First normal form
- Last Updated: March 30, 2020
- 2 minute read
- OpenEdge
- Version 12.2
- Documentation
The first rule of normalization is that you must remove duplicate columns or columns that contain more than one value to a new table. The columns of a table in the first normal form have these characteristics:
- They contain only one value
- They occur once and do not repeat
First, examine an un-normalized Customer table, as shown in the following figure.
| Cust Num | Name | Street | Order Number |
|---|---|---|---|
| 101 | Jones, Sue | 2 Mill Ave. | M31, M98, M129 |
| 102 | Hand, Jim | 12 Dudley St. | M56 |
| 103 | Lee, Sandy | 45 School St. | M37, M40 |
| 104 | Tan, Steve | 67 Main St. | M41 |
Here, the Order Number column has more than one entry. This makes it very difficult to perform even the simplest tasks, such as deleting an order, finding the total number of orders for a customer, or printing orders in sorted order. To perform any of those tasks, you need a complex algorithm to examine each value in the Order Number column for each row. You can eliminate the complexity by updating the table so that each column in a table consists of exactly one value.
The following figure shows the same Customer table in a different un-normalized format which contains only one value per column.
| Cust Num | Name | Street | Order Number1 | Order Number2 | Order Number3 |
| 101 | Jones, Sue | 2 Mill Ave. | M31 | M98 | M129 |
| 102 | Hand, Jim | 12 Dudley St. | M56 | Null | Null |
| 103 | Lee, Sandy | 45 School St. | M37 | M140 | Null |
| 104 | Tan, Steve | 67 Main St. | M41 | Null | Null |
Here, instead of a single Order Number column, there are three separate but duplicate columns for multiple orders. This format is also not efficient. What happens if a customer has more than three orders? You must either add a new column or clear an existing column value to make a new entry. It is difficult to estimate a reasonable maximum number of orders for a customer. If your business is brisk, you might have to create 200 Order Number columns for a row. But if a customer has only 10 orders, the database will contain 190 null values for this customer.
Furthermore, it is difficult and time consuming to retrieve data with repeating columns. For example, to determine which customer has Order Number M98, you must look at each Order Number column individually (all 200 of them) in every row to find a match.
To reduce the Customer table to the first normal form, split it into two smaller tables, one table to store only Customer information and another to store only Order information. Table 3 shows the normalized Customer table, and Table 4 shows the new Order table.
| Cust Num (Primary key) | Name | Street |
|---|---|---|
| 101 | Jones, Sue | 2 Mill Ave. |
| 102 | Hand, Jim | 12 Dudley St. |
| 103 | Lee, Sandy | 45 School St. |
| 104 | Tan, Steve | 67 Main St. |
| Order Number (Primary key) | Cust Num (Foreign key) |
|---|---|
| M31 | 101 |
| M98 | 101 |
| M129 | 101 |
| M56 | 102 |
| M37 | 103 |
| M140 | 103 |
| M41 | 104 |
There is now only one instance of a column in the Customer and Order tables, and each column contains exactly one value. The Cust Num column in the Order table relates to the Cust Num column in the Customer table.
A table that is normalized to the first normal form has these advantages:
- It allows you to create any number of orders for each customer without having to add new columns.
- It allows you to query and sort data for orders very quickly because you search only one column—Order Number.
- It uses disk space more efficiently because no empty columns are stored.