Good RDB Design with the Concept of Normal Forms
- Liu, An-Chi 劉安齊
What is a good design for a relational database (RDB)?
You might have seen an excel data sheet filled with lots of columns. I can imagine that kind of excel sheet is very messy. A well-designed database should design and create a model, and then implement the model into the database.
For RDB, we use the ER model. However, we might still design a model with deflects. For example, a table combines student information and department information.
In this case,
DNUM is followed by
D_HEAD are followed by
DNUM. That’s dangerous because it might cause anomalies while inserting, deleting, or modifying a tuple of data.
This is an example of bad designs. So, I am going to introduce to 5 concepts of normal forms.
When we create an RDB, we should not only base on the ER model but also need to consider that if the database is normal forms. Then we will have a good design database.
The concept of normal forms was first proposed by Codd in 1972.
- functional dependencies
- 1st normal form
- 2nd normal form
- 3rd normal form
- multi-valued dependency
- 4th normal form
- join dependency
- 5th normal form
¶ First Normal Form
The value of an attribute should be an atomic value. It could not be multi-value, array, composite values, or any other relation. (ER model has already defined)
¶ Second Normal Form
A primary key could be a combination of keys. However, if one of a key in the primary key can determine more than two attributes in the table, there is a partial relationship between the primary key and non-prime attributes. Then, it violates the second normal form.
In this case, the primary key is
PID, and any attributes should be determined by the two. But,
SNAME is only determined by
SID, so it violates the rule. Therefore, this table should divide into three new tables to follow the concept.
¶ Third Normal Form
If there are attributes that follow a non-primary key, these attributes should be another table. Just as the example:
¶ Fourth Normal Form
A table is in 4NF if and only if, for every one of its non-trivial multi-valued dependencies X ↠ Y, X is a superkey. That is, X is either a candidate key or a superset thereof.
Removing “bad” multi-valued dependencies help us get into 4th normal.
form, and into a better design.
Considering the following example:
In this case, both
Pizza Variety and
Delivery Area are determined by
Restaurant, and the two should be a combination.
Pizza Variety and
Delivery Area are independent of each other, it violates 4NF. Because, if the restaurant
Pizza A1 adds a new kind pizza
Cheese Pizza, it needs to add three rows for the three locations in the table. (Because
Pizza Variety is not binding
Delivery Area, any
Delivery Area should have this new
In other words, adding a new kind of pizza, the table needs to insert rows, which is the new pizza with each area, and it is easy to make errors when updating. To eliminate the possibility of these anomalies, 4NF suggests that the table should be split.
¶ Fifth Normal Form
Assume the table meets 1NF to 4NF.
Considering a table that has
Traveling Salesman(primary key),
Imagining an extreme case:
Traveling Salesman has certain
Brands and certain
Product Types in their repertoire. If
Brand B1 and
Brand B2 are in their repertoire, and
Product Type P is in their repertoire, then (assuming
Brand B1 and
Brand B2 both make Product Type P,) the
Traveling Salesman must offer products of Product Type P those made by Brand B1 and those made by Brand B2.
That is to say, the
Product are combined (4NF), the
Salesman is not able to only sell a certain
Brand but exclude one of the products of that
Then to solve this, splitting the able to three would make sense.
The ER model is good for RDB, but the database might easily make an error if we design not well.
Following these concepts of normal forms would reduce the possibility of anomalies, which makes the database more clear and reliable.