Normalization in Database
Enormous data is store in the database and it will be very difficult to manage, update, delete, or retrieve data from the database if it is not stored in an organized manner. Normalization is a way to organize the data and minimize the redundancy of the data.
In this blog, I will be discussed the following topics:
1.What is normalization
2.Why need normalization
3.Types of normalization
What is normalization?
Normalization is a process to organize the data into multiple related tables to minimize redundancy. It is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
Why need normalization?
Repetition of similar data is increasing the size of the database. Redundancy of data also creates some other issues like:
a.Insertion problem (Insertion Anomaly)
b.Deletion problem (Deletion Anomaly)
c.Updatation problem (Updatation Anomaly)
To Insert the redundant data for every new row (in the above table) is an Insertion anomaly because of repetition of data will only increase as we insert more entries to our table.
In the above table, we are saving two pieces of the information in every row i.e student information and branch information and if we delete all information of the student then our branch information will be also deleted unintentionally which leads to Deletion Anomaly.
If we want to modify the hod name in the above table then we have to modify it in all rows and during modification, even a single row left out unmodified will leads to inconsistent data which leads to Updatation Anomaly.
So normalization breaks the table into two different tables Student table and Branch table so we can perform the update, delete and insert separately for student and branch.
Types of normalization
We can achieve normalization in various ways. We have three basic normal forms and one advance form give below:
- 1st Normal Form
- 2nd Normal Form
- 3rd Normal Form
1st Normal Form(1NF)
It is the first step in the normalization process, It expects to design the table in such a way so that it will be easier to extended and can retrieve the data easily from it whenever required.
If the table in our database is not even in the first normal form it will be considered a bad database design. So the first normal form is the basic requirement of any database.
There are four basics rules that a table should be followed to be in the first normal form.
a. Each column should contain a single value (atomic value).
b. In each column the values stored must be of the same kind or type.
c. Each column should have a unique name.
d. Order in which data is saved doesn’t matter.
Let’s see the example table which follows all four rules of 1st normal form.
2nd Normal Form(2NF)
For a table to be in 2nd normal form it must satisfy two conditions:
a. It should be in 1st normal form.
b.There should be no any partial dependency in the table.
what is Partial dependency
Partial dependency is one kind of functional dependency that occurs when the primary key must be the candidate key and non-prime attribute are depends on the part of the candidate key.
let’s see in the above example
Here candida keys (student_id and subject_id) form the primary key but the teacher’s name dependent only on the subject_id so there is a partial dependency in the given table which must be avoided.
As a solution, Create a separate table for the teacher and use the teacher_id wherever you want.
3rd Normal Form(3NF)
For a table to be in 3rd normal form it must satisfy two conditions:
a. It should be in 2nd normal form.
b.There should be no transitive dependency in the table.
When attribute in the table depends on some non-prime attribute but not on the prime attributes.
In the above table, the total marks depend on the exam_name(eg. Internal exam, practical exam, final exam) not on the primary key (student_id and subject_id) so it is a transitive dependency which should be avoided.
As a solution put exam_name and total_marks in a separate table and use the exam_id whenever require.
Boyce-Codd normal form (BCNF)
It is an upgraded version of the 3rd normal form.
a. It should be in 3rd normal form.
b. For any dependency A — >B, A should be super key.
In the above table (student_id nad subject) are the primary key but the professor can lead the subject i.e dependency of the primary attribute on a non-primary attribute which should be avoided.
So as a Solution create two different tables in one table there will be professor_id, professor and subject and in the other table use professor_id with student_id.
In this blog, we discuss the normalization and why we need normalization in the Database and we saw the different types of normalization techniques. That’s it for now I hope this blog is helpful for understanding normalization.