Normalisation is the process of adequately organising data in the given database. When done systematically, normalisation will decompose the tables to eliminate any data redundancy or data repetition. It will also reduce any undesirable characteristics of a database like an Update, Insertion, and Deletion anomalies. Normalising a database is a multi-step approach that can put the data into a tabular form by removing all duplicated data from the relational tables.
Two primary objectives of normalisation are:
- Eliminating the redundant data from the database, and
- Ensuring logical data dependencies
After-effects of skipping normalisation
If there is a tale that is not appropriately normalised and has data redundancy, it can consume a lot of memory space and make it difficult for the DBAs to handle or update the database. There is also a possibility of data loss. Data insertion, maintenance, updating, and deletion anomalies can also happen frequently if the database is not appropriately normalised. Here is an example for you to understand the scope of these anomalies further.
Here is a sample data of academic students:
|Roll Number||Name||Branch||HOD||Dept. Phone|
In the above table, we can see the data of six students under the ‘computers’ branch of study as it is evident from the table, data for the branch, HOD, and the dept. The phone is the same but repeated for all the students. This is called data redundancy, and now you may have understood how it could eat up additional unwanted memory space.
Insertion Anomaly. In the above example, suppose if a new student is joining the academic institution and until that student opts for a specific branch, i.e., computers, the date of that student cannot be inserted into the table, or it has to be put as Null. Also, in another case, if you have to enter the data of 100 students of a branch, then again, branch information may get repeated for all 100 students in it. These situations are called insertion anomalies.
Update Anomaly. Another sample case could be the professor (Prof. X) leaving the college one day, or he may no longer be the HOD of the computer science branch. In such a case, there would be a need to update all students’ information again. If there is any mistake in or miss out on any single record, it may lead to inconsistent data gain. This is called an update anomaly.
Deletion Anomaly. Again, at a time, there may be a need to keep two pieces of information together. For example, the student info and branch info may have to stay together. So, once the academic year ends, the students’ records may get deleted, and we will be losing the corresponding branch information. This unwanted effect is called the deletion anomaly. Normalisation effectively works to tackle all these anomalies.
Rules of normalisation. As pointed out by RemoteDBA.com experts, the general normalisation rules are categorised into the following forms:
- First Normal Form
- Second Normal Form
- Third Normal Form
- BCNF (Boyce and Codd Normal)
- Fourth Normal Form
A table needs to follow these rules to be in First Normal Form:
- It should have only single-valued attributes. This means each column of the table should be single-valued and should not contain any multiple values.
- All values stored in a column need to be of the same domain. This means the attribute domains should not change at all. This can be found as a commonsense rule. Each column at the values stored in it must be of the same type.
- Sy, for example, if there is a common DOB for a set of people to be saved, still you cannot use the names of a few of them in that column along with the common DOB in the corresponding column.
- All columns in the table must have unique names. Doing so will help to avoid any confusion while retrieving data, or while performing any other stored data. If more columns have the same name, the DBMS system itself will get confused in read and write operations.
The data storage order does not matter.
The below rules needed to be satisfied for a table to be in 2NF:
- It has to be 1NF by default.
- There should not be any partial dependency. Partial dependency means that a particular attribute in the given table depends only on one part of the primary key instead of depending on the whole key.
A table would be in Third Normal Form only when it satisfies the following norms:
- It has to be 2NF compliant.
- There should not be any transitive dependency.
BCNF or Boyce and Codd Normal Form
BCNF is an up-gradation of the 3NF. Boyce and Codd Normal form of normalisation deals with the particular types of anomalies which cannot be handled by the Third Normal Form. For example, a Third Normal Form table that does not have multiple overlapping keys, as seen in the above example, can be in the BCNF. To be BCNF, a table should satisfy the following conditions:
- R should be in the 3NF.
- For each of the functional dependencies as X → Y, the super key should be X.
To understand BCNF in further detail, you may refer to the BCNF tutorial article we published in this same series.
Fourth Normal Form or 4NF
A table can be counted as 4NF if:
- It complies with BCNF.
- There is no multivalued dependency on it.
For the DBAs, it is vital to understand the previous Normal Forms before getting over to the 4NF. By understanding the dependencies and best normalisation practices, you will be able to design your relational database better.