e-Consult | Notes

Show understanding of the normalisation process

Resources | Subject Notes | Computer Science

Database Concepts: Normalisation

Normalisation is a crucial process in database design that aims to reduce data redundancy and improve data integrity. It involves organising a database into multiple tables and defining relationships between those tables. The goal is to eliminate anomalies that can occur during data insertion, deletion, and modification.

Why is Normalisation Important?

Without normalisation, databases can suffer from several problems:

Redundancy: Data is repeated unnecessarily, leading to wasted storage space.
Update Anomalies: Modifying data requires updating multiple places, increasing the risk of inconsistencies.
Insertion Anomalies: It might be impossible to add new information without also providing unnecessary data.
Deletion Anomalies: Deleting data can unintentionally remove related information.

Normal Forms

Normal forms (1NF, 2NF, 3NF, BCNF) are a set of rules that define the level of normalisation a database table should achieve. Each normal form builds upon the previous one, addressing specific types of anomalies.

First Normal Form (1NF)

A table is in 1NF if:

Each cell contains a single value (no repeating groups).
There is a primary key defined.

Example of a table not in 1NF:

Employee ID	Employee Name	Phone Numbers
101	Alice Smith	555-1234, 555-5678
102	Bob Johnson	555-9012

To achieve 1NF, the \"Phone Numbers\" column should be split into a separate table.

Second Normal Form (2NF)

A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the entire primary key.

This means that no non-key attribute can depend on only a part of the primary key (in a composite key).

Example of a table not in 2NF:

Order ID	Product ID	Product Name	Price
1	A1	Widget	10
2	A2	Gadget	20

In this example, the `Product Name` depends only on `Product ID`, not on the entire primary key (`Order ID`, `Product ID`). To achieve 2NF, this table should be split into two tables: `Orders` and `Products`.

Third Normal Form (3NF)

A table is in 3NF if it is in 2NF and no non-key attribute is transitively dependent on the primary key.

Transitive dependency occurs when a non-key attribute depends on another non-key attribute, which in turn depends on the primary key.

Example of a table not in 3NF:

Employee ID	Employee Name	Department ID	Department Name
101	Alice Smith	1	Sales
102	Bob Johnson	2	Marketing

Here, `Department Name` depends on `Department ID`, which is a non-key attribute. To achieve 3NF, the table should be split into two tables: `Employees` and `Departments`.

Normalisation Process

Identify the primary key(s) for each table.
Identify functional dependencies between attributes.
Apply normal forms iteratively, starting with 1NF and progressing to 2NF, 3NF, and so on.
Create new tables as needed to eliminate redundancy and anomalies.
Define relationships between the new tables using foreign keys.

Example of Normalisation

Consider a table containing information about students and their courses:

Student ID	Student Name	Course ID	Course Name	Instructor
1	John Doe	CS101	Intro to Programming	Dr. Smith
2	Jane Lee	CS101	Intro to Programming	Dr. Smith

This table is not normalised because it has redundancy (Course Name and Instructor are repeated for each student taking the same course). It also has update anomalies (if the instructor changes, it needs to be updated in multiple rows).

The table can be normalised into two tables:

Students Table:

Student ID	Student Name
1	John Doe
2	Jane Lee

Courses Table:

Course ID	Course Name	Instructor
CS101	Intro to Programming	Dr. Smith

StudentCourses Table:

Student ID	Course ID
1	CS101
2	CS101

This normalised structure eliminates redundancy and improves data integrity.