Resources | Subject Notes | Computer Science
Normalisation is a crucial process in database design that aims to reduce data redundancy and improve data integrity. It involves organising a database into multiple tables and defining relationships between those tables. The goal is to eliminate anomalies that can occur during data insertion, deletion, and modification.
Without normalisation, databases can suffer from several problems:
Normal forms (1NF, 2NF, 3NF, BCNF) are a set of rules that define the level of normalisation a database table should achieve. Each normal form builds upon the previous one, addressing specific types of anomalies.
A table is in 1NF if:
Example of a table not in 1NF:
Employee ID | Employee Name | Phone Numbers |
---|---|---|
101 | Alice Smith | 555-1234, 555-5678 |
102 | Bob Johnson | 555-9012 |
To achieve 1NF, the \"Phone Numbers\" column should be split into a separate table.
A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the entire primary key.
This means that no non-key attribute can depend on only a part of the primary key (in a composite key).
Example of a table not in 2NF:
Order ID | Product ID | Product Name | Price |
---|---|---|---|
1 | A1 | Widget | 10 |
2 | A2 | Gadget | 20 |
In this example, the `Product Name` depends only on `Product ID`, not on the entire primary key (`Order ID`, `Product ID`). To achieve 2NF, this table should be split into two tables: `Orders` and `Products`.
A table is in 3NF if it is in 2NF and no non-key attribute is transitively dependent on the primary key.
Transitive dependency occurs when a non-key attribute depends on another non-key attribute, which in turn depends on the primary key.
Example of a table not in 3NF:
Employee ID | Employee Name | Department ID | Department Name |
---|---|---|---|
101 | Alice Smith | 1 | Sales |
102 | Bob Johnson | 2 | Marketing |
Here, `Department Name` depends on `Department ID`, which is a non-key attribute. To achieve 3NF, the table should be split into two tables: `Employees` and `Departments`.
Consider a table containing information about students and their courses:
Student ID | Student Name | Course ID | Course Name | Instructor |
---|---|---|---|---|
1 | John Doe | CS101 | Intro to Programming | Dr. Smith |
2 | Jane Lee | CS101 | Intro to Programming | Dr. Smith |
This table is not normalised because it has redundancy (Course Name and Instructor are repeated for each student taking the same course). It also has update anomalies (if the instructor changes, it needs to be updated in multiple rows).
The table can be normalised into two tables:
Students Table:
Student ID | Student Name |
---|---|
1 | John Doe |
2 | Jane Lee |
Courses Table:
Course ID | Course Name | Instructor |
---|---|---|
CS101 | Intro to Programming | Dr. Smith |
StudentCourses Table:
Student ID | Course ID |
---|---|
1 | CS101 |
2 | CS101 |
This normalised structure eliminates redundancy and improves data integrity.