Show understanding of the normalisation process

Resources | Subject Notes | Computer Science

Database Concepts: Normalisation

Normalisation is a crucial process in database design that aims to reduce data redundancy and improve data integrity. It involves organising a database into multiple tables and defining relationships between those tables. The goal is to eliminate anomalies that can occur during data insertion, deletion, and modification.

Why is Normalisation Important?

Without normalisation, databases can suffer from several problems:

  • Redundancy: Data is repeated unnecessarily, leading to wasted storage space.
  • Update Anomalies: Modifying data requires updating multiple places, increasing the risk of inconsistencies.
  • Insertion Anomalies: It might be impossible to add new information without also providing unnecessary data.
  • Deletion Anomalies: Deleting data can unintentionally remove related information.

Normal Forms

Normal forms (1NF, 2NF, 3NF, BCNF) are a set of rules that define the level of normalisation a database table should achieve. Each normal form builds upon the previous one, addressing specific types of anomalies.

First Normal Form (1NF)

A table is in 1NF if:

  • Each cell contains a single value (no repeating groups).
  • There is a primary key defined.

Example of a table not in 1NF:

Employee ID Employee Name Phone Numbers
101 Alice Smith 555-1234, 555-5678
102 Bob Johnson 555-9012

To achieve 1NF, the \"Phone Numbers\" column should be split into a separate table.

Second Normal Form (2NF)

A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the entire primary key.

This means that no non-key attribute can depend on only a part of the primary key (in a composite key).

Example of a table not in 2NF:

Order ID Product ID Product Name Price
1 A1 Widget 10
2 A2 Gadget 20

In this example, the `Product Name` depends only on `Product ID`, not on the entire primary key (`Order ID`, `Product ID`). To achieve 2NF, this table should be split into two tables: `Orders` and `Products`.

Third Normal Form (3NF)

A table is in 3NF if it is in 2NF and no non-key attribute is transitively dependent on the primary key.

Transitive dependency occurs when a non-key attribute depends on another non-key attribute, which in turn depends on the primary key.

Example of a table not in 3NF:

Employee ID Employee Name Department ID Department Name
101 Alice Smith 1 Sales
102 Bob Johnson 2 Marketing

Here, `Department Name` depends on `Department ID`, which is a non-key attribute. To achieve 3NF, the table should be split into two tables: `Employees` and `Departments`.

Normalisation Process

  1. Identify the primary key(s) for each table.
  2. Identify functional dependencies between attributes.
  3. Apply normal forms iteratively, starting with 1NF and progressing to 2NF, 3NF, and so on.
  4. Create new tables as needed to eliminate redundancy and anomalies.
  5. Define relationships between the new tables using foreign keys.

Example of Normalisation

Consider a table containing information about students and their courses:

Student ID Student Name Course ID Course Name Instructor
1 John Doe CS101 Intro to Programming Dr. Smith
2 Jane Lee CS101 Intro to Programming Dr. Smith

This table is not normalised because it has redundancy (Course Name and Instructor are repeated for each student taking the same course). It also has update anomalies (if the instructor changes, it needs to be updated in multiple rows).

The table can be normalised into two tables:

Students Table:

Student ID Student Name
1 John Doe
2 Jane Lee

Courses Table:

Course ID Course Name Instructor
CS101 Intro to Programming Dr. Smith

StudentCourses Table:

Student ID Course ID
1 CS101
2 CS101

This normalised structure eliminates redundancy and improves data integrity.