Show understanding of the normalisation process

Resources | Subject Notes | Computer Science

Database Concepts - Normalisation

Database Concepts: Normalisation

Normalisation is a crucial process in database design aimed at reducing data redundancy and improving data integrity. It involves organising a database into multiple tables and defining relationships between those tables. The goal is to eliminate anomalies that can occur during insertion, update, and deletion operations.

Why is Normalisation Important?

Poorly designed databases can suffer from several anomalies:

  • Insertion Anomaly: Difficulty adding new data without also providing values for existing columns.
  • Update Anomaly: Updating a single piece of information requires updating it in multiple places, increasing the risk of inconsistencies.
  • Deletion Anomaly: Deleting a record unintentionally removes other related data.

Normalisation addresses these anomalies by structuring the data logically.

Normal Forms

Normal forms (1NF, 2NF, 3NF, BCNF) are a set of rules that define the conditions a relational database must satisfy to achieve optimal structure. Each normal form builds upon the previous one, addressing specific types of redundancy.

1NF (First Normal Form)

A table is in 1NF if:

  • Each cell contains a single, atomic (indivisible) value.
  • There are no repeating groups of columns.

Example of a table in 1NF:

EmployeeID FirstName LastName PhoneNumbers
101 Alice Smith 555-1234
102 Bob Johnson 555-5678

Example of a table *not* in 1NF (repeating group):

EmployeeID FirstName LastName PhoneNumbers
101 Alice Smith 555-1234, 555-9876
102 Bob Johnson 555-5678

2NF (Second Normal Form)

A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the entire primary key.

This means that if the primary key is composite (made up of multiple columns), no non-key attribute can depend only on *part* of the primary key.

Example:

EmployeeID FirstName LastName DepartmentID Salary
101 Alice Smith 1 60000
102 Bob Johnson 2 70000

In this example, the primary key is (EmployeeID, DepartmentID). `FirstName` and `Salary` depend only on `EmployeeID`, not on the entire primary key. To achieve 2NF, we would split the table into two:

Employee Table: (EmployeeID, FirstName, LastName, DepartmentID)

Department Table: (DepartmentID, Salary)

3NF (Third Normal Form)

A table is in 3NF if it is in 2NF and no non-key attribute is transitively dependent on the primary key.

Transitive dependency occurs when a non-key attribute depends on another non-key attribute, which in turn depends on the primary key.

Example:

EmployeeID FirstName LastName Email DepartmentID
101 Alice Smith alice.smith@example.com 1
102 Bob Johnson bob.johnson@example.com 2

In this example, `Email` depends on `LastName` (because `Email` is typically derived from `LastName`). Since `Email` is transitively dependent on the primary key (`EmployeeID`), the table is not in 3NF. We would split the table into two:

Employee Table: (EmployeeID, FirstName, LastName, DepartmentID)

Contact Table: (EmployeeID, Email)

BCNF (Boyce-Codd Normal Form)

BCNF is a stricter version of 3NF. A table is in BCNF if for every functional dependency X -> Y, X is a superkey.

BCNF is less commonly used than 3NF, but it's important to be aware of.

Summary of Normal Forms

Normal Form Conditions Purpose
1NF Atomic values, no repeating groups Eliminates repeating groups
2NF In 1NF, all non-key attributes fully functionally dependent on the primary key Eliminates partial dependencies
3NF In 2NF, no transitive dependencies Eliminates transitive dependencies
BCNF In 3NF, every functional dependency X -> Y where X is a superkey Ensures all dependencies are properly represented

Choosing the appropriate level of normalisation involves balancing the benefits of reduced redundancy with the potential for increased join operations during data retrieval. Higher normal forms generally lead to more joins, which can impact performance.