Resources | Subject Notes | Computer Science
Normalisation is a crucial process in database design aimed at reducing data redundancy and improving data integrity. It involves organising a database into multiple tables and defining relationships between those tables. The goal is to eliminate anomalies that can occur during insertion, update, and deletion operations.
Poorly designed databases can suffer from several anomalies:
Normalisation addresses these anomalies by structuring the data logically.
Normal forms (1NF, 2NF, 3NF, BCNF) are a set of rules that define the conditions a relational database must satisfy to achieve optimal structure. Each normal form builds upon the previous one, addressing specific types of redundancy.
A table is in 1NF if:
Example of a table in 1NF:
EmployeeID | FirstName | LastName | PhoneNumbers |
---|---|---|---|
101 | Alice | Smith | 555-1234 |
102 | Bob | Johnson | 555-5678 |
Example of a table *not* in 1NF (repeating group):
EmployeeID | FirstName | LastName | PhoneNumbers |
---|---|---|---|
101 | Alice | Smith | 555-1234, 555-9876 |
102 | Bob | Johnson | 555-5678 |
A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the entire primary key.
This means that if the primary key is composite (made up of multiple columns), no non-key attribute can depend only on *part* of the primary key.
Example:
EmployeeID | FirstName | LastName | DepartmentID | Salary |
---|---|---|---|---|
101 | Alice | Smith | 1 | 60000 |
102 | Bob | Johnson | 2 | 70000 |
In this example, the primary key is (EmployeeID, DepartmentID). `FirstName` and `Salary` depend only on `EmployeeID`, not on the entire primary key. To achieve 2NF, we would split the table into two:
Employee Table: (EmployeeID, FirstName, LastName, DepartmentID)
Department Table: (DepartmentID, Salary)
A table is in 3NF if it is in 2NF and no non-key attribute is transitively dependent on the primary key.
Transitive dependency occurs when a non-key attribute depends on another non-key attribute, which in turn depends on the primary key.
Example:
EmployeeID | FirstName | LastName | DepartmentID | |
---|---|---|---|---|
101 | Alice | Smith | alice.smith@example.com | 1 |
102 | Bob | Johnson | bob.johnson@example.com | 2 |
In this example, `Email` depends on `LastName` (because `Email` is typically derived from `LastName`). Since `Email` is transitively dependent on the primary key (`EmployeeID`), the table is not in 3NF. We would split the table into two:
Employee Table: (EmployeeID, FirstName, LastName, DepartmentID)
Contact Table: (EmployeeID, Email)
BCNF is a stricter version of 3NF. A table is in BCNF if for every functional dependency X -> Y, X is a superkey.
BCNF is less commonly used than 3NF, but it's important to be aware of.
Normal Form | Conditions | Purpose |
---|---|---|
1NF | Atomic values, no repeating groups | Eliminates repeating groups |
2NF | In 1NF, all non-key attributes fully functionally dependent on the primary key | Eliminates partial dependencies |
3NF | In 2NF, no transitive dependencies | Eliminates transitive dependencies |
BCNF | In 3NF, every functional dependency X -> Y where X is a superkey | Ensures all dependencies are properly represented |
Choosing the appropriate level of normalisation involves balancing the benefits of reduced redundancy with the potential for increased join operations during data retrieval. Higher normal forms generally lead to more joins, which can impact performance.