Show understanding of hashing algorithms

Resources | Subject Notes | Computer Science

Cambridge A-Level Computer Science 9618 - 13.2 File Organisation and Access - Hashing Algorithms

File Organisation and Access - Hashing Algorithms

This section explores different methods for organizing files on a storage device and delves into the concept of hashing algorithms, which are crucial for efficient data retrieval.

File Organisation

Files are typically organized in a hierarchical structure, resembling a tree. This structure allows for easy navigation and management of a large number of files. The most common file system is the hierarchical file system.

Hierarchical File Systems

A hierarchical file system uses directories (also known as folders) to group related files. The root directory is the top-level directory, and all other directories branch out from it. This creates a tree-like structure.

Example:

/ (Root Directory)
├── Documents
│   ├── Report.docx
│   └── Presentation.pptx
├── Pictures
│   ├── Vacation2023
│   │   ├── beach.jpg
│   │   └── mountains.png
│   └── Profile.jpg
└── Music
    ├── Artist1
    │   ├── Song1.mp3
    │   └── Song2.mp3
    └── Artist2
        └── Song3.mp3

File Access Methods

To access files, the operating system uses various methods. These methods differ in their efficiency and suitability for different scenarios.

Sequential Access

Sequential access involves reading or writing files in the order the data is stored on the storage device. This is simple to implement but can be slow for accessing specific parts of a file.

Random Access

Random access allows direct access to any part of a file without having to read through the preceding data. This is much faster than sequential access, especially for large files.

Indexed Access

Indexed access combines elements of sequential and random access. An index is created that maps keywords or other identifiers to the locations of the data within the file. This allows for faster retrieval of specific data.

Hashing Algorithms

A hashing algorithm is a function that takes an input (which can be of any size) and produces a fixed-size output, called a hash value or hash code. This hash value is used to represent the input data.

Properties of a Good Hashing Algorithm

  1. Deterministic: The same input always produces the same hash value.
  2. Efficient: The algorithm should be fast to compute.
  3. Uniform Distribution: The hash values should be evenly distributed across the possible output range to minimize collisions.

Collision Handling

A collision occurs when two different inputs produce the same hash value. Different techniques are used to handle collisions:

  • Separate Chaining: Each index in the hash table points to a linked list of elements that have the same hash value.
  • Open Addressing: If a collision occurs, the algorithm probes for an empty slot in the hash table. Common probing techniques include linear probing, quadratic probing, and double hashing.

Common Hashing Algorithms

Algorithm Description Typical Use
Division Method The hash value is calculated as the remainder when the input is divided by the size of the hash table. Simple and widely used.
Multiplication Method The hash value is calculated as an integer part of the product of the input and a constant (between 0 and 1). Can provide a more uniform distribution than the division method.
Universal Hashing A family of hash functions is chosen randomly, and the best one for a given set of inputs is selected. Provides good performance even with unknown input data.

Applications of Hashing Algorithms

Hashing algorithms are used in various applications, including:

  • Hash Tables: For efficient data storage and retrieval.
  • Data Integrity Checks: To detect if a file has been modified.
  • Password Storage: To store passwords securely (using salting and hashing).