Resources | Subject Notes | Computer Science
This section explores different methods for organizing files on a storage device and delves into the concept of hashing algorithms, which are crucial for efficient data retrieval.
Files are typically organized in a hierarchical structure, resembling a tree. This structure allows for easy navigation and management of a large number of files. The most common file system is the hierarchical file system.
A hierarchical file system uses directories (also known as folders) to group related files. The root directory is the top-level directory, and all other directories branch out from it. This creates a tree-like structure.
Example:
/ (Root Directory) ├── Documents │ ├── Report.docx │ └── Presentation.pptx ├── Pictures │ ├── Vacation2023 │ │ ├── beach.jpg │ │ └── mountains.png │ └── Profile.jpg └── Music ├── Artist1 │ ├── Song1.mp3 │ └── Song2.mp3 └── Artist2 └── Song3.mp3
To access files, the operating system uses various methods. These methods differ in their efficiency and suitability for different scenarios.
Sequential access involves reading or writing files in the order the data is stored on the storage device. This is simple to implement but can be slow for accessing specific parts of a file.
Random access allows direct access to any part of a file without having to read through the preceding data. This is much faster than sequential access, especially for large files.
Indexed access combines elements of sequential and random access. An index is created that maps keywords or other identifiers to the locations of the data within the file. This allows for faster retrieval of specific data.
A hashing algorithm is a function that takes an input (which can be of any size) and produces a fixed-size output, called a hash value or hash code. This hash value is used to represent the input data.
A collision occurs when two different inputs produce the same hash value. Different techniques are used to handle collisions:
Algorithm | Description | Typical Use |
---|---|---|
Division Method | The hash value is calculated as the remainder when the input is divided by the size of the hash table. | Simple and widely used. |
Multiplication Method | The hash value is calculated as an integer part of the product of the input and a constant (between 0 and 1). | Can provide a more uniform distribution than the division method. |
Universal Hashing | A family of hash functions is chosen randomly, and the best one for a given set of inputs is selected. | Provides good performance even with unknown input data. |
Hashing algorithms are used in various applications, including: