1.3 Compression (3)
Resources |
Revision Questions |
Computer Science
Login to see all questions
Click on a question to view the answer
1.
Explain how Huffman coding achieves compression. Provide a simple example using the following character frequencies: A: 0.4, B: 0.2, C: 0.3, D: 0.1. Illustrate the resulting Huffman codes.
Huffman coding is a variable-length prefix coding algorithm used for lossless data compression. It assigns shorter codes to more frequent symbols and longer codes to less frequent symbols. This results in an overall reduction in the average number of bits required to represent the data.
Example:
- Calculate the frequency of each character: A: 0.4, B: 0.2, C: 0.3, D: 0.1
- Create a priority queue (min-heap) with the characters and their frequencies.
- Repeatedly combine the two nodes with the lowest frequencies into a new node with a frequency equal to the sum of their frequencies. This process continues until only one node remains, which represents the root of the Huffman tree.
- Assign codes to each character based on the path taken from the root to the character in the Huffman tree. Left branches are assigned '0' and right branches are assigned '1'.
Huffman Codes:
Character | Frequency | Huffman Code |
Average Code Length: (0.4 * 1) + (0.2 * 2) + (0.3 * 2) + (0.1 * 3) = 0.4 + 0.4 + 0.6 + 0.3 = 1.7 bits per character. The uncompressed data would require 8 bits per character. Therefore, the average compression ratio is approximately 8/1.7 = 4.76:1. This demonstrates how Huffman coding can achieve significant compression, especially when there are significant differences in character frequencies.
2.
Describe the fundamental difference between lossy and lossless data compression. Provide an example of each type of compression and explain the trade-off between compression ratio and data fidelity.
Lossless compression reduces file size without losing any of the original data. The original data can be perfectly reconstructed from the compressed file. A common example is ZIP compression, which is often used for archiving files. Other examples include Run-Length Encoding (RLE) and Huffman coding. Lossless compression typically achieves lower compression ratios compared to lossy methods because it must preserve all the original information.
Lossy compression achieves higher compression ratios by discarding some of the original data. This discarded data is deemed less important, often based on perceptual models (e.g., removing high-frequency components in images that are less likely to be noticed). Examples include JPEG for images and MP3 for audio. The trade-off is that the decompressed data is not an exact replica of the original; some information is lost. The level of lossiness is often adjustable, allowing users to balance file size and quality. Higher compression levels result in smaller files but greater data loss.
3.
Describe the difference between lossless and lossy data compression. For each type, provide an example of a compression algorithm and explain the trade-off between compression ratio and data fidelity.
Lossless data compression reduces file size without losing any of the original data. The original data can be perfectly reconstructed from the compressed file. A common example is ZIP compression, which is used for archiving files. Lossless compression typically achieves lower compression ratios compared to lossy compression, but it preserves data integrity.
Lossy data compression reduces file size by discarding some of the original data. This results in a smaller file size but with a reduction in data fidelity. A common example is JPEG compression for images. Lossy compression can achieve significantly higher compression ratios than lossless compression, but the reconstructed data is not identical to the original. The trade-off is that higher compression ratios lead to greater data loss and a lower quality reconstructed file.
Here's a table summarizing the differences:
|
Feature | Lossless | Lossy |
Data Loss | No | Yes |
Compression Ratio | Lower | Higher |
Example | ZIP | JPEG |