e-Consult | Notes

Show understanding of how sound is represented and encoded

Resources | Subject Notes | Computer Science

1.2 Multimedia: Sound Representation and Encoding

This section explores how sound is represented digitally and the various encoding methods used to store and transmit audio.

Sound as a Physical Phenomenon

Sound is a mechanical wave that propagates through a medium (like air) as vibrations. These vibrations create variations in air pressure, which our ears detect as sound.

Digital Representation of Sound

To represent sound digitally, we need to sample the sound wave at regular intervals. This process is called sampling.

The amplitude of the sound wave is measured at each sample point, and this amplitude is then converted into a numerical value. This numerical representation forms the basis of digital audio.

The rate at which we take these samples is known as the sampling rate, typically measured in Hertz (Hz) or samples per second (sps). A higher sampling rate results in a more accurate representation of the original sound.

The number of bits used to represent the amplitude at each sample is known as the bit depth. A higher bit depth allows for a greater range of amplitude values and thus a higher dynamic range.

Key Parameters in Sound Representation

Parameter	Description	Typical Values
Sampling Rate (Fs)	Number of samples taken per second. Determines the highest frequency that can be accurately represented (Nyquist frequency).	44.1 kHz, 48 kHz, 96 kHz
Bit Depth (b)	Number of bits used to represent the amplitude of each sample. Determines the dynamic range (number of possible amplitude levels).	8-bit, 16-bit, 24-bit
Number of Channels (n)	Number of independent audio signals.	1 (Mono), 2 (Stereo), 4 (Surround Sound)

Encoding Formats

Digital audio data is stored in various formats, each with its own encoding scheme. These formats can be broadly categorized as:

Uncompressed Formats: These formats store the audio data directly, without any loss of information. Examples include WAV and AIFF.
Compressed Formats: These formats use various algorithms to reduce the file size of the audio data, often at the expense of some audio quality. Examples include MP3, AAC, and Ogg Vorbis.

Encoding Techniques

Compressed audio formats employ different encoding techniques:

Lossless Compression: These techniques allow the original audio data to be perfectly reconstructed from the compressed data. Examples include FLAC and Apple Lossless.
Lossy Compression: These techniques discard some audio data that is deemed perceptually irrelevant, resulting in smaller file sizes but potentially lower audio quality. Examples include MP3, AAC, and Ogg Vorbis. The amount of data discarded is controlled by the bitrate.

The Nyquist-Shannon Sampling Theorem

This theorem states that to accurately reconstruct a signal, the sampling rate must be at least twice the highest frequency component in the signal. The highest frequency that can be represented is called the Nyquist frequency ($F_N = Fs/2$).

For example, the human hearing range is typically considered to be between 20 Hz and 20 kHz. Therefore, a sampling rate of at least 40 kHz is required to accurately represent the entire audible spectrum.

Suggested diagram: Illustrating the sampling process of a sound wave and the Nyquist frequency.

Conclusion

Understanding the principles of sound representation and encoding is crucial for working with multimedia applications. The choice of sampling rate, bit depth, and encoding format depends on the desired audio quality and the available storage space.