Resources | Subject Notes | Computer Science
Floating-point numbers are used to represent real numbers (numbers with a fractional part) in computers. They are represented in a format similar to scientific notation. This allows for a wide range of values to be represented, from very small to very large, using a fixed number of bits.
The most common standard for representing floating-point numbers is IEEE 754. This standard defines how floating-point numbers are stored in memory. We will focus on the single-precision (32-bit) and double-precision (64-bit) formats, which are commonly used.
A single-precision floating-point number is typically represented using 32 bits, divided into three parts:
The format can be summarized as follows:
Bit Position | Name | Description |
---|---|---|
31 | Sign Bit | 0: Positive, 1: Negative |
30-23 | Significand | Represents the digits of the number (normalized). |
22-12 | Exponent | Represents the power of 2. It is biased. |
The exponent is biased to allow for the representation of both positive and negative exponents. The bias is a fixed value that is subtracted from the actual exponent. For single-precision, the bias is 127. This means that the actual exponent is calculated as: actual exponent = stored exponent - 127
The significand is normalized to ensure that there is only one non-zero digit to the left of the decimal point (or binary point in this case). This allows for a greater range of representable numbers with the same number of bits. The leading '1' is implicit and not stored, saving one bit.
A double-precision floating-point number is typically represented using 64 bits, divided into three parts:
The format can be summarized as follows:
Bit Position | Name | Description |
---|---|---|
63 | Sign Bit | 0: Positive, 1: Negative |
62-53 | Significand | Represents the digits of the number (normalized). |
52-31 | Exponent | Represents the power of 2. It is biased. |
The exponent is biased to allow for the representation of both positive and negative exponents. For double-precision, the bias is 1023. This means that the actual exponent is calculated as: actual exponent = stored exponent - 1023
Floating-point representation has limitations:
Consider a single-precision floating-point number with the following binary representation: 0 01111100 10000000000000000000000
. This represents the number 3.14159 (approximately).
The sign bit is 0 (positive). The exponent is 01111100, which is 124 in decimal. The bias is 127, so the actual exponent is 124 - 127 = -3. The significand is 10000000000000000000000, which is 1.10000000000000000000000 in binary. Therefore, the number is calculated as: 1.1 * 2^-3 = 1.1 / 8 = 0.1375
. This example is simplified for illustration. The actual calculation involves the implicit leading '1' in the significand.