e-Consult | Notes

Convert binary floating-point real numbers into denary and vice versa

Resources | Subject Notes | Computer Science

Cambridge A-Level Computer Science 9618 - 13.3 Floating-point Numbers

13.3 Floating-point Numbers: Representation and Manipulation

Introduction

Floating-point numbers are a standard way computers represent real numbers. They are used extensively in scientific computing, graphics, and general-purpose applications. This section details how these numbers are represented in binary, how to convert between binary and decimal, and how to perform basic manipulations.

Binary Floating-Point Representation

A binary floating-point number is typically represented using three parts:

Sign bit: Indicates whether the number is positive or negative (0 for positive, 1 for negative).
Exponent: Represents the power of 2 by which the significand is to be multiplied.
Significand (Mantissa): Represents the significant digits of the number. It's normalized to have a leading '1' (except for denormalized numbers).

The general form of a floating-point number is:

$$(-1)^s \times 1.m \times 2^e$$

Where:

$s$ is the sign bit (0 or 1)
$m$ is the significand (a fractional part)
$e$ is the exponent

Conversion from Binary to Decimal

To convert a binary floating-point number to decimal, we follow these steps:

Determine the sign bit.
Determine the exponent and significand from the binary representation.
Calculate the value of the exponent: $e = \text{exponent binary value} - \text{bias}$ (where bias is a constant specific to the floating-point format).
Calculate the significand: The significand is the binary number after the '1.' (e.g., 1.101). Convert this binary fraction to decimal.
Calculate the value: $(-1)^s \times \text{significand decimal value} \times 2^e$

Example: Converting a Binary Floating-Point Number to Decimal

Consider the binary floating-point number: $1.1011 \times 2^5$ (assuming a standard IEEE 754 single-precision format with a bias of 127).

1. Sign bit: 1 (negative)

2. Exponent: 5

3. Significand: 1.1011 (binary) = 1 + 1/2 + 0/4 + 1/8 + 1/16 = 1 + 0.5 + 0 + 0.125 + 0.0625 = 1.6875 (decimal)

4. Calculate the value: $(-1)^1 \times 1.6875 \times 2^5 = -1 \times 1.6875 \times 32 = -54$

Conversion from Decimal to Binary (Floating-Point)

Converting a decimal number to binary floating-point involves the following steps:

Convert the decimal number to binary.
Normalize the binary number: Move the binary point so that there is only one non-zero digit to the left of the binary point.
Determine the sign bit (0 for positive, 1 for negative).
Determine the exponent: $e = \lfloor \log_2(\text{magnitude}) \rfloor + \text{bias}$ (where magnitude is the number of binary digits to the left of the binary point after normalization).
Construct the floating-point number using the sign bit, exponent, and significand.

Example: Converting a Decimal Number to Binary Floating-Point

Convert the decimal number 54 to binary floating-point (using the same IEEE 754 single-precision format with a bias of 127).

Binary representation of 54: 110110
Normalize: 1.10110 x 2⁵
Sign bit: 0 (positive)
Exponent: The magnitude is 5 (the number of digits to the left of the binary point after normalization). $e = \lfloor \log_2(5) \rfloor + 127 = 2 + 127 = 129$
Floating-point number: 0 10000011 01100000 (sign, exponent, significand)

Limitations of Floating-Point Representation

Floating-point numbers have limitations due to their finite precision. This can lead to rounding errors, especially when performing many calculations. Some numbers cannot be represented exactly, resulting in approximations.

Common Issues

Rounding Errors: Due to the limited number of bits, not all real numbers can be represented exactly.
Denormalized Numbers: Numbers with exponents equal to the minimum exponent value are considered denormalized. They have limited precision.
Overflow and Underflow: When the exponent is too large or too small, overflow or underflow can occur, leading to infinity or zero, respectively.

Summary

Floating-point numbers are a crucial part of computer science. Understanding their representation and how to convert between binary and decimal is essential for working with numerical data in various applications. However, it's important to be aware of the limitations and potential errors associated with floating-point arithmetic.