Home > Writings > Programming > Floating Point Essentials > Definitions and Overview

Floating Point Essentials

Definitions and Overview

In computing, floating point numbers are commonly used to encode rational numbers beyond the available integer range or smaller than the smallest integer unit. Examples of floating point values are -210.25, 64 or 0.40625. Contrary to common misconceptions, the floating point format is not the only way to express numbers with significant digits after the decimal point. It is possible to use scaled integers or fixed point formats for that purpose too (for instance, Delphi's currency type). However, it is easy to see that by allowing the decimal place to float, the range of numbers can be much larger compared to fixed point. For example, take a decimal fixed point format with 6 digits, of which two positions after the decimal point. Such a format could store numbers such as 5745.24, 9999.00 or 0.01. However, a decimal floating point format using 6 digits could store numbers such as 97845, 0.00054 etc.

However, the greater range of floating point numbers comes at the cost of less precision, because of the need to store information about the decimal position.

There are several ways to represent floating point numbers, but in computing the most common approach uses three components: a sign bit, a significant digits part (the mantissa) and an exponent. Typically, the base is binary rather than decimal. The value represented by a binary floating point is therefore calculated as follows:

significant digits x 2exponent

Different computer systems can use different actual encodings for floating point values, and my generation certainly will remember the corresponding difficulties when sharing information between systems with different and often incompatible formats. Fortunately, the IEEE Standard for Floating-Point Arithmetic (IEEE 754) has become widely adopted over the last few decades and is the dominant format on modern PCs.

Illustration showing graphic representation of single, double and extended data types

For programmers of PC applications, the single (known as float in C) and double formats of modern Intel CPUs and the 80-bit extended format are essential knowledge.

Next: Format and Encoding

 

Floating Point Essentials

Science and Technology
News

Download

Printable version
(PDF Document)

Size: 75KB