Fixed-Point Representation
The fixed point representation is actually the fractions extension for 1s complement and 2s complement for negative numbers. In that, we simply specify how many bits we intend to use for both the whole number and fraction part of the numbers.
The name fixed-point is because the position of the dot (i.e., decimal point) is fixed within the number system.
Representation
Note that this dot is a purely imaginary construct. In the actual bit representation, none of the bits correspond to the dot. For instance, if we have 8-bit numbers, we can allocate 6-bits for whole number part and 2-bits for fraction part. In this case, the assumed binary point1 as as seen on the image on the right. The yellow boxes are the whole number parts and the green boxes are the fraction parts.
Fixed-point representation works with any negative number representations, although the Excess-N representation is rarely used.
Examples
Using 1s complement, we can represent the following numbers with 6-bits whole number and 2-bits fraction:
- (26.75)10 = (011010.11)1s
- (-1.25)10 = (111110.10)1s
- From (1.25)10 = (000001.01)2
- Invert all the bits: (111110.10)1s
Using 2s complement, we can represent the following numbers with 6-bits whole number and 2-bits fraction:
- (26.75)10 = (011010.11)2s
- (-1.25)10 = (111110.11)2s
- From (1.25)10 = (000001.01)2
- Invert all the bits: (111110.10)1s
- Add smallest one: (111110.11)2s
Resolution
As the examples above show, there is a smallest resolution of number that we can represent. To put it simply, this resolution basically states that all number that can be represented are multiples of this resolution. The resolution depends purely on the number of bits used for the fraction part.
Consider the 6-bits whole number and 2-bits fraction fixed-point representation above, we can see that all numbers are multiples of 0.25.
- 26.75 = 107 × 0.25
- -1.25 = -5 × 0.25
As you can see, the numbers are indeed multiples of 0.25. In fact, we can say something more about this resolution. Firstly, look at the multiples in binary.
- 26.75 = 107 × 0.25
- (107)10 = (01101011)1s
- Add back binary point: (011010.11)1s
- -1.25 = -5 × 0.25
- (-5)10 = (11111010)1s
- Add back binary point: (111110.10)1s
- 26.75 = 107 × 0.25
- (107)10 = (01101011)2s
- Add back binary point: (011010.11)2s
- -1.25 = -5 × 0.25
- (-5)10 = (11111011)2s
- Add back binary point: (111110.11)2s
Notice how in both cases, the binary representation of the multiples correspond to the binary representation without the binary point.
Approximation
The resolution brings us to the reason why fixed-point representation is merely an approximation of real numbers. Consider numbers that are not multiples of the resolution such as (0.125)10. The representation of this number in binary is (0.001)2. We need 3-bits fraction to fully represent this number. As such, if we use 6-bits whole number and 2-bits fraction, we must either represent this as:
- Round Up: (000000.01)2 = (0.25)10
- Round Down: (000000.00)2 = (0.0)10
There are many ways to round a number and we will not discuss those. Most often, we will simply truncate the binary representation to the number of bits available.
Exercises
Decimal to Fixed-Point
Convert -36.0312510 to 16-bits fixed-point number represented as 1s complement with 10-bits integer and 6-bits fraction. The bit arrangement are shown below:
Write your answer in binary. Truncate any excess bits (if any).
1111011011.1111011s
Steps
- Convert 36.0312510 to binary:
- 100100.000012
- Add bits until we have 10-bits integer and 6-bits fraction:
- 0000100100.0000102
- Convert to negative using 1s complement:
- Invert: 1111011011.1111011s
-
Because decimal point is actually for base 10. The base 2 counterpart is called binary point. ↩