Data Representation

Two’s Complement

+ve as usual
-ve flip and add one; reverse: flip and add one

Or another trick: from the right to left, flip all bits after the first encountered 1

400

Subtraction

Suppose we do $A - B$ with 4 bits

Convert $B$ to its two’s complement form (e.g. 1111 - 0110 + 0001 = 1010)

2^{n} - 1 - B + 1 = 2^{n} - B

Addition

A + 2^{n} - B = 2^{n} + A - B

Signed vs Unsigned

	Unsigned	Signed 2’s Complement
Minimum	$000 \dots 0 0_{2}$	$100 \dots 0 0_{2}$
Maximum	$111 \dots 1 1_{2}$	$011 \dots 1 1_{2}$
Range	$[0, 2^{k} - 1]$	$[- 2^{k - 1}, 2^{k - 1} - 1]$

In a signed integer, a number can overflow/underflow when it exceeds the maximum/minimum

Sign Extension

For signed integers
Fill missing bits with the sign digit, e.g. 1001 → 1111 1001
Value will be preserved

Zero Extension

For unsigned integers
Fill missing bits with 0
Bitwise logical operations

Floating Point Numbers

Normalized Scientific Notation

Scientific notation with no leading 0’s

Important

When dealing with binary number, the first digit will always be 1, so we don’t have to use extra memory space to store that

Important

Exponent is added with a bias, which is $2^{n - 1} - 1$ , where n is the size of the exponent in bits, to ensure all the exponent stored is positive for faster comparison

Single Precision

32 bits, 1 sign bit, 8 bits exponent, 23 bits significand

Range

0000 0000 and 1111 1111 are reserved

Smallest
- Exponent: 0000 0001, i.e. $1 - 127 = - 126$
- Fraction: 000...00
- $\pm 1.0 \times 2^{- 126}$
Largest
- Exponent: 1111 1110, i.e. $254 - 127 = 127$
- Fraction: 111...11 $\approx 2$
- $\pm 2 \times 1 0^{127}$

Double Precision

64 bits, 1 sign bit, 11 bits exponents, 52 bits significand

Bias

e.g. 8 bits exponent (0 → 255)
biased → (-128 → 127)

Formula

X = (- 1)^{S} \times (1. F) \times 2^{E - Bias}

Conversion Example

5.75 (10) = 101.11 (2)

use short division when dealing with integer part
use multiplication when dealing with decimal part

0.75 x 2 = 1.5
0.5 x 2 = 1.0 (stop when 0)

start writing from top to bottom (the integer part)

then shift the result until there is only one single digit in the integer part

101.11 → 1.0111 x 2^2 (shift by 2 digits)

then we have to store the sign, significand, exponent

storing: E = bias + exp
retrive: exp = E - bias

bias is the max value of n bits, e.g. 8 bit store 0 will be stored as 127

bias = $2^{n - 1} - 1$

Minimum for single precision:
(0)(0)(000…0001)

(- 1)^{s} \times 0. F \times 2^{- 126} = 1 \times 2^{- 23} \times 2^{- 126} = 2^{- 149}

🏡

Explorer

Data Representation

Two’s Complement

Subtraction

Signed vs Unsigned

Sign Extension

Zero Extension

Floating Point Numbers

Normalized Scientific Notation

Single Precision

Range

Double Precision

Bias

Formula

Conversion Example

Overflow

Underflow

ASCII

Explorer

Table of Contents

Backlinks