Computer number format

A computer number format is the internal representation of numeric values in digital computer and calculator hardware and software.^[1] Normally, numeric values are stored as groupings of bits, named for the number of bits that compose them. The encoding between numerical values and bit patterns is chosen for convenience of the operation of the computer; the bit format used by the computer's instruction set generally requires conversion for external use such as printing and display. Different types of processors may have different internal representations of numerical values. Different conventions are used for integer and real numbers. Most calculations are carried out with number formats that fit into a processor register, but some software systems allow representation of arbitrarily large numbers using multiple words of memory.

Binary number representation

Computers represent data in sets of binary digits. The representation is composed of bits, which in turn are grouped into larger sets such as bytes.

Table 1: Binary to Octal
Binary String	Octal value
000	0
001	1
010	2
011	3
100	4
101	5
110	6
111	7

Table 2: Number of values for a bit string.
Length of Bit String (b)	Number of Possible Values (N)
1	2
2	4
3	8
4	16
5	32
6	64
7	128
8	256
9	512
10	1024
...
$b$	$2^b=N$

A bit is a binary digit that represents one of two states. The concept of a bit can be understood as a value of either 1 or 0, on or off, yes or no, true or false, or encoded by a switch or toggle of some kind.

While a single bit, on its own, is able to represent only two values, a string of bits may be used to represent larger values. For example, a string of three bits can represent up to eight distinct values as illustrated in Table 1.

As the number of bits composing a string increases, the number of possible 0 and 1 combinations increases exponentially. While a single bit allows only two value-combinations and two bits combined can make four separate values and so on. The amount of possible combinations doubles with each binary digit added as illustrated in Table 2.

Groupings with a specific number of bits are used to represent varying things and have specific names.

A byte is a bit string containing the number of bits needed to represent a character. On most modern computers, this is an eight bit string. Because the definition of a byte is related to the number of bits composing a character, some older computers have used a different bit length for their byte.^[2] In many computer architectures, the byte is used to address specific areas of memory. For example, even though 64-bit processors may address memory sixty-four bits at a time, they may still split that memory into eight-bit pieces. This is called byte-addressable memory. Historically, many CPUs read data in some multiple of eight bits.^[3] Because the byte size of eight bits is so common, but the definition is not standardized, the term octet is sometimes used to explicitly describe an eight bit sequence.

A nibble (sometimes nybble), is a number composed of four bits.^[4] Being a half-byte, the nibble was named as a play on words. A person may need several nibbles for one bite from something; similarly, a nybble is a part of a byte. Because four bits allow for sixteen values, a nibble is sometimes known as a hexadecimal digit.^[5]

Octal and hex number display

Converting between bases

Table 3: Comparison of Values in Different Bases
Decimal Value	Binary Value	Octal Value	Hexadecimal Value
0	000000	00	00
1	000001	01	01
2	000010	02	02
3	000011	03	03
4	000100	04	04
5	000101	05	05
6	000110	06	06
7	000111	07	07
8	001000	10	08
9	001001	11	09
10	001010	12	0A
11	001011	13	0B
12	001100	14	0C
13	001101	15	0D
14	001110	16	0E
15	001111	17	0F

Main article: Positional notation (Base conversion)

Each of these number systems are positional systems, but while decimal weights are powers of 10, the octal weights are powers of 8 and the hex weights are powers of 16. To convert from hex or octal to decimal, for each digit one multiplies the value of the digit by the value of its position and then adds the results. For example:

$\text{octal } 756$

$= (7 * 8^2) + (5 * 8^1) + (6 * 8^0)$

$= (7 * 64) + (5 * 8) + (6 * 1)$

$= 448 + 40 + 6$

$= \text{decimal } 494$

$\text{hex } 3b2$

$= (3 * 16^2) + (11 * 16^1) + (2 * 16^0)$

$= (3 * 256) + (11 * 16) + (2 * 1)$

$= 768 + 176 + 2$

$= \text{decimal } 946$

Representing fractions in binary

Fixed-point numbers

Fixed-point formatting can be useful to represent fractions in binary.

The number of bits needed for the precision and range desired must be chosen to store the fractional and integer parts of a number. For instance, using a 32-bit format, 16 bits may be used for the integer and 16 for the fraction.

The eight's bit is followed by the four's bit, then the two's bit, then the one's bit. The fractional bits continue the pattern set by the integer bits. The next bit is the half's bit, then the quarter's bit, then the ⅛'s bit, and so on. For example:

				integer bits	fractional bits
0.500	=	¹⁄₂	=	00000000 00000000.10000000 00000000
1.250	=	1 ¹⁄₄	=	00000000 00000001.01000000 00000000
7.375	=	7 ³⁄₈	=	00000000 00000111.01100000 00000000

This form of encoding cannot represent some values in binary. For example, the fraction $\tfrac{1}{5}$ , 0.2 in decimal, the closest approximations would be as follows:

13107 / 65536	=	00000000 00000000.00110011 00110011	=	0.1999969... in decimal
13108 / 65536	=	00000000 00000000.00110011 00110100	=	0.2000122... in decimal

Even if more digits are used, an exact representation is impossible. The number $\tfrac{1}{3}$ , written in decimal as 0.333333333..., continues indefinitely. If prematurely terminated, the value would not represent $\tfrac{1}{3}$ precisely.

Floating-point numbers

While both unsigned and signed integers are used in digital systems, even a 32-bit integer is not enough to handle all the range of numbers a calculator can handle, and that's not even including fractions. To approximate the greater range and precision of real numbers, we have to abandon signed integers and fixed-point numbers and go to a "floating-point" format.

In the decimal system, we are familiar with floating-point numbers of the form (scientific notation):

1.1030402 × 10⁵ = 1.1030402 × 100000 = 110304.02

or, more compactly:

1.1030402E5

which means "1.1030402 times 1 followed by 5 zeroes". We have a certain numeric value (1.1030402) known as a "significand", multiplied by a power of 10 (E5, meaning 10⁵ or 100,000), known as an "exponent". If we have a negative exponent, that means the number is multiplied by a 1 that many places to the right of the decimal point. For example:

2.3434E-6 = 2.3434 × 10⁻⁶ = 2.3434 × 0.000001 = 0.0000023434

The advantage of this scheme is that by using the exponent we can get a much wider range of numbers, even if the number of digits in the significand, or the "numeric precision", is much smaller than the range. Similar binary floating-point formats can be defined for computers. There are a number of such schemes, the most popular has been defined by Institute of Electrical and Electronics Engineers (IEEE). The IEEE 754-2008 standard specification defines a 64 bit floating-point format with:

an 11-bit binary exponent, using "excess-1023" format. Excess-1023 means the exponent appears as an unsigned binary integer from 0 to 2047; subtracting 1023 gives the actual signed value
a 52-bit significand, also an unsigned binary number, defining a fractional value with a leading implied "1"
a sign bit, giving the sign of the number.

Let's see what this format looks like by showing how such a number would be stored in 8 bytes of memory:

byte 0:	S	x10	x9	x8	x7	x6	x5	x4
byte 1:	x3	x2	x1	x0	m51	m50	m49	m48
byte 2:	m47	m46	m45	m44	m43	m42	m41	m40
byte 3:	m39	m38	m37	m36	m35	m34	m33	m32
byte 4:	m31	m30	m29	m28	m27	m26	m25	m24
byte 5:	m23	m22	m21	m20	m19	m18	m17	m16
byte 6:	m15	m14	m13	m12	m11	m10	m9	m8
byte 7:	m7	m6	m5	m4	m3	m2	m1	m0

where "S" denotes the sign bit, "x" denotes an exponent bit, and "m" denotes a significand bit. Once the bits here have been extracted, they are converted with the computation:

<sign> × (1 + <fractional significand>) × 2^{<exponent> - 1023}

This scheme provides numbers valid out to about 15 decimal digits, with the following range of numbers:

	maximum	minimum
positive	1.797693134862231E+308	4.940656458412465E-324
negative	-4.940656458412465E-324	-1.797693134862231E+308

The specification also defines several special values that are not defined numbers, and are known as NaNs, for "Not A Number". These are used by programs to designate invalid operations and the like.

Some programs also use 32-bit floating-point numbers. The most common scheme uses a 23-bit significand with a sign bit, plus an 8-bit exponent in "excess-127" format, giving seven valid decimal digits.

byte 0:	S	x7	x6	x5	x4	x3	x2	x1
byte 1:	x0	m22	m21	m20	m19	m18	m17	m16
byte 2:	m15	m14	m13	m12	m11	m10	m9	m8
byte 3:	m7	m6	m5	m4	m3	m2	m1	m0

The bits are converted to a numeric value with the computation:

<sign> × (1 + <fractional significand>) × 2^{<exponent> - 127}

leading to the following range of numbers:

	maximum	minimum
positive	3.402823E+38	2.802597E-45
negative	-2.802597E-45	-3.402823E+38

Such floating-point numbers are known as "reals" or "floats" in general, but with a number of variations:

A 32-bit float value is sometimes called a "real32" or a "single", meaning "single-precision floating-point value".

A 64-bit float is sometimes called a "real64" or a "double", meaning "double-precision floating-point value".

The relation between numbers and bit patterns is chosen for convenience in computer manipulation; eight bytes stored in computer memory may represent a 64-bit real, two 32-bit reals, or four signed or unsigned integers, or some other kind of data that fits into eight bytes. The only difference is how the computer interprets them. If the computer stored four unsigned integers and then read them back from memory as a 64-bit real, it almost always would be a perfectly valid real number, though it would be junk data.

Only a finite range of real numbers can be represented with a given number of bits. Arithmetic operations can overflow or underflow, producing a value too large or too small to be represented.

The representation has a limited precision. For example, only 15 decimal digits can be represented with a 64-bit real. If a very small floating-point number is added to a large one, the result is just the large one. The small number was too small to even show up in 15 or 16 digits of resolution, and the computer effectively discards it. Analyzing the effect of limited precision is a well-studied problem. Estimates of the magnitude of round-off errors and methods to limit their effect on large calculations are part of any large computation project. The precision limit is different from the range limit, as it affects the significand, not the exponent.

The significand is a binary fraction that doesn't necessarily perfectly match a decimal fraction. In many cases a sum of reciprocal powers of 2 does not matches a specific decimal fraction, and the results of computations will be slightly off. For example, the decimal fraction "0.1" is equivalent to an infinitely repeating binary fraction: 0.000110011 ...^[6]

Numbers in programming languages

Programming in assembly language requires the programmer to keep track of the representation of numbers. Where the processor does not support a required mathematical operation, the programmer must work out a suitable algorithm and instruction sequence to carry out the operation; on some microprocessors, even integer multiplication must be done in software.

High-level programming languages such as LISP and Python offer an abstract number that may be an expanded type such as rational, bignum, or complex. Mathematical operations are carried out by library routines provided by the implementation of the language. A given mathematical symbol in the source code, by operator overloading, will invoke different object code appropriate to the representation of the numerical type; mathematical operations on any number—whether signed, unsigned, rational, floating-point, fixed-point, integral, or complex—are written exactly the same way.

Some languages, such as REXX and Java, provide decimal floating points operations, which provide rounding errors of a different form.

Notes and references

The initial version of this article was based on a public domain article from Greg Goebel's Vectorsite.

↑ Jon Stokes (2007). Inside the machine: an illustrated introduction to microprocessors and computer architecture. No Starch Press. p. 66. ISBN 978-1-59327-104-6.
↑ "byte definition". Retrieved 24 April 2012.
↑ "Microprocessor and CPU (Central Processing Unit)". Network Dictionary. Retrieved 1 May 2012.
↑ "nybble definition". Retrieved 3 May 2012.
↑ "Nybble". TechTerms.com. Retrieved 3 May 2012.
↑ Goebel, Greg. "Computer Numbering Format". Retrieved 10 September 2012.

External links

This article is issued from Wikipedia - version of the Friday, April 22, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.