IEEE-754 Reference Material

IEEE-754 Reference Material http://babbage.cs.qc.edu/IEEE-754/References.
xhtml
IEEE-754 Reference Material

This page provides some reference material for the IEEE-754 floating-point standard. It is a companion page to the
three calculator pages:
Convert decimal numbers to IEEE-754 representation. Enter a decimal value and see how it would be encoded
as a single-precision or double-precision IEEE-754 floating point number.
Analyze a single-precision IEEE-754 value. Enter a 64-bit value in hexadecimal, and see the corresponding
decimal value and the breakdown of all the fields.
Analyze a double-precision IEEE-754 value. Enter a 64-bit value in hexadecimal, and see the corresponding
decimal value and the breakdown of all the fields.
What’s Here
History of the Calculators
Some References
Kevin’s Report
Kevin’s Tables
History of the Calculators

These calculators came into being in the fall semester of 1997 when I assigned my Computer Organization course an
optional project to write a program that would print the values of the fields of IEEE-754 floating point numbers. One
student, Quanfei Wen, decided to do the program as an interactive web page, using Javascript to do the calculations.
I put Quanfei’s pages up on my web site as a resource for other students taking my Computer Organization course,
where it was found by Keven Brewer, who was working for Delco Electronics at the time. Kevin noticed some
special cases that Quanfei’s calculators didn’t handle, and volunteered to work up the precise versions of the code that
the calculators now use. Kevin also did a web search for information on the IEEE-754 standard, which is included in
the Kevin’s Report section below. Kevin also prepared some web tables listing the parameters of the standard.
The calculators have turned out to be a popular resource on the web for academics and industry alike. I know of
translations into Spanish and German, and they get over 10,000 hits a month on my web server.
In December 2004 I started reworking these pages to improve their appearance and conformance to web standards.
The javascript code behind them has been moved to separate files, but interested users can download the code from
the js/ directory. The original pages, with embedded javascript code, remain available at my old Computer
Organization course web site.
Some References
“In 1976 Intel began to design a floating-point co-processor for its i8086/8 and i432 microprocessors. Dr. John
Palmer persuaded Intel that it needed an arithmetic standard to prevent different boxes with “Intel” on the outside from
computing disparate results inside. At Stanford ten years earlier, Palmer had heard a visiting professor, William
Kahan, analyze commercially significant arithmetics and assess how much their anomalies inflated the costs of reliable
and portable numerical software. Kahan had also enhanced the numerical prowess of a successful line of Hewlett-
Packard calculators. Palmer, now the manager of Intel’s floating-point effort, recruited Kahan as a consultant to help
design the arithmetic for the i432 ( which died later ) and for the i8086/8’s upcoming i8087 coprocessor.” From An
Interview with the Old Man of Floating-Point, a short, facinating read on the development of the IEEE-754
Floating-Point standard.
The IEEE-754 Standard for Binary Floating-Point Arithmetic was published in 1985. You can order a copy of the
standard from the IEEE. The web site for the IEEE-754 is a good place to go for links to information about IEEE-754
Convert
and floating-point Decimal Floating-Point Numbers to IEEE-754 Hexadecimal Representations.
in general.
Convert IEEE-754 32-bit Hexadecimal Representations to Decimal Floating-Point Numbers.
A popular paper on
Convert the mathematical
IEEE-754 properties ofRepresentations
64-bit Hexadecimal floating-point numbers, including
to Decimal the IEEE-754Numbers.
Floating-Point standard is What
Every Computer Scientist Should Know About Floating-Point Arithmetic,
Vickery Home Page. which is also available from docs.sun.com
XHTML - CSS - Last updated 2010-04-01.
1 of 6 2/6/2011 7:55 AM
IEEE-754 Reference Material http://babbage.cs.qc.edu/IEEE-754/References.xhtml
The Q4, 1999 issue of the Intel Technology Journal had an article, IA-64 Floating-Point operations and the IEEE
standard for binary floating-point arithmetic, by M. Cornea-Hasegan and B. Norin, which provides a summary of the
IEEE-754 standard and includes a table of ten different floating-point formats used by Intel’s 64-bit microprocessors
(the IA-64 architecture). Most of the article discusses featues of the IA-64 floating-point instruction set. [2008-11-14:
Link updated to point to PDF version of the article.]
A significant floating-point standard, which pre-dates the IEEE-754 standard, is the “hexadecimal encoding” used on
IBM mainframes. This format uses sixteen instead of two as the base to which the exponent is raised. The IBM S/390
G5 processor was the first one to integrate IBM’s traditional hexadecimal encoding and IEEE-754 in the same
floating-point unit. It is described in the paper, The S/390 G5 floating point unit by E. M. Schwarz and C. A.
Krygowski, which appeared in the IBM Journal of Research and Development, vol. 43, No. 5/6, September/November
1999, pp 707-721.
Kevin’s Report
The rest of the material on this page came from Kevin J. Brewer, who worked for Delco Electronics at the time he
wrote it. In addition to the material below, Kevin greatly refined the JavaScript code for the IEEE-754 Calculator
page originally written by a Queens College student, Quanfei Wen.
At the end of this page are Kevin’s Charts, which summarize the IEEE-754 single and double precision formats.
If you find a broken link in the material below, please let me know, especially if you know where the page has moved.
(Send mail to vickery at babbage.cs.qc.edu with “IEEE-754” in the Subject line.) Where there are links that I know
are broken in what follows, I put the broken link in [square brackets] and preserved Kevin’s surrounding text.
Kevin suggested, “Scroll up and down from the locations cited below in order to learn other information about the
IEEE-754 standard.”
The source which showed me that there were actually positive and negative NaNs and introduced me to a new special number,
Indeterminate, was [ this page ] (link updated 2007-10-13). To find the table showing these NaNs and Indeterminate, use the Edit |
Find... command on the string “the corresponding values”. Scroll up a little in order to take a look at the “Special Operations”
table. And right above that table is the list of special numbers and their meanings.
The source which introduced me to the concepts of “signaling” and “quiet” NaNs was [ http://www.cas.american.edu
/~studdard/classes/fall1995/4028201/notes/17oct95/I.html ]. To find the section on “signaling” and “quiet” NaNs, use the
Edit | Find... command on the string “NaNs can be signaling or quiet”.
The source which allowed me to distinguish between “signaling” and “quiet” NaNs was [ this page ]. To find the section on
NaNs and the encodings of other special numbers, use the Edit | Find... command on the string “The definition of NaNs”.
[ This source ] shows the mathematical equations which define the various IEEE-754 values and ranges.
The source which introduced me to IEEE-754’s four rounding modes and the guard, round, and “sticky” bits was [ this page ].
To find the section on rounding, use the Edit | Find... command on the string “four different rounding modes”.
Some sources on the Web claim that IEEE-754 specifies four floating-point formats in two groups, basic and extended, with a
“single-precision” and a “double-precision” format in each of the two groups. To find this information, use the Edit | Find...
command on the string “IEEE 754 specifies four” on [ http://www.cas.american.edu/~studdard/classes/fall1995/4028201
/notes/17oct95/I.html ] and the Edit | Find... command on the string “The other two formats” on [ this page ].
Upon reading the IEEE-754 standard, one learns from “Table 1, Summary of Format Parameters” on page 9 that the extended
formats are very loosely defined with unspecified exponential biases and only lower bounds for precisions and exponents,
while the basic formats are specified exactly in terms of field widths and semantics. The extended formats are so loosely
defined that particular implementations of these formats may be so different that numerical approximation routines using them
could be non-portable.
Convert Decimal Floating-Point Numbers to IEEE-754 Hexadecimal Representations.
Other sources on theIEEE-754
Convert Web claim that IEEE-754
32-bit specifies
Hexadecimal only three floating-point
Representations formats, “single-precision”,
to Decimal Floating-Point Numbers. “double-
precision”, Convert
and “quadruple-precision”. One source
IEEE-754 64-bit Hexadecimal [ href="http://www.iac.tut.fi/usr/local/doc/Fortran-90/Version-
Representations to Decimal Floating-Point Numbers.
2.0/f9a200_spd.txt"> ] shows the three IEEE-754 formats
Vickery and their Page.
Home max and min values in DEC’s Fortran-90 documentation.
To find the section on the three IEEE-754 formats, use the Edit | Find...
XHTML - CSS - Last updated command on the string “32-bit IEEE”. [ Another
2010-04-01.
2 of 6 2/6/2011 7:55 AM
source ] shows the encodings of the special numbers and the number of bits in each field for each of the three IEEE-754
formats. To find the sections on the three IEEE-754 formats, use the Edit | Find... command on the string “For single-precision
floating point numbers” and start scrolling down.
When comparing the format parameters of “extended double-precision” in IEEE-754’s Table 1 and those of the so-called
“quadruple-precision”, one finds that the “quadruple-precision” format is simply a specific instance of the “extended double-
precision” format. Similarly, one will note that “double-precision” is a specific instance of “extended single-precision”.
The 80-bit “extended-precision” format is used “internally” by the Intel 80x87 floating-point math “co-processor” in order to
be able to shift operands back and forth without any loss of precision in the IEEE-754 64-bit (and 32-bit) format. To find this
information, use the Edit | Find... command on the string “it also implements an “extended-precision” format” on
[ http://www.cas.american.edu/~studdard/classes/fall1995/4028201/notes/17oct95/I.html ].
A source which describes the exponential bias of Intel’s 80-bit “extended-precision” format and its usage of the additional
bits it contains relative to the “double-precision” format is [ webster.cs.ucr.edu ]. (Link updated 2005-05-03). To find this data,
use the Edit | Find... command on the string “In order to help ensure accuracy”.
[ http://webster.cs.ucr.edu/Page_asm/ArtofAssembly/CH14/CH14-3.html ] states that Intel’s “extended-precision” format

supports non-normalized numbers (values very close to zero whose most significant mantissa bit is not zero). To find this
support information, use the Edit | Find... command on the string “Normalized values provide”.
When one compares these stated and implied format parameters of Intel’s “extended-precision” with those of “extended
double-precision” in Table 1, one finds that the “extended-precision” format is a specific instance of the “extended double-
precision” format, similarly to the “quadruple-precision” format.
Table 1: Expanded Summary of Format Parameters

Format
No. Parameter
Single Double
Single Extended Double Extended Quadruple + Extended #
(1) p (precision, 24 ≥ 32 53 ≥ 64 113 64

apparent mantissa width in bits)
(2) Decimal digits of precision 7.22 ≥ 9.63 15.95 ≥ 19.26 34.01 19.26
p / log2(10)
(3) Mantissa’s MS-Bit hidden bit unspecified hidden bit unspecified hidden bit explicit bit
(4) Actual mantissa width in bits 23 ≥ 31 52 ≥ 63 112 64
(5) Emax +127 ≥ +1023 +1023 ≥ +16383 +16383 +16383
(6) Emin -126 ≤ -1022 -1022 ≤ -16382 -16382 -16382
(7) Exponent bias +127 unspecified +1023 unspecified +16383 +16383
(8) Exponent width in bits 8 ≥ 11 11 ≥ 15 15 15
(9) Sign width in bits 1 1 1 1 1 1
(10) Format width in bits 32 ≥ 43 64 ≥ 79 128 80

(9) + (8) + (4)
(11) Range Magnitude Maximum 3.4028E+38 ≥ 1.7976E+308 1.7976E+308 ≥ 1.1897E+4932 1.1897E+4932 1.1897E+4932
2Emax + 1
(12) Range Magnitude Minimum 1.1754E-38 ≤ 2.2250E-308 2.2250E-308 ≤ 3.3621E-4932 3.3621E-4932 3.3621E-4932
2Emin
(13) Range Magnitude Minimum 1.4012E-45 ≤ 1.0361E-317 4.9406E-324 ≤ 3.6451E-4951 6.4751E-4966 1.8225E-4951
(Denormalized)
2Emin - (4)
(14) FORTRAN Language Type REAL*4 REAL*8 REAL*16 REAL*10
(15) C Language Type float double long double long double
© Copyright 1985 byConvert

The InstituteDecimal
of Electrical Floating-Point Numbers
and Electronics Engineers, Inc to IEEE-754 Hexadecimal Representations.
+ Although the “quadruple-precision” name and the particular parameters of its format are not specified in the IEEE-754 standard, it is a legally derived IEEE-754 format
Convert
because its parameters IEEE-754
are specific 64-bit
subset elements Hexadecimal
within Representations
the bounds of those to Decimal
specified for the “extended Floating-Point
double-precision” format. Numbers.
Vickery Home Page.
# Like the “quadruple-precision” format, Intel’s “extended-precision” format is a legal IEEE-754 format derived from the “extended double-precision” format.
3 of 6 2/6/2011 7:55 AM
Other sources on IEEE-754 include:
[ http://spectra.eng.hawaii.edu/Courses/EE361.S95/Lectures/Lec38/lec38.3.html ]
[ http://www.ece.uiuc.edu/~ece291/lecture/l11.html ]
[ Carleton University ]
[ Grinnell College ]
[ http://cch.loria.fr/documentation/IEEE754/index.html#wkahan Papers on Floating-Point by William Kahan -- “The
Father of IEEE-754” ] Broken link, but see Prof. Kahan’ home page.
Kevin’s Summary Charts

Storage Layout and Ranges of Floating-Point Numbers
IEEE-754 floating-point numbers require three component fields: the sign, the exponent, and the mantissa. The exponential
base is 2 and is never stored in any way with the value in either the registers or memory (it is implied). In order to allow the
exponent and mantissa, when taken together, to vary monotonically, the signed exponent is represented in excess-127 unsigned
form for single precision and excess-1023 for double precision. This excess-127 (or excess-1023) representation is indicated
by the variable “e” below.
Since IEEE-754 floating-point numbers are stored in a signed magnitude form, the ranges and binary patterns of the positive
and negative numbers are symmetric about the midpoint of the entire range of values (between the positive and negative
zeros). As a result, essentially any statement made in regard to the positive numbers is also true of the negative numbers and
vice versa.
The range of positive floating-point numbers is split into normalized numbers (normal numbers) which preserve the full
precision of the mantissa, including the hidden bit, (24 bits for single precision and 53 bits for double precision) and
denormalized numbers (subnormal numbers, so-called unnormalized numbers) which have from 1 to 23 significant bits for
single precision and 1 to 52 bits for double precision.
The number line tables below, which show the layout for single (32-bit) and double (64-bit) precision floating-point numbers
and their special values, were inspired by the table on [ this page ] (link updated 2007-10-13). To find the table on which these
two are based, use the Edit | Find... command on the string “the corresponding values”. In their column headers, these tables
indicate the number of bits in each field along with their bit ranges in square brackets.
The values shown in the Decimal Range column of the tables are the end points of their respective ranges with the IEEE-754
round-to-nearest value mode applied. JavaScript uses IEEE-754 double precision floating-point with round-to-nearest value
mode to perform all of its arithmetic operations including its input string to numeric conversion routine. Therefore, by default,
double (64-bit) precision conversions are automatically rounded to values matching these tables. In order for single (32-bit)
precision conversions to be rounded to values matching these tables, the user must click the Rounded button on those pages
where it is present.
32-bit Single Precision
Sign (s) Exponent (e) Mantissa (m)

Range Name Hexadecimal Range Range Decimal Range §
1 [31] 8 [30-23] 23 [22-0]
11..11 FFFFFFFF
Quiet
1 11..11 : :
-NaN
10..01 FFC00001
Indeterminate 1 11..11 10..00 FFC00000
01..11 FFBFFFFF
Signaling
1 11..11 : :
-NaN
00..01 FF800001
-Infinity
(Negative Overflow)
1 11..11 00..00 FF800000 < -(2-2-23) × 2127 ≤ -3.4028235677973365E+38

-(2-2-23) × 2127
Convert
Negative Normalized32-bit11..10
IEEE-7541 :
11..11
Hexadecimal :
FF7FFFFF
Representations
:
to Decimal :
-3.4028234663852886E+38
Floating-Point Numbers.
:
-1.m ×
Convert 2(e-127)
IEEE-754
64-bit00..01
Hexadecimal
00..00Representations
80800000 to Decimal Floating-Point
-126 Numbers.
-1.1754943508222875E-38
-2
Vickery Home Page.
4 of 6 2/6/2011 7:55 AM
-(1-2-23) × 2-126 -1.1754942106924411E-38

Negative Denormalized 11..11 807FFFFF : :
1 00..00 : :
-0.m × 2(-126) 00..01 80000001 -2-149 -1.4012984643248170E-45
(-(1+2-52) × 2-150) * (-7.0064923216240862E-46) *
-2-150 -7.0064923216240861E-46
Negative Underflow 1 00..00 00..00 80000000 : :
< -0 < -0
-0 1 00..00 00..00 80000000 -0 -0
+0 0 00..00 00..00 00000000 0 0
>0 >0
Positive Underflow 0 00..00 00..00 00000000 : :
2-150 7.0064923216240861E-46
((1+2-52) × 2-150) * (7.0064923216240862E-46) *

00..01 00000001
Positive Denormalized
0 00..00 : : 2-149 1.4012984643248170E-45
0.m × 2(-126) 11..11 007FFFFF : :
(1-2-23) × 2-126 1.1754942106924411E-38
Positive Normalized 00..01 00..00 00800000 2-126 1.1754943508222875E-38

0 : : : : :
1.m × 2(e-127) 11..10 11..11 7F7FFFFF (2-2-23) × 2127 3.4028234663852886E+38
+Infinity
(Positive Overflow)
0 11..11 00..00 7F800000 > (2-2-23) × 2127 ≥ 3.4028235677973365E+38
00..01 7F800001
Signaling
0 11..11 : :
+NaN
01..11 7FBFFFFF
10..00 7FC00000
Quiet
0 11..11 : :
+NaN
11..11 7FFFFFFF
64-bit Double Precision
Sign (s) Exponent (e) Mantissa (m)

Range Name Hexadecimal Range Range Decimal Range §
1 [63] 11 [62-52] 52 [51-0]
11..11 FFFFFFFFFFFFFFFF
Quiet
1 11..11 : :
-NaN
10..01 FFF8000000000001
Indeterminate 1 11..11 10..00 FFF8000000000000
01..11 FFF7FFFFFFFFFFFF
Signaling
1 11..11 : :
-NaN
00..01 FFF0000000000001
-Infinity
(Negative Overflow)
1 11..11 00..00 FFF0000000000000 < -(2-2-52) × 21023 ≤ -1.7976931348623158E+308
Negative Normalized 11..10 11..11 FFEFFFFFFFFFFFFF -(2-2-52) × 21023 -1.7976931348623157E+308

1 : : : : :
-1.m × 2(e-1023) 00..01 00..00 8010000000000000 -2-1022 -2.2250738585072014E-308
-(1-2-52) × 2-1022 -2.2250738585072010E-308

Negative Denormalized 11..11 800FFFFFFFFFFFFF : :
1 00..00 : :
-0.m × 2(-1022) 00..01 8000000000000001 -2-1074 -4.9406564584124654E-324
(-(1+2-52) × 2-1075) * (-2.4703282292062328E-324) *
-2-1075 -2.4703282292062327E-324
Negative Underflow 1 00..00 00..00 8000000000000000 : :
< -0 < -0
-0 1 00..00 00..00 8000000000000000 -0 -0
+0 0 00..00 00..00 0000000000000000 0 0

>0
Convert IEEE-754 32-bit Hexadecimal Representations to Decimal :
Floating-Point Numbers.
>0
Positive Underflow 0 00..00 00..00 0000000000000000 :
Convert IEEE-754 64-bit Hexadecimal Representations to Decimal
2-1075Floating-Point Numbers.
2.4703282292062327E-324
Vickery Home Page.
5 of 6 2/6/2011 7:55 AM
((1+2-52) × 2-1075) * (2.4703282292062328E-324) *

00..01 0000000000000001
Positive Denormalized
0 00..00 : : 2-1074 4.9406564584124654E-324
0.m × 2(-1022) 11..11 000FFFFFFFFFFFFF : :
(1-2-52) × 2-1022 2.2250738585072010E-308
Positive Normalized 00..01 00..00 0010000000000000 2-1022 2.2250738585072014E-308

0 : : : : :
1.m × 2(e-1023) 11..10 11..11 7FEFFFFFFFFFFFFF (2-2-52) × 21023 1.7976931348623157E+308
+Infinity
(Positive Overflow)
0 11..11 00..00 7FF0000000000000 > (2-2-52) × 21023 ≥ 1.7976931348623158E+308
00..01 7FF0000000000001
Signaling
0 11..11 : :
+NaN
01..11 7FF7FFFFFFFFFFFF
10..00 7FF8000000000000
Quiet
0 11..11 : :
+NaN
11..11 7FFFFFFFFFFFFFFF
§
Your least significant digits may differ.
*
The minimum magnitude values of denormalized ranges are represented by a single significant bit (a bit whose value is 1) at
the right hand end of its format’s mantissa. For single (32-bit) and double (64-bit) precision, these minimum range values are
1.4012984643248170E-45 and 4.9406564584124654E-324 respectively. The values 7.0064923216240862E-46 and
2.4703282292062328E-324 are each a little more than half of these minima. They are represented by one significant bit to the
right of their format’s storable mantissa and another 1-bit spaced the double precision’s mantissa width to the right of the first
bit. Then, as a result of the IEEE-754 round-to-nearest value mode’s operation, these values are rounded to the denormalized
range minimum values.

Vickery Home Page.
6 of 6 2/6/2011 7:55 AM

IEEE-754 Reference Material

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

IEEE-754 Reference Material

Enviado por

Direitos autorais:

Formatos disponíveis

IEEE-754 Reference Material http://babbage.cs.qc.edu/IEEE-754/References.

IEEE-754 Reference Material

History of the Calculators

[ http://webster.cs.ucr.edu/Page_asm/ArtofAssembly/CH14/CH14-3.html ] states that Intel’s “extended-precision” format

Table 1: Expanded Summary of Format Parameters

(1) p (precision, 24 ≥ 32 53 ≥ 64 113 64

(4) Actual mantissa width in bits 23 ≥ 31 52 ≥ 63 112 64

(5) Emax +127 ≥ +1023 +1023 ≥ +16383 +16383 +16383

(6) Emin -126 ≤ -1022 -1022 ≤ -16382 -16382 -16382

(7) Exponent bias +127 unspecified +1023 unspecified +16383 +16383

(8) Exponent width in bits 8 ≥ 11 11 ≥ 15 15 15

(9) Sign width in bits 1 1 1 1 1 1

(10) Format width in bits 32 ≥ 43 64 ≥ 79 128 80

(14) FORTRAN Language Type REAL*4 REAL*8 REAL*16 REAL*10

(15) C Language Type float double long double long double

© Copyright 1985 byConvert

Other sources on IEEE-754 include:

Kevin’s Summary Charts

32-bit Single Precision

Sign (s) Exponent (e) Mantissa (m)

Indeterminate 1 11..11 10..00 FFC00000

Convert Decimal Floating-Point Numbers to IEEE-754 Hexadecimal Representations.

-(1-2-23) × 2-126 -1.1754942106924411E-38

-0 1 00..00 00..00 80000000 -0 -0

+0 0 00..00 00..00 00000000 0 0

((1+2-52) × 2-150) * (7.0064923216240862E-46) *

Positive Normalized 00..01 00..00 00800000 2-126 1.1754943508222875E-38

64-bit Double Precision

Sign (s) Exponent (e) Mantissa (m)

Indeterminate 1 11..11 10..00 FFF8000000000000

Negative Normalized 11..10 11..11 FFEFFFFFFFFFFFFF -(2-2-52) × 21023 -1.7976931348623157E+308

-(1-2-52) × 2-1022 -2.2250738585072010E-308

-0 1 00..00 00..00 8000000000000000 -0 -0

+0 0 00..00 00..00 0000000000000000 0 0

((1+2-52) × 2-1075) * (2.4703282292062328E-324) *

Positive Normalized 00..01 00..00 0010000000000000 2-1022 2.2250738585072014E-308

Convert Decimal Floating-Point Numbers to IEEE-754 Hexadecimal Representations.

Você também pode gostar

(14) FORTRAN Language Type REAL4 REAL8 REAL16 REAL10