[Document initially written in 2004.]

The traditional FPU of the x86 processors (a.k.a. x87) can be configured to round its results either in extended precision (64-bit significands) or in double precision (53-bit significands); internal rounding in single precision is also supported, but not used in practice. Some operating systems, such as FreeBSD, NetBSD up to version 6, and Microsoft Windows (but not always?), chose to configure the processor so that, by default, it rounds in double precision. On the other hand, under GNU Linux, the rounding is done in extended precision (I do not know any other system having the same behavior by default). This is a bad choice for the reasons given below.

Note: the
SSE2
instructions, used by 64-bit systems in particular, are
*not* concerned by these problems.

More up-to-date information may be found on Wikipedia.

Before listing the drawbacks of the extended precision, one should note that the (optional) extended precision was introduced in the IEEE-754 standard essentially to allow an accurate double-precision implementation of the elementary functions of the mathematical library (exponential, logarithm, trigonometric and hyperbolic functions, etc), thus for very particular cases.

The file `/usr/include/fpu_control.h` under
Linux/x86 says:

#define _FPU_EXTENDED 0x300 /* libm requires double extended precision. */

but tests (with
mpcheck
in particular) show that the math library does not seem to need extended
precision: double precision seems to be sufficient. Or does anyone have a
counter-example?
**Update:** As shown in
glibc
bug 706, the `pow`

function is inaccurate in
some cases, and even more when the processor is configured in double
precision.

Even if in a programming language, a type corresponding to
the double precision is used (for instance, the type `double`
on most C implementations), a processor computing in extended precision
internally will not necessarily give the same results as with a processor
computing in double precision. For instance, let us consider the two
floating-point double-precision numbers
`x` = 2^{53} + 2
and
`y` = 1 − 2^{−16},
and their sum in the rounding-to-nearest mode
`z` = `x` + `y`.
On a system working in double precision, the result is the rounding of
2^{53} + 3 − 2^{−16},
that is, 2^{53} + 2 (closer to the
exact result than the following machine number
2^{53} + 4 is). On a system working in
extended precision, the result will be first rounded in extended precision,
that is, 2^{53} + 3, then rounded in
double precision (when stored to memory, for instance), that is,
2^{53} + 4 (by using the *even
rounding* rule, since 2^{53} + 3
is the middle of two consecutive machine numbers). Therefore the obtained
final result is different.

Note: this phenomenon, called *double rounding*,
occurs in the rounding-to-nearest mode only, but this is the rounding
mode that is used in general.

First the extended precision leads to portability problems, even if much care is taken at the type level, in particular because the support for extended precision is optional, and in practice, many processors do not support it. Also note that the extended precision is not completely normalized. Even if obtaining exactly the same result on two different machines is not regarded as important and the requirements are linked only to the accuracy of the results, the extended precision is useless, since a portable program must assume the worst case.

If the portability is not important and the software needs to
compute in extended precision on some given machine (by using the
`long double` type in C, for instance), the software can
still configure the processor in rounding to extended precision (in a
non-portable way unfortunately, but anyway, the portability has already
been sacrificed).

Concerning the portability, it is thus preferable to have everywhere a processor whose default behavior is to round to the double precision, without any major drawback for the programs needing extended precision.

The goal of the extended precision is to allow to have more accurate
basic operations in order to obtain more accurate final results. But most
programs have not been written specially for the extended precision, and in
general, one gets no guarantee. For instance, many programs written in C use
the type `double`, and even if the intermediate computations are
carried out in extended precision, some results may automatically be converted
to double precision, and these conversions cannot be controlled.

The extended precision can be needed by some software. But in this
case, it is important to be able to prove the error bounds and not to rely
on assumptions which could be wrong. In particular, using the extended
precision with the type `double` will not allow the user to
obtain better certified error bounds; on the contrary, the bounds may be
worse because of the double-rounding problem. The only solution is to
write non-portable code (as told above), whether the default precision is
the double one or the extended one.

For the other software, the choice of the default precision has little importance, but rounding in double precision is preferable concerning the proofs related to the error bounds.

Some algorithms are based on the correct-rounding property and may become completely wrong if a double rounding can occur. Concerning this point, my paper The Euclidean division implemented with a floating-point division and a floor gives an example particularly useful for programs written in ECMAScript (often referred to as Javascript) or using XPath.

Moreover, a same floating-point expression evaluated twice can yield
different results and make a comparison fail. It is particularly true with
GCC, which does not conform
to some points of the ISO C standard. Several people
encountered these problems; see the
GCC
bug 323 and the duplicates (which are in fact not always
real duplicates, since the problem is not always the same, even though
the first cause is the extended precision), and my program
`tst-ieee754.c`.
Note that GCC bug 323 has
been fixed in
GCC 4.5,
as long as the right compiler options are used, like asking to conform
to the C standard (see also the corresponding
patch):
GCC now supports
handling floating-point excess precision arising from use of the x87
floating-point unit in a way that conforms to ISO C99.
This is enabled with

`-fexcess-precision=standard` and with
standards conformance options such as `-std=c99`, and may
be disabled using `-fexcess-precision=fast`.

Finally, some standards such as Java and XPath (and also XSLT, as it is partly based on XPath) require IEEE double precision with correct rounding, in order to obtain completely reproducible results on different machines or software. Default rounding in extended precision makes their implementation non portable (as explained above); and in practice, most implementations do not change the internal precision of the processor, thus are buggy (see below).

Note: Java also has a single-precision type (`float`),
but the choice between the rounding in double precision and the rounding in
extended precision does not solve the rounding problems concerning this type;
in both cases, without special code, the implementation would be incorrect
on this point. However one may assume that no developers concerned in the
numerical behavior of their program use the `float` type for the
calculations, especially as this would not be faster.

Test the XPath arithmetic
with this XSLT stylesheet, to be applied to any
XML file, on itself for instance. The following
XSLT processors give an incorrect result (message
Not a conforming XPath implementation.

) under Linux/x86:

LibXSLT (

`xsltproc`command). Last tested version: 1.1.7, under Debian/unstable. Bug reports: bug 123297 on the GNOME BTS, bug 206549 on the Debian BTS (closed), bug 247465 on the Debian BTS.Sablotron (

`sabcmd`command). Last tested version: 1.0, under Debian/unstable. Bug report: bug 257843 on the Debian BTS.Xalan-C++ (

`xalan`command). Last tested version: 1.8.0 under Debian/unstable.

Test the arithmetic of your JVM with this Java class (Java source). Correct result:

$ java test z = 9.007199254740994E15 d = 0.0

Incorrect result, when the processor rounds in extended precision:

$ java test z = 9.007199254740996E15 d = 2.0

The GNU interpreter gij (last tested version: 4.3.3) gives this incorrect result under Linux/x86. Bug 255525 on the Debian BTS. Bug 16122 on GCC Bugzilla.

This bug was fixed in the following JVMs:

SableVM. Fixed in the 1.1.7 release (it was bug 1 on the SableVM BTS). Fixed in the Debian package 1.1.6-4 (see bug 255524 on the Debian BTS).

JamVM. Fixed in the 1.2.1 release (see bug 260410 on the Debian BTS).

Kaffe Virtual Machine. Fixed in the 1.1.6-2 Debian package (see bug 255502 on the Debian BTS).

CACAO. Fixed in 0.94-2 Debian package (version 0.94-1 was incorrect, but not previous versions, see bug 350729 on the Debian BTS).

Note: Sun's and IBM's JVMs always give a correct result.

For more information about Java numerics: Improving Java for Numerical Computation (in particular, see Section Use of floating-point hardware) on the JavaNumerics web site, and Updates to the Java Language Specification for JDK Release 1.2 Floating Point.

The problem of the single precision remains (is it often used?). Developers of JIT compilers may be interested in the article Optimizing precision overhead for x86 processors by Takeshi Ogasawara, Hideaki Komatsu and Toshio Nakatani, in Software — Practice and Experience, 34(9):875–893, July 2004.

Test of the Javascript arithmetic of your browser: . The number 0 is the correct result. If you get 2, this probably means that the Javascript implementation of your browser uses extended precision, and this is incorrect (for more information, see Section 8.5 of the ECMAScript Language Specification). If you get nothing, this means that your browser does not support Javascript or Javascript has been disabled.

The following web browsers give the incorrect result 2 under Linux/x86:

Mozilla (last tested version: 1.8b Gecko/20050114), but see below concerning Firefox. Bug 309797 on the Debian BTS.

Opera (last tested version: 11.00).

The bug has apparently been fixed in Firefox. Version 1.5 Gecko/20060110 gave an incorrect result (idem until Gecko 1.9.1), but recent versions such as Firefox 3.5.6 and later seem to behave correctly. Bug 264912 on Bugzilla (now closed).

A critical bug has been found in PHP, but it is related to the wider exponent range (and GCC bug 323), rather than the extended precision (the code in question runs in the double-precision mode). See below.

In fact, the configuration of the x87 FPU in double-precision mode does not solve all the problems. Indeed, the x87 registers also have a wider exponent range. This means that to hope to be portable, one (in general, the compiler) should store the value into memory to really obtain a result in the double-precision format, but with a loss of efficiency. Then, a value in the subnormal range of the double-precision format will still be on 53 bits in an x87 register instead of having a reduced precision, and storing the value into memory will yield a double rounding. It is possible to handle these cases correctly, but this is no longer so simple. And though these cases are rare (they are only underflow cases), they can occur in practice, as shown below.

A bug in PHP due to the wider exponent range was found by Rick Regan on 2010-12-30: PHP Hangs On Numeric Value 2.2250738585072011e-308. See also his bug report and his explanations. In fact, the real bug is GCC bug 323, but PHP needs to avoid it. Consequence for a web server: possible denial of service.

An interesting article by S. W. Whiteley on Linux and the extended precision. Note: it was written in August 2001, therefore it does not mention the novelties since this date.

David Goldberg's article What Every Computer Scientist Should Know About Floating-Point Arithmetic, with the addendum Differences Among IEEE 754 Implementations by Doug Priest (article in PDF, with the addendum).

webmaster@vinc17.org