21 Nov

Pentium Floating Point Division Bug

Date: Mon, 21 Nov 94 20:12:47 PST To: Fun_People Subject: Pentium Floating Point Division Bug Forwarded-by: bostic@bellcore.bellcore.com@CS.Berkeley.EDU (Keith Bostic) Forwarded-by: Elan Amir <elan@mercenary.CS.Berkeley.EDU> Forwarded-by: Nick Kralevich <nickkral@po.EECS.Berkeley.EDU> From: moler@mathworks.com (Cleve Moler) Newsgroups: comp.sys.intel Subject: MATLAB and the FDIV bug Date: 15 Nov 1994 23:30:22 -0500 Pentium Floating Point Division Bug There has been a flurry of activity the last fews days on the Internet news group, comp.sys.intel, that should interest MATLAB users. A serious design flaw has been discovered in the floating point unit on Intel's Pentium chip. Double precision divisions involving operands with certain bit patterns can produce incorrect results. The most dramatic example seen so far can be extracted from a posting last night by Tim Coe of Vitesse Semiconductor. In MATLAB, his example becomes x = 4195835 y = 3145727 z = x - (x/y)*y With exact computation, z would be zero. In fact, we get zero on most machines, including those using Intel 286, 386 and 486 chips. Even with roundoff error, z should not be much larger than eps*x, which is about 9.3e-10. But, on the Pentium, z = 256 The relative error, z/x, is about 2^(-14) or 6.1e-5. The computed quotient, x/y, is accurate to only 14 bits. An article in last week's edition of Electronic Engineering Times credits Prof. Thomas Nicely, a mathematics professor at Lynchburg College in Virginia, with the first public announcement of the Pentium division bug. One of Nicely's examples involves p = 824633702441 With exact computation q = 1 - (1/p)*p would be zero. With floating point computation, q should be on the order of eps. On most machines, we find that q = eps/2 = 2^(-53) ~= 1.11e-16 But on the Pentium q = 2^(-28) ~= 3.72e-09 This is roughly single precision accuracy and is typical of the most of the examples that had been posted before Coe's analysis. The bit patterns of the operands involved in these examples are very special. The denominator in Coe's example is y = 3*2^20 - 1 Nicely's research involves a theorem about sums of reciprocals of prime numbers. His example involves a prime of the form p = 3*2^38 - 18391 We're not sure yet how many operands cause the Pentium's floating point division to fail, or even what operands produce the largest relative error. It is certainly true that failures are very rare. But, as far as we are concerned, the real difficulty is having to worry about this at all. There are so many other things than can go wrong with computer hardware, and software, that, at least, we ought to be able to rely on the basic arithmetic. The bug is definitely in the Pentium chip. It occurs at all clock rates. The bug does not affect other arithmetic operations, or the built-in transcendental functions. Intel has recently made changes to the on-chip Program Logic Array that fix the bug and is now believed to be producing error free CPUs. It remains to be seen how long it will take for these to reach users. An unnamed Intel spokesman is quoted in the EE Times article as saying "If customers are concerned, they can call and we'll replace any of the parts that contain the bug." But, at the MathWorks, we have our own friends and contacts at Intel and we're unable to confirm this policy. We'll let you know when we hear anything more definite. In the meantime, the phone number for Customer Service at Intel is 800-628-8686. -- Cleve Moler Chairman and Chief Scientist The MathWorks, Inc.

© 1994 Peter Langston