Algorithms / Comparing floating-point numbers
Due to rounding errors, most floating-point numbers cannot be represented precisely in the computer. This led to some weird situations like:
0.1 + 0.2 = 0.30000000000000004
Different languages handle this calculation differently◹.
So, we should not directly compare two floating-point numbers, but use the approximate approach, called epsilon comparison:
approxEqAbs(a, b) =
diff := abs(a - b)
return diff <= ε
Where ε is some acceptable error margin, often called epsilon or machine epsilon. It’s the distance between 1 and the next largest floating-point number, or float(1 + ε) != 1
.
For example, for IEEE-754 single precision, ε = 2^-23:
1 = 1.00000000000000000000000
1 + epsilon_m = 1.00000000000000000000001
^ ^
2^0 2^(-23)
For double precision, ε = 2^-52:
1 = 1.0000000000000000000000000000000000000000000000000000
1 + epsilon_m = 1.0000000000000000000000000000000000000000000000000001
^ ^
2^0 2^(-52)
It’s predefined in many languages, for example:
Number.EPSILON
in JavaScript/TypeScriptf16_epsilon
,f32_epsilon
,f64_epsilon
,f128_epsilon
in ZigFLT_EPSILON
in C++f32::EPSILON
,f64::EPSILON
in Rust
See Values for standard hardware floating-point arithmetic◹ table on Wikipedia for more details.
The above approximation sounds right, but it only works for small numbers around zero. This won’t work correctly for larger numbers because the gap between floats grows larger.
A better approach for this case is to use relative epsilon comparison: Calculate the absolute difference between the two numbers a and b, and if the difference is smaller than a fixed portion of the larger number, then a and b can be considered equal.
approxEqRelative(a, b) =
largest := max(a, b)
diff := abs(a - b)
return diff <= largest * ε
In reality, when implementing these methods, we also need to pay attention to some edge cases. Such as, if a
and b
are zeros or infinities, we should just compare them directly with ==
, and if any of them is NaN
, the comparison should just return false
.
Following is the implementation of approxEqAbs
and approxEqRelative
in Zig’s standard library:
lib/std/math.zig
pub fn approxEqAbs(comptime T: type, x: T, y: T, tolerance: T) bool {
assert(@typeInfo(T) == .Float);
assert(tolerance >= 0);
// Fast path for equal values (and signed zeros and infinites).
if (x == y)
return true;
if (isNan(x) or isNan(y))
return false;
return fabs(x - y) <= tolerance;
}
pub fn approxEqRel(comptime T: type, x: T, y: T, tolerance: T) bool {
assert(@typeInfo(T) == .Float);
assert(tolerance > 0);
// Fast path for equal values (and signed zeros and infinites).
if (x == y)
return true;
if (isNan(x) or isNan(y))
return false;
return fabs(x - y) <= max(fabs(x), fabs(y)) * tolerance;
}
References:
- https://courses.engr.illinois.edu/cs357/fa2019/references/ref-1-fp/◹
- https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/◹
- https://floating-point-gui.de/errors/comparison/◹