15.1 Floating-point representations

This is an old version, view current version.

15.1 Floating-point representations

Stan’s arithmetic is implemented using double-precision arithmetic. The behavior of most²⁶ modern computers follows the floating-point arithmetic, IEEE Standard for Floating-Point Arithmetic (IEEE 754).

15.1.1 Finite values

The double-precision component of the IEEE 754 standard specifies the representation of real values using a fixed pattern of 64 bits (8 bytes). All values are represented in base two (i.e., binary). The representation is divided into two signed components:

significand (53 bits): base value representing significant digits
exponent (11 bits): power of two multiplied by the base

The value of a finite floating point number is

$v = (-1)^s \times c \ 2^q$

15.1.2 Normality

A normal floating-point value does not use any leading zeros in its significand; subnormal numbers may use leading zeros. Not all I/O systems support subnormal numbers.

15.1.3 Ranges and extreme values

There are some reserved exponent values so that legal exponent values range between $-(2^10) + 2 = -1022$ and $2^10 - 1 = 1023$ . Legal significand values are between $-2^52$ and $2^52 - 1$ . Floating point allows the representation of both really big and really small values. Some extreme values are

largest normal finite number: $\approx 1.8 \times 10^{308}$
largest subnormal finite number: $\approx 2.2 \times 10^{308}$
smallest positive normal number: $\approx 2.2 \times 10^{-308}$
smallest positive subnormal number: $\approx 4.9 \times 10^{-324}$

15.1.4 Signed zero

Because of the sign bit, there are two ways to represent zero, often called “positive zero” and “negative zero”. This distinction is irrelevant in Stan (as it is in R), because the two values are equal (i.e., 0 == -0 evaluates to true).

15.1.5 Not-a-number values

A specially chosen bit pattern is used for the not-a-number value (often written as NaN in programming language output, including Stan’s).

Stan provides a value function nan() that returns this special not-a-number value. It is meant to represent error conditions, not missing values. Usually when not-a-number is an argument to a function, the result will not-a-number if an exception (a rejection in Stan) is not raised.

Stan also provides a test function is_nan(x) that returns 1 if x is not-a-number and 0 otherwise.

Not-a-number values propagate under almost all mathematical operations. For example, all of the built-in binary arithmetic operations (addition, subtraction, multiplication, division, negation) return not-a-number if any of their arguments are not-a-number. The built-in functions such as log and exp hav the same behavior, propagating not-a-number values.

Most of Stan’s built-in functions will throw exceptions (i.e., reject) when any of their arguments is not-a-number.

Comparisons with not-a-number always return false, up to and including comparison with itself. That is, not_a_number() == not_a_number() somewhat confusingly returns false. That is why there is a built-in is_not_a_number() function in Stan (and in C++). The only exception is negation, which remains coherent. This means not_a_number() != not_a_number() returns true.

Undefined operations often return not-a-number values. For example, sqrt(-1) will evaluate to not-a-number.

15.1.6 Positive and negative infinity

There are also two special values representing positive infinity ( $\infty)$ and negative infinity ( $-\infty$ ). These are not as pathological as not-a-number, but are often used to represent error conditions such as overflow and underflow. For example, rather than raising an error or returning not-a-number, log(0) evaluates to negative infinity. Exponentiating negative infinity leads back to zero, so that 0 == exp(log(0)). Nevertheless, this should not be done in Stan because the chain rule used to calculate the derivatives will attempt illegal operations and return not-a-number.

There are value functions positive_infinity() and negative_infinity() as well as a test function is_infinity().

Positive and negative infinity have the expected comparison behavior, so that negative_infinty() < 0 evaluates to true (represented with 1 in Stan). Also, negating positive infinity leads to negative infinity and vice-versa.

Positive infinity added to either itself or a finite value produces positive infinity. Negative infinity behaves the same way. However, attempts to subtract positive infinity from itself produce not-a-number, not zero. Similarly, attempts to divide infinite values results in a not-a-number value.