15.1 Floating-point representations
Stan’s arithmetic is implemented using double-precision arithmetic. The behavior of most26 modern computers follows the floating-point arithmetic, IEEE Standard for Floating-Point Arithmetic (IEEE 754).
15.1.1 Finite values
The double-precision component of the IEEE 754 standard specifies the representation of real values using a fixed pattern of 64 bits (8 bytes). All values are represented in base two (i.e., binary). The representation is divided into two signed components:
significand (53 bits): base value representing significant digits
exponent (11 bits): power of two multiplied by the base
The value of a finite floating point number is
\[ v = (-1)^s \times c \, 2^q \]
15.1.2 Normality
A normal floating-point value does not use any leading zeros in its significand; subnormal numbers may use leading zeros. Not all I/O systems support subnormal numbers.
15.1.3 Ranges and extreme values
There are some reserved exponent values so that legal exponent values range between\(-(2^{10}) + 2 = -1022\) and \(2^{10} - 1 = 1023\). Legal significand values are between \(-2^{52}\) and \(2^{52} - 1\). Floating point allows the representation of both really big and really small values. Some extreme values are
largest normal finite number: \(\approx 1.8 \times 10^{308}\)
largest subnormal finite number: \(\approx 2.2 \times 10^{308}\)
smallest positive normal number: \(\approx 2.2 \times 10^{-308}\)
smallest positive subnormal number: \(\approx 4.9 \times 10^{-324}\)
15.1.4 Signed zero
Because of the sign bit, there are two ways to represent zero, often
called “positive zero” and “negative zero.” This distinction is
irrelevant in Stan (as it is in R), because the two values are equal
(i.e., 0 == -0
evaluates to true).
15.1.5 Not-a-number values
A specially chosen bit pattern is used for the not-a-number value
(often written as NaN
in programming language output, including
Stan’s).
Stan provides a value function not_a_number()
that returns this special
not-a-number value. It is meant to represent error conditions, not
missing values. Usually when not-a-number is an argument to a
function, the result will not-a-number if an exception (a rejection in
Stan) is not raised.
Stan also provides a test function is_nan(x)
that returns 1 if x
is not-a-number and 0 otherwise.
Not-a-number values propagate under almost all mathematical
operations. For example, all of the built-in binary arithmetic
operations (addition, subtraction, multiplication, division, negation)
return not-a-number if any of their arguments are not-a-number. The
built-in functions such as log
and exp
have the same behavior,
propagating not-a-number values.
Most of Stan’s built-in functions will throw exceptions (i.e., reject) when any of their arguments is not-a-number.
Comparisons with not-a-number always return false, up to and including
comparison with itself. That is, not_a_number() == not_a_number()
somewhat confusingly returns false. That is why there is a built-in
is_nan()
function in Stan (and in C++). The only exception
is negation, which remains coherent. This means not_a_number() != not_a_number()
returns true.
Undefined operations often return not-a-number values. For example,
sqrt(-1)
will evaluate to not-a-number.
15.1.6 Positive and negative infinity
There are also two special values representing positive infinity
(\(\infty)\) and negative infinity (\(-\infty\)). These are not
as pathological as not-a-number, but are often used to represent error
conditions such as overflow and underflow. For example, rather than
raising an error or returning not-a-number, log(0)
evaluates to
negative infinity. Exponentiating negative infinity leads back to
zero, so that 0 == exp(log(0))
. Nevertheless, this should not be
done in Stan because the chain rule used to calculate the derivatives
will attempt illegal operations and return not-a-number.
There are value functions positive_infinity()
and
negative_infinity()
as well as a test function is_inf()
.
Positive and negative infinity have the expected comparison behavior,
so that negative_infinty() < 0
evaluates to true (represented with 1
in Stan). Also, negating positive infinity leads to negative infinity
and vice-versa.
Positive infinity added to either itself or a finite value produces positive infinity. Negative infinity behaves the same way. However, attempts to subtract positive infinity from itself produce not-a-number, not zero. Similarly, attempts to divide infinite values results in a not-a-number value.
The notable exception is Intel’s optimizing compilers under certain optimization settings.↩︎