Stan Math Library
5.0.0
Automatic Differentiation
|
By default the stan-math
library is not thread safe which is due to the requirement that the autodiff stack uses a global autodiff tape which records all operations of functions being evaluated. Starting with version 2.18 of stan-math
threading support can be switched on using the compile-time switch STAN_THREADS
. Defining this variable at compile time can be achieved by adding to make/local
Once this is set stan-math
will use the C++11 thread_local
facility such that the autodiff stack is maintained per thread and not anymore globally. This allows the use of autodiff in a threaded application as this enables the calculation of the derivatives of a function inside a thread (the function itself may not use threads). Only if the function to be evaluated is composed of independent tasks it maybe possible to evaluate derivatives of a function in a threaded approach. An example of a threaded gradient evaluation is the map_rect
function in stan-math
.
In addition to making stan-math
thread safe this also turns on parallel execution support of the map_rect
function. Currently, the maximal number of threads being used by the function is controlled by the environment variable STAN_NUM_THREADS
at runtime. Setting this variable to a positive integer number defines the maximal number of threads being used. In case the variable is set to the special value of -1
this requests that as many threads as physical cores are being used. If the variable is not set a single thread is used. Any illegal value (not an integer, zero, other negative) will cause an exception to be thrown.
The Intel TBB library is used in stan-math since version 2.21.0. The Intel TBB library uses a threadpool internally and distributes work through a task-based approach. The tasks are dispatched to the threadpool via a the Intel TBB work-stealing scheduler. For example, whenever threading is enabled via STAN_THREADS
the map_rect
function in stan-math will use the tbb::parallel_for
of the TBB. This will execute the work chunks given to map_rect
with scheduling and thus load-balance CPU core utilization.
By default stan-math builds only the main tbb
library by defining the makefile
variable
The Intel TBB provides in addition to the main library memory allocators which are specifically designed to speedup threaded programs. These speedups have so far only been observed on MacOS systems for Stan programs such that on MacOS the default is set to
Users may override the default choices by defining TBB_LIBRARIES
in the make/local
file manually. Please refer to the pull request which merged the Intel TBB for further details on the performance evaluations.
Threading support requires a fully C++11 compliant compiler which has a working thread_local
implementation. Below you find for each operating system what is known to work. Known to work configurations refers to run successfully by developers.
The compiler support for the C++11 thread_local
keyword for the major open-source compilers is available since these versions:
-pthread
to the CXXFLAGS
variablethread_local
which should be fixed with clang >=4.0Known to work:
Should work:
thread_local
keyword since Mac OS Sierra (Xcode 8.0)Known to work:
With clang
on linux there are issues during the linking step of programs which happens on old distributions like ubuntu trusty 14.04 (the 16.04 LTS is fine). A solution can be found here. It is likely that clang 4.0 if used with libc++ 4.0 will work just fine, but developers have not yet confirmed this.
Known to work: