Stan provides high-level parallelization
via multi-threading by use of the
map_rect functions in a Stan program.
Stan also provides low-level parallelization
on GPU hardware using the OpenCL framework to speed
up matrix operations.
Both of these features require building executibles which call the appropriate
libraries. This is done via makefile variables which set the appropriate
C++ compiler and linker flags:
STAN_THREADS- compiler directives for threading
STAN_OPENCL- compiler directives for GPU-aware matrix operations
These options can be combined, it is possible to run a program which uses multi-threading operations on a machine with GPU hardware, in which case, the call to Make
For example to compile program
parallel_logistic.stan which uses
reduce_sum for within-chain
parallelization on an OpenCL (GPU) machine:
> make STAN_THREADS=TRUE STAN_OPENCL=TRUE /path/to/parallel_logistic
In addition, for multi-threaded programs, it is necessary to specify the number of available threads
via the shell environment variable
STAN_NUM_THREADS which specifies how many threads to run in parallel.
Generally, this number should not exceed the number of available cores.
If this variable isn’t set, then the program will run single-threaded.
To run a single chain on a 4-core machine for program
export STAN_NUM_THREADS=4 ./parallel_logistic sample data file=data.csv ...