Stan has support for different types of parallelization: multi-threading with Intel Threading Building Blocks (TBB), multi-processing with Message Passing Interface (MPI) and manycore processing with OpenCL.
Multi-threading in Stan can be used with two mechanisms: reduce with summation and rectangular map. The latter can also be used with multi-processing.
The advantages of reduce with summation are:
- More flexible argument interface, avoiding the packing and unpacking that is necessary with rectanguar map.
- Partitions data for parallelization automatically (this is done manually in rectanguar map).
- Is easier to use.
The advantages of rectangular map are:
- Returns a list of vectors, while the reduce summation returns only a scalar.
- Can be parallelized across multiple cores and multiple computers, while reduce summation can only parallelized across multiple cores on a single machine.
The actual speedup gained from using these functions will depend on many details. It is strongly recommended to only parallelize the computationally most expensive operations in a Stan program. Oftentimes this is the evaluation of the log likelihood for the observed data. When it is not clear which parts of the model is the most computationally expensive, we recommend using profiling, which is available in Stan 2.26 and newer.
Since only portions of a Stan program will run in parallel, the maximal speedup one can achieve is capped, a phenomen described by Amdahl’s law.