Stan Math Library
4.9.0
Automatic Differentiation
|
The kernel executor allows OpenCL kernels to be executed in async.
GPUs have the capability to perform reads, writes, and computation at the same time. In order to maximize the throughput to the device the Kernel Executor assigns matrices read and write events. Write events are blocking in that no further operations can be completed until the write event is finished. However, read events can happen in async together. Take the following for example
In the above, When A
and B
are created from the Eigen matrices A_eig
and B_eig
. they are both assigned write events to their write event stack. C
and D
depend on A
and B
while E
depends on C
and D
. When executing C
's operation, A
and B
are assigned events to their read event stack while C
is assigned an event to it's write event stack. Once A
and B
have finished their write event the kernel to compute C
can begin. The excution to create D
also waits for the write events of A
and B
, but does not have to wait for the execution of C
to finish. Executing E
requires waiting for for the write events of both C
and D
.
Classes | |
struct | stan::math::opencl_kernels::in_buffer |
An in_buffer signifies a cl::Buffer argument used as input. More... | |
struct | stan::math::opencl_kernels::out_buffer |
An out_buffer signifies a cl::Buffer argument used as output. More... | |
struct | stan::math::opencl_kernels::in_out_buffer |
An in_out_buffer signifies a cl::Buffer argument used as both input and output. More... | |
struct | stan::math::opencl_kernels::kernel_cl< Args > |
Creates functor for kernels. More... | |
Functions | |
template<typename T , require_not_matrix_cl_t< T > * = nullptr> | |
const T & | stan::math::opencl_kernels::internal::get_kernel_args (const T &t) |
Extracts the kernel's arguments, used in the global and local kernel constructor. | |
template<typename K , require_matrix_cl_t< K > * = nullptr> | |
const cl::Buffer & | stan::math::opencl_kernels::internal::get_kernel_args (const K &m) |
Extracts the kernel's arguments, used in the global and local kernel constructor. | |
template<typename T , require_not_matrix_cl_t< T > * = nullptr> | |
void | stan::math::opencl_kernels::internal::assign_event (const cl::Event &e, const T &) |
Assigns the event to a matrix_cl . | |
template<typename T , require_same_t< T, cl::Event > * = nullptr> | |
void | stan::math::opencl_kernels::internal::assign_events (const T &) |
Adds the event to any matrix_cls in the arguments depending on whether they are in_buffer , out_buffer , or in_out_buffers . | |
template<typename T , require_not_matrix_cl_t< T > * = nullptr> | |
tbb::concurrent_vector< cl::Event > | stan::math::opencl_kernels::internal::select_events (const T &m) |
Select events from kernel arguments. | |
auto | stan::math::opencl_kernels::compile_kernel (const char *name, const std::vector< std::string > &sources, const std::unordered_map< std::string, int > &options) |
Compile an OpenCL kernel. | |
stan::math::opencl_kernels::kernel_cl< Args >::kernel_cl (const char *name, std::vector< std::string > sources, std::unordered_map< std::string, int > options={}) | |
Creates functor for kernels that only need access to defining the global work size. | |
template<typename... CallArgs> | |
auto | stan::math::opencl_kernels::kernel_cl< Args >::operator() (cl::NDRange global_thread_size, CallArgs &&... args) const |
Executes a kernel. | |
template<typename... CallArgs> | |
auto | stan::math::opencl_kernels::kernel_cl< Args >::operator() (cl::NDRange global_thread_size, cl::NDRange thread_block_size, CallArgs &&... args) const |
Executes a kernel. | |
int | stan::math::opencl_kernels::kernel_cl< Args >::get_option (const std::string option_name) const |
Retrieves an option used for compiling the kernel. | |