Automatic Differentiation
 
Loading...
Searching...
No Matches
Kernel Executor

Detailed Description

The kernel executor allows OpenCL kernels to be executed in async.

GPUs have the capability to perform reads, writes, and computation at the same time. In order to maximize the throughput to the device the Kernel Executor assigns matrices read and write events. Write events are blocking in that no further operations can be completed until the write event is finished. However, read events can happen in async together. Take the following for example

matrix_cl<double> A = from_eigen_cl(A_eig);
matrix_cl<double> B = from_eigen_cl(B_eig);
matrix_cl<double> C = A * B;
matrix_cl<double> D = A + B;
matrix_cl<double> E = C + D;

In the above, When A and B are created from the Eigen matrices A_eig and B_eig. they are both assigned write events to their write event stack. C and D depend on A and B while E depends on C and D. When executing C's operation, A and B are assigned events to their read event stack while C is assigned an event to it's write event stack. Once A and B have finished their write event the kernel to compute C can begin. The excution to create D also waits for the write events of A and B, but does not have to wait for the execution of C to finish. Executing E requires waiting for for the write events of both C and D.

Classes

struct  stan::math::opencl_kernels::in_buffer
 An in_buffer signifies a cl::Buffer argument used as input. More...
 
struct  stan::math::opencl_kernels::out_buffer
 An out_buffer signifies a cl::Buffer argument used as output. More...
 
struct  stan::math::opencl_kernels::in_out_buffer
 An in_out_buffer signifies a cl::Buffer argument used as both input and output. More...
 
struct  stan::math::opencl_kernels::kernel_cl< Args >
 Creates functor for kernels. More...
 

Functions

template<typename T , require_not_matrix_cl_t< T > * = nullptr>
const T & stan::math::opencl_kernels::internal::get_kernel_args (const T &t)
 Extracts the kernel's arguments, used in the global and local kernel constructor.
 
template<typename K , require_matrix_cl_t< K > * = nullptr>
const cl::Buffer & stan::math::opencl_kernels::internal::get_kernel_args (const K &m)
 Extracts the kernel's arguments, used in the global and local kernel constructor.
 
template<typename T , require_not_matrix_cl_t< T > * = nullptr>
void stan::math::opencl_kernels::internal::assign_event (const cl::Event &e, const T &)
 Assigns the event to a matrix_cl.
 
template<typename T , require_same_t< T, cl::Event > * = nullptr>
void stan::math::opencl_kernels::internal::assign_events (const T &)
 Adds the event to any matrix_cls in the arguments depending on whether they are in_buffer, out_buffer, or in_out_buffers.
 
template<typename T , require_not_matrix_cl_t< T > * = nullptr>
tbb::concurrent_vector< cl::Event > stan::math::opencl_kernels::internal::select_events (const T &m)
 Select events from kernel arguments.
 
auto stan::math::opencl_kernels::compile_kernel (const char *name, const std::vector< std::string > &sources, const std::unordered_map< std::string, int > &options)
 Compile an OpenCL kernel.
 
 stan::math::opencl_kernels::kernel_cl< Args >::kernel_cl (const char *name, std::vector< std::string > sources, std::unordered_map< std::string, int > options={})
 Creates functor for kernels that only need access to defining the global work size.
 
template<typename... CallArgs>
auto stan::math::opencl_kernels::kernel_cl< Args >::operator() (cl::NDRange global_thread_size, CallArgs &&... args) const
 Executes a kernel.
 
template<typename... CallArgs>
auto stan::math::opencl_kernels::kernel_cl< Args >::operator() (cl::NDRange global_thread_size, cl::NDRange thread_block_size, CallArgs &&... args) const
 Executes a kernel.
 
int stan::math::opencl_kernels::kernel_cl< Args >::get_option (const std::string option_name) const
 Retrieves an option used for compiling the kernel.