Detailed Description

The kernel executor allows OpenCL kernels to be executed in async.

GPUs have the capability to perform reads, writes, and computation at the same time. In order to maximize the throughput to the device the Kernel Executor assigns matrices read and write events. Write events are blocking in that no further operations can be completed until the write event is finished. However, read events can happen in async together. Take the following for example

matrix_cl<double> A = from_eigen_cl(A_eig);
matrix_cl<double> B = from_eigen_cl(B_eig);
matrix_cl<double> C = A * B;
matrix_cl<double> D = A + B;
matrix_cl<double> E = C + D;

In the above, When A and B are created from the Eigen matrices A_eig and B_eig. they are both assigned write events to their write event stack. C and D depend on A and B while E depends on C and D. When executing C's operation, A and B are assigned events to their read event stack while C is assigned an event to it's write event stack. Once A and B have finished their write event the kernel to compute C can begin. The excution to create D also waits for the write events of A and B, but does not have to wait for the execution of C to finish. Executing E requires waiting for for the write events of both C and D.

Classes
struct	stan::math::opencl_kernels::in_buffer
	An in_buffer signifies a cl::Buffer argument used as input. More...

struct	stan::math::opencl_kernels::out_buffer
	An out_buffer signifies a cl::Buffer argument used as output. More...

struct	stan::math::opencl_kernels::in_out_buffer
	An in_out_buffer signifies a cl::Buffer argument used as both input and output. More...

struct	stan::math::opencl_kernels::kernel_cl< Args >
	Creates functor for kernels. More...

Functions
template<typename T , require_not_matrix_cl_t< T > * = nullptr>
const T &	stan::math::opencl_kernels::internal::get_kernel_args (const T &t)
	Extracts the kernel's arguments, used in the global and local kernel constructor.

template<typename K , require_matrix_cl_t< K > * = nullptr>
const cl::Buffer &	stan::math::opencl_kernels::internal::get_kernel_args (const K &m)
	Extracts the kernel's arguments, used in the global and local kernel constructor.

template<typename T , require_not_matrix_cl_t< T > * = nullptr>
void	stan::math::opencl_kernels::internal::assign_event (const cl::Event &e, const T &)
	Assigns the event to a `matrix_cl`.

template<typename T , require_same_t< T, cl::Event > * = nullptr>
void	stan::math::opencl_kernels::internal::assign_events (const T &)
	Adds the event to any `matrix_cls` in the arguments depending on whether they are `in_buffer`, `out_buffer`, or `in_out_buffers`.

template<typename T , require_not_matrix_cl_t< T > * = nullptr>
tbb::concurrent_vector< cl::Event >	stan::math::opencl_kernels::internal::select_events (const T &m)
	Select events from kernel arguments.

auto	stan::math::opencl_kernels::compile_kernel (const char *name, const std::vector< std::string > &sources, const std::unordered_map< std::string, int > &options)
	Compile an OpenCL kernel.

	stan::math::opencl_kernels::kernel_cl< Args >::kernel_cl (const char *name, std::vector< std::string > sources, std::unordered_map< std::string, int > options={})
	Creates functor for kernels that only need access to defining the global work size.

template<typename... CallArgs>
auto	stan::math::opencl_kernels::kernel_cl< Args >::operator() (cl::NDRange global_thread_size, CallArgs &&... args) const
	Executes a kernel.

template<typename... CallArgs>
auto	stan::math::opencl_kernels::kernel_cl< Args >::operator() (cl::NDRange global_thread_size, cl::NDRange thread_block_size, CallArgs &&... args) const
	Executes a kernel.

int	stan::math::opencl_kernels::kernel_cl< Args >::get_option (const std::string option_name) const
	Retrieves an option used for compiling the kernel.

Table of contents

Detailed Description

Classes

Functions