Automatic Differentiation
 
Loading...
Searching...
No Matches
multiply_transpose.hpp
Go to the documentation of this file.
1#ifndef STAN_MATH_OPENCL_MULTIPLY_TRANSPOSE_HPP
2#define STAN_MATH_OPENCL_MULTIPLY_TRANSPOSE_HPP
3#ifdef STAN_OPENCL
4
8
11
12namespace stan {
13namespace math {
23template <typename T, typename = require_arithmetic_t<T>>
25 matrix_cl<T> temp(A.rows(), A.rows(),
29
30 if (A.size() == 0) {
31 return temp;
32 }
33 // padding the matrices so the dimensions are divisible with local
34 // improves performance because we can omit if statements in the
35 // multiply kernel
36 int local
37 = opencl_kernels::multiply_transpose.get_option("THREAD_BLOCK_SIZE");
38 int Mpad = ((A.rows() + local - 1) / local) * local;
39 int wpt = opencl_kernels::multiply_transpose.get_option("WORK_PER_THREAD");
40 try {
41 opencl_kernels::multiply_transpose(cl::NDRange(Mpad, Mpad / wpt),
42 cl::NDRange(local, local / wpt), A, temp,
43 A.rows(), A.cols());
44 } catch (cl::Error& e) {
45 check_opencl_error("multiply self transpose", e);
46 }
47 return temp;
48}
49} // namespace math
50} // namespace stan
51
52#endif
53#endif
const matrix_cl_view & view() const
Definition matrix_cl.hpp:70
Represents an arithmetic matrix on the OpenCL device.
Definition matrix_cl.hpp:47
void check_opencl_error(const char *function, const cl::Error &e)
Throws the domain error with specifying the OpenCL error that occurred.
const kernel_cl< in_buffer, out_buffer, int, int > multiply_transpose("multiply_transpose", {thread_block_helpers, multiply_transpose_kernel_code}, {{"THREAD_BLOCK_SIZE", 32}, {"WORK_PER_THREAD", 4}})
See the docs for add() .
matrix_cl< T > multiply_transpose(const matrix_cl< T > &A)
Computes the product of a square OpenCL matrix with its transpose.
static constexpr double e()
Return the base of the natural logarithm.
Definition constants.hpp:20
The lgamma implementation in stan-math is based on either the reentrant safe lgamma_r implementation ...