`vignettes/profiling.Rmd`

`profiling.Rmd`

This vignette demonstrates how to use the new profiling functionality introduced in CmdStan 2.26.0.

Profiling identifies which parts of a Stan program are taking the longest time to run and is therefore a useful guide when working on optimizing the performance of a model.

However, be aware that the statistical assumptions that go into a model are the most important factors in overall model performance. It is often not possible to make up for model problems with just brute force computation. For ideas on how to address performance of your model from a statistical perspective, see Gelman (2020).

```
library(cmdstanr)
check_cmdstan_toolchain(fix = TRUE, quiet = TRUE)
```

Consider a simple logistic regression with parameters
`alpha`

and `beta`

, covariates `X`

, and
outcome `y`

.

```
data {
int<lower=1> k;
int<lower=0> n;
matrix[n, k] X;
array[n] int y;
}
parameters {
vector[k] beta;
real alpha;
}
model {
beta ~ std_normal();
alpha ~ std_normal();
y ~ bernoulli_logit(X * beta + alpha);
}
```

A simple question is how much time do the prior calculations take
compared against the likelihood? To answer this we surround the prior
and likelihood calculations with `profile`

statements.

```
profile("priors") {
target += std_normal_lpdf(beta);
target += std_normal_lpdf(alpha);
}
profile("likelihood") {
target += bernoulli_logit_lpmf(y | X * beta + alpha);
}
```

In general we recommend using a separate `.stan`

file, but
for convenience in this vignette we’ll write the Stan program as a
string and use `write_stan_file()`

to write it to a temporary
file.

```
profiling_bernoulli_logit <- write_stan_file('
data {
int<lower=1> k;
int<lower=0> n;
matrix[n, k] X;
array[n] int y;
}
parameters {
vector[k] beta;
real alpha;
}
model {
profile("priors") {
target += std_normal_lpdf(beta);
target += std_normal_lpdf(alpha);
}
profile("likelihood") {
target += bernoulli_logit_lpmf(y | X * beta + alpha);
}
}
')
```

We can then run the model as usual and Stan will collect the
profiling information for any sections with `profile`

statements.

```
# Compile the model
model <- cmdstan_model(profiling_bernoulli_logit)
# Generate some fake data
n <- 1000
k <- 20
X <- matrix(rnorm(n * k), ncol = k)
y <- 3 * X[,1] - 2 * X[,2] + 1
p <- runif(n)
y <- ifelse(p < (1 / (1 + exp(-y))), 1, 0)
stan_data <- list(k = ncol(X), n = nrow(X), y = y, X = X)
# Run one chain of the model
fit <- model$sample(data = stan_data, chains = 1)
```

The raw profiling information can then be accessed with the
`$profiles()`

method, which returns a list containing one
data frame per chain (profiles across multiple chains are not
automatically aggregated). Details on the column names are available in
the CmdStan
documentation.

`fit$profiles()`

```
[[1]]
name thread_id total_time forward_time reverse_time chain_stack
1 likelihood 0x102122e00 0.71089600 0.58314100 0.1277560 51969
2 priors 0x102122e00 0.00482875 0.00293865 0.0018901 34646
no_chain_stack autodiff_calls no_autodiff_calls
1 34646000 17323 1
2 0 17323 1
```

The `total_time`

column is the total time spent inside a
given profile statement. It is clear that the vast majority of time is
spent in the likelihood function.

Stan’s specialized glm functions can be used to make models like this faster. In this case the likelihood can be replaced with

`target += bernoulli_logit_glm_lpmf(y | X, alpha, beta);`

We’ll keep the same `profile()`

statements so that the
profiling information for the new model is collected automatically just
like for the previous one.

```
profiling_bernoulli_logit_glm <- write_stan_file('
data {
int<lower=1> k;
int<lower=0> n;
matrix[n, k] X;
array[n] int y;
}
parameters {
vector[k] beta;
real alpha;
}
model {
profile("priors") {
target += std_normal_lpdf(beta);
target += std_normal_lpdf(alpha);
}
profile("likelihood") {
target += bernoulli_logit_glm_lpmf(y | X, alpha, beta);
}
}
')
```

```
model_glm <- cmdstan_model(profiling_bernoulli_logit_glm)
fit_glm <- model_glm$sample(data = stan_data, chains = 1)
```

`fit_glm$profiles()`

```
[[1]]
name thread_id total_time forward_time reverse_time chain_stack
1 likelihood 0x1066bde00 0.45516500 0.45357200 0.00159287 17695
2 priors 0x1066bde00 0.00399743 0.00242302 0.00157441 35390
no_chain_stack autodiff_calls no_autodiff_calls
1 0 17695 1
2 0 17695 1
```

We can see from the `total_time`

column that this is much
faster than the previous model.

The other columns of the profiling output are documented in the CmdStan documentation.

The timing numbers are broken down by forward pass and reverse pass,
and the `chain_stack`

and `no_chain_stack`

columns
contain information about how many autodiff variables were saved in the
process of performing a calculation.

These numbers are all totals – times are the total times over the
whole calculation, and `chain_stack`

counts are similarly the
total counts of autodiff variables used over the whole calculation. It
is often convenient to have per-gradient calculations (which will be
more stable across runs with different seeds). To compute these, use the
`autodiff_calls`

column.

```
profile_chain_1 <- fit$profiles()[[1]]
per_gradient_timing <- profile_chain_1$total_time/profile_chain_1$autodiff_calls
print(per_gradient_timing) # two elements for the two profile statements in the model
```

`[1] 4.103770e-05 2.787479e-07`

After sampling (or optimization or variational inference) finishes,
CmdStan stores the profiling data in CSV files in a temporary location.
The paths of the profiling CSV files can be retrieved using
`$profile_files()`

.

`fit$profile_files()`

`[1] "/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/RtmppMVUoY/model_6580008f67848265f3dfd0e7ae3b0600-profile-202307251456-1-5100ca.csv"`

These can be saved to a more permanent location with the
`$save_profile_files()`

method.

```
# see ?save_profile_files for info on optional arguments
fit$save_profile_files(dir = "path/to/directory")
```

Gelman, Andrew, Aki Vehtari, Daniel Simpson, Charles C. Margossian, Bob Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian Bürkner, and Martin Modrák. 2020. “Bayesian Workflow.” https://arxiv.org/abs/2011.01808.