31.2 Example decision analysis

This is an old version, view current version.

This section outlines a very simple decision analysis for a commuter deciding among modes of transportation to get to work: walk, bike share, public transportation, or cab. Suppose the commuter has been taking various modes of transportation for the previous year and the transportation conditions and costs have not changed during that time. Over the year, such a commuter might accumulate two hundred observations of the time it takes to get to work given a choice of commute mode.

Step 1. Define decisions and outcomes

A decision consists of the choice of commute mode and the outcome is a time and cost. More formally,

the set of decisions is $D = 1:4$, corresponding to the commute types walking, bicycling, public transportation, and cab, respectively, and
the set of outcomes $X = \mathbb{R} \times \mathbb{R}_+$ contains pairs of numbers $x = (c, t)$ consisting of a cost $c$ and time $t \geq 0$.

Step 2. Define density of outcome conditioned on decision

The density required is $p(x \mid d),$ where $d \in D$ is a decision and $x = (c, t) \in X$ is an outcome. Being a statistical decision problem, this density will the a posterior predictive distribution conditioned on previously observed outcome and decision pairs, based on a parameter model with parameters $\theta,$ \[ p(x \mid d, x^{\textrm{obs}}, d^{\textrm{obs}}) = \int p(x \mid d, \theta) \cdot p(\theta \mid x^{\textrm{obs}}, d^{\textrm{obs}}) \, \textrm{d}\theta. \] The observed data for a year of commutes consists of choice of the chosen commute mode $d^{\textrm{obs}}_n$ and observed costs and times $x^{\textrm{obs}}_n = (c^{\textrm{obs}}_n, t^{\textrm{obs}}_n)$ for $n \in 1:200.$

For simplicity, commute time $t_n$ for trip $n$ will be modeled as lognormal for a given choice of transportation $d_n \in 1:4,$ \[ t_n \sim \textrm{lognormal}(\mu_{d[n]}, \sigma_{d[n]}). \] To understand the notation, $d_n$, also written $d[n]$, is the mode of transportation used for trip $n$. For example if trip $n$ was by bicycle, then $t_n \sim \textrm{lognormal}(\mu_2, \sigma_2),$ where $\mu_2$ and $\sigma_2$ are the lognormal parameters for bicycling.

Simple fixed priors are used for each mode of transportation $k \in 1:4,$ \[\begin{eqnarray*} \mu_k & \sim & \textrm{normal}(0, 5) \\[2pt] \sigma_k & \sim & \textrm{lognormal}(0, 1). \end{eqnarray*}\] These priors are consistent with a broad range of commute times; in a more realistic model each commute mode would have its own prior based on knowledge of the city and the time of day would be used as a covariate; here the commutes are taken to be exchangeable.

Cost is usually a constant function for public transportation, walking, and bicycling. Nevertheless, for simplicity, all costs will be modeled as lognormal, \[ c_n \sim \textrm{lognormal}(\nu_{d[n]}, \tau_{d[n]}). \] Again, the priors are fixed for the modes of transportation, \[\begin{eqnarray*} \nu_k & \sim & \textrm{normal}(0, 5) \\[2pt] \tau_k & \sim & \textrm{lognormal}(0, 1). \end{eqnarray*}\] A more realistic approach would model cost conditional on time, because the cost of a cab depends on route chosen and the time it takes.

The full set of parameters that are marginalized in the posterior predictive distribution is \[ \theta = (\mu_{1:4}, \sigma_{1:4}, \nu_{1:4}, \tau_{1:4}). \]

Step 3. Define the utility function

For the sake of concreteness, the utility function will be assumed to be a simple function of cost and time. Further suppose the commuter values their commute time at $25 per hour and has a utility function that is linear in the commute cost and time. Then the utility function may be defined as \[ U(c, t) = -(c + 25 \cdot t). \] The sign is negative because high cost is undesirable. A better utility function might have a step function or increasing costs for being late, different costs for different modes of transportation because of their comfort and environmental impact, and non-linearity of utility in cost.

Step 4. Maximize expected utility

At this point, all that is left is to calculate expected utility for each decision and choose the optimum. If the decisions consist of a small set of discrete choices, expected utility can be easily coded in Stan. The utility function is coded as a function, the observed data is coded as data, the model parameters coded as parameters, and the model block itself coded to follow the sampling distributions of each parameter.

functions {
  real U(real c, real t) {
    return -(c + 25 * t);
  }
}
data {
  int<lower=0> N;
  array[N] int<lower=1, upper=4> d;
  array[N] real c;
  array[N] real<lower=0> t;
}
parameters {
  vector[4] mu;
  vector<lower=0>[4] sigma;
  array[4] real nu;
  array[4] real<lower=0> tau;
}
model {
  mu ~ normal(0, 1);
  sigma ~ lognormal(0, 0.25);
  nu ~ normal(0, 20);
  tau ~ lognormal(0, 0.25);
  t ~ lognormal(mu[d], sigma[d]);
  c ~ lognormal(nu[d], tau[d]); 
}
generated quantities {
  array[4] real util;
  for (k in 1:4) {
    util[k] = U(lognormal_rng(mu[k], sigma[k]),
                lognormal_rng(nu[k], tau[k]));
  }
}

The generated quantities block defines an array variable util where util[k], which will hold the utility derived from a random commute for choice k generated according to the model parameters for that choice. This randomness is required to appropriately characterize the posterior predictive distribution of utility.

For simplicity in this initial formulation, all four commute options have their costs estimated, even though cost is fixed for three of the options. To deal with the fact that some costs are fixed, the costs would have to be hardcoded or read in as data, nu and tau would be declared as univariate, and the RNG for cost would only be employed when k == 4.

Defining the utility function for pairs of vectors would allow the random number generation in the generated quantities block to be vectorized.

All that is left is to run Stan. The posterior mean for util[k] is the expected utility, which written out with full conditioning, is \[\begin{eqnarray*} \mathbb{E}\!\left[U(x) \mid d = k, d^{\textrm{obs}}, x^{\textrm{obs}}\right] & = & \int U(x) \cdot p(x \mid d = k, \theta) \cdot p(\theta \mid d^{\textrm{obs}}, x^{\textrm{obs}}) \, \textrm{d}\theta \\[4pt] & \approx & \frac{1}{M} \sum_{m = 1}^M U(x^{(m)} ), \end{eqnarray*}\] where \[ x^{(m)} \sim p(x \mid d = k, \theta^{(m)} ) \] and \[ \theta^{(m)} \sim p(\theta \mid d^{\textrm{obs}}, x^{\textrm{obs}}). \]

In terms of Stan’s execution, the random generation of $x^{(m)}$ is carried out with the lognormal_rng operations after $\theta^{(m)}$ is drawn from the model posterior. The average is then calculated after multiple chains are run and combined.

It only remains to make the decision k with highest expected utility, which will correspond to the choice with the highest posterior mean for util[k]. This can be read off of the mean column of the Stan’s summary statistics or accessed programmatically through Stan’s interfaces.