## 23.3 Example: Mapping Logistic Regression

An example should help to clarify both the syntax and semantics of the mapping operation and how it may be combined with reductions built into Stan to provide a map-reduce implementation.

### Unmapped Logistic Regression

Consider the following simple logistic regression model, which is coded unconventionally to accomodate direct translation to a mapped implementation.

```
data {
int y[12];
real x[12];
}
parameters {
vector[2] beta;
}
model {
beta ~ std_normal();
y ~ bernoulli_logit(beta[1] + beta[2] * to_vector(x));
}
```

The program is unusual in that it (a) hardcodes the data size, which
is not required by the map function but is just used here for
simplicity, (b) represents the predictors as a real array even though
it needs to be used as a vector, and (c) represents the regression
coefficients (intercept and slope) as a vector even though they’re
used individually. The `bernoulli_logit`

distribution is used
because the argument is on the logit scale—it implicitly applies the
inverse logit function to map the argument to a probability.

### Mapped Logistic Regression

The unmapped logistic regression model described in the previous subsection may be implemented using Stan’s rectangular mapping functionality as follows.

```
functions {
vector lr(vector beta, vector theta, real[] x, int[] y) {
real lp = bernoulli_logit_lpmf(y | beta[1] + to_vector(x) * beta[2]);
return [lp]';
}
}
data {
int y[12];
real x[12];
}
transformed data {
// K = 3 shards
int ys[3, 4] = { y[1:4], y[5:8], y[9:12] };
real xs[3, 4] = { x[1:4], x[5:8], x[9:12] };
vector[0] theta[3];
}
parameters {
vector[2] beta;
}
model {
beta ~ std_normal();
target += sum(map_rect(lr, beta, theta, xs, ys));
}
```

The first piece of the code is the actual function to compute the
logistic regression. The argument `beta`

will contain the
regression coefficients (intercept and slope), as before. The second
argument `theta`

of job-specific parameters is not used, but
nevertheless must be present. The modeled data `y`

is passed as
an array of integers and the predictors `x`

as an array of real
values. The function body then computes the log probability mass of `y`

and
assigns it to the local variable `lp`

. This variable is then
used in `[lp]'`

to construct a row vector and then transpose it
to a vector to return.

The data are taken in as before. There is an additional transformed
data block that breaks the data up into three shards.^{42}

The value `3`

is also hard coded; a more practical program would
allow the number of shards to be controlled. There are three parallel
arrays defined here, each of size three, corresponding to the number
of shards. The array `ys`

contains the modeled data variables;
each element of the array `ys`

is an array of size four. The
second array `xs`

is for the predictors, and each element of it
is also of size four. These contained arrays are the same size
because the predictors `x`

stand in a one-to-one relationship
with the modeled data `y`

. The final array `theta`

is also
of size three; its elements are empty vectors, because there are no
shard-specific parameters.

The parameters and the prior are as before. The likelihood is now
coded using map-reduce. The function `lr`

to compute the log
probability mass is mapped over the data `xs`

and `ys`

,
which contain the original predictors and outcomes broken into shards.
The parameters `beta`

are in the first argument because they are
shared across shards. There are no shard-specific parameters, so
the array of job-specific parameters `theta`

contains only empty
vectors.

The term “shard” is borrowed from databases, where it refers to a slice of the rows of a database. That is exactly what it is here if we think of rows of a dataframe. Stan’s shards are more general in that they need not correspond to rows of a dataframe.↩