StanCon 2019
Accepted submissions were given the option being a poster or 15 minute presentation. Video links are included if a presentation was made.
 Pharmaceuticals/Medicine
 Computing prediction and tolerance intervals for a mixture of normal distributions. Jeanfrancois Michiels, Timothy Mutsvari, Oussama Errazi. Pharmalex. Abstract
 Parallel numerical ODE solution in Torsten for population models. Yi Zhang, William R. Gillespie. Metrum LLC Abstract
 Multichannel Gaussian Processes as flexible alternatives to linear models: perspectives and challenges to scaling up Bayesian inference to genomicscale data. Caetano SoutoMaior, Susan T. Harbison. Laboratory of Systems Genetics, National Heart Lung and Blood Institute, NIH. Abstract
 Estimating the prevalence of HIV infection in England using Bayesian evidence synthesis. Anne Presanis, Christopher Jackson [presenting author], Daniela De Angelis (MRC Biostatistics Unit, University of Cambridge); Peter Kirwan, Alison Brown, Ada Miltz, Ross Harris, Cuong Chau, Stephanie Migchelsen, Hamish Mohammed, Katy Davison, Sara Croxford, Sarika Desai, Kathy Lowndes, Valerie Delpech, Noel Gill (Public Health England). Abstract Video
 A DecisionTheoretic Journey From Early Clinical Data to Late Stage Efficacy using Hierarchical Joint Models.. Krzysztof Sakrejda, Eric Novik. Generable Abstract Video
 Bayesian analyses of timetoevent data using the rstanarm R package. Eren M. Elçi, Sam Brilleman. Public Health and Preventive Medicine, Monash University. Abstract Video
 Modelling enzyme kinetics with Stan. Teddy Groves. DTU BIOSUSTAIN Quantitative Modelling of Cell Metabolism Team Abstract Video
 The emergence of HIV resistance to antiretroviral therapy in southern Africa: a mechanistic metaanalysis of survey data. Julien Riou, Matthias Egger, Christian Althaus. Institute of Social and Preventive Medicine, University of Bern, Switzerland Abstract Video
 Handling missing data, censored values and measurement error in machine learning models using multiple imputation for early stage drug discovery. Rowan Swiers. AstraZeneca Abstract Video
 A Bayesian multilayered model to predict mechanisms, types, and severity of druginduced liver injury. Elizaveta Semenova, Dominic Williams, Stanley E Lazic. Data Science and Quantitative Biology group, AstraZeneca, Cambridge UK Abstract

Modeling
 Gaussian process modeling and covariate selection for longitudinal data. Juho Timonen, Aki Vehtari, Harri Lähdesmäki. Aalto University Abstract
 Estimating the effect of age and league on scoring rate in professional soccer. Benjamin Torvaney. Wefarm Abstract
 Hierarchical models for gammaray burst populations. J. Michael Burgess. MPE Abstract
 Modeling cocoa bean fermentation processes. Mauricio MorenoZambrano, Sergio Grimbs, Matthias S. Ullrich and MarcThorsten Hütt. Department of Life Sciences & Chemistry, Jacobs University Bremen Abstract Video
 Approximate leavefutureout crossvalidation for Bayesian time series models. Paul Bürkner, Jonah Gabry, Aki Vehtari. Abstract Video
 When seasonality meets Bayesian: Decomposing seasonalities in Stan. Hyunji Moon, SNU, Hyeonseop Lee, PUBG. Abstract Video

Prediction and causal inference for timetoevent outcomes truncated by death. . Leah Comment. Abstract Video
 Fast Forward Like a Lambo (skrrt skrrt). Daniel Lee. Generable Abstract Video
 The Currency of Place and the ShortTerm Rental Market. Mikael Brunila. Abstract Video
 ProfitMaximizing A/B Tests. Elea McDonnell Feit, Ron Berman. Drexel Univeristy, The Wharton School Abstract Video
 Structured priors for survey estimates in the presence of nonrepresentative data. Yuxiang Gao (University of Toronto), Lauren Kennedy (Columbia University), Daniel Simpson (University of Toronto). Abstract Video
 Chronikis: a Bayesian timeseries modeling language. Kevin S. Van Horn. Adobe Inc. Abstract Video
 A longshort term event memory statespace model for multiparty elections. Marcus Groß. INWT Statistics GmbH Abstract
 Simulation of Statistic Mechanical Systems using Stan. Forrest Eli Hurley. North Carolina State University Abstract Video
 Regularized Hierarchical Models for Remotely Sensed Forest Inventories. Nathan E. Rutenbeck SilviaTerra Abstract
 Getting the Lead out–Does New York City’s childhood lead testing make statistical sense?. Jonathan Auerbach, Breck Baldwin. Columbia Univeristy Abstract Video
 Inference
 Making Stan Faster using Sequential Monte Carlo samplers. Simon Maskell (University of Liverpool), Alessandro Varsi (University of Liverpool), Peter Green (University of Liverpool), Paul Horridge (University of Liverpool), Alejandro Diaz (University of Liverpool), Lee Devlin (University of Liverpool), Rob Moore (University of Liverpool), Katerina Chatzopoulou (University of Liverpool), Jinglai Li (University of Liverpool), Maria Sudell (University of Liverpool), Luke Mason (STFC), Robin Pinning (STFC), Jack Taylor (STFC), Vassil Alexandrov (STFC), Ed PyzerKnapp (IBM) . Abstract
 One weird trick: Nonparametric Bayesian updating by kernels. Robert Grant. BayesCamp Abstract
 Semiparametric Modeling of the Mean,Variance and Scale Parameters in Skew Normal Regression Models: A Bayesian Perspective. Héctor Zarate. Abstract
 Prior choice in logit models of discrete choice. Jim Savage. Schmidt Futures Abstract Video
 Stacking for multimodal posterior distributions. Yuling Yao, Aki Vehtari and Andrew Gelman. Abstract Video
 Bayesian leaveoneout crossvalidation for large data. Måns Magnusson, Aalto, Michael Riis Andersen, Danish Technical University, Johan Jonasson, Chalmers Technical University, Aki Vehtari, Aalto. Abstract Video
 Core Stan
 The State of GPU Computation Support for Stan. Rok Češnovar (University of Ljubljana  UL), Steve Bronder (Capital One), Davor Sluga (UL), Jure Demšar (UL), Tadej Ciglarič (UL), Sean Talts (Columbia University), Erik Štrumbelj (UL). Abstract Video
 Extending Stan’s Automatic Differentiation (AD) capabilities using dco/c++. Philip Maybank. Numerical Algorithms Group (NAG) Abstract Video
Abstracts
Regularized Hierarchical Models for Remotely Sensed Forest Inventories. Nathan E. Rutenbeck SilviaTerra
Management and conservation of the world’s forests is critical for maintaining global timber supply, as well as for the ecosystem services forestlands provide. Forest biometrics remains a field focused on traditional methods in sampling and regression, despite the fact that these methods are ill equipped to utilize the profusion of remote sensing data now available. When remote sensing data is used, it is often deployed within simple populationlevel regression models that simultaneously leave out information regarding known forest structure and sample design, and are prone to overfitting of effects at the population level. Using Stan, we show that for the prediction of forest basal area (a key inventory attribute) incorporating known structural attributes (forest stands) and sample design information into a hierarchical modeling framework along with remote sensing data can yield beneficial results in terms of reducing overfitting and improving predictive performance when compared to more conventional methods. We fit and compared four candidate models, examining their performance with respect to one another and to the conventional frequentist inferences that are so widely used for operational forest inventory. The four models we examined are 1) a hierarchical model incorporating forest stand and sample design effects; 2) a populationlevel remote sensing principal components model; 3) the hierarchical model with the addition of remote sensing principal component effects at the population level; and 4) the hierarchical and remote sensing model with the addition of regularizing horseshoe priors on remote sensing effects. The hierarchical model without remote sensing effects showed the expected shrinkage of standlevel mean basal area predictions toward the global mean. The addition of remote sensing effects showed overall reductions in prediction error in comparison to the sample design model. Incorporating regularizing priors on the remote sensing principal components effects retained signal from the remote sensing data but showed further shrinkage of predictions of standlevel mean basal area toward sample means. Our results suggest that the penalized hierarchical model can be used in developing operational forest inventories that balance information from known forest structural heterogeneity, the sample design, and remote sensing data.
A Bayesian multilayered model to predict mechanisms, types, and severity of druginduced liver injury. Elizaveta Semenova, Dominic Williams, Stanley E Lazic. Data Science and Quantitative Biology group, AstraZeneca, Cambridge UK
Abstract: Druginduced liver injury (DILI) is a major cause of attrition in drug development and a common reason for withdrawing a drug from the market. Predicting clinical liver toxicity is difficult, but preclinical in vitro assays and physical/chemical properties of drugs can be used as predictors. We developed a multilayered Bayesian model where we use assay results to predict the mechanism(s) of toxicity, use the mechanisms to predict the type of liver injury, and then combine the type of injury with the clinical dose of a drug to predict the severity of injury. The model therefore has a layered structure, enabling uncertainty to propagate through the layers. Based only on assay and physchem data, along with the clinical dose, the model enables safety pharmacologists to predict the severity, type, and mechanism of liver toxicity with good accuracy.
Getting the Lead out–Does New York City’s childhood lead testing make statistical sense?. Jonathan Auerbach, Breck Baldwin. Columbia Univeristy .
Video
Abstract: The US has dramatically reduced blood lead levels in children over the past 30 years and that effort continues. New York City (NYC) was an early adopter of lead reduction policies and that effort continues with laws that require all children be tested and with mandatory interventions for those tested blood levels (tbll) greater than 5mg/dL. But there is a statistically interesting story around how current blood level limits are set, the performance of common tests and how to apply common Bayes rule reasoning to publicly available data.
The data we have: We have high quality blood lead level (bll) tests applied nation wide (NHANES) for 5,000 children, we have NYC supplied data that provides counts for all children’s tested blood lead level, the number greater than 5mg/dL, 10mg/dL and 15/dL and claims of blood tests that widely vary from sources like FDA applications for blood testing equipment, actual studies of test performance and government testing standards.
The data we want: New York city recently dropped the threshold for intervention from 10mg/dL to 5mg/dL. It is an open question what the false positive rate is for these test thresholds with some research suggesting that it is as high as 70%. On the other extreme is an FDA applications for the LeadCare Plus testing device claim a standard deviation of .5 at the 5mg/dL which suggests a very low false positive rate…but that depends on the distribution of actual blls in the NYC population.
How we got the data we wanted: This is a simple application of Bayes rule: p(bll > 5  t >5) = p(tbll > 5  bll>5) p(bll>5)/p(tbll>5) where we don’t know p(bll>5) for NYC. NYC refused to release nonquantized data for tbll under FIOA requests, which if we had, would allow a fairly straightforward determination of false positive rates from tbll test evaluations. But we do have data for the US as a whole in nonquantized form. 
The paper describes a process of model refinement staring with naive approaches and incrementally modifying our models to better suite NYC data. The final approach, subject to change as we do more work, is to fit national NHANES data with an exponential distribution, assume that similar distributions apply to NYC and recover a believable false positive rate across a range of reported blood test performance. Along the way we show an interesting simple use of the ‘integrate_ode_rk45’ function in Stan and demonstrate Bayesian workflow.
Simulation of Statistic Mechanical Systems using Stan. Forrest Eli Hurley. North Carolina State University Video
Abstract: Bayesian statistics is closely coupled with physics. The metropolis algorithm (1953) was developed by scientists working at Los Alamos as a method for thermodynamic simulation of molecular dynamics. Not until the work of W. K. Hastings (1970) was the method generalized to arbitrary probability distributions. Hamiltonian Monte Carlo is even more deeply rooted in physics than the MetropolisHastings algorithm. The simulation of states with velocities, energies, and a Hamiltonian describes nothing other than a physical system. It matches a canonical ensemble in that there is not a fixed energy between steps, only an overall fixed temperature. The temperature is usually only implicit, but some tempering methods simulate chains at higher temperatures to smooth the probability distributions. The Ising Model, a proxy for magnetization, is a prevalent introductory model in the study of statistical mechanics. It consists of an Ndimensional grid of spin up or down particles. The energy varies depending on the alignment of spins between nearest neighbors. At low temperatures spins tend to align on a macroscopic scale; at high temperatures they become evenly distributed. We simulate the XY Model, similar to the Ising Model but allowing spins to be represented by unit vectors in two dimensions, using Stan. We create chains at several temperatures to identify the locations of phase transitions in macroscopic properties. Our work shows the applicability of Stan for computation in continuous statistical mechanical problems.
A longshort term event memory statespace model for multiparty elections. Marcus Groß. INWT Statistics GmbH
Abstract: Statespace models are a popular choice in modelling voting intentions and election results by using poll data. The presented multivariate statespace model attempts to go beyond randomwalk or Kalmanfilter approaches (with comparable performance to simple weighted survey averages) to the problem by introducing a longshort term event memory effect. This effect serves as reasonable explanation to the observation that the voter’s share partially tends to reverse to the party’s longterm trend after larger short term movements. Any event influencing the voter’s share of a party is presumed to have a convex shaped effect decomposable into a short term effect due to e.g. media spreading and a smaller long term effect remaining despite overlay effects of new events and forgetting. This effect is modelled by a mixture of a random walk and two contrasting autoregressive processes. By also taking advantage of the widely observed effect that government parties tend to fall in voter’s share, whereas the opposite effect is observed for opposition parties, mid and longterm predictions of election outcomes can be considerably be improved. The Stanmodel is fitted and evaluated on poll data from seven pollsters for the German national elections (“Bundestagswahl”) from 1994 to 2017, where low double digits (outofsample) improvements in prediction performance can be seen between 3 and 18months prior elections. By taking into account the pollsters house effects, their poll errors and even more importantly their correlations in poll errors, an appropriate and realistic estimation error can be propagated.
Bayesian leaveoneout crossvalidation for large data. Måns Magnusson, Aalto, Michael Riis Andersen, Danish Technical University, Johan Jonasson, Chalmers Technical University, Aki Vehtari, Aalto. Video
Abstract: Model inference, such as model comparison, model checking, and model selection, is an important part of model development. Leaveoneout crossvalidation (LOO) is a general approach for assessing the generalizability of a model, but unfortunately, LOO does not scale well to large datasets. We propose a combination of using approximate inference techniques and probabilityproportionaltosizesampling (PPS) for fast LOO model evaluation for large datasets. We provide both theoretical and empirical results showing good properties for large data.
Stacking for multimodal posterior distributions. Yuling Yao, Aki Vehtari and Andrew Gelman. Video
Abstract: When working with multimodal posterior distributions, MCMC algorithms can have difficulty moving between modes, and default variational or modebased approximate inferences can understate posterior uncertainty. And, even if the most important modes can be found, it is difficult to evaluate their relative weights in the posterior, which requires computing the integral of the posterior in the neighborhood of each mode. Here we propose an alternative approach, using parallel runs of MCMC, variational, or mode based inferences to hit as many modes as possible, and then using Bayesian stacking to weight the set of simulations at each mode. Bayesian stacking is a method for constructing a weighted average of distributions so as to minimize crossvalidated prediction errors. The result from stacking is not necessarily equivalent, even asymptotically, to fully Bayesian inference, but it serves many of the same goals. We discuss in the context of several theoretical and applied examples.
Chronikis: a Bayesian timeseries modeling language. Kevin S. Van Horn. Adobe Inc. Video
Abstract: Chronikis (http://chronikis.org) is an opensource language for Bayesian timeseries models that compiles to Stan and R. It currently focuses on linear statespace models, with plans to incrementally expand the class of supported models over time. The goal for Chronikis is to allow one to quickly and reliably create and apply a variety of models to a time series, doing a full Bayesian analysis on each.
Thus the Chronikis language itself focuses on concise and clear model specification, and as far as possible the task of creating efficient estimation and forecasting code is left to the compiler. These twin goals are facilitated by making the Chronikis language fully declarative: the body of a Chronikis program is just an expression whose ““value”” is a probability distribution over time series.
The compiler applies a series of semanticspreserving transformations to the body of a Chronikis program, eventually arriving at a form that it can straightforwardly translate to Stan. Along the way it infers types and shapes for all variables except the parameters of main(), reparameterizes in some cases to use noncentered parameterization, assigns each variable to the appropriate Stan block, and infers bounds for variables assigned to the parameters block.
For the sake of clarity, Chronikis supports operations for constructing complex models from simpler components. For example, here is a Chronikis program for a randomwalk model with observation noise:
def main(s_rw, s_obs: real{0.0,}, mu0: real, sigma0: real{0.0,})
=
sigma_rw ~ half_cauchy(s_rw);
sigma_obs ~ half_cauchy(s_obs);
accum(wn(sigma_rw)) + constp(mu0, sigma0) + wn(sigma_obs)
Notes on the above:

The main() parameters s_rw, s_obs, mu0, and sigma0 are prior parameters.

sigma_rw^2 and sigma_obs^2 are the randomwalk and observationerror variances.

wn(s) is a white noise process with variance s^2.

constp(m,s) is a distribution over constant time series, with a Normal(m,s) distribution for the constant value.

accum is an operator on timeseries distributions; accum(D) is a timeseries distribution whose draws are cumulative sums of a time series drawn from D.

Sum (+) is another operator on timeseries distributions; D1 + D2 is a timeseries distribution whose draws are the elementwise sum of independent draws from D1 and D2.
Chronikis also has some innovative support for (quasi)periodic timeseries model components. The period can be arbitrarily large, and need not even be an integer. One can allow the periodic pattern to slowly change over time. There is a smoothness parameter, and this bounds the size of the latent state required, regardless of how large the period may be. Chronikis accomplishes all this by constructing a linear statespace model that approximates the zeromean Gaussian process defined by variant of MacKay’s periodic kernel, modified to ensure that the realizations of the process are themselves zerocentered.
Structured priors for survey estimates in the presence of nonrepresentative data. Yuxiang Gao (University of Toronto), Lauren Kennedy (Columbia University), Daniel Simpson (University of Toronto). Video
Abstract:A central theme in the field of survey statistics is estimating populationlevel quantities through data coming from potentially nonrepresentative samples of the population. Multilevel Regression and Poststratification (MRP), a modelbased approach, is gaining traction against the traditional weighted approach for survey estimates. MRP uses partial pooling through random effects, thus shrinking model estimates to an overall mean and reducing potential overfitting. Despite MRP’s straightforward specification of prior distributions, the estimates coming from it are susceptible to bias if there is an underlying structure that the prior does not capture. This work aims to provide a new framework for specifying structured prior distributions that lead to bias reduction in MRP estimates. We use simulation studies to explore the benefit of these priors and demonstrate on US survey data.
ProfitMaximizing A/B Tests. Elea McDonnell Feit, Ron Berman. Drexel University, The Wharton School Video
Abstract: Marketers often use A/B testing as a tactical tool to compare marketing treatments in a test stage and then deploy the betterperforming treatment to the remainder of the consumer population. While these tests have traditionally been analyzed using hypothesis testing, we reframe such tactical tests as a Bayesian decision problem with an explicit tradeoff between the opportunity cost of the test (where some customers receive a suboptimal treatment) and the potential losses associated with deploying a suboptimal treatment to the remainder of the population.
We derive a closedform expression for the profitmaximizing test size and show that it is substantially smaller than that typically recommended for a hypothesis test, particularly when the response is noisy or when the total population is small. The common practice of using small holdout groups for media testing can be rationalized by asymmetric priors. The proposed test design achieves nearly the same expected regret as the flexible, yet hardertoimplement multiarmed bandit.
Adopting a Bayesian approach to experimental design requires informative priors. We show how priors can be estimated from data on past A/B test, using Stan to fit a hierarchical meta model. An R notebook will be provided which shows the complete process from metaanalysis of past experiments to determining the profitmaximizing sample size for the new A/B test.
A full paper is available at https://arxiv.org/abs/1811.00457.
The Currency of Place and the ShortTerm Rental Market. Mikael Brunila. Video
Abstract: Airbnb and shortterm rentals are raising rents in cities through the use of new technologies and by catering to culturally savvy populations. As a phenomenon of the attention economy, Airbnb is a platform where meaning becomes priced, as efficient and attractive communication is awarded by more bookings. In this paper, we look at how this capitalization of meaning can be understood by modelling the use of neighbourhood names. Using Natural Language Processing techniques and Bayesian hierarchical logit models with Intrinsic AutoRegressive priors, we explore how listings draw upon the value placed on wellknown neighbourhoods to promote themselves. Our findings separate different spatial effects as well as neighbourhood and listing level patterns that help us explain how neighbourhood names are deployed to promote shortterm rentals on Airbnb.
Fast Forward Like a Lambo (skrrt skrrt). Daniel Lee. Generable Video
Abstract: Exploring simple, automatic withinchain parallelization. For any (wellbehaved) statistical model written in the Stan language, the Stan Math library (Math) provides the gradient of the log joint probability distribution function specified. It currently provides the gradient with reversemode automatic differentiation. Math also provides forwardmode automatic differentiation, which isn’t as well tested as reversemode, but is available nonethe less. Reversemode automatic differentiation scales well as it can tackle an arbitrary number of parameters with one sweep. However, this can’t be parallelized easily. Forwardmode requires N sweeps to evaluate N directional derivatives, but each of these sweeps can be done in parallel. With the adoption of C++14 capable compilers, we’re now able to use threading as an easy paradigm to coordinate withinchain parallelization. We’ll show some of the performance considerations and some preliminary results.
Prediction and causal inference for timetoevent outcomes truncated by death. Leah Comment. Video
Abstract: Predicting customer behaviour is crucial for making decisions such as the cost of acquisition or planning for production or service capacity. In the model being presented individual purchase data of a fashion retailer is utilized to describe and predict their behaviour using Bayeasian multilayered architecture to allow for heterogeneity and latent variables, such as customer state of activity.
–>
When seasonality meets Bayesian: Decomposing seasonalities in Stan. Hyunji Moon, SNU, Hyeonseop Lee, PUBG. Video
Abstract: Multiple seasonalities play a key role in time series forecasting, especially for business time series where seasonal effects are often dramatic. Previous approaches including Fourier decomposition, exponential smoothing, and Seasonal ARIMA do not reflect distinct characteristics of each period in seasonal patterns such as unique behavior of specific day of the week in business data. We propose a multidimensional hierarchical model. Intermediate parameters for each seasonal period are first estimated, then mixture of intermediate parameters are then taken, resulting in the model which successfully reflects interactions between multiple seasonalities. Although this process leads to the reduction of data available for each parameter, a robust estimation can be obtained through a hierarchical Bayesian model. Consideration of not only the characteristics of each seasonal periods but also the interactions between characteristics from multiple seasonalities becomes possible through this model. Our new model is implemented in Stan and considerable improvements in prediction accuracy compared to previous models are achieved. Previous models include Fourier decomposition which Prophet uses to model seasonalities. Comparison has been performed on realworld dataset from a nationscale logistic network.
Approximate leavefutureout crossvalidation for Bayesian time series models. Paul Bürkner, Jonah Gabry, Aki Vehtari. Video
Abstract: One of the common goals of time series analysis is to use the observed series to inform predictions for future observations. In the absence of any actual new data to predict, crossvalidation can be used to estimate a model’s future predictive accuracy, for instance, for the purpose of model comparison or selection. As exact crossvalidation for Bayesian models is often computationally expensive, approximate crossvalidation methods have been developed; most notably methods for leaveoneout crossvalidation (LOOCV). If the actual prediction task is to predict the future given the past, LOOCV provides an overly optimistic estimate as the information from future observations is available to influence predictions of the past. To tackle the prediction task properly and account for the time series structure, we can use leavefutureout crossvalidation (LFOCV). Like exact LOOCV, exact LFOCV requires refitting the model many times to different subsets of the data. Using Pareto smoothed importance sampling, we propose a method for approximating exact LFOCV that drastically reduces the computational costs while also providing informative diagnostics about the quality of the approximation. We provide examples using Bayesian timeseries models fitted with Stan.
Handling missing data, censored values and measurement error in machine learning models using multiple imputation for early stage drug discovery. Rowan Swiers. AstraZeneca
Video
Abstract: Multiple imputation is a technique for handling missing data, censored values and measurement error. Currently it is underused in the machine learning field due to lack of familiarity and experience with the technique, whilst other missing data solutions such as full Bayesian models can be hard to set up. However, randomizationbased evaluations of Bayesianly derived repeated imputations can provide approximately valid inference of the posterior distributions and allow use of techniques which rely upon complete data such as SVMs and random Forest models.
This paper, using simulated data sets inspired by AstraZeneca drug data, shows how multiple imputation techniques can improve the analysis of data with missing values or with uncertainty. We pay close attention to the prediction of Bayesian posterior coverage due its importance in industrial applications. Comparisons are made to other commonly used methods of handling missing data such as single uniform imputation and data removal. Furthermore, we review several standard multiple imputation models and compare them on our simulated data sets. We provide recommendations on when to use each technique and where extra care is needed based upon data distributions. Finally, using simulated data, we give examples of how correct use of multiple imputation can affect investment decisions in the early stages of drug discovery.
Analysis was performed using both Python and Stan and is provided in a Jupyter notebook.
The emergence of HIV resistance to antiretroviral therapy in southern Africa: a mechanistic metaanalysis of survey data. Julien Riou, Matthias Egger, Christian Althaus. Institute of Social and Preventive Medicine, University of Bern, Switzerland Video
Abstract: Largescale campaigns providing access to antiretroviral therapy (ART) to people living with HIV in southern Africa have been ongoing since the early 2000s. The success of these campaigns is now threatened by the emergence of HIV drug resistance, most of all resistance to nonnucleoside reversetranscriptase inhibitors (NNRTI), a class of drugs constitutive of ART. Systematic reviews of crosssectional surveys have provided insights into the temporal trends of NNRTI resistance among HIVinfected individuals. However, these simple temporal trends fail to account for the local dynamics of HIV transmission and treatment that create the evolutionary pressure generating resistance mutations. Such approaches limit our general understanding of the phenomena of resistance emergence in response to drug introduction and disallow any betweencountry comparison. Here, we propose a mechanistic approach linking the observed levels of NNRTI resistance to the underlying dynamics of HIV in each country.
We developed a SIRlike model consisting of a hierarchical system of ordinary differential equations in Stan. The model considered the infection of susceptible individuals with HIV, the treatment of diagnosed individuals with ART from the early 2000s, the occurrence of resistance mutations in response to the evolutionary pressure created by ART, and the transmission of mutant, resistant viruses to susceptible individuals. The model was fitted jointly to countrylevel data regarding different aspects of the HIV epidemic (prevalence of HIV, number of people under ART and population size in 8 countries of southern Africa from 2000 to 2016) and to measurements of NNRTI resistance in crosssectional surveys (60 surveys from 2000 to 2016). Partial pooling was allowed by introducing a hierarchical structure by country on the parameters governing the occurrence of resistance, as well as a hierarchical structure by survey on resistance data.
The model could adequately reproduce the dynamics of the HIV epidemics in each country. We found substantial heterogeneity between the rates of emergence of NNRTI resistance across countries that is not explained by differences in the local dynamics of HIV transmission and treatment. Understanding the factors associated with this heterogeneity will allow public health authorities to anticipate on potential issues of drug resistance emergence related to local characteristics.
Modelling enzyme kinetics with Stan.Teddy Groves. DTU BIOSUSTAIN Quantitative Modelling of Cell Metabolism Team Video
Abstract: The advent of highthroughput technologies has transformed molecular biology into a datarich discipline. However, integrating this data into a predictive modeling framework is not trivial because many different sources of uncertainty about the molecular processes governing cell metabolism must be taken into account. In particular, reaction fluxes and steadystate reactant concentrations can at best be measured noisily, and even for the bestunderstood organisms preexperimental knowledge of kinetic parameters is incomplete and imprecise.
We are using Stan to overcome the existing limitations in the study of cell metabolism by combining preexperimental knowledge about kinetic parameters with experimental measurements of reactant concentrations and reaction fluxes. The presentation and accompanying notebook show a simple but instructive case.
We model cell metabolism as a set of differential equations describing enzymecatalysed reactions. Expert knowledge is taken into account in the form of priors over parameters describing the enzymes’ dynamics. Measured metabolite concentrations and reaction fluxes are treated as depending on the parameters via the differential equations, with random noise representing measurement error.
We will discuss how our approach compares to others in the same field, how we plan to develop our project and some of the challenges we have faced so far. The biggest challenge is that a large and complicated system of ODEs must be solved every time the joint log probability density is evaluated. We demonstrate a strategy for speeding up this calculation by exploiting the assumption that the system of reactions is at steady state.
https://www.biosustain.dtu.dk/research/scientificsections/quantitativemodellingofcellmetabolism/staffquantitativemodellingofcellmetabolism
Modeling cocoa bean fermentation processes. Mauricio MorenoZambrano, Sergio Grimbs, Matthias S. Ullrich and MarcThorsten Hütt. Department of Life Sciences & Chemistry, Jacobs University Bremen Video
Abstract:A key step in the production of chocolate is the fermentation of cocoa beans. This importance relies on its role in the development of chocolate’s flavor and aroma. Unlike other food fermentation processes, this specific fermentation is well known because of its lack of control and multiple ways in which it is performed. Here, a quantitative model of cocoa bean fermentation is constructed on previously available data regarding microbiological and metabolites dynamics. The model is formulated as a system of coupled ordinary differential equations (ODEs) with two different types of state variables: (1) Metabolite concentrations of glucose (Glc), fructose (Fru), ethanol (EtOH), lactic acid (LA) and acetic acid (Ac), and (2) population sizes of yeast (Y), lactic acid bacteria (LAB) and acetic acid bacteria (AAB). In total, the model comprehends 25 unknown parameters that were estimated using the Markov chain Monte Carlo NoUTurn sampler in Rstan. Thereafter, we demonstrate that the model can quantitatively describe existing fermentation series and that the estimated parameters can be used to extract and interpret differences in environmental conditions between two independent fermentation trials [1].
References
[1] MorenoZambrano, M., Grimbs, S., Ullrich M. S. and Hütt, MT. (2018). A mathematical model of cocoa bean fermantation. Royal Society Open Science, 5(10), 180 964.
Bayesian analyses of timetoevent data using the rstanarm R package. Eren M. Elçi, Sam Brilleman. Public Health and Preventive Medicine, Monash University. Video
Abstract: Timetoevent data refers to the observed time from a defined origin (e.g. diagnosis of a disease) until a terminating event of interest (e.g. death). Timetoevent data emerges in a range of industries and scientific disciplines, although it is particularly common in medical and pharmaceutical research. In these research fields, timetoevent data is commonly known as survival data reflecting the fact that death is an event endpoint often used in clinical studies. Analyses of survival data are widely used for decision making in clinical trials, drug development and regulatory approvals.
In this talk we introduce a flexible family of Bayesian survival models that are being integrated into the rstanarm R package through the new stan_surv modelling function. The implementation uses a familiar formula syntax for specifying covariates and censoring mechanisms, based on the widely recognised survival R package. The stan_surv modelling function accommodates standard parametric (e.g. exponential, Weibull and Gompertz) survival models under either hazard or accelerated failure time formulations. Additionally, flexible parametric (cubic splinebased) hazard models are available. These allow the timedependent baseline hazard and timedependent effects of covariates to both be modelled using flexible smooth functions. We demonstrate the software using an example dataset. We put particular emphasis on functionality that allows practitioners to implement survival analyses as part of a robust Bayesian workflow, including prior and posterior checks and efficient leaveoneout crossvalidation.
Prior choice in logit models of discrete choice. Jim Savage. Schmidt Futures Video
Abstract: In models of discrete choice, sensibleseeming priors on partworth coefficients can imply priors in the choice probability space that are highly implausible, putting close to 100% prior weight on a single choice dominating all others. This problem reveals itself in problems with initialization and poor fit quality. Yet choosing priors is complicated by the research design, including the dimensionality of choice attributes, and their scale and covariance. In this talk I provide intuition for how priors and choice attributes interact to create extreme prior choice probabilities, and describe a new method to define priors that implies closetouniform weight in the choice probability space.
Semiparametric Modeling of the Mean,Variance and Scale Parameters in Skew NormalRegression Models: A Bayesian Perspective. Héctor Zarate.
Abstract: The goal of this paper is to estimate the location, scale and shape functions in heteroscedastic semiparametric models when the response variable comes from a skew normal distribution. We rely on the connection among smoothing methods that use basis functions with penalization, mixed models and a Bayesian Markov Chain sampling simulation methodology. The novelty of our strategy lies in its potential to contribute to a simple and unified computational methodology that takes into account the factors that affect the parameters in the responses, which in turn is important for an efficient estimation and correct inference without the requirement of fully parametric models. A simulation study investigates the performance of the estimates. Finally, an application using the forecasting predictive densities, highlights the merits of our approach.
Hierarchical models for gammaray burst populations. J. Michael Burgess. MPE
Abstract: Inferring the number, rate and intrinsic properties of short gammaray bursts has been a long studied problem in the field. As it is closely related to the number of GW events expected for neutron star mergers, the topic has begun to be discussed int he literature again. However, the utilized techniques for GRBs still rely on improper statistical modeling V/Vmax estimators and in many cases, methods are simply guessed. I will discuss the use of Bayesian hierarchal models to infer population and object level parameters of inhomogeneousPoisson process distributed populations. Techniques to handle highdimensional selections effects will be introduced. The methodology will then be applied to sGRB population data with the aim of understand how many of these objects there are, where they are in the Universe and what are their properties under given modeling assumptions. The methodology is general, thus extensions to other populations can be made easily.
A DecisionTheoretic Journey From Early Clinical Data to Late Stage Efficacy using Hierarchical Joint Models.. Krzysztof Sakrejda, Eric Novik. Generable Video
Abstract: Most statistical problems end with estimating the quantities of interest which may be unobservable parameters or in the prediction context, potentially observable data. As statisticians we sometimes forget that models are often decisionmaking tools and making decisions conditional on our understanding of the uncertainties in the system is the ultimate goal of the consumers of our models. In this talk, we will introduce a decision problem of advancing therapies to latestage clinical trials from earlystage clinical data. We do this in the context of a Bayesian Joint Model.
In clinical studies, it is common to measure a clinical biomarker repeatedly over time (‘longitudinal data’). It is also common to measure the patientspecific time from a defined origin, e.g. diagnosis of a disease, until a clinical event of interest, such as death or disease progression (‘survival data’). Joint Modeling as it is called in the Survival literature aims to model both the longitudinal biomarker evolutions and survival endpoints simultaneously. Commonly, this is achieved by specifying a joint likelihood formulation for longitudinal and survival outcomes.
Joint modeling approaches provide several benefits over more traditional modeling and have applications to health through (i) improving the understanding of how biomarkers influence event endpoints; (ii) the development of dynamic risk prediction models for use in personalized medicine; and in the context of clinical trials (iii) requiring fewer patients than the event model alone.
Once the inferences from the Joint Model are obtained, we set up a Utility function describing the risk preferences of the trial’s sponsors and take its expectation with respect to the posterior distribution. The resulting function is then maximized.
The State of GPU Computation Support for Stan. Rok Češnovar (University of Ljubljana  UL), Steve Bronder (Capital One), Davor Sluga (UL), Jure Demšar (UL), Tadej Ciglarič (UL), Sean Talts (Columbia University), Erik Štrumbelj (UL). Video
Abstract: Our presentations details the current state of and future work on the OpenCLbased framework that allows the Stan automatic differentiation library to utilize GPUs. Our research was initially motivated by large Gaussian Process models where the computation is dominated by the Cholesky decomposition but has since developed into an extensible framework.
The following GPUoptimized routines for matrix algebra primitives are already available to Stan users (including reverse mode): matrix multiplication, solving triangular systems, Cholesky decomposition and some special cases. Several support functions are available in the Math library but not exposed to Stan users: matrix initialization, input validity checking, copy, pack/unpack, multiplication by scalar, and transpose. We have made progress on implementing commonly used likelihoods  4 Generalized Linear Model likelihoods can already be used: normal (identity), Bernoulli (logit), Poisson (log) and Negative Binomial (log). And data caching is now available and substantially reduces the overhead of transferring data to the GPU.
We will show how problem size, model and choice of hardware impact the speedups that we can achieve with GPU computation in Stan. Finally, we will discuss directions for future work, routines to implement next, autotuning tunable GPU parameters and advanced data caching.
Extending Stan’s Automatic Differentiation (AD) capabilities using dco/c++. Philip Maybank. Numerical Algorithms Group (NAG) Video
Abstract: Tapebased AD Libraries, such as NAG’s dco/c++ tool, keep a record of calculations that are executed by a program in order to evaluate derivatives. They are applicable to a wider range of numerical codes than tapefree AD libraries, which are typically written to compute derivatives for a specific library of functions. The Stan Math Library is a tapefree AD library. The basic idea of the work in this presentation is that dco/c++ can be used to supply derivatives to Stan. This extends the range of functions which can be used by Stan’s MCMC samplers. We illustrate this idea on a toy problem: inferring the parameters of a damped harmonic oscillator driven by white noise using Stan’s NUTS.
Estimating the prevalence of HIV infection in England using Bayesian evidence synthesis. Anne Presanis, Christopher Jackson [presenting author], Daniela De Angelis (MRC Biostatistics Unit, University of Cambridge); Peter Kirwan, Alison Brown, Ada Miltz, Ross Harris, Cuong Chau, Stephanie Migchelsen, Hamish Mohammed, Katy Davison, Sara Croxford, Sarika Desai, Kathy Lowndes, Valerie Delpech, Noel Gill (Public Health England). Video
Abstract: We present a substantive application of Stan that has informed national health policy.
Annual estimation of the number of people living with HIV in England, including those who are unaware of their infection, has, for several years, been based on a Bayesian model that combines evidence from multiple sources of data. For several demographic and risk groups, the model estimates the number of people in each group, the prevalence of HIV, and the proportion of HIV infections that are diagnosed.
In the 2018 version of this model, implemented in Stan, the strata are defined by age, gender, sexual behaviour, injecting drug use, ethnicity and region. Changes between years are also modelled. Routinelycollected data sources include a register of diagnosed HIV infections, a register of attendances at genitourinary medicine (GUM) clinics, and the national census. These are combined with data from several surveys of health and sexual behaviour among different groups, HIV testing data from unlinked anonymous surveys of drug users, and data from HIV testing of donated blood.
This is an example of a ““multiparameter evidence synthesis””, where the quantities of interest cannot be estimated directly, but can be inferred indirectly through a network of model assumptions. Potential biases due to selection, underreporting and missing data are represented explicitly through structural assumptions and informative priors. A fourlevel hierarchical model is used to borrow strength between stratumspecific parameters. Stan’s model description language makes the assumptions explicit, and its inference engine provides posterior estimates efficiently.
The estimates from 2018 demonstrate that the UNAIDS target of 90% of infections diagnosed by 2020 has been met in England, and the estimates continue to inform policies around HIV testing, treatment and prevention.
Estimating the effect of age and league on scoring rate in professional soccer. Benjamin Torvaney. Wefarm
Abstract: Understanding the effect of different factors on player output is critical to accurately evaluating player performance. In particular, it is useful to be able to project performance into the future, whether to assess a potential new signing, or to aid in squad management. To do this, we must account for footballing context. Intuitively, we know that scoring goals in the Norwegian Eliteserien is less impressive than scoring in the Premier League; however, this is rarely quantified.
If we propose a model in which a player’s expected goalscoring rate is the product of their ability, the difficulty of the competition, and a relative age effect, we can estimate the effect of each parameter from historical goalscoring tallies (accompanied by minutes played). We can extend the model to allow competition factors to vary over time, to reflect the changing dynamics of professional soccer.
Such a model yields promising results: high profile soccer stars have the highest model estimates; a clear age curve for goalscoring is produced; competition strengths vary over time in accordance with popular perception.
Multichannel Gaussian Processes as flexible alternatives to linear models: perspectives and challenges to scaling up Bayesian inference to genomicscale data. Caetano SoutoMaior, Susan T. Harbison. Laboratory of Systems Genetics, National Heart Lung and Blood Institute, NIH.
Abstract:
One weird trick: Nonparametric Bayesian updating by kernels. Robert Grant. BayesCamp
Abstract: One of the big attractions for people adopting Bayesian methods is the promise of ““updating”” their parameter estimates and predictions as more data arrive. Yesterday’s posterior becomes today’s prior. In practice, this is not always simple, requiring at the very least a complete set of sufficient statistics, random samples from an unchanging population, and no changes of mind about the probability distribution for the priors. Sometimes, one would like to update without imposing an a priori distribution on yesterday’s posterior and without estimating lots of statistics. I discuss a kernel approach, which is easily incorporated in Stan by an additional target+= statement, looping over yesterday’s posterior draws, and uniform proposal densities. I compare this with parametric updates, and explore the potential to reduce computation by using kernels weighted by counts of posterior draws inside hypercubes of parameter space.
Making Stan Faster using Sequential Monte Carlo samplers. Simon Maskell (University of Liverpool), Alessandro Varsi (University of Liverpool), Peter Green (University of Liverpool), Paul Horridge (University of Liverpool), Alejandro Diaz (University of Liverpool), Lee Devlin (University of Liverpool), Rob Moore (University of Liverpool), Katerina Chatzopoulou (University of Liverpool), Jinglai Li (University of Liverpool), Maria Sudell (University of Liverpool), Luke Mason (STFC), Robin Pinning (STFC), Jack Taylor (STFC), Vassil Alexandrov (STFC), Ed PyzerKnapp (IBM).
Abstract: Stan uses the No UTurn Sampler (NUTS), a specific instance of Markov Chain Monte Carlo (MCMC). MCMC can be slow, e.g., when dimensionality is high and it would be better if NUTS was faster. We have recently been working to improve the runtime of a solution to problems that Stan can tackle (and those that it cannot, e.g. those that would require reversible jump MCMC). Our approach has been to replace NUTS with a variant of a Sequential Monte Carlo (SMC) sampler that uses the clever ideas embodied in NUTS without coupling them to MCMC specifically. SMC samplers manipulate a population of samples, making it possible to distribute computation across each of many processors. Our work has shown that SMC samplers can be configured to exploit this parallelism (and the advances that have led to the development of, for example, the use of NUTS as a proposal distribution). This can achieve faster runtime than MCMC in terms of the number of effective samples per second (by running the SMC sampler on clusters of hundreds of cores, as are routinely used in the context of Deep Learning, for example). Furthermore, we have shown that SMC samplers can be configured to outperform MCMC by making better use of the available processing resources. This is possible because MCMC’s convergence proofs require that the single sampling chain never goes wrong while the proofs for SMC samplers only require that the samples don’t all malfunction simultaneously. Put another way, SMC samplers have an additional degree of freedom in their design and this degree of freedom can be exploited to offer improved performance relative to MCMC. This talk will explain how SMC samplers can outperform MCMC per second and per flop. We will also describe our progress to date on integrating SMC samplers into Stan: our intent is to make it possible to use all Stan files. Thus far we’re able to achieve a runtime that is over an order of magnitude faster than MCMC.
Gaussian process modeling and covariate selection for longitudinal data. Juho Timonen, Aki Vehtari, Harri Lähdesmäki. Alto University
Abstract: Longitudinal data arises when the same observational units are measured repeatedly, and is common in clinical studies. Such data is often modeled using generalized linear mixed effect models with offtheshelf software packages. These are, however, restricted to a parametric form and cannot model nonstationary disease effects. We demonstrate our new Rpackage for interpretable Bayesian nonparametric modeling of longitudinal data using additive Gaussian processes. Like the Rpackages and brms, our goal is to provide an interface to Stan with a simple and intuitive syntax. However, our Stan program is specifically designed for Gaussian process modeling of longitudinal data, allowing the user to specify a model that mixes group and individualspecific age effects or effects of other continuous or categorical covariates. We show how our package uses Stan to model nonstationary disease effects and uncertainty of the observed disease onsets, identify heterogeneous effects present in only a subset of study subjects, and handles general nonGaussian likelihoods. Furthermore, we define a way of resolving the relevance of any continuous or categorical covariate by sampling only one full model with all covariates included. Our focus is on biomedical applications, where is often vital to determine which covariates affect the response variable, in order to reduce future measurement costs or have a better interpretation about the progression of a disease.
Computing prediction and tolerance intervals for a mixture of normal distributions. Jeanfrancois Michiels, Timothy Mutsvari, Oussama Errazi. Pharmalex
Abstract: For the submission of a Biosimilar product, Biosimilarity assessment is the first step to achieve in the “Totality of Evidence” strategy as required by Authorities (e.g. FDA). The main objective of biosimilarity is to give evidence that the test biological product is as similar as possible to the reference product. The definition of ‘similar’ remains a critical component that needs to be addressed and justified. For biologicals, it is the process and its capability that should be evaluated, i.e. the risk of producing batches outside defendable limits. Thus, the first step is to set the acceptance limits. βexpectation and (β,γ), also known as BetaGamma, tolerance intervals are useful metrics to demonstrate that a test product (i.e. the biosimilar) is similar to a reference product. Biosimilarity is concluded if the βexpectation of the biosimilar product is within the (β,γ) of the reference. βexpectation interval is constructed to contain a β proportion of the population on average. A (β,γ) tolerance interval on the other hand is built to contain at least a β proportion of the population with a confidence level γ. In general, the pharmaceutical company producing the biosimilar has no access to the data of the reference product. Buying boxes of the reference product from several drugstores and analysing them is nevertheless one possible strategy to acquire knowledge on the process variability. Due to that sampling strategy, the distribution of the reference product can be quite exotic and it is likely that the distribution of the reference product is a mixture of normal distributions. Fitting a mixture of 2 normal distributions on data is performed using Stan. The output are the posterior distributions of the mean and standard deviation of the 2 normal distributions and the posterior distribution of the relative proportion of the 2 distributions. We present different algorithms to derive βexpectation and (β,γ) tolerance intervals for a mixture of 2 normal distributions. Using simulations, the operating characteristics of the intervals are shown (e.g. the capability to conclude similarity when it is actually similar).
Parallel numerical ODE solution in Torsten for population models. Yi Zhang, William R. Gillespie. Metrum LLC
Abstract: Torsten is a collection of functions to facilitate analysis of pharmacometric data using Stan. To seek an alternative to the ““map_rect”” function for withinchain parallel computation in Stan, we have implemented numerical ODE solution functions for population models with functional signatures that specify schedules of events such as doses and observations in a manner consistent with NONMEM compatible.
The population solution function feature is designed toward multilevel parallelization using Message Passing Interface(MPI). For that we first implemented Torsten’s own ODE integrators based on CVODES library. Table 1 shows MPI performance results of such an integrator on a group of 1000 Lorenz systems.
n_processor  Walltime(ms)  Speedup  efficiency 
1  10986  1.00  1.00 
2  5505  2.00  1.00 
4  3091  3.55  0.89 
8  1459  7.53  0.94 
16  1355  8.11  0.51 
32  739  14.87  0.46 
64  424  25.91  0.40 
128  382  28.76  0.22 
256  284  38.68  0.15 
512  293  37.49  0.07 
Table 1: MPI performance of the Lorenz model solved by Torsten’s BDF integrator. (n_population = 1000)
Then we developed MPIbased population solvers that are specifically designed for PKPD applications, for which ODE system size $n$ is typically in the scale of $10^0\sim 10^2$. We employ the latest standard(MPI3) functionalities for latency hiding, and test the implementation on two MPI implementations (OpenMPI and MPICH). Tables 25 show performance results of one such function on a simple twocompartment PK model($n=3$) and a more complex PKPD model($n=8$), run on a METWORX workflow.
n_processor  Walltime(ms)  Speedup  efficiency 
1  2966  1.00  1.00 
2  1544  1.92  0.96 
4  866  3.42  0.85 
8  887  3.34  0.42 
Table 2: Parallel performance of solving a twocompartment population model using pmx_solve_group_bdf and OpenMPI.
n_processor  Walltime(ms)  Speedup  efficiency 
1  45791  1.00  1.00 
2  23532  1.95  0.97 
4  13421  3.41  0.85 
8  10394  4.41  0.55 
Table 3: Parallel performance of solving a Neutropenia population model using pmx_solve_group_bdf and OpenMPI.
n_processor  Walltime(ms)  Speedup  efficiency 
1  2470  1.00  1.00 
2  1419  1.74  0.87 
4  1170  2.11  0.53 
8  860  2.87  0.36 
Table 4: Parallel performance of solving a twocompartment population model using pmx_solve_group_bdf and MPICH.
n_processor  Walltime(ms)  Speedup  efficiency 
1  45087  1.00  1.00 
2  22976  1.96  0.98 
4  14158  3.18  0.80 
8  10523  4.28  0.54 
Table 5: Parallel performance of solving a Neutropenia population model using pmx_solve_group_bdf and MPICH.
In addtional to populationlevel parallelization, we are also implementing individuallevel parallelization based on parallel time integration with multigrid. This will enables us to reduce the solution time of a single ODE system, and create a multilevel parallelization for ODEbased population models. The results of a preliminary implementation are shown in Table 6.
n_processor  Walltime(ms)  Speedup  efficiency 
1  2.8  1.00  1.00 
2  1.7  1.65  0.82 
4  1.2  2.33  0.58 
Table 6: Parallel performance of solving 10^4 steps of a single Neutropenia ODE system using parallelintime technique.