Estimate the subject-specific or marginal longitudinal trajectory
Source:R/posterior_traj.R
posterior_traj.RdThis function allows us to generate an estimated longitudinal trajectory (either subject-specific, or by marginalising over the distribution of the group-specific parameters) based on draws from the posterior predictive distribution.
Usage
posterior_traj(
object,
m = 1,
newdata = NULL,
newdataLong = NULL,
newdataEvent = NULL,
interpolate = TRUE,
extrapolate = FALSE,
control = list(),
last_time = NULL,
prob = 0.95,
ids,
dynamic = TRUE,
scale = 1.5,
draws = NULL,
seed = NULL,
return_matrix = FALSE,
...
)Arguments
- object
A fitted model object returned by the
stan_jmmodelling function. Seestanreg-objects.- m
Integer specifying the number or name of the submodel
- newdata
Deprecated: please use
newdataLonginstead. Optionally, a data frame in which to look for variables with which to predict. If omitted, the model matrix is used. Ifnewdatais provided and any variables were transformed (e.g. rescaled) in the data used to fit the model, then these variables must also be transformed innewdata. This only applies if variables were transformed before passing the data to one of the modeling functions and not if transformations were specified inside the model formula.- newdataLong, newdataEvent
Optionally, a data frame (or in the case of
newdataLongthis can be a list of data frames) in which to look for variables with which to predict. If omitted, the model matrices are used. If new data is provided, then two options are available. Either one can provide observed covariate and outcome data, collected up to some time t, and use this data to draw new individual-specific coefficients (i.e. individual-level random effects). This is the default behaviour when new data is provided, determined by the argumentdynamic = TRUE, and requiring bothnewdataLongandnewdataEventto be specified. Alternatively, one can specifydynamic = FALSE, and then predict using just covariate data, by marginalising over the distribution of the group-specific coefficients; in this case, onlynewdataLongneeds to be specified and it only needs to be a single data frame with the covariate data for the predictions for the one longitudinal submodel.- interpolate
A logical specifying whether to interpolate the estimated longitudinal trajectory in between the observation times. This can be used to achieve a smooth estimate of the longitudinal trajectory across the entire follow up time. If
TRUEthen the interpolation can be further controlled using thecontrolargument.- extrapolate
A logical specifying whether to extrapolate the estimated longitudinal trajectory beyond the time of the last known observation time. If
TRUEthen the extrapolation can be further controlled using thecontrolargument.- control
A named list with parameters controlling the interpolation or extrapolation of the estimated longitudinal trajectory when either
interpolate = TRUEorextrapolate = TRUE. The list can contain one or more of the following named elements:ipointsa positive integer specifying the number of discrete time points at which to calculate the estimated longitudinal response for
interpolate = TRUE. These time points are evenly spaced starting at 0 and ending at the last known observation time for each individual. The last observation time for each individual is taken to be either: the event or censoring time if no new data is provided; the time specified in the "last_time" column if provided in the new data (see Details section below); or the time of the last longitudinal measurement if new data is provided but no "last_time" column is included. The default is 15.epointsa positive integer specifying the number of discrete time points at which to calculate the estimated longitudinal response for
extrapolate = TRUE. These time points are evenly spaced between the last known observation time for each individual and the extrapolation distance specifed using eitheredistoreprop. The default is 15.epropa positive scalar between 0 and 1 specifying the amount of time across which to extrapolate the longitudinal trajectory, represented as a proportion of the total observed follow up time for each individual. For example specifying
eprop = 0.2means that for an individual for whom the latest of their measurement, event or censoring times was 10 years, their estimated longitudinal trajectory will be extrapolated out to 12 years (i.e. 10 + (0.2 * 10)). The default value is 0.2.edista positive scalar specifying the amount of time across which to extrapolate the longitudinal trajectory for each individual, represented in units of the time variable
time_var(from fitting the model). This cannot be specified ifepropis specified.
- last_time
A scalar, character string, or
NULL. This argument specifies the last known survival time for each individual when conditional predictions are being obtained. IfnewdataEventis provided and conditional survival predictions are being obtained, then thelast_timeargument can be one of the following: (i) a scalar, this will use the same last time for each individual innewdataEvent; (ii) a character string, naming a column innewdataEventin which to look for the last time for each individual; (iii)NULL, in which case the default is to use the time of the latest longitudinal observation innewdataLong. IfnewdataEventisNULLthen thelast_timeargument cannot be specified directly; instead it will be set equal to the event or censoring time for each individual in the dataset that was used to estimate the model. If standardised survival probabilities are requested (i.e.standardise = TRUE) then conditional survival probabilities are not allowed and therefore thelast_timeargument is ignored.- prob
A scalar between 0 and 1 specifying the width to use for the uncertainty interval (sometimes called credible interval) for the predicted mean response and the prediction interval for the predicted (raw) response. For example
prob = 0.95(the default) means that the 2.5th and 97.5th percentiles will be provided. Only relevant whenreturn_matrixisFALSE.- ids
An optional vector specifying a subset of subject IDs for whom the predictions should be obtained. The default is to predict for all individuals who were used in estimating the model or, if
newdatais specified, then all individuals contained innewdata.- dynamic
A logical that is only relevant if new data is provided via the
newdataargument. Ifdynamic = TRUE, then new group-specific parameters are drawn for the individuals in the new data, conditional on their longitudinal biomarker data contained innewdata. These group-specific parameters are then used to generate individual-specific survival probabilities for these individuals. These are often referred to as "dynamic predictions" in the joint modelling context, because the predictions can be updated each time additional longitudinal biomarker data is collected on the individual. On the other hand, ifdynamic = FALSEthen the survival probabilities will just be marginalised over the distribution of the group-specific coefficients; this will mean that the predictions will incorporate all uncertainty due to between-individual variation so there will likely be very wide credible intervals on the predicted survival probabilities.- scale
A scalar, specifying how much to multiply the asymptotic variance-covariance matrix for the random effects by, which is then used as the "width" (ie. variance-covariance matrix) of the multivariate Student-t proposal distribution in the Metropolis-Hastings algorithm. This is only relevant when
newdataEventis supplied anddynamic = TRUE, in which case new random effects are simulated for the individuals in the new data using the Metropolis-Hastings algorithm.- draws
An integer indicating the number of MCMC draws to return. The default is to set the number of draws equal to 200, or equal to the size of the posterior sample if that is less than 200.
- seed
An optional
seedto use.- return_matrix
A logical. If
TRUEthen adrawsbynrow(newdata)matrix is returned which contains all the actual simulations or draws from the posterior predictive distribution. Otherwise ifreturn_matrixis set toFALSE(the default) then a data frame is returned, as described in the Value section below.- ...
Other arguments passed to
posterior_predict, for exampledraws,re.form,seed, etc.
Value
When return_matrix = FALSE, a data frame
of class predict.stanjm. The data frame includes a column for the median
of the posterior predictions of the mean longitudinal response (yfit),
a column for each of the lower and upper limits of the uncertainty interval
corresponding to the posterior predictions of the mean longitudinal response
(ci_lb and ci_ub), and a column for each of the lower and upper
limits of the prediction interval corresponding to the posterior predictions
of the (raw) longitudinal response. The data frame also includes columns for
the subject ID variable, and each of the predictor variables. The returned
object also includes a number of attributes.
When return_matrix = TRUE, the returned object is the same as that
described for posterior_predict.
Details
The posterior_traj function acts as a wrapper to the
posterior_predict function, but allows predictions to be
easily generated at time points that are interpolated and/or extrapolated
between time zero (baseline) and the last known survival time for the
individual, thereby providing predictions that correspond to a smooth estimate
of the longitudinal trajectory (useful for the plotting via the associated
plot.predict.stanjm method). In addition it returns a data
frame by default, whereas the posterior_predict function
returns a matrix; see the Value section below for details. Also,
posterior_traj allows predictions to only be generated for a subset
of individuals, via the ids argument.
Examples
if (.Platform$OS.type != "windows" || .Platform$r_arch != "i386") {
# \donttest{
# Run example model if not already loaded
if (!exists("example_jm")) example(example_jm)
# Obtain subject-specific predictions for all individuals
# in the estimation dataset
pt1 <- posterior_traj(example_jm, interpolate = FALSE, extrapolate = FALSE)
head(pt1)
# Obtain subject-specific predictions only for a few selected individuals
pt2 <- posterior_traj(example_jm, ids = c(1,3,8))
# If we wanted to obtain subject-specific predictions in order to plot the
# longitudinal trajectories, then we might want to ensure a full trajectory
# is obtained by interpolating and extrapolating time. We can then use the
# generic plot function to plot the subject-specific predicted trajectories
# for the first three individuals. Interpolation and extrapolation is
# carried out by default.
pt3 <- posterior_traj(example_jm)
head(pt3) # predictions at additional time points compared with pt1
plot(pt3, ids = 1:3)
# If we wanted to extrapolate further in time, but decrease the number of
# discrete time points at which we obtain predictions for each individual,
# then we could specify a named list in the 'control' argument
pt4 <- posterior_traj(example_jm, control = list(ipoints = 10, epoints = 10, eprop = 0.5))
# If we have prediction data for a new individual, and we want to
# estimate the longitudinal trajectory for that individual conditional
# on this new data (perhaps extrapolating forward from our last
# longitudinal measurement) then we can do that. It requires drawing
# new individual-specific parameters, based on the full likelihood,
# so we must supply new data for both the longitudinal and event
# submodels. These are sometimes known as dynamic predictions.
ndL <- pbcLong[pbcLong$id == 8, , drop = FALSE]
ndE <- pbcSurv[pbcSurv$id == 8, , drop = FALSE]
ndL$id <- "new_subject" # new id can't match one used in training data
ndE$id <- "new_subject"
pt5 <- posterior_traj(example_jm,
newdataLong = ndL,
newdataEvent = ndE)
# By default it is assumed that the last known survival time for
# the individual is the time of their last biomarker measurement,
# but if we know they survived to some later time then we can
# condition on that information using the last_time argument
pt6 <- posterior_traj(example_jm,
newdataLong = ndL,
newdataEvent = ndE,
last_time = "futimeYears")
# Alternatively we may want to estimate the marginal longitudinal
# trajectory for a given set of covariates. To do this, we can pass
# the desired covariate values in a new data frame (however the only
# covariate in our fitted model was the time variable, year). To make sure
# that we marginalise over the random effects, we need to specify an ID value
# which does not correspond to any of the individuals who were used in the
# model estimation and specify the argument dynamic=FALSE.
# The marginal prediction is obtained by generating subject-specific
# predictions using a series of random draws from the random
# effects distribution, and then integrating (ie, averaging) over these.
# Our marginal prediction will therefore capture the between-individual
# variation associated with the random effects.
nd <- data.frame(id = rep("new1", 11), year = (0:10 / 2))
pt7 <- posterior_traj(example_jm, newdataLong = nd, dynamic = FALSE)
head(pt7) # note the greater width of the uncertainty interval compared
# with the subject-specific predictions in pt1, pt2, etc
# Alternatively, we could have estimated the "marginal" trajectory by
# ignoring the random effects (ie, assuming the random effects were set
# to zero). This will generate a predicted longitudinal trajectory only
# based on the fixed effect component of the model. In essence, for a
# linear mixed effects model (ie, a model that uses an identity link
# function), we should obtain a similar point estimate ("yfit") to the
# estimates obtained in pt5 (since the mean of the estimated random effects
# distribution will be approximately 0). However, it is important to note that
# the uncertainty interval will be much more narrow, since it completely
# ignores the between-individual variability captured by the random effects.
# Further, if the model uses a non-identity link function, then the point
# estimate ("yfit") obtained only using the fixed effect component of the
# model will actually provide a biased estimate of the marginal prediction.
# Nonetheless, to demonstrate how we can obtain the predictions only using
# the fixed effect component of the model, we simply specify 're.form = NA'.
# (We will use the same covariate values as used in the prediction for
# example for pt5).
pt8 <- posterior_traj(example_jm, newdataLong = nd, dynamic = FALSE,
re.form = NA)
head(pt8) # note the much narrower ci, compared with pt5
# }
}
#> `geom_smooth()` using formula = 'y ~ x'
#> `geom_smooth()` using formula = 'y ~ x'
#> Drawing new random effects for 1 individuals. Monitoring progress:
#>
|
| | 0%
|
|======================================================================| 100%
#> Drawing new random effects for 1 individuals. Monitoring progress:
#>
|
| | 0%
|
|======================================================================| 100%
#> id year yfit ci_lb ci_ub pi_lb pi_ub
#> 1 new1 0.0000000 0.4916872 -0.4241185 0.8481504 -1.2898612 1.439933
#> 2 new1 0.3571429 0.5504692 -0.3639245 0.9054757 -0.4663548 1.641273
#> 3 new1 0.7142857 0.6128161 -0.3037304 0.9653442 -0.4979441 1.371600
#> 4 new1 1.0714286 0.6751630 -0.2435363 1.0255971 -0.5529147 1.420842
#> 5 new1 1.4285714 0.7358875 -0.1833423 1.0858500 -0.3959071 1.359410
#> 6 new1 1.7857143 0.7955649 -0.1231482 1.1461029 -0.4307864 1.614048