R/posterior_traj.R
posterior_traj.Rd
This function allows us to generate an estimated longitudinal trajectory (either subject-specific, or by marginalising over the distribution of the group-specific parameters) based on draws from the posterior predictive distribution.
posterior_traj( object, m = 1, newdata = NULL, newdataLong = NULL, newdataEvent = NULL, interpolate = TRUE, extrapolate = FALSE, control = list(), last_time = NULL, prob = 0.95, ids, dynamic = TRUE, scale = 1.5, draws = NULL, seed = NULL, return_matrix = FALSE, ... )
object | A fitted model object returned by the
|
---|---|
m | Integer specifying the number or name of the submodel |
newdata | Deprecated: please use |
newdataLong, newdataEvent | Optionally, a data frame (or in the case of
|
interpolate | A logical specifying whether to interpolate the estimated
longitudinal trajectory in between the observation times. This can be used
to achieve a smooth estimate of the longitudinal trajectory across the
entire follow up time. If |
extrapolate | A logical specifying whether to extrapolate the estimated
longitudinal trajectory beyond the time of the last known observation time.
If |
control | A named list with parameters controlling the interpolation or
extrapolation of the estimated longitudinal trajectory when either
|
last_time | A scalar, character string, or |
prob | A scalar between 0 and 1 specifying the width to use for the
uncertainty interval (sometimes called credible interval) for the predicted
mean response and the prediction interval for the predicted (raw) response.
For example |
ids | An optional vector specifying a subset of subject IDs for whom the
predictions should be obtained. The default is to predict for all individuals
who were used in estimating the model or, if |
dynamic | A logical that is only relevant if new data is provided
via the |
scale | A scalar, specifying how much to multiply the asymptotic
variance-covariance matrix for the random effects by, which is then
used as the "width" (ie. variance-covariance matrix) of the multivariate
Student-t proposal distribution in the Metropolis-Hastings algorithm. This
is only relevant when |
draws | An integer indicating the number of MCMC draws to return. The default is to set the number of draws equal to 200, or equal to the size of the posterior sample if that is less than 200. |
seed | An optional |
return_matrix | A logical. If |
... | Other arguments passed to |
When return_matrix = FALSE
, a data frame
of class predict.stanjm
. The data frame includes a column for the median
of the posterior predictions of the mean longitudinal response (yfit
),
a column for each of the lower and upper limits of the uncertainty interval
corresponding to the posterior predictions of the mean longitudinal response
(ci_lb
and ci_ub
), and a column for each of the lower and upper
limits of the prediction interval corresponding to the posterior predictions
of the (raw) longitudinal response. The data frame also includes columns for
the subject ID variable, and each of the predictor variables. The returned
object also includes a number of attributes.
When return_matrix = TRUE
, the returned object is the same as that
described for posterior_predict
.
The posterior_traj
function acts as a wrapper to the
posterior_predict
function, but allows predictions to be
easily generated at time points that are interpolated and/or extrapolated
between time zero (baseline) and the last known survival time for the
individual, thereby providing predictions that correspond to a smooth estimate
of the longitudinal trajectory (useful for the plotting via the associated
plot.predict.stanjm
method). In addition it returns a data
frame by default, whereas the posterior_predict
function
returns a matrix; see the Value section below for details. Also,
posterior_traj
allows predictions to only be generated for a subset
of individuals, via the ids
argument.
# \donttest{ # Run example model if not already loaded if (!exists("example_jm")) example(example_jm) # Obtain subject-specific predictions for all individuals # in the estimation dataset pt1 <- posterior_traj(example_jm, interpolate = FALSE, extrapolate = FALSE) head(pt1)#> id year yfit ci_lb ci_ub pi_lb pi_ub #> 1 1 0.0000000 2.69984909 2.25764936 3.2239869 1.6413644 3.577369 #> 2 1 0.5256674 2.79254601 2.34119466 3.3084102 1.8686028 3.786467 #> 3 2 0.0000000 0.06099148 -0.28755884 0.3632144 -0.7153253 1.142746 #> 4 2 0.4982888 0.14683570 -0.19678502 0.4418192 -0.6692718 1.090876 #> 5 2 0.9993155 0.23240127 -0.10551244 0.5220164 -0.8086935 0.891891 #> 6 2 2.1026694 0.41666914 0.09548675 0.7127605 -0.4392688 1.196561# Obtain subject-specific predictions only for a few selected individuals pt2 <- posterior_traj(example_jm, ids = c(1,3,8)) # If we wanted to obtain subject-specific predictions in order to plot the # longitudinal trajectories, then we might want to ensure a full trajectory # is obtained by interpolating and extrapolating time. We can then use the # generic plot function to plot the subject-specific predicted trajectories # for the first three individuals. Interpolation and extrapolation is # carried out by default. pt3 <- posterior_traj(example_jm) head(pt3) # predictions at additional time points compared with pt1#> id year yfit ci_lb ci_ub pi_lb pi_ub #> 1 1 0 2.69984909 2.2576494 3.2239869 1.8820139 4.15414861 #> 2 2 0 0.06099148 -0.2875588 0.3632144 -0.6504792 0.77562794 #> 3 3 0 0.22635897 -0.1014481 0.6277125 -0.6215849 0.86623579 #> 4 4 0 0.65360972 0.3630433 1.0321208 -0.2365076 1.43172407 #> 5 5 0 1.14671848 0.8692571 1.4856038 0.4108954 1.81162914 #> 6 6 0 -0.73281998 -0.9630179 -0.3803159 -1.6285571 -0.03093085#>#>#># If we wanted to extrapolate further in time, but decrease the number of # discrete time points at which we obtain predictions for each individual, # then we could specify a named list in the 'control' argument pt4 <- posterior_traj(example_jm, control = list(ipoints = 10, epoints = 10, eprop = 0.5)) # If we have prediction data for a new individual, and we want to # estimate the longitudinal trajectory for that individual conditional # on this new data (perhaps extrapolating forward from our last # longitudinal measurement) then we can do that. It requires drawing # new individual-specific parameters, based on the full likelihood, # so we must supply new data for both the longitudinal and event # submodels. These are sometimes known as dynamic predictions. ndL <- pbcLong[pbcLong$id == 8, , drop = FALSE] ndE <- pbcSurv[pbcSurv$id == 8, , drop = FALSE] ndL$id <- "new_subject" # new id can't match one used in training data ndE$id <- "new_subject" pt5 <- posterior_traj(example_jm, newdataLong = ndL, newdataEvent = ndE)#> Drawing new random effects for 1 individuals. Monitoring progress: #> | | | 0% | |======================================================================| 100%# By default it is assumed that the last known survival time for # the individual is the time of their last biomarker measurement, # but if we know they survived to some later time then we can # condition on that information using the last_time argument pt6 <- posterior_traj(example_jm, newdataLong = ndL, newdataEvent = ndE, last_time = "futimeYears")#> Drawing new random effects for 1 individuals. Monitoring progress: #> | | | 0% | |======================================================================| 100%# Alternatively we may want to estimate the marginal longitudinal # trajectory for a given set of covariates. To do this, we can pass # the desired covariate values in a new data frame (however the only # covariate in our fitted model was the time variable, year). To make sure # that we marginalise over the random effects, we need to specify an ID value # which does not correspond to any of the individuals who were used in the # model estimation and specify the argument dynamic=FALSE. # The marginal prediction is obtained by generating subject-specific # predictions using a series of random draws from the random # effects distribution, and then integrating (ie, averaging) over these. # Our marginal prediction will therefore capture the between-individual # variation associated with the random effects. nd <- data.frame(id = rep("new1", 11), year = (0:10 / 2)) pt7 <- posterior_traj(example_jm, newdataLong = nd, dynamic = FALSE) head(pt7) # note the greater width of the uncertainty interval compared#> id year yfit ci_lb ci_ub pi_lb pi_ub #> 1 new1 0.0000000 0.8383967 -2.779474 2.740931 -2.771958 2.807810 #> 2 new1 0.3571429 0.8964855 -2.722729 2.800614 -2.934628 3.159185 #> 3 new1 0.7142857 0.9492078 -2.665984 2.860296 -2.875908 2.856379 #> 4 new1 1.0714286 1.0019300 -2.609239 2.919979 -2.905819 3.159848 #> 5 new1 1.4285714 1.0574264 -2.552494 2.979661 -2.384959 3.226213 #> 6 new1 1.7857143 1.1183328 -2.495749 3.039344 -2.552383 3.387019# with the subject-specific predictions in pt1, pt2, etc # Alternatively, we could have estimated the "marginal" trajectory by # ignoring the random effects (ie, assuming the random effects were set # to zero). This will generate a predicted longitudinal trajectory only # based on the fixed effect component of the model. In essence, for a # linear mixed effects model (ie, a model that uses an identity link # function), we should obtain a similar point estimate ("yfit") to the # estimates obtained in pt5 (since the mean of the estimated random effects # distribution will be approximately 0). However, it is important to note that # the uncertainty interval will be much more narrow, since it completely # ignores the between-individual variability captured by the random effects. # Further, if the model uses a non-identity link function, then the point # estimate ("yfit") obtained only using the fixed effect component of the # model will actually provide a biased estimate of the marginal prediction. # Nonetheless, to demonstrate how we can obtain the predictions only using # the fixed effect component of the model, we simply specify 're.form = NA'. # (We will use the same covariate values as used in the prediction for # example for pt5). pt8 <- posterior_traj(example_jm, newdataLong = nd, dynamic = FALSE, re.form = NA) head(pt8) # note the much narrower ci, compared with pt5#> id year yfit ci_lb ci_ub pi_lb pi_ub #> 1 new1 0.0000000 0.6866444 0.2654333 1.123719 -0.27232294 1.695580 #> 2 new1 0.3571429 0.7493349 0.3297991 1.182328 -0.20333442 1.858451 #> 3 new1 0.7142857 0.8098669 0.3941648 1.239708 -0.29720147 1.479493 #> 4 new1 1.0714286 0.8705245 0.4585306 1.297089 -0.32133975 1.528451 #> 5 new1 1.4285714 0.9327557 0.5228963 1.355529 -0.16996192 1.859970 #> 6 new1 1.7857143 0.9964271 0.5872620 1.418941 -0.08560543 1.826824# }