Methods for MixMod Objects

Dimitris Rizopoulos

2018-12-06

MixMod Objects

In this vignette we illustrate the use of a number of basic generic functions for models fitted by the mixed_model() function of package GLMMadaptive.

We start by simulating some data for a binary longitudinal outcome:

set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time

# we constuct a data frame with the design: 
# everyone has a baseline measurment, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
                 time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
                 sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))

# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)

betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
D22 <- 0.1 # variance of random slopes

# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))

We continue by fitting the mixed effects logistic regression for y assuming random intercepts and random slopes for the random-effects part.

fm <- mixed_model(fixed = y ~ sex * time, random = ~ time | id, data = DF,
                  family = binomial())

Detailed Output, Confidence Intervals, Covariance Matrix of the MLEs and Robust Standard Errors

As in the majority of model-fitting functions in R, the print() and summary() methods display a short and a detailed output of the fitted model, respectively. For 'MixMod' objects we obtain

fm
#> 
#> Call:
#> mixed_model(fixed = y ~ sex * time, random = ~time | id, data = DF, 
#>     family = binomial())
#> 
#> 
#> Model:
#>  family: binomial
#>  link: logit 
#> 
#> Random effects covariance matrix:
#>              StdDev    Corr
#> (Intercept)  0.7260        
#> time         0.2489  0.6256
#> 
#> Fixed effects:
#>    (Intercept)      sexfemale           time sexfemale:time 
#>   -2.071584799   -1.172034240    0.262702412    0.002493367 
#> 
#> log-Lik: -358.8284

and

summary(fm)
#> 
#> Call:
#> mixed_model(fixed = y ~ sex * time, random = ~time | id, data = DF, 
#>     family = binomial())
#> 
#> Data Descriptives:
#> Number of Observations: 800
#> Number of Groups: 100 
#> 
#> Model:
#>  family: binomial
#>  link: logit 
#> 
#> Fit statistics:
#>    log.Lik      AIC     BIC
#>  -358.8284 731.6568 749.893
#> 
#> Random effects covariance matrix:
#>              StdDev    Corr
#> (Intercept)  0.7260        
#> time         0.2489  0.6256
#> 
#> Fixed effects:
#>                Estimate Std.Err z-value p-value
#> (Intercept)     -2.0716  0.3199 -6.4755 < 1e-04
#> sexfemale       -1.1720  0.4722 -2.4819 0.01307
#> time             0.2627  0.0564  4.6613 < 1e-04
#> sexfemale:time   0.0025  0.0781  0.0319 0.97453
#> 
#> Integration:
#> method: adaptive Gauss-Hermite quadrature rule
#> quadrature points: 11
#> 
#> Optimization:
#> method: hybrid EM and quasi-Newton
#> converged: TRUE

The output is rather self-explanatory. However, just note that the fixed-effects coefficients are on the linear predictor scale, and hence are the corresponding log-odds for the intercept and log-odds ratios for the rest of the parameters. The summary() only shows the estimated coefficients, standard errors and p-values, but no confidence intervals. These can be separately obtained using the confint() method, i.e.,

exp(confint(fm))
#>                     2.5 %  Estimate    97.5 %
#> (Intercept)    0.06729962 0.1259860 0.2358477
#> sexfemale      0.12274967 0.3097362 0.7815624
#> time           1.16444341 1.3004397 1.4523190
#> sexfemale:time 0.86021510 1.0024965 1.1683115

By default the confidence intervals are produced for the fixed effects. Hence, taking the exp we obtain the confidence intervals for the corresponding odds-ratios. In addition, by default, the level of the confidence intervals is 95%. The following piece of code produces 90% confidence intervals for the variances of the random intercepts and slopes, and for their covariance:

confint(fm, parm = "var-cov", level = 0.90)
#>                         5 %   Estimate       95 %
#> var.(Intercept)  0.06950694 0.52709131 0.69278283
#> cov.(Int)_time  -0.04285652 0.11302036 0.47390226
#> var.time         0.02770139 0.06192966 0.05342255

The estimated variance-covariance matrix of the maximum likelihood estimates of all parameters is returned using the vcov() method, e.g.,

vcov(fm)
#>                 (Intercept)    sexfemale         time sexfemale:time
#> (Intercept)     0.102343645 -0.076210649 -0.008492563   6.175501e-03
#> sexfemale      -0.076210649  0.223010883  0.005931219  -1.767413e-02
#> time           -0.008492563  0.005931219  0.003176198  -2.954446e-03
#> sexfemale:time  0.006175501 -0.017674132 -0.002954446   6.099051e-03
#> D_11           -0.085880010 -0.020019772  0.008266537   1.722885e-03
#> D_12            0.017555327  0.004438214 -0.001891643  -4.506264e-04
#> D_22           -0.082121459 -0.020038676  0.010306830   8.189895e-05
#>                        D_11          D_12          D_22
#> (Intercept)    -0.085880010  0.0175553274 -8.212146e-02
#> sexfemale      -0.020019772  0.0044382136 -2.003868e-02
#> time            0.008266537 -0.0018916428  1.030683e-02
#> sexfemale:time  0.001722885 -0.0004506264  8.189895e-05
#> D_11            0.379264088 -0.0950463827  4.268870e-01
#> D_12           -0.095046383  0.0374304672 -1.872794e-01
#> D_22            0.426886977 -0.1872794290  1.058810e+00

The elements of this covariance matrix that correspond to the elements of the covariance matrix of the random effects (i.e., the elements D_xx) are on the log-Cholesky scale.

Robust standard errors using the sandwich estimator can be obtained by setting the logical argument sandwich to TRUE, i.e.,

vcov(fm, sandwich = TRUE)
#>                 (Intercept)    sexfemale         time sexfemale:time
#> (Intercept)     0.104351901 -0.071672817 -0.005555342   0.0023775477
#> sexfemale      -0.071672817  0.214705553  0.003147819  -0.0183462871
#> time           -0.005555342  0.003147819  0.002932681  -0.0026718810
#> sexfemale:time  0.002377548 -0.018346287 -0.002671881   0.0061708150
#> D_11           -0.085412338 -0.010902066  0.006386534   0.0033642442
#> D_12            0.021480846 -0.005633021 -0.001786834  -0.0002913227
#> D_22           -0.094979283  0.030120512  0.006798901   0.0025381625
#>                        D_11          D_12         D_22
#> (Intercept)    -0.085412338  0.0214808461 -0.094979283
#> sexfemale      -0.010902066 -0.0056330213  0.030120512
#> time            0.006386534 -0.0017868339  0.006798901
#> sexfemale:time  0.003364244 -0.0002913227  0.002538163
#> D_11            0.247885532 -0.0662560931  0.283959686
#> D_12           -0.066256093  0.0290155130 -0.147970165
#> D_22            0.283959686 -0.1479701654  0.868219167

The use of robust standard errors via the sandwich argument is also available in the summary(), confint(), anova(), marginal_coefs(), effectPlotData(), predict(), and simulate() methods.

Fixed and Random Effects Estimates

To extract the estimated fixed effects coefficients from a fitted mixed model, we can use the fixef() method. Similarly, the empirical Bayes estimates of the random effects are extracted using the ranef() method, and finally the coef() method returns the subject-specific coefficients, i.e., the sum of the fixed and random effects coefficients:

fixef(fm)
#>    (Intercept)      sexfemale           time sexfemale:time 
#>   -2.071584799   -1.172034240    0.262702412    0.002493367
head(ranef(fm))
#>   (Intercept)         time
#> 1  -0.4028803 -0.119484844
#> 2   0.7032286  0.284875825
#> 3   0.5090121  0.211931456
#> 4  -0.2303141 -0.004750287
#> 5   0.2429221  0.161494572
#> 6  -0.3855094 -0.182391588
head(coef(fm))
#>   (Intercept) sexfemale       time sexfemale:time
#> 1   -2.474465 -1.172034 0.14321757    0.002493367
#> 2   -1.368356 -1.172034 0.54757824    0.002493367
#> 3   -1.562573 -1.172034 0.47463387    0.002493367
#> 4   -2.301899 -1.172034 0.25795212    0.002493367
#> 5   -1.828663 -1.172034 0.42419698    0.002493367
#> 6   -2.457094 -1.172034 0.08031082    0.002493367

Marginalized Coefficients

The fixed effects estimates in mixed models with nonlinear link functions have an interpretation conditional on the random effects. However, often we wish to obtain parameters with a marginal / population averaged interpretation, which leads many researchers to use generalized estimating equations, and dealing with potential issues with missing data. Nonetheless, recently Hedeker et al. have proposed a nice solution to this problem. Their approach is implemented in function marginal_coefs(). For example, for model fm we obtain the marginalized coefficients using:

marginal_coefs(fm)
#>    (Intercept)      sexfemale           time sexfemale:time 
#>        -1.6011        -1.1226         0.1800         0.0506

The function calculates the marginal log odds ratios in our case (because we have a binary outcome) using a Monte Carlo procedure with number of samples determined by the M argument.

Standard errors for the marginalized coefficients are obtained by setting std_errors = TRUE in the call to marginal_coefs(), and require a double Monte Carlo procedure for which argument K comes also into play. To speed up computations, the outer Monte Carlo procedure is performed in parallel using package parallel and number of cores specified in the cores argument:

marginal_coefs(fm, std_errors = TRUE)
#>                Estimate Std.Err z-value  p-value
#> (Intercept)     -1.6011  0.2819 -5.6806  < 1e-04
#> sexfemale       -1.1226  0.3740 -3.0015 0.002687
#> time             0.1800  0.0346  5.1999  < 1e-04
#> sexfemale:time   0.0506  0.0392  1.2907 0.196811

Fitted Values and Residuals

The fitted() method extracts fitted values from the fitted mixed model. These are always on the scale of the response variable. The type argument of fitted() specifies the type of fitted values computed. The default is type = "mean_subject" which corresponds to the fitted values calculated using only the fixed-effects part of the linear predictor; hence, for the subject who has random effects values equal to 0, i.e., the “mean subject”:

head(fitted(fm))
#>         1         2         3         4         5         6 
#> 0.1118895 0.1156621 0.1647183 0.5815852 0.5940199 0.5950458

Setting type = "subject_specific" will calculate the fitted values using both the fixed and random effects parts, where for the latter the empirical Bayes estimates of the random effects are used:

head(fitted(fm, type = "subject_specific"))
#>          1          2          3          4          5          6 
#> 0.07766777 0.07914174 0.09707109 0.23765440 0.24276090 0.24318768

Finally, setting type = "marginal" will calculate the fitted values based on the multiplication of the fixed-effects design matrix with the marginalized coefficients described above (due to the required computing time, these fitted values are not displayed):

head(fitted(fm, type = "marginal"))

The residuals() method simply calculates the residuals by subtracting the fitted values from the observed repeated measurements outcome. Hence, this method also has a type argument with exactly the same options as the fitted() method.

Effect Plots

To display the estimated longitudinal evolutions of the binary outcome we can use an effect plot. This is simply predictions from the models with the corresponding 95% pointwise confidence intervals.

As a first step we create a data frame the provides the setting based on which the plot is to be produced; function expand.grid() is helpful in this regard:

nDF <- with(DF, expand.grid(time = seq(min(time), max(time), length.out = 15),
                            sex = levels(sex)))

Next we use the effectPlotData() function that does the heavy lifting, i.e., calculates the predictions and confidence intervals from a fitted mixed model for the data frame provided above, i.e.,

plot_data <- effectPlotData(fm, nDF)

Then we can produce the plot using for example the xyplot() function from package lattice, e.g.,

library("lattice")
xyplot(pred + low + upp ~ time | sex, data = plot_data,
       type = "l", lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 2,
       xlab = "Follow-up time", ylab = "log odds")

expit <- function (x) exp(x) / (1 + exp(x))
xyplot(expit(pred) + expit(low) + expit(upp) ~ time | sex, data = plot_data,
       type = "l", lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 2,
       xlab = "Follow-up time", ylab = "Subject-Specific Probabilities")

The effectPlotData() function also allows to compute marginal predictions using the marginalized coefficients described above. This is achieved by setting marginal = TRUE in the respective call:

plot_data_m <- effectPlotData(fm, nDF, marginal = TRUE, cores = 4)

# we put the two groups in the same panel
my.panel.bands <- function(x, y, upper, lower, fill, col, subscripts, ..., font, 
                           fontface) {
    upper <- upper[subscripts]
    lower <- lower[subscripts]
    panel.polygon(c(x, rev(x)), c(upper, rev(lower)), col = fill, border = FALSE, ...)
}

xyplot(expit(pred) ~ time, group = sex, data = plot_data_m, 
       upper = expit(plot_data_m$upp), low = expit(plot_data_m$low), 
       type = "l", col = c("blue", "red"), 
       fill = c("#0000FF80", "#FF000080"),
       panel = function (x, y, ...) {
           panel.superpose(x, y, panel.groups = 'my.panel.bands', ...)
           panel.xyplot(x, y, lwd = 2,  ...)
}, xlab = "Follow-up time", ylab = "Marginal Probabilities")

Comparing Two Models

The anova() method can be used to compare two fitted mixed models using a likelihood ratio test. For example, we test if we can test the null hypothesis that the covariance between the random intercepts and slopes is equal to zero using

gm <- mixed_model(fixed = y ~ sex * time, random = ~ time || id, data = DF,
                  family = binomial())

anova(gm, fm)
#> 
#>       AIC    BIC log.Lik  LRT df p.value
#> gm 730.94 746.57 -359.47                
#> fm 731.66 731.66 -358.83 1.29  1  0.2566

Predictions

Using the predict() method we can calculate predictions for new subjects. As an example, we treat subject 1 from the DF dataset as a new patient (in the code below we change his id variable):

pred_DF <- DF[DF$id == 1, ][1:4, ]
pred_DF$id <- paste0("N", as.character(pred_DF$id))
pred_DF
#>   id      time  sex y
#> 1 N1 0.0000000 male 0
#> 2 N1 0.1424363 male 0
#> 3 N1 1.7055512 male 0
#> 4 N1 9.1391210 male 0

We start by computing predictions based only on the fixed-effects part of the model; because of the nonlinear link function, these predictions are of subjects with random effects value equal to zero, which is not to the average predictions:

predict(fm, newdata = pred_DF, type_pred = "response",
        type = "mean_subject", se.fit = TRUE)
#> $pred
#>         1         2         3         4 
#> 0.1118895 0.1156621 0.1647183 0.5815852 
#> 
#> $se.fit
#>         1         2         3         4 
#> 0.3199119 0.3162100 0.2874263 0.4608707

Population averaged predictions can be obtained by first calculating the marginalized coefficients (using marginal_Coefs()) and multiplying them with the fixed effects design matrix; this is achieved using the option type = "marginal" (due to the required computing time, predictions not shown):

predict(fm, newdata = pred_DF, type_pred = "response",
        type = "marginal", se.fit = FALSE)

Finally, we calculate subject-specific predictions; the standard errors are calculated with a Monte Carlo scheme (for details check the online help file):

predict(fm, newdata = pred_DF, type_pred = "response",
        type = "subject_specific", se.fit = TRUE)
#> $pred
#>          1          2          3          4 
#> 0.07623721 0.07729536 0.08982069 0.17702658 
#> 
#> $se.fit
#>          1          2          3          4 
#> 0.02789734 0.02809443 0.03218202 0.11386294 
#> 
#> $low
#>          1          2          3          4 
#> 0.03879535 0.03989853 0.04526640 0.04706515 
#> 
#> $upp
#>         1         2         3         4 
#> 0.1472472 0.1478227 0.1662718 0.4965264 
#> 
#> $success_rate
#> [1] 0.5033333

Suppose now that we want predictions at time points at which no responses y have been recorded, e.g.,

future_Times <- pred_DF[1:3, c("id", "time", "sex")]
future_Times$time <- c(3, 4, 10)
future_Times
#>   id time  sex
#> 1 N1    3 male
#> 2 N1    4 male
#> 3 N1   10 male

Predictions at these time points can be calculated by provide this data frame in the newdata2 argument of predict():

predict(fm, newdata = pred_DF, newdata2 = future_Times, type_pred = "response",
        type = "subject_specific", se.fit = TRUE)
#> $pred
#>          1          2          3          4 
#> 0.07623721 0.07729536 0.08982069 0.17702658 
#> 
#> $pred2
#>         1         2         3 
#> 0.1015482 0.1115191 0.1905586 
#> 
#> $se.fit
#>          1          2          3          4 
#> 0.02789734 0.02809443 0.03218202 0.11386294 
#> 
#> $se.fit2
#>          1          2          3 
#> 0.03879429 0.04618988 0.12848454 
#> 
#> $low
#>          1          2          3          4 
#> 0.03879535 0.03989853 0.04526640 0.04706515 
#> 
#> $upp
#>         1         2         3         4 
#> 0.1472472 0.1478227 0.1662718 0.4965264 
#> 
#> $low2
#>          1          2          3 
#> 0.04976927 0.04925355 0.04708840 
#> 
#> $upp2
#>         1         2         3 
#> 0.1919316 0.2181698 0.5587416

Simulate

The simulate() method can be used to simulate response outcome data from a fitted mixed model. For example, we simulate two realization of our dichotomous outcome:

head(simulate(fm, nsim = 2, seed = 123), 10)
#>       [,1] [,2]
#>  [1,]    1    0
#>  [2,]    0    0
#>  [3,]    1    0
#>  [4,]    1    1
#>  [5,]    1    1
#>  [6,]    1    0
#>  [7,]    1    1
#>  [8,]    1    0
#>  [9,]    0    0
#> [10,]    0    0

By setting acount_MLEs_var = TRUE in the call to simulate() we also account for the variability in the maximum likelihood estimates in the simulation of new responses. This is achieved by simulating each time a new response vector using a realization for the parameters from a multivariate normal distribution with mean the MLEs and covariance matrix the covariance matrix of the MLEs:

head(simulate(fm, nsim = 2, seed = 123, acount_MLEs_var = TRUE), 10)
#>       [,1] [,2]
#>  [1,]    0    0
#>  [2,]    1    0
#>  [3,]    1    0
#>  [4,]    0    0
#>  [5,]    0    0
#>  [6,]    1    0
#>  [7,]    0    0
#>  [8,]    1    0
#>  [9,]    0    0
#> [10,]    0    1