In this vignette we illustrate how to correct p-values for multiple comparisons using the multcomp and emmeans packages.
We start by simulating some data for a binary longitudinal outcome:
set.seed(1234)
n <- 300 # number of subjects
K <- 4 # number of measurements per subject
t_max <- 15 # maximum follow-up time
# we constuct a data frame with the design:
# everyone has a baseline measurment, and then measurements at K time points
DF <- data.frame(id = rep(seq_len(n), each = K),
time = gl(K, 1, n*K, labels = paste0("Time", 1:K)),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ 1, data = DF)
betas <- c(-2.13, 1, rep(c(1.2, -1.2), K-1)) # fixed effects coefficients
D11 <- 1 # variance of random intercepts
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)))
# linear predictor
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))
We fit a mixed effects logistic regression for y
assuming random intercepts for the random-effects part, and the main effects of sex
and time
for the fixed-effects part.
fm <- mixed_model(fixed = y ~ sex + time, random = ~ 1 | id, data = DF,
family = binomial())
The uncorrected p-values for the 4 time points are give by the summary()
method:
summary(fm)
#>
#> Call:
#> mixed_model(fixed = y ~ sex + time, random = ~1 | id, data = DF,
#> family = binomial())
#>
#> Data Descriptives:
#> Number of Observations: 1200
#> Number of Groups: 300
#>
#> Model:
#> family: binomial
#> link: logit
#>
#> Fit statistics:
#> log.Lik AIC BIC
#> -583.4797 1178.959 1201.182
#>
#> Random effects covariance matrix:
#> StdDev
#> (Intercept) 1.448146
#>
#> Fixed effects:
#> Estimate Std.Err z-value p-value
#> (Intercept) -2.2865 0.2496 -9.1597 < 1e-04
#> sexfemale 0.8853 0.2448 3.6171 0.00029798
#> timeTime2 0.3066 0.2266 1.3533 0.17594625
#> timeTime3 -0.5049 0.2463 -2.0501 0.04035455
#> timeTime4 0.5633 0.2237 2.5184 0.01178997
#>
#> Integration:
#> method: adaptive Gauss-Hermite quadrature rule
#> quadrature points: 11
#>
#> Optimization:
#> method: hybrid EM and quasi-Newton
#> converged: TRUE
To perform the pairwise comparisons and obtain corrected p-values, we load the multcomp package and use the glht()
function. Because no specific methods exist for MixMod
object returned by mixed_model()
, we need to specify the vcov.
and coef.
arguments of glht()
, i.e.,
library("multcomp")
#> Loading required package: mvtnorm
#> Loading required package: survival
#> Loading required package: TH.data
#> Loading required package: MASS
#>
#> Attaching package: 'MASS'
#> The following object is masked from 'package:GLMMadaptive':
#>
#> negative.binomial
#>
#> Attaching package: 'TH.data'
#> The following object is masked from 'package:MASS':
#>
#> geyser
fm_mc <- glht(fm, linfct = mcp(time = "Tukey"),
vcov. = vcov(fm, "fixed"), coef. = fixef)
summary(fm_mc)
#>
#> Simultaneous Tests for General Linear Hypotheses
#>
#> Multiple Comparisons of Means: Tukey Contrasts
#>
#>
#> Fit: mixed_model(fixed = y ~ sex + time, random = ~1 | id, data = DF,
#> family = binomial())
#>
#> Linear Hypotheses:
#> Estimate Std. Error z value Pr(>|z|)
#> Time2 - Time1 == 0 0.3066 0.2266 1.353 0.52811
#> Time3 - Time1 == 0 -0.5049 0.2463 -2.050 0.16945
#> Time4 - Time1 == 0 0.5633 0.2237 2.518 0.05691 .
#> Time3 - Time2 == 0 -0.8115 0.2423 -3.350 0.00465 **
#> Time4 - Time2 == 0 0.2567 0.2165 1.186 0.63520
#> Time4 - Time3 == 0 1.0682 0.2406 4.440 < 0.001 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> (Adjusted p values reported -- single-step method)
We continue our illustration by including the interaction term between sex
and time
, and we focus on the difference between males and females for the various time points. We start by fitting the model:
gm <- mixed_model(fixed = y ~ sex * time, random = ~ 1 | id, data = DF,
family = binomial())
To compute the estimated log odds for males and females at the different time points we use the emmeans()
functions from the emmeans package:
library("emmeans")
gm_mc <- emmeans(gm, ~ sex | time)
gm_mc
#> time = Time1:
#> sex emmean SE df asymp.LCL asymp.UCL
#> male -2.737017 0.3421930 Inf -3.407703 -2.0663308
#> female -1.209182 0.2550117 Inf -1.708995 -0.7093678
#>
#> time = Time2:
#> sex emmean SE df asymp.LCL asymp.UCL
#> male -1.675620 0.2755058 Inf -2.215602 -1.1356387
#> female -1.435439 0.2629832 Inf -1.950877 -0.9200015
#>
#> time = Time3:
#> sex emmean SE df asymp.LCL asymp.UCL
#> male -4.398350 0.5452336 Inf -5.466988 -3.3297118
#> female -1.389421 0.2612568 Inf -1.901475 -0.8773671
#>
#> time = Time4:
#> sex emmean SE df asymp.LCL asymp.UCL
#> male -1.336933 0.2610534 Inf -1.848589 -0.8252781
#> female -1.252104 0.2564246 Inf -1.754687 -0.7495210
#>
#> Confidence level used: 0.95
The corresponding pairwise comparisons are performed by the pairs()
function:
pairs(gm_mc)
#> time = Time1:
#> contrast estimate SE df z.ratio p.value
#> male - female -1.52783517 0.4097558 Inf -3.729 0.0002
#>
#> time = Time2:
#> contrast estimate SE df z.ratio p.value
#> male - female -0.24018095 0.3664609 Inf -0.655 0.5122
#>
#> time = Time3:
#> contrast estimate SE df z.ratio p.value
#> male - female -3.00892881 0.5851176 Inf -5.142 <.0001
#>
#> time = Time4:
#> contrast estimate SE df z.ratio p.value
#> male - female -0.08482945 0.3550390 Inf -0.239 0.8112
For additional examples in testing interactions with the emmeans package check the vignette: vignette("interactions", package = "emmeans")
.