Linear Mixed Models — Comprehensive Reference Cheatsheet
West, Welch & Gałecki · 3rd Edition · With contributions from Gillespie
§1 · Core Concepts & Motivation
Why LMM?
- Data are clustered / nested / grouped — violating OLS independence assumption
- Handles unbalanced designs (unequal group sizes, missing data) gracefully
- Accounts for within-group correlation explicitly
- Allows both subject-specific and population-average inference
- Handles repeated measures and longitudinal data correctly
- Flexible modeling of heterogeneous variance
General LMM Equation
Matrix form (Laird & Ware, 1982)
y = Xβ + Zb + ε
| Symbol | Meaning |
|---|---|
y | n×1 response vector |
X | n×p design matrix (fixed effects) |
β | p×1 fixed-effect parameter vector |
Z | n×q design matrix (random effects) |
b | q×1 random-effect vector, b ~ N(0, G) |
ε | n×1 residual vector, ε ~ N(0, R) |
Marginal distribution
y ~ N(Xβ, ZGZᵀ + R)
≡ N(Xβ, V)
Key Assumptions
- Normality: b ~ N(0, G) and ε ~ N(0, R)
- Independence: b ⊥ ε (random effects independent of residuals)
- Linearity: E[y|X, Z] = Xβ + Zb
- Homoscedasticity: unless modeled otherwise in R
- Correct covariance structure for R and G
- No measurement error in X and Z
Violations of normality are less critical for fixed effects (CLT applies) but matter more for BLUP accuracy
Estimation Methods
| Method | Full Name | Use When |
|---|---|---|
ML | Maximum Likelihood | Comparing models with different fixed effects |
REML | Restricted ML | Comparing models with same fixed effects; default for variance estimation |
BLUP | Best Linear Unbiased Predictor | Predicting random effects (Eb|y) |
BLUE | Best Linear Unbiased Estimator | Fixed effects: β̂ = (XᵀV⁻¹X)⁻¹XᵀV⁻¹y |
REML cannot be used to compare models differing in fixed effects — use ML for that
Marginal vs. Conditional
Marginal (Population-Average) Model
Describe average response in population; random effects integrated out. Focus on Xβ. Useful for policy, group comparisons.
Conditional (Subject-Specific) Model
Describe response for a specific subject/cluster given their random effects b. Focus on Xβ + Zb. Useful for individual predictions.
LMMs are naturally both — β gives population-level inference, BLUPs give individual predictions
Terminology Map
| Context | = LMM Term |
|---|---|
| HLM Level-1 residuals | ε (within-cluster) |
| HLM Level-2 residuals | b (between-cluster) |
| Random intercept model | Z = 1, G = σ²_b (scalar) |
| Growth curve model | Z includes time; random slope |
| Repeated measures ANOVA | Special case of LMM with compound symmetry |
| Split-plot design | 2-level LMM with whole-plot & subplot error |
§2 · Fixed vs. Random Effects
Fixed Effects (β)
- Parameters that are constant across the population
- Levels are exhaustive — we care about those specific levels
- Inference is to those specific levels only
- Estimated by GLS: β̂ = (XᵀV⁻¹X)⁻¹XᵀV⁻¹y
- Examples: treatment group, time point, age, sex, dose
- Can include interactions between fixed predictors
Var(β̂) = (XᵀV⁻¹X)⁻¹
[estimated by plugging in V̂]
Random Effects (b)
- Factor levels are a random sample from a larger population
- Inference extends to the entire population of levels
- Not "estimated" — predicted via BLUPs (shrinkage estimators)
- Characterized by their variance components (G matrix)
- Examples: subjects, schools, clinics, batches, farms
- Random effects shrink toward 0 — borrow strength
BLUP formula
b̂ = GZᵀV⁻¹(y - Xβ̂)
Decision Guide: Fixed or Random?
| Question | Fixed | Random |
|---|---|---|
| Levels exhaustive? | Yes ✓ | No (sample) |
| Inference to other levels? | No | Yes ✓ |
| Number of levels | Any | Prefer ≥ 5–6 |
| Interest in variance? | No | Yes ✓ |
| Predict individual effects? | No | Yes ✓ |
| Treatment conditions | Yes ✓ | Rare |
| Blocking factor | Sometimes | Preferred ✓ |
G Matrix — Random Effect Covariance
Random intercept only
G = σ²_u · I
Random intercept + slope (unstructured)
G = | σ²_u₀ σ_u₀u₁ |
| σ_u₀u₁ σ²_u₁ |
Multiple random effects (general)
G = Cov(b) [q × q positive definite matrix]
σ²_u₀: variance of random interceptsσ²_u₁: variance of random slopesσ_u₀u₁: covariance between intercepts and slopes
§3 · Covariance Structures (R matrix)
Independence (ID)
R = σ²·I
- Default; residuals uncorrelated
- Valid for simple random intercept models
- 1 parameter: σ²
Compound Symmetry (CS)
Var(y_ij) = σ² + σ²_b
Cov(y_ij, y_ik) = σ²_b [j ≠ k]
- All pairs equally correlated: ρ = σ²_b/(σ²_b + σ²)
- Implied by pure random intercept model
- Equivalent to repeated measures ANOVA
- 2 parameters: σ², σ²_b
Unstructured (UN)
R = Σ [free n×n pos.def. matrix]
- Most flexible; separate variance + all covariances
- Parameters: n(n+1)/2 — grows rapidly
- Risk of non-convergence with small samples
- Use when no theoretical structure assumed
AR(1) — Autoregressive
Corr(y_ij, y_ik) = ρ^|j-k|
Cov = σ² · ρ^|t_j - t_k|
- Correlation decays with time lag
- Appropriate for equally-spaced longitudinal data
- 2 parameters: σ², ρ (|ρ| < 1)
- Parsimonious; popular in practice
Toeplitz (TOEP)
Corr(y_ij, y_ik) = ρ_|j-k|
[separate ρ per lag]
- Correlation depends on lag only
- More flexible than AR(1)
- Parameters: σ² + (n−1) correlation params
- Between CS and UN in complexity
Heterogeneous Structures
CSH: Var(y_ij) = σ²_j [heterogeneous CS]
ARH(1): separate σ²_j + AR(1) correlation
- Allow variance to differ by time point / group
- Suffix H = heterogeneous version
- More parameters but handle unequal variances
Covariance Structure Selection Strategy
AIC — Akaike Information Criterion
AIC = -2ℓ + 2p
BIC — Bayesian (Schwarz) Information Criterion
BIC = -2ℓ + p·ln(n)
- Fit candidate structures with same fixed effects using REML
- Compare AIC/BIC — smaller is better
- Use LRT only for nested covariance structures
- For non-nested structures: use AIC/BIC only
- Prefer parsimony when AIC differences are small (<2–4)
| Structure | Params | Nested in UN? |
|---|---|---|
| ID | 1 | Yes |
| CS | 2 | Yes |
| AR(1) | 2 | Yes |
| TOEP | n | Yes |
| UN | n(n+1)/2 | — |
| CSH | n+1 | Yes |
§4 · Intraclass Correlation Coefficient (ICC)
ICC Definition & Interpretation
2-level random intercept model
ICC = ρ = σ²_u / (σ²_u + σ²_ε)
- Proportion of total variance due to between-cluster differences
- = correlation between two observations in the same cluster
- Range: 0 ≤ ICC ≤ 1
- ICC = 0 → no clustering; OLS is fine
- ICC = 1 → all variance between clusters
Typical ICC values: education 0.05–0.30; clinical 0.01–0.05; psychology 0.10–0.50
ICC for 3-Level Models
3-level: students in classes in schools
ICC₂ = σ²_v / (σ²_v + σ²_u + σ²_ε)
ICC₃ = (σ²_v + σ²_u) / (σ²_v + σ²_u + σ²_ε)
| ICC | Numerator | Interpretation |
|---|---|---|
| ICC₂ | σ²_v | % variance at level 3 (school) |
| ICC₃ | σ²_v + σ²_u | % variance at levels 2+3 |
Correlation between level-1 units in same level-2 cluster
ρ₂ = σ²_u / (σ²_u + σ²_ε) [within school, different class]
Same level-2 and level-3 cluster
ρ₃ = (σ²_v + σ²_u) / (σ²_v + σ²_u + σ²_ε)
Design Effect (Deff)
Effective sample size reduction
Deff = 1 + (n_j - 1) · ICC
Effective N
n_eff = N / Deff
- n_j = average cluster size
- Deff > 1 → clustered sample less efficient than simple random
- Ignoring ICC inflates Type I error
- Used for sample size planning
Even ICC = 0.05 with cluster size 20 → Deff = 1.95 (halves effective N)
§5 · Two-Level Models (Hierarchical / Nested)
Model Taxonomy — 2 Levels
MODEL 1 — Unconditional Means (Null)
y_ij = γ_00 + u_0j + e_ij
No predictors. Partitions variance. Compute ICC.
↓
MODEL 2 — Random Intercept, Level-1 predictor
y_ij = γ_00 + γ_10·X_ij + u_0j + e_ij
Fixed slope for X; random intercept across clusters
↓
MODEL 3 — Random Intercept + Level-2 predictor
y_ij = γ_00 + γ_01·W_j + γ_10·X_ij + u_0j + e_ij
W_j explains variance in intercepts across clusters
↓
MODEL 4 — Random Slopes
y_ij = γ_00 + γ_10·X_ij + u_0j + u_1j·X_ij + e_ij
Slope of X varies across clusters; test cov(u_0j, u_1j)
↓
MODEL 5 — Cross-Level Interaction
y_ij = γ_00 + γ_01·W_j + γ_10·X_ij + γ_11·W_j·X_ij + u_0j + u_1j·X_ij + e_ij
W_j moderates effect of X on y (slope-as-outcome)
HLM Notation — 2 Levels
Level 1 (within cluster — observations)
y_ij = β_0j + β_1j·X_ij + e_ij
e_ij ~ N(0, σ²)
Level 2 (between clusters)
β_0j = γ_00 + γ_01·W_j + u_0j
β_1j = γ_10 + γ_11·W_j + u_1j
⎡u_0j⎤ ~ N(0, G) = N(0, ⎡τ_00 τ_01⎤)
⎣u_1j⎦ ⎣τ_01 τ_11⎦
Combined (reduced-form) equation
y_ij = γ_00 + γ_01·W_j + γ_10·X_ij
+ γ_11·W_j·X_ij
+ u_0j + u_1j·X_ij + e_ij
HLM notation and LMM matrix notation are equivalent — same model, different presentation
KEY VARIANCE COMPONENTS
σ²= level-1 residual variance (within clusters)τ_00= variance of random intercepts (between clusters)τ_11= variance of random slopesτ_01= covariance of intercepts and slopes
Centering Strategies
Grand Mean Centering (GMC)
X*_ij = X_ij - X̄..
- Interpret intercept as group mean at avg predictor
- Fixed-effect estimate = total effect (within + between)
- Use for: macro-level questions, cross-level interactions
- Random intercept now = deviation from grand mean
Group Mean Centering (CWC)
X*_ij = X_ij - X̄_j
- Removes between-group variation from level-1 predictor
- Within-group effect only
- Add W̄_j (group mean) at level 2 to recover between effect
- Separates within from between effects cleanly
- Preferred for individual-level questions
No Centering
X*_ij = X_ij
- Intercept = predicted y when X = 0 (meaningful only if 0 in range)
- Estimate = total effect conflated
- OK if 0 is natural/meaningful (e.g., time = 0 at baseline)
- Avoid for cross-level interactions — creates multicollinearity
Centering choice affects interpretation of intercept variance (τ_00)!
§6 · Three-Level Models
3-Level HLM Notation
Level 1 — within clusters (e.g., students)
y_ijk = π_0jk + π_1jk·a_ijk + e_ijk
e_ijk ~ N(0, σ²)
Level 2 — classroom (j in school k)
π_0jk = β_00k + β_01k·X_jk + r_0jk
π_1jk = β_10k + β_11k·X_jk + r_1jk
⎡r_0jk⎤ ~ N(0, T_π)
⎣r_1jk⎦
Level 3 — school (k)
β_00k = γ_000 + γ_001·W_k + u_00k
β_10k = γ_100 + γ_101·W_k + u_10k
⎡u_00k⎤ ~ N(0, T_β)
⎣u_10k⎦
3-Level Variance Components
| Component | Symbol | Represents |
|---|---|---|
| Level-1 variance | σ² | Within-classroom variation |
| Level-2 intercept var | τ²_π00 | Between-classroom, within-school |
| Level-2 slope var | τ²_π11 | Slope variance at classroom level |
| Level-3 intercept var | τ²_β00 | Between-school variation in means |
| Level-3 slope var | τ²_β10 | Slope variance at school level |
ICC at level 2 (classroom)
ICC₂ = τ²_π00 / (τ²_π00 + τ²_β00 + σ²)
ICC at level 3 (school)
ICC₃ = τ²_β00 / (τ²_π00 + τ²_β00 + σ²)
Cross-Level Interactions — 3 Levels
- L1 × L2: Level-1 predictor moderated by level-2 variable (most common)
- L1 × L3: Level-1 predictor moderated by level-3 variable
- L2 × L3: Level-2 predictor moderated by level-3 variable
- L1 × L2 × L3: Three-way cross-level interaction (rare)
Example: L1 × L3 cross-level interaction
y_ijk = γ_000 + γ_100·a_ijk + γ_001·W_k
+ γ_101·W_k·a_ijk + ...
Always include lower-order terms when modeling cross-level interactions
3-Level vs. 2-Level: When to Use
| Condition | Action |
|---|---|
| Data has 3 natural levels | Use 3-level model |
| ICC₃ ≈ 0 (negligible school variance) | Collapse to 2-level (ignore school) |
| Very few level-3 units (<5) | Treat level 3 as fixed effect |
| Predictors at all 3 levels | 3-level needed |
| Cross-level interactions involving L3 | 3-level needed |
| Research Q about L3 variance | 3-level needed |
§7 · Hypothesis Testing
Tests for Fixed Effects
t-test (single parameter)
t = β̂_p / SE(β̂_p) ~ t_df [Satterthwaite or KR df]
H₀: β_p = 0
F-test (joint test of multiple parameters)
F = (Lβ̂)ᵀ [L(XᵀV⁻¹X)⁻¹Lᵀ]⁻¹ (Lβ̂) / r
~ F(r, df) [r = rank of L]
H₀: Lβ = 0
Likelihood Ratio Test (nested models, ML)
LRT = -2(ℓ_reduced - ℓ_full) ~ χ²_df
df = difference in # fixed effect parameters
Use Kenward-Roger (KR) or Satterthwaite df approximation for small samples — avoid asymptotic df
Tests for Random Effects / Variance Components
LRT for variance components (REML)
LRT = -2(ℓ_REML_reduced - ℓ_REML_full)
H₀: σ²_u = 0 (e.g., no random intercept needed)
- Null hypothesis on the boundary of parameter space (σ² ≥ 0)
- Standard χ² is too conservative — p-value should be halved
- For single variance: use mixture χ² = ½χ²₀ + ½χ²₁
- For multiple df: corrections more complex (Self & Liang)
Wald test for variance (less preferred)
Z = σ̂²_u / SE(σ̂²_u) ~ N(0,1) [approximate]
Always use REML (not ML) when comparing models that differ only in random effects structure
Degrees of Freedom Approximations
| Method | When to Use | Notes |
|---|---|---|
| Satterthwaite | Default in SAS PROC MIXED, R lmerTest | Fast; adequate for most cases |
| Kenward-Roger | Small samples, complex designs | More accurate; adjusts SE too |
| Between-Within | Simple balanced designs | Splits df into between/within subject |
| Containment | Nested designs | Classic ANOVA-like approach |
| Residual | Large samples only | Uses n-p df; OK when N huge |
Type I / II / III SS Tests
| Type | Tests | Use When |
|---|---|---|
| Type I | Sequential — order matters | A priori hierarchical models |
| Type II | Each effect adjusted for all others at same or lower level | No interaction present |
| Type III | Each effect adjusted for ALL others including interactions | Default; unbalanced data with interactions |
Type III tests require correct sum-to-zero (or GLM) contrasts. Always verify coding scheme.
Kenward-Roger Adjustment
- Adjusts both the F-statistic and the denominator df
- Accounts for uncertainty in estimating variance components
- Uses a scaled Wald statistic with inflated standard errors
- Preferred when: small number of clusters, complex covariance, REML
- Available in SAS PROC MIXED (DDFM=KR), R
pbkrtest,lmerTest
KR adjusted test statistic
F_KR = (Lβ̂)ᵀ Φ̂⁻¹ (Lβ̂) / r ~ F(r, ν_KR)
[Φ̂ = KR-adjusted covariance of Lβ̂]
LRT: ML vs. REML Rules
| Comparison | Method | Justification |
|---|---|---|
| Fixed effects differ | ML | REML ℓ not comparable across fixed effects |
| Random effects differ, same fixed | REML | REML accounts for fixed-effect uncertainty |
| Nested covariance structures | REML | Same fixed effects — valid comparison |
| AIC/BIC comparison | REML (same fixed) | Comparable on same scale |
§8 · Contrasts & Estimable Functions
Types of Contrasts
| Type | Definition | Example |
|---|---|---|
| Pairwise | Compare two means: μ_i - μ_j | Treatment A vs. B |
| Helmert | Each level vs. mean of subsequent | Time 1 vs. avg(T2,T3) |
| Polynomial | Linear, quadratic, cubic trends | Dose-response curve |
| Deviation | Each level vs. grand mean | Factor effect coding |
| Simple | Each vs. reference category | Dummy coding |
| User-defined | Custom L matrix | Any specific hypothesis |
Estimable Functions & L Matrix
Contrast: scalar hypothesis
Lβ = c where L is a 1×p row vector
H₀: Lβ = 0
Estimate: Lβ̂ = c̃
SE = √[L(XᵀV⁻¹X)⁻¹Lᵀ]
Multiple contrasts simultaneously
H₀: Lβ = 0 (L is r×p, rank r)
F = (Lβ̂)ᵀ[L·Var(β̂)·Lᵀ]⁻¹(Lβ̂) / r
A function Lβ is estimable if L is in the row space of X (i.e., L = LHX for hat matrix H_X)
Planned vs. Post-hoc Comparisons
Planned (a priori) Contrasts
Specified before seeing data. No need to adjust for multiple comparisons if contrasts are orthogonal and pre-specified. Use t or F tests directly.
Post-hoc (exploratory) Comparisons
Adjust for multiple testing. Choose from: Bonferroni, Tukey HSD (all pairwise), Scheffé (any contrast), Holm (step-down), Benjamini-Hochberg (FDR).
Multiple Comparison Adjustments
| Method | Controls | Best For |
|---|---|---|
| Bonferroni | FWER | Few planned comparisons; conservative |
| Holm | FWER | More powerful than Bonferroni |
| Tukey HSD | FWER (all pairs) | All pairwise means; balanced |
| Scheffé | FWER (any contrast) | Exploratory; very conservative |
| BH (FDR) | FDR | Many comparisons; genomics, etc. |
| Simulation | FWER (exact) | Complex designs; SAS SIMULATE |
Least Squares Means (LSMeans) in LMMs
- LSMeans = population marginal means — average over nuisance factors
- = Lβ̂ for specific L matrix averaging over covariates at their means
- Account for imbalance in the design (unlike simple cell means)
- Default in SAS PROC MIXED (LSMEANS statement)
- In R: use
emmeanspackage
LSMean for group i
μ̂_i = L_i β̂
where L_i averages over all other factors
Pairwise difference
μ̂_i - μ̂_j = (L_i - L_j)β̂
SE² = (L_i-L_j) Var(β̂) (L_i-L_j)ᵀ
LSMeans are preferred over raw means when the design is unbalanced or when covariates are present
Adjust for multiple LSMean comparisons
LSMEANS trt / ADJUST=TUKEY;
LSMEANS trt / ADJUST=BON;
LSMEANS trt / PDIFF;
§9 · Hierarchical Model Building Process
2-Level Model Building Steps
- Unconditional Means Model (Null) — No predictors. Fit with REML. Compute ICC = σ²_u / (σ²_u + σ²_e). Justifies multilevel approach if ICC > 0.
- Unconditional Growth / Level-1 Predictors — Add level-1 predictors (e.g., time) with fixed effects. Keep random intercepts. REML. Compare AIC/BIC to Step 1.
- Random Slopes — Allow slopes of key level-1 predictors to vary randomly. Test variance τ_11 using LRT (REML). Keep if significant or theoretically motivated.
- Level-2 Predictors — Add group-level (level-2) predictors to explain intercept (and slope) variance. Switch to ML for LRT comparing fixed effects.
- Cross-Level Interactions — Test whether level-2 variables moderate level-1 effects (slope-as-outcome). Use ML for model comparison.
- Final Model — Refit with REML for final variance component estimates. Report fixed effects, variance components, ICC, fit indices.
3-Level Model Building Steps
- Unconditional Means (3-level null) — Fit with three random intercepts (level 1, 2, 3). Compute ICC₂ and ICC₃. REML.
- Level-1 Predictors — Add individual-level (L1) predictors. Test fixed slopes. Consider centering. REML.
- Random Slopes at Level 2 — Allow L1 slopes to vary across L2 units. Test τ²_π11 with LRT (REML). Half the χ² p-value.
- Random Slopes at Level 3 — Allow L1 slopes to vary across L3 units. Test τ²_β10 with LRT (REML).
- Level-2 Predictors — Add classroom-level predictors (L2) to explain L2 variance in intercepts and slopes. ML for LRT.
- Level-3 Predictors — Add school-level predictors (L3). ML for comparison.
- Cross-Level Interactions — L1×L2, L1×L3, L2×L3. ML for comparison.
- Final Refit with REML — Final estimates and inference. Report all variance components at each level.
Model Comparison Toolkit
| Tool | Formula | Use |
|---|---|---|
| AIC | -2ℓ + 2k | Non-nested; smaller = better |
| BIC | -2ℓ + k·ln(n) | Penalizes params more; smaller = better |
| LRT | -2Δℓ ~ χ²_Δk | Nested models only |
| Pseudo-R² | 1 - σ²_reduced/σ²_null | Variance explained at each level |
| ΔAIC < 2 | — | Models essentially equivalent |
| ΔAIC 2–7 | — | Some support for better model |
| ΔAIC > 10 | — | Clear preference for better model |
Pseudo-R² at Each Level
Snijders & Bosker (1994) — Level-1 R²
R²₁ = 1 - (σ²_new + τ²_new) / (σ²_null + τ²_null)
Level-2 R² (variance in intercepts explained)
R²₂ = 1 - τ²_new / τ²_null
Proportion of variance explained (each level)
PVE_L1 = (σ²_null - σ²_model) / σ²_null
PVE_L2 = (τ²_null - τ²_model) / τ²_null
Pseudo-R² can decrease when adding predictors in LMMs. Use cautiously and do not interpret as OLS R².
§10 · Longitudinal Data Analysis
Longitudinal LMM Framework
Individual growth curve model
y_ij = (β_0 + u_0i) + (β_1 + u_1i)·t_ij
+ β_2·X_i + β_3·X_i·t_ij + ε_ij
| Term | Meaning |
|---|---|
β_0 + u_0i | Individual baseline (random intercept) |
β_1 + u_1i | Individual growth rate (random slope) |
β_2 | Between-person covariate effect at baseline |
β_3 | Covariate × time interaction (moderates growth) |
ε_ij | Within-person residual (serial correlation) |
Center time at baseline (t=0) so the intercept = baseline mean, and random intercept variance = between-person baseline variability
Time Specification Strategies
| Time Model | Code | When |
|---|---|---|
| Linear | time | Constant rate of change |
| Quadratic | time + time² | Accelerating/decelerating change |
| Cubic | time + time² + time³ | More complex nonlinearity |
| Piecewise | Splines / dummy slopes | Change point exists |
| Categorical | Factor(time) | Unstructured change pattern |
| Log time | log(time) | Rapid early change, plateau |
Piecewise linear (change at time t*)
y_ij = β_0 + β_1·t_ij + β_2·(t_ij - t*)_+ + u_0i + u_1i·t_ij + ε_ij
Choosing R Matrix for Longitudinal Data
Start with
Random intercept + slope (models within-person correlation implicitly)
↓ if residuals still correlated
Add serial correlation to R
AR(1) for equally-spaced time; SP(POW) for unequal spacing
↓ if variances change over time
Allow heterogeneous variance
ARH(1) or UN structure; or model variance as function of time
↓ fully unstructured
Unstructured (UN)
Separate variance at each time + all covariances; most general
Handling Missing Data in Longitudinal LMMs
| Missing Type | Definition | LMM Valid? |
|---|---|---|
| MCAR | Missing completely at random | Yes ✓ |
| MAR | Missing at random (given observed data) | Yes ✓ (REML) |
| MNAR | Missing not at random (informative) | Biased — needs selection/pattern mixture model |
- LMMs use all available data — no listwise deletion needed
- Handles unbalanced time points naturally
- Each person can have different observation times
- Valid under MAR; check dropout patterns
Test MAR assumption: Does time of dropout predict outcomes? Use pattern mixture models if MNAR suspected.
Time-Varying vs. Time-Invariant Covariates
| Type | Varies? | Level | Example |
|---|---|---|---|
| Time-invariant | No | Between-person (L2) | Sex, baseline age, group |
| Time-varying | Yes | Within-person (L1) | Current BMI, mood, medication |
| Time itself | Yes | L1 (in 2-level) | Week, day, visit number |
Time-varying covariate decomposition (Curran & Bauer)
X_ij = X̄_i + (X_ij - X̄_i)
= between-person + within-person component
Center time-varying covariates within-person to separate within from between effects
Autoregressive / Transition Models
Lagged response as predictor
y_ij = β_0 + β_1·y_i(j-1) + β_2·X_ij + u_0i + ε_ij
- Regress current outcome on previous value
- β_1 = autoregressive coefficient (stability)
- Accounts for serial dependence explicitly
- Useful for studying short-term change
- Biased if u_0i correlated with lagged y — use system GMM or within-person centering
Including lagged y as predictor and random intercept simultaneously can cause Nickell bias in short panels
§11 · Software Reference (SAS, R, Stata, SPSS)
SAS PROC MIXED — Key Syntax
Basic 2-level random intercept
PROC MIXED DATA=mydata METHOD=REML;
CLASS school;
MODEL score = time group time*group / SOLUTION DDFM=KR;
RANDOM INTERCEPT / SUBJECT=school TYPE=VC;
REPEATED / SUBJECT=school TYPE=AR(1);
LSMEANS group / PDIFF ADJUST=TUKEY;
ESTIMATE 'Trt vs Ctrl at T1' group 1 -1 time*group 1 0 -1 0;
RUN;
Random slope model
RANDOM INTERCEPT time / SUBJECT=id TYPE=UN;
[TYPE=UN allows covariance between intercept and slope]
3-level: students in classes in schools
RANDOM INTERCEPT / SUBJECT=school;
RANDOM INTERCEPT / SUBJECT=class(school);
R — lme4 & nlme Syntax
lme4 — random intercept
library(lme4); library(lmerTest)
m1 <- lmer(score ~ time*group + (1|school),
data=d, REML=TRUE)
summary(m1) # KR df via lmerTest
lme4 — random intercept + slope
m2 <- lmer(score ~ time + (1 + time|id), data=d)
m2b <- lmer(score ~ time + (1|id) + (0+time|id), data=d)
# m2b = uncorrelated random effects
nlme — custom covariance structure
library(nlme)
m3 <- lme(score ~ time, random = ~1|school,
correlation = corAR1(form=~time|id),
data=d, method="REML")
emmeans for LSMeans / contrasts
library(emmeans)
emmeans(m1, ~ group | time)
pairs(emmeans(m1, ~ group), adjust="tukey")
Key R Functions Cheatsheet
| Function | Package | Purpose |
|---|---|---|
lmer() | lme4 | Fit LMM (no p-values by default) |
lme() | nlme | LMM with flexible covariance |
summary() | base | Fixed effects, variance components |
anova(m1, m2) | base | LRT between nested models |
AIC(), BIC() | base | Information criteria |
emmeans() | emmeans | LSMeans, contrasts |
icc() | performance | ICC calculation |
r2() | performance | Conditional/marginal R² |
check_model() | performance | Assumption diagnostics |
VarCorr() | lme4 | Extract variance components |
ranef() | lme4 | Extract BLUPs |
fixef() | lme4 | Extract fixed effects |
simulate() | lme4 | Parametric bootstrap |
PBmodcomp() | pbkrtest | Parametric bootstrap LRT |
KRmodcomp() | pbkrtest | Kenward-Roger test |
Stata & SPSS Commands
Stata — mixed
mixed score time##group || school: || id:time, reml
estat icc
margins group, at(time=(0 1 2))
test 2.group = 3.group
SPSS — MIXED
MIXED score BY group WITH time
/FIXED = group time group*time | SSTYPE(3)
/RANDOM INTERCEPT | SUBJECT(school) COVTYPE(VC)
/REPEATED = time | SUBJECT(id) COVTYPE(AR1)
/EMMEANS = TABLES(group) COMPARE ADJ(BONFERRONI).
Python — statsmodels
import statsmodels.formula.api as smf
model = smf.mixedlm("score ~ time*group",
data=df, groups=df["school"],
re_formula="~time")
result = model.fit(reml=True)
§12 · Assumptions, Diagnostics & Remedies
Normality of Residuals (ε)
- QQ-plot of marginal residuals (y - Xβ̂)
- QQ-plot of conditional residuals (y - Xβ̂ - Zb̂)
- Shapiro-Wilk test (small n only; over-powered for large n)
- Histogram of residuals
Remedy
Transform y (log, sqrt, Box-Cox)
Or use robust SE estimation
Or use nonparametric bootstrap
Normality of Random Effects (b)
- QQ-plot of BLUPs (empirical Bayes estimates)
- Check for outliers in BLUP distribution
- Difficult to assess with few clusters (<30)
- Fixed effects estimates robust to this violation
R: QQ-plot of BLUPs
qqnorm(ranef(m)$school[,1])
qqline(ranef(m)$school[,1])
Homoscedasticity
- Plot residuals vs. fitted values
- Plot residuals vs. each predictor
- Levene's test within groups
- If violated: model heterogeneous R (CSH, ARH)
- Or use weighted LMM
- Or model variance as function of a covariate
Influential Observations
| Measure | Detects |
|---|---|
| Cook's D | Influential observations on fixed effects |
| DFFITS | Influence on fitted values |
| Leverage (h_ii) | High-leverage points |
| MDFFITS | Multivariate cluster-level influence |
| COVTRACE/COVRATIO | Influence on covariance of β̂ |
In LMMs, influence can be at the observation OR cluster level — check both
Linearity & Functional Form
- Plot residuals vs. each continuous predictor
- Added Variable Plots (partial regression plots)
- Component + Residual plots (CERES)
- Add polynomial terms if nonlinearity detected
- Consider splines for flexible modeling
- Use GAMMs if strong nonlinearity
Multicollinearity
- Variance Inflation Factor (VIF) for fixed effects
- VIF > 10 (or 5): potential problem
- Condition number of X'X: >30 concerning
- Remedies: centering, orthogonalization, ridge regression, PCA
- Cross-level interactions are prone to multicollinearity — center predictors
R: check VIF
car::vif(lm(y ~ X1 + X2 + ..., data=d))
# VIF for fixed effects part
§13 · Full Reporting Checklist & Quick Reference
Reporting Checklist for LMM
- State model equation clearly (HLM or matrix notation)
- Report all fixed effects: estimate, SE, t/F, df (method), p-value, CI
- Report all variance components: σ², τ_00, τ_11, τ_01 with SE or CI
- Report ICC (and ICC at each level for 3-level)
- State estimation method (ML vs REML) and software
- Report df approximation method (KR, Satterthwaite, etc.)
- Report sample sizes at all levels (N subjects, J clusters, K schools)
- Report covariance structure of R and G matrices with justification
- Report model selection criteria (AIC/BIC for final choice)
- Report assumption checks and any violations
- Report LSMeans or contrasts with multiplicity adjustment
- Report pseudo-R² at each level if possible
Common Pitfalls & Remedies
| Pitfall | Remedy |
|---|---|
| Using ML for variance component inference | Switch to REML |
| Using REML for fixed effects LRT | Switch to ML |
| Using standard χ² for testing σ²=0 | Halve the p-value (boundary test) |
| Ignoring clustering (OLS on clustered data) | Use LMM or cluster-robust SE |
| Too few clusters for random effects (<5) | Treat as fixed effect instead |
| Not centering in cross-level interactions | Center level-1 predictors |
| Convergence failure | Scale predictors; simpler G; different optimizer |
| Singular fit (boundary estimate) | Simplify random effects structure |
| Treating time as factor unnecessarily | Model as continuous if monotone trend |
| Ignoring serial correlation in longitudinal data | Model R with AR(1) or add R structure |
Effect Size Measures
| Measure | Formula | Interpretation |
|---|---|---|
| Marginal R² (R²_m) | Var(Xβ̂) / Var(Xβ̂ + Zb̂ + ε̂) | Variance explained by fixed effects |
| Conditional R² (R²_c) | Var(Xβ̂ + Zb̂) / total var | Variance explained by entire model |
| Cohen's d (LMM) | γ_fixed / √(σ²_total) | Standardized fixed effect |
| PVE at L1 | (σ²_null - σ²_model) / σ²_null | Within-cluster variance explained |
| PVE at L2 | (τ²_null - τ²_model) / τ²_null | Between-cluster variance explained |
Nakagawa & Schielzeth (2013) — R for lme4
performance::r2(model)
# returns R²_marginal and R²_conditional
Minimum Cluster Requirements
| Purpose | Min. Clusters (L2) | Notes |
|---|---|---|
| Estimate ICC reliably | ≥ 30 | More is always better |
| Random intercept only | ≥ 10–20 | Minimum for stable estimates |
| Random intercept + slope | ≥ 30–50 | More variance components to estimate |
| Cross-level interaction | ≥ 50 | Need adequate power |
| 3-level (L3 units) | ≥ 20 L3 units | With ≥ 3–5 L2 per L3 |
Rule of thumb: 30/30 rule — at least 30 clusters with at least 30 obs each for stable estimates (Maas & Hox)
Master Formula Reference
LMM general
y = Xβ + Zb + ε
b ~ N(0, G)
ε ~ N(0, R)
y ~ N(Xβ, V=ZGZᵀ+R)
GLS estimator
β̂ = (XᵀV⁻¹X)⁻¹XᵀV⁻¹y
Var(β̂) = (XᵀV⁻¹X)⁻¹
BLUP
b̂ = GZᵀV⁻¹(y - Xβ̂)
ICC (2-level)
ρ = σ²_u/(σ²_u + σ²_ε)
Design effect
Deff = 1 + (n̄-1)·ρ
Information criteria
AIC = -2ℓ + 2k
BIC = -2ℓ + k·ln(n)
LRT
G² = -2Δℓ ~ χ²_Δk