Linear mixed model

Public Health Nepal
0

Linear Mixed Models — Comprehensive Reference Cheatsheet

West, Welch & Gałecki · 3rd Edition · With contributions from Gillespie
§1 · Core Concepts & Motivation

Why LMM?

  • Data are clustered / nested / grouped — violating OLS independence assumption
  • Handles unbalanced designs (unequal group sizes, missing data) gracefully
  • Accounts for within-group correlation explicitly
  • Allows both subject-specific and population-average inference
  • Handles repeated measures and longitudinal data correctly
  • Flexible modeling of heterogeneous variance

General LMM Equation

Matrix form (Laird & Ware, 1982) y = Xβ + Zb + ε
SymbolMeaning
yn×1 response vector
Xn×p design matrix (fixed effects)
βp×1 fixed-effect parameter vector
Zn×q design matrix (random effects)
bq×1 random-effect vector, b ~ N(0, G)
εn×1 residual vector, ε ~ N(0, R)
Marginal distribution y ~ N(Xβ, ZGZᵀ + R)  ≡  N(, V)

Key Assumptions

  • Normality: b ~ N(0, G) and ε ~ N(0, R)
  • Independence: b ⊥ ε (random effects independent of residuals)
  • Linearity: E[y|X, Z] = Xβ + Zb
  • Homoscedasticity: unless modeled otherwise in R
  • Correct covariance structure for R and G
  • No measurement error in X and Z
Violations of normality are less critical for fixed effects (CLT applies) but matter more for BLUP accuracy

Estimation Methods

MethodFull NameUse When
MLMaximum LikelihoodComparing models with different fixed effects
REMLRestricted MLComparing models with same fixed effects; default for variance estimation
BLUPBest Linear Unbiased PredictorPredicting random effects (Eb|y)
BLUEBest Linear Unbiased EstimatorFixed effects: β̂ = (XᵀV⁻¹X)⁻¹XᵀV⁻¹y
REML cannot be used to compare models differing in fixed effects — use ML for that

Marginal vs. Conditional

Marginal (Population-Average) Model
Describe average response in population; random effects integrated out. Focus on . Useful for policy, group comparisons.
Conditional (Subject-Specific) Model
Describe response for a specific subject/cluster given their random effects b. Focus on Xβ + Zb. Useful for individual predictions.
LMMs are naturally both — β gives population-level inference, BLUPs give individual predictions

Terminology Map

Context= LMM Term
HLM Level-1 residualsε (within-cluster)
HLM Level-2 residualsb (between-cluster)
Random intercept modelZ = 1, G = σ²_b (scalar)
Growth curve modelZ includes time; random slope
Repeated measures ANOVASpecial case of LMM with compound symmetry
Split-plot design2-level LMM with whole-plot & subplot error
§2 · Fixed vs. Random Effects

Fixed Effects (β)

  • Parameters that are constant across the population
  • Levels are exhaustive — we care about those specific levels
  • Inference is to those specific levels only
  • Estimated by GLS: β̂ = (XᵀV⁻¹X)⁻¹XᵀV⁻¹y
  • Examples: treatment group, time point, age, sex, dose
  • Can include interactions between fixed predictors
Var(β̂) = (XᵀV⁻¹X)⁻¹  [estimated by plugging in V̂]

Random Effects (b)

  • Factor levels are a random sample from a larger population
  • Inference extends to the entire population of levels
  • Not "estimated" — predicted via BLUPs (shrinkage estimators)
  • Characterized by their variance components (G matrix)
  • Examples: subjects, schools, clinics, batches, farms
  • Random effects shrink toward 0 — borrow strength
BLUP formula b̂ = GZᵀV⁻¹(y - Xβ̂)

Decision Guide: Fixed or Random?

QuestionFixedRandom
Levels exhaustive?Yes ✓No (sample)
Inference to other levels?NoYes ✓
Number of levelsAnyPrefer ≥ 5–6
Interest in variance?NoYes ✓
Predict individual effects?NoYes ✓
Treatment conditionsYes ✓Rare
Blocking factorSometimesPreferred ✓

G Matrix — Random Effect Covariance

Random intercept only G = σ²_u · I
Random intercept + slope (unstructured) G = | σ²_u₀ σ_u₀u₁ | | σ_u₀u₁ σ²_u₁ |
Multiple random effects (general) G = Cov(b)  [q × q positive definite matrix]
  • σ²_u₀: variance of random intercepts
  • σ²_u₁: variance of random slopes
  • σ_u₀u₁: covariance between intercepts and slopes
§3 · Covariance Structures (R matrix)

Independence (ID)

R = σ²·I
  • Default; residuals uncorrelated
  • Valid for simple random intercept models
  • 1 parameter: σ²

Compound Symmetry (CS)

Var(y_ij) = σ² + σ²_b Cov(y_ij, y_ik) = σ²_b [j ≠ k]
  • All pairs equally correlated: ρ = σ²_b/(σ²_b + σ²)
  • Implied by pure random intercept model
  • Equivalent to repeated measures ANOVA
  • 2 parameters: σ², σ²_b

Unstructured (UN)

R = Σ [free n×n pos.def. matrix]
  • Most flexible; separate variance + all covariances
  • Parameters: n(n+1)/2 — grows rapidly
  • Risk of non-convergence with small samples
  • Use when no theoretical structure assumed

AR(1) — Autoregressive

Corr(y_ij, y_ik) = ρ^|j-k| Cov = σ² · ρ^|t_j - t_k|
  • Correlation decays with time lag
  • Appropriate for equally-spaced longitudinal data
  • 2 parameters: σ², ρ  (|ρ| < 1)
  • Parsimonious; popular in practice

Toeplitz (TOEP)

Corr(y_ij, y_ik) = ρ_|j-k| [separate ρ per lag]
  • Correlation depends on lag only
  • More flexible than AR(1)
  • Parameters: σ² + (n−1) correlation params
  • Between CS and UN in complexity

Heterogeneous Structures

CSH: Var(y_ij) = σ²_j [heterogeneous CS] ARH(1): separate σ²_j + AR(1) correlation
  • Allow variance to differ by time point / group
  • Suffix H = heterogeneous version
  • More parameters but handle unequal variances

Covariance Structure Selection Strategy

AIC — Akaike Information Criterion AIC = -2ℓ + 2p
BIC — Bayesian (Schwarz) Information Criterion BIC = -2ℓ + p·ln(n)
  • Fit candidate structures with same fixed effects using REML
  • Compare AIC/BIC — smaller is better
  • Use LRT only for nested covariance structures
  • For non-nested structures: use AIC/BIC only
  • Prefer parsimony when AIC differences are small (<2–4)
StructureParamsNested in UN?
ID1Yes
CS2Yes
AR(1)2Yes
TOEPnYes
UNn(n+1)/2
CSHn+1Yes
§4 · Intraclass Correlation Coefficient (ICC)

ICC Definition & Interpretation

2-level random intercept model ICC = ρ = σ²_u / (σ²_u + σ²_ε)
  • Proportion of total variance due to between-cluster differences
  • = correlation between two observations in the same cluster
  • Range: 0 ≤ ICC ≤ 1
  • ICC = 0 → no clustering; OLS is fine
  • ICC = 1 → all variance between clusters
Typical ICC values: education 0.05–0.30; clinical 0.01–0.05; psychology 0.10–0.50

ICC for 3-Level Models

3-level: students in classes in schools ICC₂ = σ²_v / (σ²_v + σ²_u + σ²_ε) ICC₃ = (σ²_v + σ²_u) / (σ²_v + σ²_u + σ²_ε)
ICCNumeratorInterpretation
ICC₂σ²_v% variance at level 3 (school)
ICC₃σ²_v + σ²_u% variance at levels 2+3
Correlation between level-1 units in same level-2 cluster ρ₂ = σ²_u / (σ²_u + σ²_ε) [within school, different class] Same level-2 and level-3 cluster ρ₃ = (σ²_v + σ²_u) / (σ²_v + σ²_u + σ²_ε)

Design Effect (Deff)

Effective sample size reduction Deff = 1 + (n_j - 1) · ICC
Effective N n_eff = N / Deff
  • n_j = average cluster size
  • Deff > 1 → clustered sample less efficient than simple random
  • Ignoring ICC inflates Type I error
  • Used for sample size planning
Even ICC = 0.05 with cluster size 20 → Deff = 1.95 (halves effective N)
§5 · Two-Level Models (Hierarchical / Nested)

Model Taxonomy — 2 Levels

MODEL 1 — Unconditional Means (Null)
y_ij = γ_00 + u_0j + e_ij
No predictors. Partitions variance. Compute ICC.
MODEL 2 — Random Intercept, Level-1 predictor
y_ij = γ_00 + γ_10·X_ij + u_0j + e_ij
Fixed slope for X; random intercept across clusters
MODEL 3 — Random Intercept + Level-2 predictor
y_ij = γ_00 + γ_01·W_j + γ_10·X_ij + u_0j + e_ij
W_j explains variance in intercepts across clusters
MODEL 4 — Random Slopes
y_ij = γ_00 + γ_10·X_ij + u_0j + u_1j·X_ij + e_ij
Slope of X varies across clusters; test cov(u_0j, u_1j)
MODEL 5 — Cross-Level Interaction
y_ij = γ_00 + γ_01·W_j + γ_10·X_ij + γ_11·W_j·X_ij + u_0j + u_1j·X_ij + e_ij
W_j moderates effect of X on y (slope-as-outcome)

HLM Notation — 2 Levels

Level 1 (within cluster — observations) y_ij = β_0j + β_1j·X_ij + e_ij e_ij ~ N(0, σ²)
Level 2 (between clusters) β_0j = γ_00 + γ_01·W_j + u_0j β_1j = γ_10 + γ_11·W_j + u_1j ⎡u_0j⎤ ~ N(0, G) = N(0, ⎡τ_00 τ_01⎤) ⎣u_1j⎦ ⎣τ_01 τ_11⎦
Combined (reduced-form) equation y_ij = γ_00 + γ_01·W_j + γ_10·X_ij + γ_11·W_j·X_ij + u_0j + u_1j·X_ij + e_ij
HLM notation and LMM matrix notation are equivalent — same model, different presentation

KEY VARIANCE COMPONENTS
  • σ² = level-1 residual variance (within clusters)
  • τ_00 = variance of random intercepts (between clusters)
  • τ_11 = variance of random slopes
  • τ_01 = covariance of intercepts and slopes

Centering Strategies

Grand Mean Centering (GMC)
X*_ij = X_ij - X̄..
  • Interpret intercept as group mean at avg predictor
  • Fixed-effect estimate = total effect (within + between)
  • Use for: macro-level questions, cross-level interactions
  • Random intercept now = deviation from grand mean
Group Mean Centering (CWC)
X*_ij = X_ij - X̄_j
  • Removes between-group variation from level-1 predictor
  • Within-group effect only
  • Add W̄_j (group mean) at level 2 to recover between effect
  • Separates within from between effects cleanly
  • Preferred for individual-level questions
No Centering
X*_ij = X_ij
  • Intercept = predicted y when X = 0 (meaningful only if 0 in range)
  • Estimate = total effect conflated
  • OK if 0 is natural/meaningful (e.g., time = 0 at baseline)
  • Avoid for cross-level interactions — creates multicollinearity
Centering choice affects interpretation of intercept variance (τ_00)!
§6 · Three-Level Models

3-Level HLM Notation

Level 1 — within clusters (e.g., students) y_ijk = π_0jk + π_1jk·a_ijk + e_ijk e_ijk ~ N(0, σ²)
Level 2 — classroom (j in school k) π_0jk = β_00k + β_01k·X_jk + r_0jk π_1jk = β_10k + β_11k·X_jk + r_1jk ⎡r_0jk⎤ ~ N(0, T_π) ⎣r_1jk⎦
Level 3 — school (k) β_00k = γ_000 + γ_001·W_k + u_00k β_10k = γ_100 + γ_101·W_k + u_10k ⎡u_00k⎤ ~ N(0, T_β) ⎣u_10k⎦

3-Level Variance Components

ComponentSymbolRepresents
Level-1 varianceσ²Within-classroom variation
Level-2 intercept varτ²_π00Between-classroom, within-school
Level-2 slope varτ²_π11Slope variance at classroom level
Level-3 intercept varτ²_β00Between-school variation in means
Level-3 slope varτ²_β10Slope variance at school level
ICC at level 2 (classroom) ICC₂ = τ²_π00 / (τ²_π00 + τ²_β00 + σ²) ICC at level 3 (school) ICC₃ = τ²_β00 / (τ²_π00 + τ²_β00 + σ²)

Cross-Level Interactions — 3 Levels

  • L1 × L2: Level-1 predictor moderated by level-2 variable (most common)
  • L1 × L3: Level-1 predictor moderated by level-3 variable
  • L2 × L3: Level-2 predictor moderated by level-3 variable
  • L1 × L2 × L3: Three-way cross-level interaction (rare)
Example: L1 × L3 cross-level interaction y_ijk = γ_000 + γ_100·a_ijk + γ_001·W_k + γ_101·W_k·a_ijk + ...
Always include lower-order terms when modeling cross-level interactions

3-Level vs. 2-Level: When to Use

ConditionAction
Data has 3 natural levelsUse 3-level model
ICC₃ ≈ 0 (negligible school variance)Collapse to 2-level (ignore school)
Very few level-3 units (<5)Treat level 3 as fixed effect
Predictors at all 3 levels3-level needed
Cross-level interactions involving L33-level needed
Research Q about L3 variance3-level needed
§7 · Hypothesis Testing

Tests for Fixed Effects

t-test (single parameter) t = β̂_p / SE(β̂_p) ~ t_df [Satterthwaite or KR df] H₀: β_p = 0
F-test (joint test of multiple parameters) F = (Lβ̂)ᵀ [L(XᵀV⁻¹X)⁻¹Lᵀ]⁻¹ (Lβ̂) / r ~ F(r, df) [r = rank of L] H₀: Lβ = 0
Likelihood Ratio Test (nested models, ML) LRT = -2(ℓ_reduced - ℓ_full) ~ χ²_df df = difference in # fixed effect parameters
Use Kenward-Roger (KR) or Satterthwaite df approximation for small samples — avoid asymptotic df

Tests for Random Effects / Variance Components

LRT for variance components (REML) LRT = -2(ℓ_REML_reduced - ℓ_REML_full) H₀: σ²_u = 0 (e.g., no random intercept needed)
  • Null hypothesis on the boundary of parameter space (σ² ≥ 0)
  • Standard χ² is too conservative — p-value should be halved
  • For single variance: use mixture χ² = ½χ²₀ + ½χ²₁
  • For multiple df: corrections more complex (Self & Liang)
Wald test for variance (less preferred) Z = σ̂²_u / SE(σ̂²_u) ~ N(0,1) [approximate]
Always use REML (not ML) when comparing models that differ only in random effects structure

Degrees of Freedom Approximations

MethodWhen to UseNotes
SatterthwaiteDefault in SAS PROC MIXED, R lmerTestFast; adequate for most cases
Kenward-RogerSmall samples, complex designsMore accurate; adjusts SE too
Between-WithinSimple balanced designsSplits df into between/within subject
ContainmentNested designsClassic ANOVA-like approach
ResidualLarge samples onlyUses n-p df; OK when N huge

Type I / II / III SS Tests

TypeTestsUse When
Type ISequential — order mattersA priori hierarchical models
Type IIEach effect adjusted for all others at same or lower levelNo interaction present
Type IIIEach effect adjusted for ALL others including interactionsDefault; unbalanced data with interactions
Type III tests require correct sum-to-zero (or GLM) contrasts. Always verify coding scheme.

Kenward-Roger Adjustment

  • Adjusts both the F-statistic and the denominator df
  • Accounts for uncertainty in estimating variance components
  • Uses a scaled Wald statistic with inflated standard errors
  • Preferred when: small number of clusters, complex covariance, REML
  • Available in SAS PROC MIXED (DDFM=KR), R pbkrtest, lmerTest
KR adjusted test statistic F_KR = (Lβ̂)ᵀ Φ̂⁻¹ (Lβ̂) / r ~ F(r, ν_KR) [Φ̂ = KR-adjusted covariance of Lβ̂]

LRT: ML vs. REML Rules

ComparisonMethodJustification
Fixed effects differMLREML ℓ not comparable across fixed effects
Random effects differ, same fixedREMLREML accounts for fixed-effect uncertainty
Nested covariance structuresREMLSame fixed effects — valid comparison
AIC/BIC comparisonREML (same fixed)Comparable on same scale
§8 · Contrasts & Estimable Functions

Types of Contrasts

TypeDefinitionExample
PairwiseCompare two means: μ_i - μ_jTreatment A vs. B
HelmertEach level vs. mean of subsequentTime 1 vs. avg(T2,T3)
PolynomialLinear, quadratic, cubic trendsDose-response curve
DeviationEach level vs. grand meanFactor effect coding
SimpleEach vs. reference categoryDummy coding
User-definedCustom L matrixAny specific hypothesis

Estimable Functions & L Matrix

Contrast: scalar hypothesis Lβ = c where L is a 1×p row vector H₀: Lβ = 0 Estimate: Lβ̂ = c̃ SE = √[L(XᵀV⁻¹X)⁻¹Lᵀ]
Multiple contrasts simultaneously H₀: Lβ = 0 (L is r×p, rank r) F = (Lβ̂)ᵀ[L·Var(β̂)·Lᵀ]⁻¹(Lβ̂) / r
A function Lβ is estimable if L is in the row space of X (i.e., L = LHX for hat matrix H_X)

Planned vs. Post-hoc Comparisons

Planned (a priori) Contrasts
Specified before seeing data. No need to adjust for multiple comparisons if contrasts are orthogonal and pre-specified. Use t or F tests directly.
Post-hoc (exploratory) Comparisons
Adjust for multiple testing. Choose from: Bonferroni, Tukey HSD (all pairwise), Scheffé (any contrast), Holm (step-down), Benjamini-Hochberg (FDR).

Multiple Comparison Adjustments

MethodControlsBest For
BonferroniFWERFew planned comparisons; conservative
HolmFWERMore powerful than Bonferroni
Tukey HSDFWER (all pairs)All pairwise means; balanced
SchefféFWER (any contrast)Exploratory; very conservative
BH (FDR)FDRMany comparisons; genomics, etc.
SimulationFWER (exact)Complex designs; SAS SIMULATE

Least Squares Means (LSMeans) in LMMs

  • LSMeans = population marginal means — average over nuisance factors
  • = Lβ̂ for specific L matrix averaging over covariates at their means
  • Account for imbalance in the design (unlike simple cell means)
  • Default in SAS PROC MIXED (LSMEANS statement)
  • In R: use emmeans package
LSMean for group i μ̂_i = L_i β̂ where L_i averages over all other factors
Pairwise difference μ̂_i - μ̂_j = (L_i - L_j)β̂ SE² = (L_i-L_j) Var(β̂) (L_i-L_j)ᵀ
LSMeans are preferred over raw means when the design is unbalanced or when covariates are present
Adjust for multiple LSMean comparisons LSMEANS trt / ADJUST=TUKEY; LSMEANS trt / ADJUST=BON; LSMEANS trt / PDIFF;
§9 · Hierarchical Model Building Process

2-Level Model Building Steps

  1. Unconditional Means Model (Null) — No predictors. Fit with REML. Compute ICC = σ²_u / (σ²_u + σ²_e). Justifies multilevel approach if ICC > 0.
  2. Unconditional Growth / Level-1 Predictors — Add level-1 predictors (e.g., time) with fixed effects. Keep random intercepts. REML. Compare AIC/BIC to Step 1.
  3. Random Slopes — Allow slopes of key level-1 predictors to vary randomly. Test variance τ_11 using LRT (REML). Keep if significant or theoretically motivated.
  4. Level-2 Predictors — Add group-level (level-2) predictors to explain intercept (and slope) variance. Switch to ML for LRT comparing fixed effects.
  5. Cross-Level Interactions — Test whether level-2 variables moderate level-1 effects (slope-as-outcome). Use ML for model comparison.
  6. Final Model — Refit with REML for final variance component estimates. Report fixed effects, variance components, ICC, fit indices.

3-Level Model Building Steps

  1. Unconditional Means (3-level null) — Fit with three random intercepts (level 1, 2, 3). Compute ICC₂ and ICC₃. REML.
  2. Level-1 Predictors — Add individual-level (L1) predictors. Test fixed slopes. Consider centering. REML.
  3. Random Slopes at Level 2 — Allow L1 slopes to vary across L2 units. Test τ²_π11 with LRT (REML). Half the χ² p-value.
  4. Random Slopes at Level 3 — Allow L1 slopes to vary across L3 units. Test τ²_β10 with LRT (REML).
  5. Level-2 Predictors — Add classroom-level predictors (L2) to explain L2 variance in intercepts and slopes. ML for LRT.
  6. Level-3 Predictors — Add school-level predictors (L3). ML for comparison.
  7. Cross-Level Interactions — L1×L2, L1×L3, L2×L3. ML for comparison.
  8. Final Refit with REML — Final estimates and inference. Report all variance components at each level.

Model Comparison Toolkit

ToolFormulaUse
AIC-2ℓ + 2kNon-nested; smaller = better
BIC-2ℓ + k·ln(n)Penalizes params more; smaller = better
LRT-2Δℓ ~ χ²_ΔkNested models only
Pseudo-R²1 - σ²_reduced/σ²_nullVariance explained at each level
ΔAIC < 2Models essentially equivalent
ΔAIC 2–7Some support for better model
ΔAIC > 10Clear preference for better model

Pseudo-R² at Each Level

Snijders & Bosker (1994) — Level-1 R² R²₁ = 1 - (σ²_new + τ²_new) / (σ²_null + τ²_null)
Level-2 R² (variance in intercepts explained) R²₂ = 1 - τ²_new / τ²_null
Proportion of variance explained (each level) PVE_L1 = (σ²_null - σ²_model) / σ²_null PVE_L2 = (τ²_null - τ²_model) / τ²_null
Pseudo-R² can decrease when adding predictors in LMMs. Use cautiously and do not interpret as OLS R².
§10 · Longitudinal Data Analysis

Longitudinal LMM Framework

Individual growth curve model y_ij = (β_0 + u_0i) + (β_1 + u_1i)·t_ij + β_2·X_i + β_3·X_i·t_ij + ε_ij
TermMeaning
β_0 + u_0iIndividual baseline (random intercept)
β_1 + u_1iIndividual growth rate (random slope)
β_2Between-person covariate effect at baseline
β_3Covariate × time interaction (moderates growth)
ε_ijWithin-person residual (serial correlation)
Center time at baseline (t=0) so the intercept = baseline mean, and random intercept variance = between-person baseline variability

Time Specification Strategies

Time ModelCodeWhen
LineartimeConstant rate of change
Quadratictime + time²Accelerating/decelerating change
Cubictime + time² + time³More complex nonlinearity
PiecewiseSplines / dummy slopesChange point exists
CategoricalFactor(time)Unstructured change pattern
Log timelog(time)Rapid early change, plateau
Piecewise linear (change at time t*) y_ij = β_0 + β_1·t_ij + β_2·(t_ij - t*)_+ + u_0i + u_1i·t_ij + ε_ij

Choosing R Matrix for Longitudinal Data

Start with
Random intercept + slope (models within-person correlation implicitly)
↓ if residuals still correlated
Add serial correlation to R
AR(1) for equally-spaced time; SP(POW) for unequal spacing
↓ if variances change over time
Allow heterogeneous variance
ARH(1) or UN structure; or model variance as function of time
↓ fully unstructured
Unstructured (UN)
Separate variance at each time + all covariances; most general

Handling Missing Data in Longitudinal LMMs

Missing TypeDefinitionLMM Valid?
MCARMissing completely at randomYes ✓
MARMissing at random (given observed data)Yes ✓ (REML)
MNARMissing not at random (informative)Biased — needs selection/pattern mixture model
  • LMMs use all available data — no listwise deletion needed
  • Handles unbalanced time points naturally
  • Each person can have different observation times
  • Valid under MAR; check dropout patterns
Test MAR assumption: Does time of dropout predict outcomes? Use pattern mixture models if MNAR suspected.

Time-Varying vs. Time-Invariant Covariates

TypeVaries?LevelExample
Time-invariantNoBetween-person (L2)Sex, baseline age, group
Time-varyingYesWithin-person (L1)Current BMI, mood, medication
Time itselfYesL1 (in 2-level)Week, day, visit number
Time-varying covariate decomposition (Curran & Bauer) X_ij = X̄_i + (X_ij - X̄_i) = between-person + within-person component
Center time-varying covariates within-person to separate within from between effects

Autoregressive / Transition Models

Lagged response as predictor y_ij = β_0 + β_1·y_i(j-1) + β_2·X_ij + u_0i + ε_ij
  • Regress current outcome on previous value
  • β_1 = autoregressive coefficient (stability)
  • Accounts for serial dependence explicitly
  • Useful for studying short-term change
  • Biased if u_0i correlated with lagged y — use system GMM or within-person centering
Including lagged y as predictor and random intercept simultaneously can cause Nickell bias in short panels
§11 · Software Reference (SAS, R, Stata, SPSS)

SAS PROC MIXED — Key Syntax

Basic 2-level random intercept PROC MIXED DATA=mydata METHOD=REML; CLASS school; MODEL score = time group time*group / SOLUTION DDFM=KR; RANDOM INTERCEPT / SUBJECT=school TYPE=VC; REPEATED / SUBJECT=school TYPE=AR(1); LSMEANS group / PDIFF ADJUST=TUKEY; ESTIMATE 'Trt vs Ctrl at T1' group 1 -1 time*group 1 0 -1 0; RUN;
Random slope model RANDOM INTERCEPT time / SUBJECT=id TYPE=UN; [TYPE=UN allows covariance between intercept and slope]
3-level: students in classes in schools RANDOM INTERCEPT / SUBJECT=school; RANDOM INTERCEPT / SUBJECT=class(school);

R — lme4 & nlme Syntax

lme4 — random intercept library(lme4); library(lmerTest) m1 <- lmer(score ~ time*group + (1|school), data=d, REML=TRUE) summary(m1) # KR df via lmerTest
lme4 — random intercept + slope m2 <- lmer(score ~ time + (1 + time|id), data=d) m2b <- lmer(score ~ time + (1|id) + (0+time|id), data=d) # m2b = uncorrelated random effects
nlme — custom covariance structure library(nlme) m3 <- lme(score ~ time, random = ~1|school, correlation = corAR1(form=~time|id), data=d, method="REML")
emmeans for LSMeans / contrasts library(emmeans) emmeans(m1, ~ group | time) pairs(emmeans(m1, ~ group), adjust="tukey")

Key R Functions Cheatsheet

FunctionPackagePurpose
lmer()lme4Fit LMM (no p-values by default)
lme()nlmeLMM with flexible covariance
summary()baseFixed effects, variance components
anova(m1, m2)baseLRT between nested models
AIC(), BIC()baseInformation criteria
emmeans()emmeansLSMeans, contrasts
icc()performanceICC calculation
r2()performanceConditional/marginal R²
check_model()performanceAssumption diagnostics
VarCorr()lme4Extract variance components
ranef()lme4Extract BLUPs
fixef()lme4Extract fixed effects
simulate()lme4Parametric bootstrap
PBmodcomp()pbkrtestParametric bootstrap LRT
KRmodcomp()pbkrtestKenward-Roger test

Stata & SPSS Commands

Stata — mixed mixed score time##group || school: || id:time, reml estat icc margins group, at(time=(0 1 2)) test 2.group = 3.group
SPSS — MIXED MIXED score BY group WITH time /FIXED = group time group*time | SSTYPE(3) /RANDOM INTERCEPT | SUBJECT(school) COVTYPE(VC) /REPEATED = time | SUBJECT(id) COVTYPE(AR1) /EMMEANS = TABLES(group) COMPARE ADJ(BONFERRONI).
Python — statsmodels import statsmodels.formula.api as smf model = smf.mixedlm("score ~ time*group", data=df, groups=df["school"], re_formula="~time") result = model.fit(reml=True)
§12 · Assumptions, Diagnostics & Remedies

Normality of Residuals (ε)

  • QQ-plot of marginal residuals (y - Xβ̂)
  • QQ-plot of conditional residuals (y - Xβ̂ - Zb̂)
  • Shapiro-Wilk test (small n only; over-powered for large n)
  • Histogram of residuals
Remedy Transform y (log, sqrt, Box-Cox) Or use robust SE estimation Or use nonparametric bootstrap

Normality of Random Effects (b)

  • QQ-plot of BLUPs (empirical Bayes estimates)
  • Check for outliers in BLUP distribution
  • Difficult to assess with few clusters (<30)
  • Fixed effects estimates robust to this violation
R: QQ-plot of BLUPs qqnorm(ranef(m)$school[,1]) qqline(ranef(m)$school[,1])

Homoscedasticity

  • Plot residuals vs. fitted values
  • Plot residuals vs. each predictor
  • Levene's test within groups
  • If violated: model heterogeneous R (CSH, ARH)
  • Or use weighted LMM
  • Or model variance as function of a covariate

Influential Observations

MeasureDetects
Cook's DInfluential observations on fixed effects
DFFITSInfluence on fitted values
Leverage (h_ii)High-leverage points
MDFFITSMultivariate cluster-level influence
COVTRACE/COVRATIOInfluence on covariance of β̂
In LMMs, influence can be at the observation OR cluster level — check both

Linearity & Functional Form

  • Plot residuals vs. each continuous predictor
  • Added Variable Plots (partial regression plots)
  • Component + Residual plots (CERES)
  • Add polynomial terms if nonlinearity detected
  • Consider splines for flexible modeling
  • Use GAMMs if strong nonlinearity

Multicollinearity

  • Variance Inflation Factor (VIF) for fixed effects
  • VIF > 10 (or 5): potential problem
  • Condition number of X'X: >30 concerning
  • Remedies: centering, orthogonalization, ridge regression, PCA
  • Cross-level interactions are prone to multicollinearity — center predictors
R: check VIF car::vif(lm(y ~ X1 + X2 + ..., data=d)) # VIF for fixed effects part
§13 · Full Reporting Checklist & Quick Reference

Reporting Checklist for LMM

  1. State model equation clearly (HLM or matrix notation)
  2. Report all fixed effects: estimate, SE, t/F, df (method), p-value, CI
  3. Report all variance components: σ², τ_00, τ_11, τ_01 with SE or CI
  4. Report ICC (and ICC at each level for 3-level)
  5. State estimation method (ML vs REML) and software
  6. Report df approximation method (KR, Satterthwaite, etc.)
  7. Report sample sizes at all levels (N subjects, J clusters, K schools)
  8. Report covariance structure of R and G matrices with justification
  9. Report model selection criteria (AIC/BIC for final choice)
  10. Report assumption checks and any violations
  11. Report LSMeans or contrasts with multiplicity adjustment
  12. Report pseudo-R² at each level if possible

Common Pitfalls & Remedies

PitfallRemedy
Using ML for variance component inferenceSwitch to REML
Using REML for fixed effects LRTSwitch to ML
Using standard χ² for testing σ²=0Halve the p-value (boundary test)
Ignoring clustering (OLS on clustered data)Use LMM or cluster-robust SE
Too few clusters for random effects (<5)Treat as fixed effect instead
Not centering in cross-level interactionsCenter level-1 predictors
Convergence failureScale predictors; simpler G; different optimizer
Singular fit (boundary estimate)Simplify random effects structure
Treating time as factor unnecessarilyModel as continuous if monotone trend
Ignoring serial correlation in longitudinal dataModel R with AR(1) or add R structure

Effect Size Measures

MeasureFormulaInterpretation
Marginal R² (R²_m)Var(Xβ̂) / Var(Xβ̂ + Zb̂ + ε̂)Variance explained by fixed effects
Conditional R² (R²_c)Var(Xβ̂ + Zb̂) / total varVariance explained by entire model
Cohen's d (LMM)γ_fixed / √(σ²_total)Standardized fixed effect
PVE at L1(σ²_null - σ²_model) / σ²_nullWithin-cluster variance explained
PVE at L2(τ²_null - τ²_model) / τ²_nullBetween-cluster variance explained
Nakagawa & Schielzeth (2013) — R for lme4 performance::r2(model) # returns R²_marginal and R²_conditional

Minimum Cluster Requirements

PurposeMin. Clusters (L2)Notes
Estimate ICC reliably≥ 30More is always better
Random intercept only≥ 10–20Minimum for stable estimates
Random intercept + slope≥ 30–50More variance components to estimate
Cross-level interaction≥ 50Need adequate power
3-level (L3 units)≥ 20 L3 unitsWith ≥ 3–5 L2 per L3
Rule of thumb: 30/30 rule — at least 30 clusters with at least 30 obs each for stable estimates (Maas & Hox)

Master Formula Reference

LMM general y = Xβ + Zb + ε b ~ N(0, G) ε ~ N(0, R) y ~ N(Xβ, V=ZGZᵀ+R)
GLS estimator β̂ = (XᵀV⁻¹X)⁻¹XᵀV⁻¹y Var(β̂) = (XᵀV⁻¹X)⁻¹
BLUP b̂ = GZᵀV⁻¹(y - Xβ̂)
ICC (2-level) ρ = σ²_u/(σ²_u + σ²_ε) Design effect Deff = 1 + (n̄-1)·ρ
Information criteria AIC = -2ℓ + 2k BIC = -2ℓ + k·ln(n) LRT G² = -2Δℓ ~ χ²_Δk
Based on: West, Welch & Gałecki — Linear Mixed Models: A Practical Guide Using Statistical Software, 3rd Ed. · With contributions from Gillespie

Post a Comment

0Comments

Post comment

Post a Comment (0)

#buttons=(Accept !) #days=(20)

Our website uses cookies to enhance your experience. Learn More
Accept !