Linear Mixed Models — Comprehensive Reference Cheatsheet

West, Welch & Gałecki · 3rd Edition · With contributions from Gillespie

§1 · Core Concepts & Motivation

Why LMM?

Data are clustered / nested / grouped — violating OLS independence assumption
Handles unbalanced designs (unequal group sizes, missing data) gracefully
Accounts for within-group correlation explicitly
Allows both subject-specific and population-average inference
Handles repeated measures and longitudinal data correctly
Flexible modeling of heterogeneous variance

General LMM Equation

Matrix form (Laird & Ware, 1982) y = Xβ + Zb + ε

Symbol	Meaning
`y`	n×1 response vector
`X`	n×p design matrix (fixed effects)
`β`	p×1 fixed-effect parameter vector
`Z`	n×q design matrix (random effects)
`b`	q×1 random-effect vector, b ~ N(0, G)
`ε`	n×1 residual vector, ε ~ N(0, R)

Marginal distribution y ~ N(Xβ, ZGZᵀ + R) ≡ N(Xβ, V)

Key Assumptions

Normality: b ~ N(0, G) and ε ~ N(0, R)
Independence: b ⊥ ε (random effects independent of residuals)
Linearity: E[y|X, Z] = Xβ + Zb
Homoscedasticity: unless modeled otherwise in R
Correct covariance structure for R and G
No measurement error in X and Z

Violations of normality are less critical for fixed effects (CLT applies) but matter more for BLUP accuracy

Estimation Methods

Method	Full Name	Use When
`ML`	Maximum Likelihood	Comparing models with different fixed effects
`REML`	Restricted ML	Comparing models with same fixed effects; default for variance estimation
`BLUP`	Best Linear Unbiased Predictor	Predicting random effects (Eb\|y)
`BLUE`	Best Linear Unbiased Estimator	Fixed effects: β̂ = (XᵀV⁻¹X)⁻¹XᵀV⁻¹y

REML cannot be used to compare models differing in fixed effects — use ML for that

Marginal vs. Conditional

Marginal (Population-Average) Model

Describe average response in population; random effects integrated out. Focus on Xβ. Useful for policy, group comparisons.

Conditional (Subject-Specific) Model

Describe response for a specific subject/cluster given their random effects b. Focus on Xβ + Zb. Useful for individual predictions.

LMMs are naturally both — β gives population-level inference, BLUPs give individual predictions

Terminology Map

Context	= LMM Term
HLM Level-1 residuals	ε (within-cluster)
HLM Level-2 residuals	b (between-cluster)
Random intercept model	Z = 1, G = σ²_b (scalar)
Growth curve model	Z includes time; random slope
Repeated measures ANOVA	Special case of LMM with compound symmetry
Split-plot design	2-level LMM with whole-plot & subplot error

§2 · Fixed vs. Random Effects

Fixed Effects (β)

Parameters that are constant across the population
Levels are exhaustive — we care about those specific levels
Inference is to those specific levels only
Estimated by GLS: β̂ = (XᵀV⁻¹X)⁻¹XᵀV⁻¹y
Examples: treatment group, time point, age, sex, dose
Can include interactions between fixed predictors

Var(β̂) = (XᵀV⁻¹X)⁻¹ [estimated by plugging in V̂]

Random Effects (b)

Factor levels are a random sample from a larger population
Inference extends to the entire population of levels
Not "estimated" — predicted via BLUPs (shrinkage estimators)
Characterized by their variance components (G matrix)
Examples: subjects, schools, clinics, batches, farms
Random effects shrink toward 0 — borrow strength

BLUP formula b̂ = GZᵀV⁻¹(y - Xβ̂)

Decision Guide: Fixed or Random?

Question	Fixed	Random
Levels exhaustive?	Yes ✓	No (sample)
Inference to other levels?	No	Yes ✓
Number of levels	Any	Prefer ≥ 5–6
Interest in variance?	No	Yes ✓
Predict individual effects?	No	Yes ✓
Treatment conditions	Yes ✓	Rare
Blocking factor	Sometimes	Preferred ✓

G Matrix — Random Effect Covariance

Random intercept only G = σ²_u · I

Random intercept + slope (unstructured) G = | σ²_u₀ σ_u₀u₁ | | σ_u₀u₁ σ²_u₁ |

Multiple random effects (general) G = Cov(b) [q × q positive definite matrix]

σ²_u₀: variance of random intercepts
σ²_u₁: variance of random slopes
σ_u₀u₁: covariance between intercepts and slopes

§3 · Covariance Structures (R matrix)

Independence (ID)

R = σ²·I

Default; residuals uncorrelated
Valid for simple random intercept models
1 parameter: σ²

Compound Symmetry (CS)

Var(y_ij) = σ² + σ²_b Cov(y_ij, y_ik) = σ²_b [j ≠ k]

All pairs equally correlated: ρ = σ²_b/(σ²_b + σ²)
Implied by pure random intercept model
Equivalent to repeated measures ANOVA
2 parameters: σ², σ²_b

Unstructured (UN)

R = Σ [free n×n pos.def. matrix]

Most flexible; separate variance + all covariances
Parameters: n(n+1)/2 — grows rapidly
Risk of non-convergence with small samples
Use when no theoretical structure assumed

AR(1) — Autoregressive

Corr(y_ij, y_ik) = ρ^|j-k| Cov = σ² · ρ^|t_j - t_k|

Correlation decays with time lag
Appropriate for equally-spaced longitudinal data
2 parameters: σ², ρ (|ρ| < 1)
Parsimonious; popular in practice

Toeplitz (TOEP)

Corr(y_ij, y_ik) = ρ_|j-k| [separate ρ per lag]

Correlation depends on lag only
More flexible than AR(1)
Parameters: σ² + (n−1) correlation params
Between CS and UN in complexity

Heterogeneous Structures

CSH: Var(y_ij) = σ²_j [heterogeneous CS] ARH(1): separate σ²_j + AR(1) correlation

Allow variance to differ by time point / group
Suffix H = heterogeneous version
More parameters but handle unequal variances

Covariance Structure Selection Strategy

AIC — Akaike Information Criterion AIC = -2ℓ + 2p

BIC — Bayesian (Schwarz) Information Criterion BIC = -2ℓ + p·ln(n)

Fit candidate structures with same fixed effects using REML
Compare AIC/BIC — smaller is better
Use LRT only for nested covariance structures
For non-nested structures: use AIC/BIC only
Prefer parsimony when AIC differences are small (<2–4)

Structure	Params	Nested in UN?
ID	1	Yes
CS	2	Yes
AR(1)	2	Yes
TOEP	n	Yes
UN	n(n+1)/2	—
CSH	n+1	Yes

§4 · Intraclass Correlation Coefficient (ICC)

ICC Definition & Interpretation

2-level random intercept model ICC = ρ = σ²_u / (σ²_u + σ²_ε)

Proportion of total variance due to between-cluster differences
= correlation between two observations in the same cluster
Range: 0 ≤ ICC ≤ 1
ICC = 0 → no clustering; OLS is fine
ICC = 1 → all variance between clusters

Typical ICC values: education 0.05–0.30; clinical 0.01–0.05; psychology 0.10–0.50

ICC for 3-Level Models

3-level: students in classes in schools ICC₂ = σ²_v / (σ²_v + σ²_u + σ²_ε) ICC₃ = (σ²_v + σ²_u) / (σ²_v + σ²_u + σ²_ε)

ICC	Numerator	Interpretation
ICC₂	σ²_v	% variance at level 3 (school)
ICC₃	σ²_v + σ²_u	% variance at levels 2+3

Correlation between level-1 units in same level-2 cluster ρ₂ = σ²_u / (σ²_u + σ²_ε) [within school, different class] Same level-2 and level-3 cluster ρ₃ = (σ²_v + σ²_u) / (σ²_v + σ²_u + σ²_ε)

Design Effect (Deff)

Effective sample size reduction Deff = 1 + (n_j - 1) · ICC

Effective N n_eff = N / Deff

n_j = average cluster size
Deff > 1 → clustered sample less efficient than simple random
Ignoring ICC inflates Type I error
Used for sample size planning

Even ICC = 0.05 with cluster size 20 → Deff = 1.95 (halves effective N)

§5 · Two-Level Models (Hierarchical / Nested)

Model Taxonomy — 2 Levels

MODEL 1 — Unconditional Means (Null)

y_ij = γ_00 + u_0j + e_ij
No predictors. Partitions variance. Compute ICC.

↓

MODEL 2 — Random Intercept, Level-1 predictor

y_ij = γ_00 + γ_10·X_ij + u_0j + e_ij
Fixed slope for X; random intercept across clusters

↓

MODEL 3 — Random Intercept + Level-2 predictor

y_ij = γ_00 + γ_01·W_j + γ_10·X_ij + u_0j + e_ij
W_j explains variance in intercepts across clusters

↓

MODEL 4 — Random Slopes

y_ij = γ_00 + γ_10·X_ij + u_0j + u_1j·X_ij + e_ij
Slope of X varies across clusters; test cov(u_0j, u_1j)

↓

MODEL 5 — Cross-Level Interaction

y_ij = γ_00 + γ_01·W_j + γ_10·X_ij + γ_11·W_j·X_ij + u_0j + u_1j·X_ij + e_ij
W_j moderates effect of X on y (slope-as-outcome)

HLM Notation — 2 Levels

Level 1 (within cluster — observations) y_ij = β_0j + β_1j·X_ij + e_ij e_ij ~ N(0, σ²)

Level 2 (between clusters) β_0j = γ_00 + γ_01·W_j + u_0j β_1j = γ_10 + γ_11·W_j + u_1j ⎡u_0j⎤ ~ N(0, G) = N(0, ⎡τ_00 τ_01⎤) ⎣u_1j⎦ ⎣τ_01 τ_11⎦

Combined (reduced-form) equation y_ij = γ_00 + γ_01·W_j + γ_10·X_ij + γ_11·W_j·X_ij + u_0j + u_1j·X_ij + e_ij

HLM notation and LMM matrix notation are equivalent — same model, different presentation

KEY VARIANCE COMPONENTS

σ² = level-1 residual variance (within clusters)
τ_00 = variance of random intercepts (between clusters)
τ_11 = variance of random slopes
τ_01 = covariance of intercepts and slopes

Centering Strategies

Grand Mean Centering (GMC)

X*_ij = X_ij - X̄..

Interpret intercept as group mean at avg predictor
Fixed-effect estimate = total effect (within + between)
Use for: macro-level questions, cross-level interactions
Random intercept now = deviation from grand mean

Group Mean Centering (CWC)

X*_ij = X_ij - X̄_j

Removes between-group variation from level-1 predictor
Within-group effect only
Add W̄_j (group mean) at level 2 to recover between effect
Separates within from between effects cleanly
Preferred for individual-level questions

No Centering

X*_ij = X_ij

Intercept = predicted y when X = 0 (meaningful only if 0 in range)
Estimate = total effect conflated
OK if 0 is natural/meaningful (e.g., time = 0 at baseline)
Avoid for cross-level interactions — creates multicollinearity

Centering choice affects interpretation of intercept variance (τ_00)!

§6 · Three-Level Models

3-Level HLM Notation

Level 1 — within clusters (e.g., students) y_ijk = π_0jk + π_1jk·a_ijk + e_ijk e_ijk ~ N(0, σ²)

Level 2 — classroom (j in school k) π_0jk = β_00k + β_01k·X_jk + r_0jk π_1jk = β_10k + β_11k·X_jk + r_1jk ⎡r_0jk⎤ ~ N(0, T_π) ⎣r_1jk⎦

Level 3 — school (k) β_00k = γ_000 + γ_001·W_k + u_00k β_10k = γ_100 + γ_101·W_k + u_10k ⎡u_00k⎤ ~ N(0, T_β) ⎣u_10k⎦

3-Level Variance Components

Component	Symbol	Represents
Level-1 variance	`σ²`	Within-classroom variation
Level-2 intercept var	`τ²_π00`	Between-classroom, within-school
Level-2 slope var	`τ²_π11`	Slope variance at classroom level
Level-3 intercept var	`τ²_β00`	Between-school variation in means
Level-3 slope var	`τ²_β10`	Slope variance at school level

ICC at level 2 (classroom) ICC₂ = τ²_π00 / (τ²_π00 + τ²_β00 + σ²) ICC at level 3 (school) ICC₃ = τ²_β00 / (τ²_π00 + τ²_β00 + σ²)

Cross-Level Interactions — 3 Levels

L1 × L2: Level-1 predictor moderated by level-2 variable (most common)
L1 × L3: Level-1 predictor moderated by level-3 variable
L2 × L3: Level-2 predictor moderated by level-3 variable
L1 × L2 × L3: Three-way cross-level interaction (rare)

Example: L1 × L3 cross-level interaction y_ijk = γ_000 + γ_100·a_ijk + γ_001·W_k + γ_101·W_k·a_ijk + ...

Always include lower-order terms when modeling cross-level interactions

3-Level vs. 2-Level: When to Use

Condition	Action
Data has 3 natural levels	Use 3-level model
ICC₃ ≈ 0 (negligible school variance)	Collapse to 2-level (ignore school)
Very few level-3 units (<5)	Treat level 3 as fixed effect
Predictors at all 3 levels	3-level needed
Cross-level interactions involving L3	3-level needed
Research Q about L3 variance	3-level needed

§7 · Hypothesis Testing

Tests for Fixed Effects

t-test (single parameter) t = β̂_p / SE(β̂_p) ~ t_df [Satterthwaite or KR df] H₀: β_p = 0

F-test (joint test of multiple parameters) F = (Lβ̂)ᵀ [L(XᵀV⁻¹X)⁻¹Lᵀ]⁻¹ (Lβ̂) / r ~ F(r, df) [r = rank of L] H₀: Lβ = 0

Likelihood Ratio Test (nested models, ML) LRT = -2(ℓ_reduced - ℓ_full) ~ χ²_df df = difference in # fixed effect parameters

Use Kenward-Roger (KR) or Satterthwaite df approximation for small samples — avoid asymptotic df

Tests for Random Effects / Variance Components

LRT for variance components (REML) LRT = -2(ℓ_REML_reduced - ℓ_REML_full) H₀: σ²_u = 0 (e.g., no random intercept needed)

Null hypothesis on the boundary of parameter space (σ² ≥ 0)
Standard χ² is too conservative — p-value should be halved
For single variance: use mixture χ² = ½χ²₀ + ½χ²₁
For multiple df: corrections more complex (Self & Liang)

Wald test for variance (less preferred) Z = σ̂²_u / SE(σ̂²_u) ~ N(0,1) [approximate]

Always use REML (not ML) when comparing models that differ only in random effects structure

Degrees of Freedom Approximations

Method	When to Use	Notes
Satterthwaite	Default in SAS PROC MIXED, R lmerTest	Fast; adequate for most cases
Kenward-Roger	Small samples, complex designs	More accurate; adjusts SE too
Between-Within	Simple balanced designs	Splits df into between/within subject
Containment	Nested designs	Classic ANOVA-like approach
Residual	Large samples only	Uses n-p df; OK when N huge

Type I / II / III SS Tests

Type	Tests	Use When
Type I	Sequential — order matters	A priori hierarchical models
Type II	Each effect adjusted for all others at same or lower level	No interaction present
Type III	Each effect adjusted for ALL others including interactions	Default; unbalanced data with interactions

Type III tests require correct sum-to-zero (or GLM) contrasts. Always verify coding scheme.

Kenward-Roger Adjustment

Adjusts both the F-statistic and the denominator df
Accounts for uncertainty in estimating variance components
Uses a scaled Wald statistic with inflated standard errors
Preferred when: small number of clusters, complex covariance, REML
Available in SAS PROC MIXED (DDFM=KR), R pbkrtest, lmerTest

KR adjusted test statistic F_KR = (Lβ̂)ᵀ Φ̂⁻¹ (Lβ̂) / r ~ F(r, ν_KR) [Φ̂ = KR-adjusted covariance of Lβ̂]

LRT: ML vs. REML Rules

Comparison	Method	Justification
Fixed effects differ	ML	REML ℓ not comparable across fixed effects
Random effects differ, same fixed	REML	REML accounts for fixed-effect uncertainty
Nested covariance structures	REML	Same fixed effects — valid comparison
AIC/BIC comparison	REML (same fixed)	Comparable on same scale

§8 · Contrasts & Estimable Functions

Types of Contrasts

Type	Definition	Example
Pairwise	Compare two means: μ_i - μ_j	Treatment A vs. B
Helmert	Each level vs. mean of subsequent	Time 1 vs. avg(T2,T3)
Polynomial	Linear, quadratic, cubic trends	Dose-response curve
Deviation	Each level vs. grand mean	Factor effect coding
Simple	Each vs. reference category	Dummy coding
User-defined	Custom L matrix	Any specific hypothesis

Estimable Functions & L Matrix

Contrast: scalar hypothesis Lβ = c where L is a 1×p row vector H₀: Lβ = 0 Estimate: Lβ̂ = c̃ SE = √[L(XᵀV⁻¹X)⁻¹Lᵀ]

Multiple contrasts simultaneously H₀: Lβ = 0 (L is r×p, rank r) F = (Lβ̂)ᵀ[L·Var(β̂)·Lᵀ]⁻¹(Lβ̂) / r

A function Lβ is estimable if L is in the row space of X (i.e., L = LHX for hat matrix H_X)

Planned vs. Post-hoc Comparisons

Planned (a priori) Contrasts

Specified before seeing data. No need to adjust for multiple comparisons if contrasts are orthogonal and pre-specified. Use t or F tests directly.

Post-hoc (exploratory) Comparisons

Adjust for multiple testing. Choose from: Bonferroni, Tukey HSD (all pairwise), Scheffé (any contrast), Holm (step-down), Benjamini-Hochberg (FDR).

Multiple Comparison Adjustments

Method	Controls	Best For
Bonferroni	FWER	Few planned comparisons; conservative
Holm	FWER	More powerful than Bonferroni
Tukey HSD	FWER (all pairs)	All pairwise means; balanced
Scheffé	FWER (any contrast)	Exploratory; very conservative
BH (FDR)	FDR	Many comparisons; genomics, etc.
Simulation	FWER (exact)	Complex designs; SAS SIMULATE

Least Squares Means (LSMeans) in LMMs

LSMeans = population marginal means — average over nuisance factors
= Lβ̂ for specific L matrix averaging over covariates at their means
Account for imbalance in the design (unlike simple cell means)
Default in SAS PROC MIXED (LSMEANS statement)
In R: use emmeans package

LSMean for group i μ̂_i = L_i β̂ where L_i averages over all other factors

Pairwise difference μ̂_i - μ̂_j = (L_i - L_j)β̂ SE² = (L_i-L_j) Var(β̂) (L_i-L_j)ᵀ

LSMeans are preferred over raw means when the design is unbalanced or when covariates are present

Adjust for multiple LSMean comparisons LSMEANS trt / ADJUST=TUKEY; LSMEANS trt / ADJUST=BON; LSMEANS trt / PDIFF;

§9 · Hierarchical Model Building Process

2-Level Model Building Steps

Unconditional Means Model (Null) — No predictors. Fit with REML. Compute ICC = σ²_u / (σ²_u + σ²_e). Justifies multilevel approach if ICC > 0.
Unconditional Growth / Level-1 Predictors — Add level-1 predictors (e.g., time) with fixed effects. Keep random intercepts. REML. Compare AIC/BIC to Step 1.
Random Slopes — Allow slopes of key level-1 predictors to vary randomly. Test variance τ_11 using LRT (REML). Keep if significant or theoretically motivated.
Level-2 Predictors — Add group-level (level-2) predictors to explain intercept (and slope) variance. Switch to ML for LRT comparing fixed effects.
Cross-Level Interactions — Test whether level-2 variables moderate level-1 effects (slope-as-outcome). Use ML for model comparison.
Final Model — Refit with REML for final variance component estimates. Report fixed effects, variance components, ICC, fit indices.

3-Level Model Building Steps

Unconditional Means (3-level null) — Fit with three random intercepts (level 1, 2, 3). Compute ICC₂ and ICC₃. REML.
Level-1 Predictors — Add individual-level (L1) predictors. Test fixed slopes. Consider centering. REML.
Random Slopes at Level 2 — Allow L1 slopes to vary across L2 units. Test τ²_π11 with LRT (REML). Half the χ² p-value.
Random Slopes at Level 3 — Allow L1 slopes to vary across L3 units. Test τ²_β10 with LRT (REML).
Level-2 Predictors — Add classroom-level predictors (L2) to explain L2 variance in intercepts and slopes. ML for LRT.
Level-3 Predictors — Add school-level predictors (L3). ML for comparison.
Cross-Level Interactions — L1×L2, L1×L3, L2×L3. ML for comparison.
Final Refit with REML — Final estimates and inference. Report all variance components at each level.

Model Comparison Toolkit

Tool	Formula	Use
AIC	-2ℓ + 2k	Non-nested; smaller = better
BIC	-2ℓ + k·ln(n)	Penalizes params more; smaller = better
LRT	-2Δℓ ~ χ²_Δk	Nested models only
Pseudo-R²	1 - σ²_reduced/σ²_null	Variance explained at each level
ΔAIC < 2	—	Models essentially equivalent
ΔAIC 2–7	—	Some support for better model
ΔAIC > 10	—	Clear preference for better model

Pseudo-R² at Each Level

Snijders & Bosker (1994) — Level-1 R² R²₁ = 1 - (σ²_new + τ²_new) / (σ²_null + τ²_null)

Level-2 R² (variance in intercepts explained) R²₂ = 1 - τ²_new / τ²_null

Proportion of variance explained (each level) PVE_L1 = (σ²_null - σ²_model) / σ²_null PVE_L2 = (τ²_null - τ²_model) / τ²_null

Pseudo-R² can decrease when adding predictors in LMMs. Use cautiously and do not interpret as OLS R².

§10 · Longitudinal Data Analysis

Longitudinal LMM Framework

Individual growth curve model y_ij = (β_0 + u_0i) + (β_1 + u_1i)·t_ij + β_2·X_i + β_3·X_i·t_ij + ε_ij

Term	Meaning
`β_0 + u_0i`	Individual baseline (random intercept)
`β_1 + u_1i`	Individual growth rate (random slope)
`β_2`	Between-person covariate effect at baseline
`β_3`	Covariate × time interaction (moderates growth)
`ε_ij`	Within-person residual (serial correlation)

Center time at baseline (t=0) so the intercept = baseline mean, and random intercept variance = between-person baseline variability

Time Specification Strategies

Time Model	Code	When
Linear	`time`	Constant rate of change
Quadratic	`time + time²`	Accelerating/decelerating change
Cubic	`time + time² + time³`	More complex nonlinearity
Piecewise	Splines / dummy slopes	Change point exists
Categorical	Factor(time)	Unstructured change pattern
Log time	`log(time)`	Rapid early change, plateau

Piecewise linear (change at time t*) y_ij = β_0 + β_1·t_ij + β_2·(t_ij - t*)_+ + u_0i + u_1i·t_ij + ε_ij

Choosing R Matrix for Longitudinal Data

Start with

Random intercept + slope (models within-person correlation implicitly)

↓ if residuals still correlated

Add serial correlation to R

AR(1) for equally-spaced time; SP(POW) for unequal spacing

↓ if variances change over time

Allow heterogeneous variance

ARH(1) or UN structure; or model variance as function of time

↓ fully unstructured

Unstructured (UN)

Separate variance at each time + all covariances; most general

Handling Missing Data in Longitudinal LMMs

Missing Type	Definition	LMM Valid?
MCAR	Missing completely at random	Yes ✓
MAR	Missing at random (given observed data)	Yes ✓ (REML)
MNAR	Missing not at random (informative)	Biased — needs selection/pattern mixture model

LMMs use all available data — no listwise deletion needed
Handles unbalanced time points naturally
Each person can have different observation times
Valid under MAR; check dropout patterns

Test MAR assumption: Does time of dropout predict outcomes? Use pattern mixture models if MNAR suspected.

Time-Varying vs. Time-Invariant Covariates

Type	Varies?	Level	Example
Time-invariant	No	Between-person (L2)	Sex, baseline age, group
Time-varying	Yes	Within-person (L1)	Current BMI, mood, medication
Time itself	Yes	L1 (in 2-level)	Week, day, visit number

Time-varying covariate decomposition (Curran & Bauer) X_ij = X̄_i + (X_ij - X̄_i) = between-person + within-person component

Center time-varying covariates within-person to separate within from between effects

Autoregressive / Transition Models

Lagged response as predictor y_ij = β_0 + β_1·y_i(j-1) + β_2·X_ij + u_0i + ε_ij

Regress current outcome on previous value
β_1 = autoregressive coefficient (stability)
Accounts for serial dependence explicitly
Useful for studying short-term change
Biased if u_0i correlated with lagged y — use system GMM or within-person centering

Including lagged y as predictor and random intercept simultaneously can cause Nickell bias in short panels

§11 · Software Reference (SAS, R, Stata, SPSS)

SAS PROC MIXED — Key Syntax

Basic 2-level random intercept PROC MIXED DATA=mydata METHOD=REML; CLASS school; MODEL score = time group time*group / SOLUTION DDFM=KR; RANDOM INTERCEPT / SUBJECT=school TYPE=VC; REPEATED / SUBJECT=school TYPE=AR(1); LSMEANS group / PDIFF ADJUST=TUKEY; ESTIMATE 'Trt vs Ctrl at T1' group 1 -1 time*group 1 0 -1 0; RUN;

Random slope model RANDOM INTERCEPT time / SUBJECT=id TYPE=UN; [TYPE=UN allows covariance between intercept and slope]

3-level: students in classes in schools RANDOM INTERCEPT / SUBJECT=school; RANDOM INTERCEPT / SUBJECT=class(school);

R — lme4 & nlme Syntax

lme4 — random intercept library(lme4); library(lmerTest) m1 <- lmer(score ~ time*group + (1|school), data=d, REML=TRUE) summary(m1) # KR df via lmerTest

lme4 — random intercept + slope m2 <- lmer(score ~ time + (1 + time|id), data=d) m2b <- lmer(score ~ time + (1|id) + (0+time|id), data=d) # m2b = uncorrelated random effects

nlme — custom covariance structure library(nlme) m3 <- lme(score ~ time, random = ~1|school, correlation = corAR1(form=~time|id), data=d, method="REML")

emmeans for LSMeans / contrasts library(emmeans) emmeans(m1, ~ group | time) pairs(emmeans(m1, ~ group), adjust="tukey")

Key R Functions Cheatsheet

Function	Package	Purpose
`lmer()`	lme4	Fit LMM (no p-values by default)
`lme()`	nlme	LMM with flexible covariance
`summary()`	base	Fixed effects, variance components
`anova(m1, m2)`	base	LRT between nested models
`AIC(), BIC()`	base	Information criteria
`emmeans()`	emmeans	LSMeans, contrasts
`icc()`	performance	ICC calculation
`r2()`	performance	Conditional/marginal R²
`check_model()`	performance	Assumption diagnostics
`VarCorr()`	lme4	Extract variance components
`ranef()`	lme4	Extract BLUPs
`fixef()`	lme4	Extract fixed effects
`simulate()`	lme4	Parametric bootstrap
`PBmodcomp()`	pbkrtest	Parametric bootstrap LRT
`KRmodcomp()`	pbkrtest	Kenward-Roger test

Stata & SPSS Commands

Stata — mixed mixed score time##group || school: || id:time, reml estat icc margins group, at(time=(0 1 2)) test 2.group = 3.group

SPSS — MIXED MIXED score BY group WITH time /FIXED = group time group*time | SSTYPE(3) /RANDOM INTERCEPT | SUBJECT(school) COVTYPE(VC) /REPEATED = time | SUBJECT(id) COVTYPE(AR1) /EMMEANS = TABLES(group) COMPARE ADJ(BONFERRONI).

Python — statsmodels import statsmodels.formula.api as smf model = smf.mixedlm("score ~ time*group", data=df, groups=df["school"], re_formula="~time") result = model.fit(reml=True)

§12 · Assumptions, Diagnostics & Remedies

Normality of Residuals (ε)

QQ-plot of marginal residuals (y - Xβ̂)
QQ-plot of conditional residuals (y - Xβ̂ - Zb̂)
Shapiro-Wilk test (small n only; over-powered for large n)
Histogram of residuals

Remedy Transform y (log, sqrt, Box-Cox) Or use robust SE estimation Or use nonparametric bootstrap

Normality of Random Effects (b)

QQ-plot of BLUPs (empirical Bayes estimates)
Check for outliers in BLUP distribution
Difficult to assess with few clusters (<30)
Fixed effects estimates robust to this violation

R: QQ-plot of BLUPs qqnorm(ranef(m)$school[,1]) qqline(ranef(m)$school[,1])

Homoscedasticity

Plot residuals vs. fitted values
Plot residuals vs. each predictor
Levene's test within groups
If violated: model heterogeneous R (CSH, ARH)
Or use weighted LMM
Or model variance as function of a covariate

Influential Observations

Measure	Detects
Cook's D	Influential observations on fixed effects
DFFITS	Influence on fitted values
Leverage (h_ii)	High-leverage points
MDFFITS	Multivariate cluster-level influence
COVTRACE/COVRATIO	Influence on covariance of β̂

In LMMs, influence can be at the observation OR cluster level — check both

Linearity & Functional Form

Plot residuals vs. each continuous predictor
Added Variable Plots (partial regression plots)
Component + Residual plots (CERES)
Add polynomial terms if nonlinearity detected
Consider splines for flexible modeling
Use GAMMs if strong nonlinearity

Multicollinearity

Variance Inflation Factor (VIF) for fixed effects
VIF > 10 (or 5): potential problem
Condition number of X'X: >30 concerning
Remedies: centering, orthogonalization, ridge regression, PCA
Cross-level interactions are prone to multicollinearity — center predictors

R: check VIF car::vif(lm(y ~ X1 + X2 + ..., data=d)) # VIF for fixed effects part

§13 · Full Reporting Checklist & Quick Reference

Reporting Checklist for LMM

State model equation clearly (HLM or matrix notation)
Report all fixed effects: estimate, SE, t/F, df (method), p-value, CI
Report all variance components: σ², τ_00, τ_11, τ_01 with SE or CI
Report ICC (and ICC at each level for 3-level)
State estimation method (ML vs REML) and software
Report df approximation method (KR, Satterthwaite, etc.)
Report sample sizes at all levels (N subjects, J clusters, K schools)
Report covariance structure of R and G matrices with justification
Report model selection criteria (AIC/BIC for final choice)
Report assumption checks and any violations
Report LSMeans or contrasts with multiplicity adjustment
Report pseudo-R² at each level if possible

Common Pitfalls & Remedies

Pitfall	Remedy
Using ML for variance component inference	Switch to REML
Using REML for fixed effects LRT	Switch to ML
Using standard χ² for testing σ²=0	Halve the p-value (boundary test)
Ignoring clustering (OLS on clustered data)	Use LMM or cluster-robust SE
Too few clusters for random effects (<5)	Treat as fixed effect instead
Not centering in cross-level interactions	Center level-1 predictors
Convergence failure	Scale predictors; simpler G; different optimizer
Singular fit (boundary estimate)	Simplify random effects structure
Treating time as factor unnecessarily	Model as continuous if monotone trend
Ignoring serial correlation in longitudinal data	Model R with AR(1) or add R structure

Effect Size Measures

Measure	Formula	Interpretation
Marginal R² (R²_m)	Var(Xβ̂) / Var(Xβ̂ + Zb̂ + ε̂)	Variance explained by fixed effects
Conditional R² (R²_c)	Var(Xβ̂ + Zb̂) / total var	Variance explained by entire model
Cohen's d (LMM)	γ_fixed / √(σ²_total)	Standardized fixed effect
PVE at L1	(σ²_null - σ²_model) / σ²_null	Within-cluster variance explained
PVE at L2	(τ²_null - τ²_model) / τ²_null	Between-cluster variance explained

Nakagawa & Schielzeth (2013) — R for lme4 performance::r2(model) # returns R²_marginal and R²_conditional

Minimum Cluster Requirements

Purpose	Min. Clusters (L2)	Notes
Estimate ICC reliably	≥ 30	More is always better
Random intercept only	≥ 10–20	Minimum for stable estimates
Random intercept + slope	≥ 30–50	More variance components to estimate
Cross-level interaction	≥ 50	Need adequate power
3-level (L3 units)	≥ 20 L3 units	With ≥ 3–5 L2 per L3

Rule of thumb: 30/30 rule — at least 30 clusters with at least 30 obs each for stable estimates (Maas & Hox)

Master Formula Reference

LMM general y = Xβ + Zb + ε b ~ N(0, G) ε ~ N(0, R) y ~ N(Xβ, V=ZGZᵀ+R)

GLS estimator β̂ = (XᵀV⁻¹X)⁻¹XᵀV⁻¹y Var(β̂) = (XᵀV⁻¹X)⁻¹

BLUP b̂ = GZᵀV⁻¹(y - Xβ̂)

ICC (2-level) ρ = σ²_u/(σ²_u + σ²_ε) Design effect Deff = 1 + (n̄-1)·ρ

Information criteria AIC = -2ℓ + 2k BIC = -2ℓ + k·ln(n) LRT G² = -2Δℓ ~ χ²_Δk

Facebook SDK (Plugin)

Linear mixed model

Why LMM?

General LMM Equation

Key Assumptions

Estimation Methods

Marginal vs. Conditional

Terminology Map

Fixed Effects (β)

Random Effects (b)

Decision Guide: Fixed or Random?

G Matrix — Random Effect Covariance

Independence (ID)

Compound Symmetry (CS)

Unstructured (UN)

AR(1) — Autoregressive

Toeplitz (TOEP)

Heterogeneous Structures

Covariance Structure Selection Strategy

ICC Definition & Interpretation

ICC for 3-Level Models

Design Effect (Deff)

Model Taxonomy — 2 Levels

HLM Notation — 2 Levels

Centering Strategies

3-Level HLM Notation

3-Level Variance Components

Cross-Level Interactions — 3 Levels

3-Level vs. 2-Level: When to Use

Tests for Fixed Effects

Tests for Random Effects / Variance Components

Degrees of Freedom Approximations

Type I / II / III SS Tests

Kenward-Roger Adjustment

LRT: ML vs. REML Rules

Types of Contrasts

Estimable Functions & L Matrix

Planned vs. Post-hoc Comparisons

Multiple Comparison Adjustments

Least Squares Means (LSMeans) in LMMs

2-Level Model Building Steps

3-Level Model Building Steps

Model Comparison Toolkit

Pseudo-R² at Each Level

Longitudinal LMM Framework

Time Specification Strategies

Choosing R Matrix for Longitudinal Data

Handling Missing Data in Longitudinal LMMs

Time-Varying vs. Time-Invariant Covariates

Autoregressive / Transition Models

SAS PROC MIXED — Key Syntax

R — lme4 & nlme Syntax

Key R Functions Cheatsheet

Stata & SPSS Commands

Normality of Residuals (ε)

Normality of Random Effects (b)

Homoscedasticity

Influential Observations

Linearity & Functional Form

Multicollinearity

Reporting Checklist for LMM

Common Pitfalls & Remedies

Effect Size Measures

Minimum Cluster Requirements

Master Formula Reference

You may like these posts

Post a Comment

#buttons=(Accept !) #days=(20)

Contact form