explanatory variables: he called these fixed effects and random effects. It will take a good deal of practice
before you are confident in deciding whether a particular categorical explanatory variable should be treated
as a fixed effect or a random effect, but in essence:
fixed effects influence only the mean of y;
random effects influence only the variance of y
The important point is that because the random effects come from a large population, there is not much
point in concentrating on estimating means of our small subset of factor levels, and no point at all in comparing
individual pairs of means for different factor levels. Much better to recognize them for what they are, random
samples from a much larger population, and to concentrate on their variance. This is the added variation
caused by differences between the levels of the random effects
Variance components analysis is all about estimating the size of this variance, and working out its percentage
contribution to the overall variation. There are five fundamental assumptions of linear mixed-effects models:Within-group errors are independent with mean zero and variance σ2.
Within-group errors are independent of the random effects.
The random effects are normally distributed with mean zero and covariance matrix .
The random effects are independent in different groups.
The covariance matrix does not depend on the group.
The tricks with mixed-effects models are:
learning which variables are random effects;
specifying the fixed and random effects in the model formula;
getting the nesting structure of the random effects right;
remembering to get library(lme4) or library(nlme) at the outset.
The issues fall into two broad categories: questions about experimental design and the management of
experimental error (e.g. where does most of the variation occur, and where would increased replication
be most profitable?); and questions about hierarchical structure, and the relative magnitude of variation at
different levels within the hierarchy (e.g. studies on the genetics of individuals within families, families
within parishes, and parishes with counties, to discover the relative importance of genetic and phenotypic
variation)
Most ANOVA models are based on the assumption that there is a single error term. But in hierarchical
studies and nested experiments, where the data are gathered at two or more different spatial scales, there
is a different error variance for each different spatial scale. There are two reasonably clear-cut sets of
circumstances where your first choice would be to use a linear mixed-effects model: you want to do variance
components analysis because all your explanatory variables are categorical random effects and you do not
have any fixed effects; or you do have fixed effects, but you also have pseudoreplication of one sort or another
(e.g. temporal pseudoreplication resulting from repeated measurements on the same individuals; see p. 699).
To test whether one should use a model with mixed effects or just a plain old linear model, Douglas Bates
wrote in the R help archive: ‘I would recommend the likelihood ratio test against a linear model fit by lm.
The p-value returned from this test will be conservative because you are testing on the boundary of the
parameter space.