11.3.1 Split-plot experiments
In a split-plot experiment, different treatments are applied to plots of different sizes.
Each different plot size is associated with its own error variance, so instead of having one error variance (as in all the ANOVA tables up to this point), we have as many error terms as there are different plot sizes.
The analysis is presented as a series of component ANOVA tables, one for each plot size, in a hierarchy from the largest plot size with the lowest replication at the top, down to the smallest plot size with the greatest replication at the bottom.
yields <- read.table("c:\\temp\\splityield.txt",header=T)
attach(yields)names(yields)
The example refers to a designed field experiment on crop yield with three treatments:
irrigation(with two levels, irrigated or not),
sowing density (with three levels, low, medium and high), and
fertilizer application (with three levels, low, medium and high).
- The largest plots were the four whole fields (block), each of which was split in half, and
- irrigation was allocated at random to one half of the field.
- Each irrigation plot was split into three, and one of three different seed-sowing densities (low, medium or high) was allocated at random (independently for each level of irrigation and each block).
- Finally, each density plot was divided into three, and one of three fertilizer nutrient treatments (N, P, or N and P together) was allocated at random.
The issue with split-plot experiments is pseudoreplication.
Think about the irrigation experiment. There were four blocks, each split in half, with one half irrigated and the other as a control. The dataframe for an analysis of this experiment should therefore contain just 8 rows (not 72 rows as in the present case).
Therewould be seven degrees of freedom in total, three for blocks, one for irrigation and just 7 − 3 − 1 = 3 d.f.
for error.
model <-aov(yield~irrigation*density*fertilizer+Error(block/irrigation/density))
summary(model)
interaction.plot(fertilizer,irrigation,yield)
interaction.plot(density,irrigation,yield)
When there are one or more missing values (NA), then factors have effects in more than one stratum and
the same main effect turns up in more than one ANOVA table.
In such a case, use lme or lmer rather than aov. The output of aov is not to be trusted under these circumstances
11.3.2 Mixed-effects models
because the explanatory variables are a mixture of fixed effects and random effects:
fixed effects influence only the mean of y;
random effects influence only the variance of y.
A random effect should be thought of as coming from a population of effects: the existence of this population
is an extra assumption.
We speak of prediction of random effects, rather than estimation:
we estimate fixed effects from data, but we intend to make predictions about the population from which our random effects were sampled.
Fixed effects are unknown constants to be estimated from the data.
Random effects govern the variance–covariance structure of the response variable.
The fixed effects are often experimental treatments that were applied under our direction, and the random effects are either categorical or continuous variables that are distinguished by the fact that we are typically not interested in the parameter values, but only in the variance they explain.
One of more of the explanatory variables might represent grouping in time or in space.
Random effects that come from the same group will be correlated, and this contravenes one of the fundamental assumptions of standard statistical models: independence of errors. Mixed-effects models take care of this non-independence of errors by modelling the covariance structure introduced by the grouping of the data
Instead of estimating a mean for every single factor level, the random-effects
model estimates the distribution of the means (usually as the standard deviation of the differences of the
factor-level means around an overall mean)
Mixed-effects models are particularly useful in cases where there
is temporal pseudoreplication (repeated measurements) and/or spatial pseudoreplication (e.g. nested designs
or split-plot experiments). These models can allow for:
spatial autocorrelation between neighbours;
temporal autocorrelation across repeated measures on the same individuals;
differences in the mean response between blocks in a field experiment;
differences between subjects in a medical trial involving repeated measures
to waste precious degrees of freedom in estimating parameters for each of the separate levels of the categorical random variables. On the other hand, we do want to make use of the all measurements we have taken, but because of the pseudoreplication we want to take account of both the
correlation structure, used to model within-group correlation associated with temporal and spatial dependencies, using correlation, and
variance function, used to model non-constant variance in the within-group errors using weights