A factorial experiment has two or more factors, each with two or more levels, plus replication for each combination of factors levels.
This means that we can investigate statistical interactions, in which the response
to one factor depends on the level of another factor
example comes from a farm-scale trial of animal diets.
There are two factors: diet and supplement.
Diet is a factor with three levels: barley, oats and wheat.
Supplement is a factor with four levels: agrimore, control, supergain and supersupp.
weights <- read.table("c:\\temp\\growth.txt",header=T)The response variable is weight gain after 6 weeks.
attach(weights)
barplot(tapply(gain,list(diet,supplement),mean),
beside=T,ylim=c(0,30),col=c("orange","yellow","cornsilk"))
labs <- c("Barley","Oats","Wheat")
legend(locator(1),labs,fill= c("orange","yellow","cornsilk"))
tapply(gain,list(diet,supplement),mean)
Note that the second factor in the list (supplement) appears as groups of bars from left to right in
alphabetical order by factor level, from agrimore to supersupp.
The first factor (diet) appears as three levels within each group of bars: orange = barley, yellow = oats, cornsilk = wheat, again in alphabetical order by factor level.
We should really add a key to explain the levels of diet. Use locator(1) to find the coordinates for the top left corner of the box around the legend.
You need to increase the default scale on the y axis to make enough room for the legend box.
Now we use aov or lm to fit a factorial analysis of variance (the choice affects only whether we
get an ANOVA table or a list of parameters estimates as the default output from summary).
model <- aov(gain~diet*supplement)
summary(model)
summary.lm(model)
there are 12 estimated parameters (the number of rows in the table):
six main effects and six interactions.
the parameter labelled Intercept is the mean with both factor levels set to their first in the alphabet (diet=barley and supplement=agrimore).
All other rows are differences between means.
The output re-emphasizes that none of the interaction terms is even
close to significant, but it suggests that the minimal adequate model will require five parameters:
an intercept,
a difference due to oats,
a difference due to wheat,
a difference due to control and
difference due to supergain (these are the five rows with significance stars).
This draws attention to the main shortcoming of using treatment contrasts as the default.
If you look carefully at the table, you will see that the effect sizes of two of the supplements, control and supergain, are not significantly different from one another.
You need lots of practice at doing t tests in your head, to be able to do this quickly.
Ignoring the signs (because the signs are negative for both of them), we have 3.05 vs. 3.88, a difference of 0.83. But look at the associated standard errors (both 0.927);
the difference is less than 1 standard error of a difference between two means.
The rows get starred in the significance column because treatments contrasts
compare all the main effects in the rows with the intercept (where each factor is set to its first level in the alphabet, namely agrimore and barley in this case). When, as here, several factor levels are different from the intercept, but not different from one another, they all get significance stars.
This means that you cannot count up the number of rows with stars in order to determine the number of significantly different factor levels
The disadvantage of the ANOVA table is
that it does not show us the effect sizes, and does not allow us to work out how many levels of each of the
two factors are significantly different.
model <- aov(gain~diet+supplement)
summary.lm(model)
supp2 <- factor(supplement)
levels(supp2)
levels(supp2)[c(1,4)] <- "best"
levels(supp2)[c(2,3)] <- "worst"
levels(supp2)
model2 <- aov(gain~diet+supp2)
anova(model,model2)
summary.lm(model2)