Lecture 12
Factorial ANOVA
Our initial
discussion of Analysis of Variance concerned the comparison of two or more
groups. Those groups can be considered different "levels" of one
independent variable. If we are comparing women to men, there are two levels of
gender represented. If the study is an experiment that compares a drug's effect
to a control group with no drug, there is one independent variable with two
levels: drug and no drug. If we add a third comparison group, say a placebo, we
still have one independent variable. We just have three different
representations of the independent variable. The drug, no drug, and the placebo
are considered different "levels" of the independent variable. A
factorial design (and hence, a factorial analysis) introduces another
independent variable to be studied simultaneously with the first independent
variable, and each independent variable can have two or more levels. Each of
the independent variables are now called factors.
Example
We previously discussed an experiment that examined the effect of a violent
incident on memory for details of an eyewitness account. In that study (based
loosely on studies by Elizabeth Loftus and colleagues), two groups of
participants are shown identical films of a bank robber. In the film, the bank
robber is shown getting money from the teller and then leaving the bank and
running off down the street. The two groups, which are randomly assigned, are
shown slightly different versions of the film, however. In one group, a very
brief clip is inserted in the film that shows the robber running out of the
bank. At that point, one group views a violent incident in which the robber
turns and shoots a police officer. In the initial experiment we discussed, the participants
who saw the version with the violent incident had poorer memory about the
characteristics of the robber. Let's assume that the dependent variable is a
memory test with 10 questions about the robber's identifying characteristics
(e.g., color of shirt, color of pants, approx. height, approx. age, race etc.).
The cells below
represent rough approximations of the mean scores on the memory test in the two
groups.
Study 1
|
No Violence |
Violence |
Memory performance |
High |
Low |
Let's say we
wanted to turn this study into a practical public program of some sort that
attempts to improve eyewitness memory for details. Because we could not
possibly train everyone who might be a potential witness to a crime, we might
pick out certain people who are more likely to witness a crime (e.g.,
convenience store owners, police officers). Because police officers are always
witnessing crimes and arrests, and they often have to testify about the
incident, we might try to improve their memory for criminal events. We could
develop a training course that attempted to improve the memory of police
officers for crime details by informing them about various potential fallacies
of eyewitness accounts. We then might conduct a second study that evaluates the
effectiveness of the training. In that study, we randomly assign police
officers to a training condition or to a condition in which they receive no
training. At the end of the training period, the police officers are shown a
film of a crime and given a memory performance test. If the training was
effective, they should have better memory of the incident.
Study 2
|
No Training |
Training |
Memory performance |
low |
high |
Instead of
conducting two separate experiments, we could combine them both. This would be
beneficial for several reasons. First, it would be more economical; it is
cheaper and easier to run one experiment instead of two. Second, we could look
for any combined effects of the two independent variables. For instance, it
might be that the training program would be effective for improving memory, but
that the benefits of the training do not extend to violent incidents. When a
violent incident occurs, trained police officers are just as likely to miss the
details, because the emotional event interferes with memory storage for them
also. In other words, although memory is generally improved by the training,
the circumstances under which it is effective are limited. If this was the
case, we would have the following outcome of our combined study.
|
No Training |
Training |
No Violence |
Med |
High |
Violence |
Low |
Low |
In this
hypothetical outcome, the training has an effect when there is not a violent
incident. However, if there is violence in the film clip, training makes no
difference in memory performance. In this sense, the effect on one independent
variable (training) depends on the level of the other independent variable
(violence). This is an example of an interaction. Interactions imply
that the two independent variables combine to have a different effect on the
dependent variable. Mathematically, their effects are multiplicative rather
than additive. Take the following hypothetical outcome as an example.
|
No Training |
Training |
No Violence |
Med |
High |
Violence |
Low |
Med |
Here, there is a
difference between the Training and No Training conditions in their effects,
but the effect of training is essentially the same in both the Violence and No
Violence condition. Training improves memory performance in all groups equally.
This is an example of an additive effect, not a multiplicative effect. If the
effect is simply additive, there is no interaction effect.
Main Effects
A "main effect" concerns the overall effect of one of the independent
variables. In examining the main effect of one independent variable, one
compares the difference between two groups after combining levels of the other
independent variable. In the above experiment, there are two possible main
effects: a main effect for training, and a main effect for violence. They are
"main" because they concern the overall or average effect of one of
the independent variables. Let's look at some hypothetical means that would
match our previous "additive" findings. Our scores on the dependent
variable may range from 0 to 10, so here are some possible means:
|
No Training |
Training |
(Marginal Means) |
No Violence |
6.6 |
8.6 |
(7.6) |
Violence |
2.0 |
6.0 |
(4.0) |
(Marginal Means) |
(4.3) |
(7.3) |
(6.1) |
In this example,
there is a main effect for the training variable, because, overall, training
improves memory. To see the main effect for memory, we have to average over the
two Violence/No Violence cells. We do that separately for the Training and No
Training condictions. What we wind up with is two marginal means: 4.3 overall
for the No Training conditions and 7.3 overall for the Training condition.
Because 7.3 is larger than 4.3, there seems to be a difference the Training and
No Training conditions (of course, we would have to test this for signficance).
The difference between these marginal means represents a main effect. To look
at the main effect for the Violence independent variable, we would compare the
Violence and No Violence combining all the participants in the No Training and
Training conditions. Overall, the Violence conditions had poorer memory than
the No Violence condition (4.0 vs. 7.6). So, in general, there seems to be two
main effects in this study, and, because the effects are additive, there is no
interaction between the two independent variables. Their effects on the
dependent variable are separate or independent.
Main effects
and interactions do not depend on one another. That is, there can be one or two
main effects, and an interaction can occur in combination with either main
effect, both main effects, or no main effects. An easy way to tell if there is
an interaction is to plot the four cell means on a graph. This can be done with
a line graph or a histogram, although I believe the line graph offers a simpler
interpretation. If the two lines are parallel, there is no interaction. If they
are not parallel, there is an interaction.
Click here for some examples of
possible outcomes of main effects with no interactions. Click here
for some
graphs that present examples of interactions.
Naturally, for any
of these examples, we do not really know if there is an interaction or a main
effect until we test for them. There will be more on that below, but first I
need to make one more distinction.
Simple Effects
If there is a significant interaction, it indicates that, overall, the effect
of one independent variable depends on the level of the other independent
variable. In other words, the effect of an independent variable is different
for different levels of the other independent variable. The overall test of
the interaction does not indicate which means are different from which other
means. To know that, we need to examine what are called simple effects.
Simple effects (or sometimes called "simple main effects") occur
between individual cell means within the levels of one of the independent
variables. In the following example, for instance, there appears to be a
"simple effect for training within the No Violence condition".
Training had some effect within the No Violence condition (8.6 vs. 3.2).
Because there was no effect for training within the Violence condition, there was
no simple effect for training within Violence. There also appears to be a
simple effect for the violence factor within Training (8.6 vs. 3.0).
|
No Training |
Training |
No Violence |
3.0 |
8.6 |
Violence |
3.1 |
3.2 |
Any combination of
simple effects may occur when there is an interaction. When a significant
interaction is found, simple effects are usually tested to discover where the
differences lie. With larger designs with more cells, this is even more
critical. One could do t-tests to compare means, but this is problematic
because of alpha inflation.
Testing Simple
Effects
There are really two approaches to testing whether particular simple effects
are significant or not. The first is to conduct a post hoc test like the Tukey
HSD test in which all possible pairs of means are tested. Another approach is
to use a simple effects analysis. This is essentially a focused F-test
that compares all the cells within a level of one of the independent variables.
Factorial ANOVA
To test for main effects and interactions in a factorial design, we (or the
computer) need(s) to conduct a factorial ANOVA. A similar rationale to the
between groups ANOVA discussed previously is used. An F-ratio is formed for
each of the main effects and the interaction. For each main effect, the
variation of the marginal means around the grand mean (mean for total sample)
are compared to the variation within groups. For the interaction, variation
among the cell means around the grand mean are examined. Although I will not go
into the details of the computation, you should understand the general
rationale for each component.
Effect |
Full Name |
Interpretation |
Mean Square |
F test |
SSA |
Sum of Squares for the A factor |
Tests the A main effect. Represents variation of the marginal means for the levels of A around the grand mean. |
MSA=SSA/(a-1) |
F=MSA/MSE |
SSB |
Sum of Sqaures for the B factor |
Tests the B main effect. Represents variation of the marginal means for the levels of B around the grand mean. |
MSA=SSB/(b-1) |
F=MSB/MSE |
SSAB |
Sum of Squares A X B |
Test of the interation. Is there a multiplicative effect? |
MSAB= |
F=MSAB/MSE |
SSE |
Sum of Squares Error |
Represents the overall variation of scores within the cells. Analogous to the SSW used in the one-way ANOVA. |
MSE=SSE/ab(n-1) |
|
SST |
Sum of Squares |
Represents total variation. |
|
|
Planned Contrasts
Another approach to interactions is to skip the omnibus interaction test and
conduct a planned contrast. A planned contrast (or sometimes called an
"a priori" contrast) compares particular cells within a design. It is
an F-test that uses the full MSE as the bottom of the F-ratio, but instead of
testing main effects and interactions, a pair of means is compared. Note that
this is different from a t-test between groups because the t-test uses a
standard error estimate and d.f. only based on the two cells (groups)
concerned. The planned contrast uses the full sample for estimating the error
variation.
From a
statisticians point of view (and yours too), planned contrasts are better than
separate t-tests. The biggest difference is that the power is greater for a
planned contrast. If you think about it, the use of a larger d.f. and a larger
sample size to estimate the standard error should lead to greater power.
Planned contrasts can also be formulated to test for a certain interaction
pattern (e.g., a cross-over X pattern), and these are referred to as interaction
contrasts. Interaction contrasts are not used very often, but they are
certainly legitimate. They are also more powerful than the omnibus factorial
ANOVA, because they require a particular pattern of results to be predicted.
More Complex
Factorial Designs
Our example above is of a 2 X 2 factorial design. This terminology refers to
two levels of the first factor and two levels of the second factor. We can also
have more complex designs, such as a 2 X 3 design. This design still has two
independent variables, but there are 2 levels of the first factor and 3 levels
of the second factor. A 4 X 2 design has four factors of the first independent
variable and 2 levels of the second independent variable. A 2 X 2 X 3 design
has 3 factors: 2 levels of the first independent variable, 2 levels of the
second independent variable, and 3 levels of the third independent variable.
Within-Subjects and
Mixed Designs
As with the one-way ANOVA, one can conduct a within-subjects test when there
are repeated measures or matched cases. Either or both of the two factors can
be within-subjects factors. If one factor is between and one is within, it is
referred to as a mixed design. Sometimes these designs are also called
"nested". Of course, any degree of complexity is possible and there
can be as many factors and levels as desired.
The computations
and details of the within-subjects factorial are a bit more complex, so we will
not be able to cover them in this class. If you ever need to analyze a study
with these complex features, there are a couple of excellent references on
ANOVA that might be consulted:
Keppel, G. (1991).
Design and analysis: A researcher's handbook. Englewood Cliffs, NJ: Prentice
Hall.
Winer, B.J. Brown,
D.R., Michels, K.M. (1991). Statistical principles in experimental design. New
York : McGraw-Hill.