Lecture 12
Factorial ANOVA

Our initial discussion of Analysis of Variance concerned the comparison of two or more groups. Those groups can be considered different "levels" of one independent variable. If we are comparing women to men, there are two levels of gender represented. If the study is an experiment that compares a drug's effect to a control group with no drug, there is one independent variable with two levels: drug and no drug. If we add a third comparison group, say a placebo, we still have one independent variable. We just have three different representations of the independent variable. The drug, no drug, and the placebo are considered different "levels" of the independent variable. A factorial design (and hence, a factorial analysis) introduces another independent variable to be studied simultaneously with the first independent variable, and each independent variable can have two or more levels. Each of the independent variables are now called factors.

Example
We previously discussed an experiment that examined the effect of a violent incident on memory for details of an eyewitness account. In that study (based loosely on studies by Elizabeth Loftus and colleagues), two groups of participants are shown identical films of a bank robber. In the film, the bank robber is shown getting money from the teller and then leaving the bank and running off down the street. The two groups, which are randomly assigned, are shown slightly different versions of the film, however. In one group, a very brief clip is inserted in the film that shows the robber running out of the bank. At that point, one group views a violent incident in which the robber turns and shoots a police officer. In the initial experiment we discussed, the participants who saw the version with the violent incident had poorer memory about the characteristics of the robber. Let's assume that the dependent variable is a memory test with 10 questions about the robber's identifying characteristics (e.g., color of shirt, color of pants, approx. height, approx. age, race etc.).

The cells below represent rough approximations of the mean scores on the memory test in the two groups.

Study 1

 No Violence Violence Memory performance High Low

Let's say we wanted to turn this study into a practical public program of some sort that attempts to improve eyewitness memory for details. Because we could not possibly train everyone who might be a potential witness to a crime, we might pick out certain people who are more likely to witness a crime (e.g., convenience store owners, police officers). Because police officers are always witnessing crimes and arrests, and they often have to testify about the incident, we might try to improve their memory for criminal events. We could develop a training course that attempted to improve the memory of police officers for crime details by informing them about various potential fallacies of eyewitness accounts. We then might conduct a second study that evaluates the effectiveness of the training. In that study, we randomly assign police officers to a training condition or to a condition in which they receive no training. At the end of the training period, the police officers are shown a film of a crime and given a memory performance test. If the training was effective, they should have better memory of the incident.

Study 2

 No Training Training Memory performance low high

Instead of conducting two separate experiments, we could combine them both. This would be beneficial for several reasons. First, it would be more economical; it is cheaper and easier to run one experiment instead of two. Second, we could look for any combined effects of the two independent variables. For instance, it might be that the training program would be effective for improving memory, but that the benefits of the training do not extend to violent incidents. When a violent incident occurs, trained police officers are just as likely to miss the details, because the emotional event interferes with memory storage for them also. In other words, although memory is generally improved by the training, the circumstances under which it is effective are limited. If this was the case, we would have the following outcome of our combined study.

 No Training Training No Violence Med High Violence Low Low

In this hypothetical outcome, the training has an effect when there is not a violent incident. However, if there is violence in the film clip, training makes no difference in memory performance. In this sense, the effect on one independent variable (training) depends on the level of the other independent variable (violence). This is an example of an interaction. Interactions imply that the two independent variables combine to have a different effect on the dependent variable. Mathematically, their effects are multiplicative rather than additive. Take the following hypothetical outcome as an example.

 No Training Training No Violence Med High Violence Low Med

Here, there is a difference between the Training and No Training conditions in their effects, but the effect of training is essentially the same in both the Violence and No Violence condition. Training improves memory performance in all groups equally. This is an example of an additive effect, not a multiplicative effect. If the effect is simply additive, there is no interaction effect.

Main Effects
A "main effect" concerns the overall effect of one of the independent variables. In examining the main effect of one independent variable, one compares the difference between two groups after combining levels of the other independent variable. In the above experiment, there are two possible main effects: a main effect for training, and a main effect for violence. They are "main" because they concern the overall or average effect of one of the independent variables. Let's look at some hypothetical means that would match our previous "additive" findings. Our scores on the dependent variable may range from 0 to 10, so here are some possible means:

 No Training Training (Marginal Means) No Violence 6.6 8.6 (7.6) Violence 2.0 6.0 (4.0) (Marginal Means) (4.3) (7.3) (6.1)

In this example, there is a main effect for the training variable, because, overall, training improves memory. To see the main effect for memory, we have to average over the two Violence/No Violence cells. We do that separately for the Training and No Training condictions. What we wind up with is two marginal means: 4.3 overall for the No Training conditions and 7.3 overall for the Training condition. Because 7.3 is larger than 4.3, there seems to be a difference the Training and No Training conditions (of course, we would have to test this for signficance). The difference between these marginal means represents a main effect. To look at the main effect for the Violence independent variable, we would compare the Violence and No Violence combining all the participants in the No Training and Training conditions. Overall, the Violence conditions had poorer memory than the No Violence condition (4.0 vs. 7.6). So, in general, there seems to be two main effects in this study, and, because the effects are additive, there is no interaction between the two independent variables. Their effects on the dependent variable are separate or independent.

Main effects and interactions do not depend on one another. That is, there can be one or two main effects, and an interaction can occur in combination with either main effect, both main effects, or no main effects. An easy way to tell if there is an interaction is to plot the four cell means on a graph. This can be done with a line graph or a histogram, although I believe the line graph offers a simpler interpretation. If the two lines are parallel, there is no interaction. If they are not parallel, there is an interaction.

Click here for some examples of possible outcomes of main effects with no interactions. Click here for some graphs that present examples of interactions.

Naturally, for any of these examples, we do not really know if there is an interaction or a main effect until we test for them. There will be more on that below, but first I need to make one more distinction.

Simple Effects
If there is a significant interaction, it indicates that, overall, the effect of one independent variable depends on the level of the other independent variable. In other words, the effect of an independent variable is different for different levels of the other independent variable. The overall test of the interaction does not indicate which means are different from which other means. To know that, we need to examine what are called simple effects. Simple effects (or sometimes called "simple main effects") occur between individual cell means within the levels of one of the independent variables. In the following example, for instance, there appears to be a "simple effect for training within the No Violence condition". Training had some effect within the No Violence condition (8.6 vs. 3.2). Because there was no effect for training within the Violence condition, there was no simple effect for training within Violence. There also appears to be a simple effect for the violence factor within Training (8.6 vs. 3.0).

 No Training Training No Violence 3.0 8.6 Violence 3.1 3.2

Any combination of simple effects may occur when there is an interaction. When a significant interaction is found, simple effects are usually tested to discover where the differences lie. With larger designs with more cells, this is even more critical. One could do t-tests to compare means, but this is problematic because of alpha inflation.

Testing Simple Effects
There are really two approaches to testing whether particular simple effects are significant or not. The first is to conduct a post hoc test like the Tukey HSD test in which all possible pairs of means are tested. Another approach is to use a simple effects analysis. This is essentially a focused F-test that compares all the cells within a level of one of the independent variables.

Factorial ANOVA
To test for main effects and interactions in a factorial design, we (or the computer) need(s) to conduct a factorial ANOVA. A similar rationale to the between groups ANOVA discussed previously is used. An F-ratio is formed for each of the main effects and the interaction. For each main effect, the variation of the marginal means around the grand mean (mean for total sample) are compared to the variation within groups. For the interaction, variation among the cell means around the grand mean are examined. Although I will not go into the details of the computation, you should understand the general rationale for each component.

 Effect Full Name Interpretation Mean Square F test SSA Sum of Squares for the A factor Tests the A main effect. Represents variation of the marginal means for the levels of A around the grand mean. MSA=SSA/(a-1) where a represents the number of levels of the A factor F=MSA/MSE SSB Sum of Sqaures for the B factor Tests the B main effect. Represents variation of the marginal means for the levels of B around the grand mean. MSA=SSB/(b-1) where b represents the number of levels of the B factor F=MSB/MSE SSAB Sum of Squares A X B Test of the interation. Is there a multiplicative effect? MSAB= SSAB/(a-1)(b-1) F=MSAB/MSE SSE Sum of Squares Error Represents the overall variation of scores within the cells. Analogous to the SSW used in the one-way ANOVA. MSE=SSE/ab(n-1) SST Sum of Squares Represents total variation.

Planned Contrasts
Another approach to interactions is to skip the omnibus interaction test and conduct a planned contrast. A planned contrast (or sometimes called an "a priori" contrast) compares particular cells within a design. It is an F-test that uses the full MSE as the bottom of the F-ratio, but instead of testing main effects and interactions, a pair of means is compared. Note that this is different from a t-test between groups because the t-test uses a standard error estimate and d.f. only based on the two cells (groups) concerned. The planned contrast uses the full sample for estimating the error variation.

From a statisticians point of view (and yours too), planned contrasts are better than separate t-tests. The biggest difference is that the power is greater for a planned contrast. If you think about it, the use of a larger d.f. and a larger sample size to estimate the standard error should lead to greater power. Planned contrasts can also be formulated to test for a certain interaction pattern (e.g., a cross-over X pattern), and these are referred to as interaction contrasts. Interaction contrasts are not used very often, but they are certainly legitimate. They are also more powerful than the omnibus factorial ANOVA, because they require a particular pattern of results to be predicted.

More Complex Factorial Designs
Our example above is of a 2 X 2 factorial design. This terminology refers to two levels of the first factor and two levels of the second factor. We can also have more complex designs, such as a 2 X 3 design. This design still has two independent variables, but there are 2 levels of the first factor and 3 levels of the second factor. A 4 X 2 design has four factors of the first independent variable and 2 levels of the second independent variable. A 2 X 2 X 3 design has 3 factors: 2 levels of the first independent variable, 2 levels of the second independent variable, and 3 levels of the third independent variable.

Within-Subjects and Mixed Designs
As with the one-way ANOVA, one can conduct a within-subjects test when there are repeated measures or matched cases. Either or both of the two factors can be within-subjects factors. If one factor is between and one is within, it is referred to as a mixed design. Sometimes these designs are also called "nested". Of course, any degree of complexity is possible and there can be as many factors and levels as desired.

The computations and details of the within-subjects factorial are a bit more complex, so we will not be able to cover them in this class. If you ever need to analyze a study with these complex features, there are a couple of excellent references on ANOVA that might be consulted:

Keppel, G. (1991). Design and analysis: A researcher's handbook. Englewood Cliffs, NJ: Prentice Hall.

Winer, B.J. Brown, D.R., Michels, K.M. (1991). Statistical principles in experimental design. New York : McGraw-Hill.