Threats to validity of Research Design

Barbara Ohlund and Chong-ho Yu

The books by Campbell and Stanley (1963) and Cook and Campbell (1979) are considered classic in the field of experimental design. The following is summary of their books with insertion of our examples.

Problem and Background

Experimental method and essay-writing

Campbell and Stanley point out that adherence to experimentation dominated the field of education through the 1920s (Thorndike era) but that this gave way to great pessimism and rejection by the late 1930s. However, it should be noted that a departure from experimentation to essay writing (Thorndike to Gestalt Psychology) occurred most often by people already adept at the experimental tradition. Therefore we must be aware of the past so that we avoid total rejection of any method, and instead take a serious look at the effectiveness and applicability of current and past methods without making false assumptions.


Multiple experimentation is more typical of science than a once and for all definitive experiment! Experiments really need replication and cross-validation at various times and conditions before the results can be theoretically interpreted with confidence.

Cumulative wisdom

An interesting point made is that experiments which produce opposing theories against each other probably will not have clear cut outcomes--that in fact both researchers have observed something valid which represents the truth. Adopting experimentation in education should not imply advocating a position incompatible with traditional wisdom, rather experimentation may be seen as a process of refining this wisdom. Therefore these areas, cumulative wisdom and science, need not be opposing forces.

Factors Jeopardizing Internal and External Validity

Please note that validity discussed here is in the context of experimental design, not in the context of measurement.

Factors which jeopardize internal validity

Factors which jeopardize external validity

Three Experimental Designs

To make things easier, the following will act as representations within particular designs:

The three experimental designs discussed in this section are:

The One Shot Case Study

This is a single group studied only once. A group is introduced to a treatment or condition and then observed for changes which are attributed to the treatment


The Problems with this design are:

One Group Pre-Posttest Design

This is a presentation of a pretest, followed by a treatment, and then a posttest where the difference between O1 and O2 is explained by X:

O1 X O2

However, there exists threats to the validity of the above assertion:

The Static Group Comparison

This is a two group design, where one group is exposed to a treatment and the results are tested while a control group is not exposed to the treatment and similarly tested in order to compare the effects of treatment.

X O1

Threats to validity include:

Three True Experimental Designs

The next three designs discussed are the most strongly recommended designs:

The Pretest-Posttest Control Group Design

This designs takes on this form:

R O1 X O2
R O3
This design controls for all of the seven threats to validity described in detail so far. An explanation of how this design controls for these threats is below.

The factors described so far effect internal validity. These factors could produce changes which may be interpreted as the result of the treatment. These are called main effects which have been controlled in this design giving it internal validity.

However, in this design, there are threats to external validity (also called interaction effects because they involve the treatment and some other variable the interaction of which cause the threat to validity). It is important to note here that external validity or generalizability always turns out to involve extrapolation into a realm not represented in one's sample.

In contrast, internal validity are solvable within the limits of the logic of probability statistics. This means that we can control for internal validity based on probability statistics within the experiment conducted, however, external validity or generalizability can not logically occur because we can't logically extrapolate to different conditions. (Hume's truism that induction or generalization is never fully justified logically).

External threats include:

Research should be conducted in schools in this manner--ideas for research should originate with teachers or other school personnel. The designs for this research should be worked out with someone expert at research methodology, and the research itself carried out by those who came up with the research idea. Results should be analyzed by the expert, and then the final interpretation delivered by an intermediary.

Tests of significance for this design--although this design may be developed and conducted appropriately, statistical tests of significance are not always used appropriately.

The Soloman Four-Group Design

The design is as:

R O1 X O2
R O3
X O5


In this design, subjects are randomly assigned to four different groups: experimental with both pre-posttests, experimental with no pretest, control with pre-posttests, and control without pretests. By using experimental and control groups with and without pretests, both the main effects of testing and the interaction of testing and the treatment are controlled. Therefore generalizability increases and the effect of X is replicated in four different ways.

Statistical tests for this design--a good way to test the results is to rule out the pretest as a "treatment" and treat the posttest scores with a 2X2 analysis of variance design-pretested against unpretested.

The Posttest-Only Control Group Design

This design is as:
R X O1
This design can be though of as the last two groups in the Solomon 4-group design. And can be seen as controlling for testing as main effect and interaction, but unlike this design, it doesn't measure them. But the measurement of these effects isn't necessary to the central question of whether of not X did have an effect. This design is appropriate for times when pretests are not acceptable.

Statistical tests for this design--the most simple form would be the t-test. However covariance analysis and blocking on subject variables (prior grades, test scores, etc.) can be used which increase the power of the significance test similarly to what is provided by a pretest.

Discussion on causal inference and generalization

As illustrated above, Cook and Campbell devoted much efforts to avoid/reduce the threats against internal valdity (cause and effect) and external validity (generalization). However, some widespread concepts may also contribute other types of threats against internal and external validity.

Some researchers downplay the importance of causal inference and assert the worth of understanding. This understanding includes "what," "how," and "why." However, is "why" considered a "cause and effect" relationship? If a question "why X happens" is asked and the answer is "Y happens," does it imply that "Y causes X"? If X and Y are correlated only, it does not address the question "why." Replacing "cause and effect" with "understanding" makes the conclusion confusing and misdirect researchers away from the issue of "internal validity."

Some researchers apply a phenomenological approach to "explanation." In this view, an explanation is applied to only a particular case in a particular time and place, and thus generalization is considered inappropriate. In fact, a particular explanation does not explain anything. For example, if one askes, "Why Alex Yu behaves in that way," the asnwer could be "because he is Alex Yu. He is a unqiue human being. He has a particular family background and a specific social circle." These "particular" statements are alway right, thereby misguide researchers away from the issue of external validity.