Jason's Homepage

Stats Notes


Statistics Links

Other links

Lecture 6
Between Groups t-test

Single Group Tests: The z-test and the t-test
There are two tests that are used to see if a single sample mean is different from the population mean, the z-test and the t-test. We've already discussed the rationale for these under significance testing. The book provides more detail on these tests and when they are used (Chapter 6). Frankly, they are very rarely used, and I don't want to spend too much time on them.

One comment I do want to make is on when to use which test. On page 165 of the Daniel text, there is a complex flow chart that one can follow to make a decision about which to use. Actually, it is much simpler than this. If the population standard deviation is known, use the z-test. If it is not known, use the t-test. Because of the central limit theorem, one doesn't have to worry about the normality of the distribution of the population. In the flow chart, Daniel is pointing out, in part, that one has to be cautious about using the single group t-test when there is a very small sample size and the population is not normally distributed. Most of the time, in real research, we don't know about the shape of the distribution of the population, so people use the t-test regardless of whether the population distribution is normal.

Between Groups t-test
The between groups t-test is used when we have a continuous dependent variable and we are interested in comparing two groups. An example might be if there is experiment with an experimental and control group, or perhaps a comparison between two non-experimental groups like women and men. Another way to describe the situation in which the between groups t-test is appropriate is to state that t-test involves a dichotomous independent variable and a continuous dependent variable. Remember that the independent variable is the one assumed to be the cause and the dependent variable is the result or the outcome. The independent variable in this case is group membership (e.g., gender is the independent variable). We compare the means of the dependent variable for participants in the two groups. So, the t-test is to see if the independent, grouping variable produces a large difference between the means on the dependent variable.

I've already given you the conceptual explanation of the two groups t-test in the previous lectures. To reiterate a bit, it is a matter of checking to see if the difference between two groups is one that is likely to occur by chance or not. If the means of the two groups are large relative to what we would expect to occur from sample to sample, we consider the difference to be significant. If the difference between the group means is small relative to the amount of sampling variability, the difference will not be significant.

We can think of the formula conceptually as a ratio like this:

t = difference between groups
sampling variability

When the value on the top of the equation is large, or the value on the bottom of the equation is small, the overall ratio will be large. The larger the value of t, the farther out on the sampling curve it will be, and, thus, the more likely it will be significant.

The Standard Error of the Estimate
The value that will be used on the top of the t formula is simple. It's just the difference between the two group means (). Here x-bar with the subscript 1 denotes the mean of the first group and the x-bar with the subscript of 2 denotes the mean of the second group (of course, the subscript used has nothing to do with the actual values used in the data set for the grouping variable).

The value on the bottom of the equation is designed to gauge sampling variability. As in the previous lectures, we estimate sampling variability with the standard error. The standard error in this case can be more formally called the "standard error of the estimate," because we are gauging the sampling variability of an estimate of the difference between means. The standard error of the estimate represents the standard deviation of the sampling distribution when the sampling distribution is based on a large number of mean differences. You can imagine a sampling distribution created by taking repeated samples of two groups and computing the difference between the means in each.

Because we are not going to actually construct a sampling distribution, we have to estimate the standard error. To estimate the standard error, we use the standard deviation of the sample we have. The standard deviation represents the variability of scores in the sample. If we know something about how much scores in the sample tend to vary, we can make a guess about how much variation we would expect in the sampling distribution. Imagine drawing scores from a population with very low variability. All the scores would be similar to one another, so the samples drawn from that population would not vary much.

To make our guess at the standard error of mean differences, we need to use the variability of scores within the two groups. Because we have two groups, we need to find their variances and "pool" them together. The variances of the two groups will be symbolized by and where a 1 and a 2 are used as subscripts to denote the first or second group. If we have those, we can pool them together in the pooled variance estimate. Here's the formula, were represents the pooled variance estimate:

n1 and n2 are the sample sizes for each of the two groups. With the pooled variance, we can then estimate the standard error. The standard error is symbolized by (like the symbol for the standard error of the mean, but a subscript with the difference between the two means is used).

Now, all we need to do is to use that value in our ratio, and presto, we have a t-test.



The Statistical Hypotheses for the t-test
At the beginning of the class, I stated that we are interested in inferential statistics. We want to make decisions about the state of the population, not just our puny little sample. So, that means our real hypothesis concerns the population. The null hypothesis with the between groups t-test is that the two groups are equal in the population (either or , where m1 and m2 are the population means for the two groups). In other words, if we could conduct our experiment on the entire population, would we find a difference between the two groups? If we find significance with our t-test, we conclude that there are differences between the two population means.

How to Conduct the Between Groups t-test
In our search for significance, we need to calculate our t-value and then see if it will fall out on the end of the sampling distribution curve. The general procedures that we will follow for calculating the between groups t-test is very similar to the procedure we will follow for every other statistical test we will conduct. Let me briefly summarize the steps in a table, then I'll explain:

Step Number

Description of Step

Specific Example with Between Groups t-test


Know the hypothesis you are testing


Use the formula, finding the value of the standard error estimate first. In this case, the standard error is the pooled standard error that represents the sampling variation of the difference between two group means.


Check to see if the calculated value indicates significance. To do this: determine the degrees of freedom, and look up the critical value in the table in the back of the book for alpha=.05 (in Daniel's tables, the subscript value .975 is used). If the value you calculated exceeds the value in the table, it is significant (i.e., the null hypothesis is rejected).

d.f. = n1 + n2 - 2.


Calculate the 95% confidence interval. Two values are used: (1) the low value, which subtracts the product of the critical value times the standard error, and (2) the high value, which adds that product.

As with all tests we conduct, we will compute a value and then compare it to a value in the table in the back of the book. Note that Daniel calls the tabled value the reliability coefficient. (Other books refer to it as the critical value or the tabled value.) When we are making this comparison, we are looking to see how far out on the sampling curve our sample is likely to be. Remember this picture?


 If our calculated value exceeds the one in the table (i.e., the reliability coefficient), we have significance. The reliability coefficient is denoted by t.975 when alpha is equal to .05 and we are using a two-tailed test. More generally the reliability coefficient will be referred to as t(1-a/2). We've decided to reject the null hypothesis that the two groups were equal and we accept the alternative hypothesis that states that they are different. This means that we can be 95% sure that the differences between the two groups is not simply due to chance.

The Confidence Interval
Some researchers also like to compute a "confidence interval." The confidence interval is an estimate of where 95% of the mean differences in the sampling distribution should fall. It can also be stated as a representation of 95% confidence that the population mean difference will fall within these two points. Remember that our null hypothesis was that there is no difference between the population means ( or ). In the case of the between groups t-test, when the intervals do not include 0, we have significance. For example, if we are 95% confident that the difference between the two groups in the population falls between 2.3 and 5.4, the difference between m1 and m2 must be greater than 0, and thus there is a real (significant) difference between the groups.

To calculate the confidence interval, we find two values: the lower confidence limit (LCL) and the upper confidence limit (UCL). These are obtained from values we already have, the difference between our sample means, the tabled t-value (reliability coefficient), and the standard error.


 The next "lecture" is an example with numbers.

Jason's Homepage

Stats Notes


Statistics Links

Other links