Lecture 15
Point-biserial correlation, Phi, & Cramer's V

Differences and Relationships
The correlation coefficient is a measure of how two variables are related. t-tests examine how two groups are different. What if I told you these two types of questions are really the same question?

Examine the following histogram. This histogram shows a large difference in the dependent variable, y, between the groups No X and X. The mean is much higher for the X than the No X group. This is an example in which X has two groups. Now imagine that there are several values (levels or groups) for variable X: No X, a little X, some X, a lot of X, all of X. For instance, what if instead of having a study with no drug vs. drug, there is a study in which several levels of dosage used (e.g., 0mg, 5mg, 10mg, etc). Then there would be several intermittent values of X. The following graph shows one potential outcome of the study in which there are several levels of X. In this second graph, we see that as the value of X increases, the value of Y increases. That is the definition of a relationship. In other words, there is a correlation between X and Y.

The same holds true for the first figure in which there were only two values of X. As X increases, Y increases. So both graphs demonstrate that there is a relationship between X and Y. At the same time, we can see that both graphs demonstrate that the values of Y differ for the values of X. So, a difference between values of X is the same as a relationship between X and Y.

Point-biserial Correlation
A point-biserial correlation is simply the correlation between one dichotmous variable and one continuous variable. It turns out that this is a special case of the Pearson correlation. So computing the special point-biserial correlation is equivalent to computing the Pearson correlation when one variable is dichotmous and the other is continuous.

Because we know that differences are the same as correlations now, we can show that the t-test for two groups is the same as the correlation between the grouping or independent variable (X) and the dependent variable (Y). Let's go back to our last t-test example and check this out. In this example, we were comparing first and last siblings to see who was more introverted.

 First or Last Sibling Introversion 1 65 1 48 1 63 1 52 1 61 1 53 1 63 1 70 1 65 1 66 2 61 2 42 2 66 2 52 2 47 2 58 2 65 2 62 2 64 2 69

When we conducted the t-test we got the following results: This t-value was non-significant, indicating no significant differences between the groups.

If we compute a correlation between the grouping variable (sibling) and the the introversion score for this same data, the r= -.13 (I magically arrived at this figure with some assistance from my friend Hal). If we plug this into the t-test formula used for testing the significance of a correlation, we would get this: which is the very same t-value from the difference test above. The correlation test (also nonsignificant) indicates that there is no relationship between the sibling group and the introversion score.

So, a difference is a correlation.

Phi
There is another special case of correlation called "phi" (or
f, the Greek letter f ). Phi represents the correlation between two dichotmous variables. As with the point-biserial, computing the Pearson correlation for two dichotomous variables is the same as the phi.

Similar to the t-test/correlation equivalence, the relationship between two dichotomous variables is the same as the difference between two groups when the dependent variable is dichotmous. The appropriate test to compare group differences with a dichotmous outcome is the chi-square statistic. And, we can also show that the test of the phi coefficient is equivalent to the chi-square test.

Remember, one of the ways chi-square is interpreted is as a test of independence. The test of independence refers to whether or not two varables are related. If two variables are related, they are correlated.

So, when we conduct a chi-square test, and we want to have a rough estimate of how strongly related the two variables are, we can examine phi. Squaring phi will give you the approximate amount of shared variance between the two variables, as does r-square.

Cramer's V
Cramer's V is used to examine the association between two categorical variables when there is more than a 2 X 2 contingency (e.g., 2 X 3). In these more complicated designs, phi is not appropriate, but Cramer's statistic is. Cramer's V represents the association or correlation between two variables. I've also seen this statistic referred to as "Cramers Phi".