Lecture
1
Types of scales & levels of measurement
Discrete and
continuous variables
Daniel's text distinguishes between discrete and continuous variables. These
are technical distinctions that will not be all that important to us in this
class. According to the text, discrete variables are variables in which there
are no intermediate values possible. For instance, the number of phone calls
you receive per day. You cannot receive 6.3 phone calls. Continuous variables
are everything else; any variable that can theoretically have values in between
points (e.g., between 153 and 154 lbs. for instance). It turns out that this is
not all that useful of a distinction for our purposes. What is really more
important for statistical considerations is the level of measurement
used. When I say it is more important, I've really understated this.
Understanding the level of measurement of a variable (or scale or measure) is
the first and most important distinction one must make about a variable when
doing statistics!
Levels of
measurement
Statisticians often refer to the "levels of measurement" of a
variable, a measure, or a scale to distinguish between measured variables that
have different properties. There are four basic levels: nominal, ordinal,
interval, and ratio.
Nominal
A variable measured on a "nominal" scale is
a variable that does not really have any evaluative distinction. One value is
really not any greater than another. A good example of a nominal variable is
sex (or gender). Information in a data set on sex is usually coded as 0 or 1, 1
indicating male and 0 indicating female (or the other way around--0 for male, 1
for female). 1 in this case is an arbitrary value and it is not any greater or
better than 0. There is only a nominal difference between 0 and 1. With nominal
variables, there is a qualitative difference between values, not a quantitative
one.
Ordinal
Something measured on an "ordinal" scale
does have an evaluative connotation. One value is greater or larger or better
than the other. Product A is preferred over product B, and therefore A receives
a value of 1 and B receives a value of 2. Another example might be rating your
job satisfaction on a scale from 1 to 10, with 10 representing complete
satisfaction. With ordinal scales, we only know that 2 is
better than 1 or 10 is better than 9; we do not know by how much. It may vary. The distance between 1 and 2 maybe shorter than between 9 and 10.
Interval
A variable measured on an interval scale gives
information about more or betterness as ordinal
scales do, but interval variables have an equal distance between each value.
The distance between 1 and 2 is equal to the distance between 9 and 10.
Temperature using Celsius or Fahrenheit is a good example, there is the exact
same difference between 100 degrees and 90 as there is between 42 and 32.
Ratio
Something measured on a ratio scale has the same
properties that an interval scale has except, with a ratio scaling, there is an
absolute zero point. Temperature measured in Kelvin is an example. There is no
value possible below 0 degrees Kelvin, it is absolute zero. Weight is another
example, 0 lbs. is a meaningful absence of weight. Your bank account balance is
another. Although you can have a negative or positive account balance, there is
a definite and nonarbitrary meaning of an account
balance of 0.
One can think of nominal, ordinal,
interval, and ratio as being ranked in their relation to one another. Ratio is
more sophisticated than interval, interval is more sophisticated than ordinal,
and ordinal is more sophisticated than nominal. I don't know if the ranks are
equidistant or not, probably not. So what kind of measurement level is this
ranking of measurement levels?? I'd say ordinal. In statistics, it's best to be
a little conservative when in doubt.
Two
General Classes of Variables (Who Cares?)
Ok, remember I stated that this is the first and most important distinction
when using statistics? Here's why. For the most part, statisticians or
researchers wind up only caring about the difference between nominal and all
the others. There are generally two classes of statistics: those that deal with
nominal dependent variables and those that deal with ordinal, interval,
or ratio variables. (Right now we will focus on the dependent variable and
later we will discuss the independent variable). When I describe these types of
two general classes of variables, I (and many others) usually refer to them as
"categorical" and "continuous." (Sometimes I'll use "dichotomous"
instead of "categorical" ). Note also, that
"continuous" in this sense is not exactly the same as
"continuous" used in Chapter 1 of the text when distinguishing
between discrete and continuous. It’s a much looser term. Categorical and
dichotomous usually mean that a scale is nominal. "Continuous"
variables are usually those that are ordinal or better.
Ordinal scales with few categories
(2,3, or possibly 4) and nominal measures are often classified as categorical
and are analyzed using binomial class of statistical tests, whereas ordinal
scales with many categories (5 or more), interval, and ratio, are usually
analyzed with the normal theory class of statistical tests. Although the distinction is a somewhat fuzzy
one, it is often a very useful distinction for choosing the correct statistical
test. There are a number of special
statistics that have been developed to deal with ordinal variables with just a
few possible values, but we are not going to cover them in this class (see Agresti, 1984, 1990; O’Connell, 2006; Wickens,
1989 for more information on analysis of ordinal variables).
General Classes of
Statistics (Oh, I Guess I Do Care)
Ok, so we have these two general categories (i.e., continuous and categorical),
what next…? Well this distinction (as fuzzy as it may sound) has very important
implications for the type of statistical procedure used and we will be making
decisions based on this distinction all through the course. There are
two general classes of statistics: those based on binomial theory and
those based on normal theory. Chi-square and logistic regression deal
with binomial theory or binomial distributions, and t-tests,
ANOVA, correlation, and regression deal with normal theory. So here's a table
to summarize.
Type of Dependent Variable (or Scale) |
Level of Measurement |
General Class of
Statistic |
Examples of Statistical Procedures |
Categorical (or dichotomous) |
nominal, ordinal with 2, 3, or 4 levels |
binomial |
chi-square, logistic regression |
Continuous |
ordinal with more than 4 categories |
normal |
ANOVA, regression, correlation, t-tests |
Survey
Questions and Measures: Some Common Examples
In actual practice, researchers
and real life research problems do not tell you how the dependent variable
should be categorized, so I will outline a few types of survey questions or
other measures that are commonly used.
Yes/No
Questions
Any question on a survey that has yes or no as a possible response is nominal,
and so binomial statistics will be applied whenever a single yes/no question
serves as the dependent variable or one of the dependent variables in an
analysis.
Likert Scales
A special kind of survey question uses a set of
responses that are ordered so that one response is greater than another. The
term Likert scale is named after the inventor,
Rensis Likert, whose name
is pronounced "Lickert." Generally, this
term is used for any question that has about 5 or more possible options. An
example might be: "How would you rate your department administrator?"
1=very incompetent, 2=somewhat incompetent, 3=neither competent, 4=somewhat
competent, or 5=very competent. Likert scales are
either ordinal or interval, and many psychometricians
would argue that they are interval scales because, when well constructed, there
is equal distance between each value. So if a Likert
scale is used as a dependent variable in an analysis, normal theory statistics
are used such as ANOVA or regression would be used.
Physical
Measures
Most physical measures, such as
height, weight, systolic blood pressure, distance etc., are interval or ratio
scales, so they fall into the general "continuous "
category. Therefore, normal theory type statistics are also used when a such a measure serves as the dependent variable in an
analysis.
Counts
Counts are tricky. If a variable is measured by counting, such as the case if a
researcher is counting the number of days a hospital patient has been
hospitalized, the variable is on a ratio scale and is treated as a continuous
variable. Special statistics are often recommended, however, because count
variables often have a very skewed distribution with a
large number of cases with a zero count (see Agresti,
1990, p. 125; Cohen, Cohen, West, & Aiken, 2003, Chapter 13). If a researcher is counting the number of
subjects in an experiment (or number of cases in the data set), a continuous
type measure is not really being used. Counting in this instance is really
examining the frequency that some value of a variable occurs. For example,
counting the number of subjects in the data set that report having been
hospitalized in the last year, relies on a dichotomous variable in the data set
that stands for being hospitalized or not being hospitalized (e.g., from a
question such as "have you been hospitalized in the last year?").
Even if one were to count the number of cases based on the question "how
many days in the past year have you been hospitalized," which is a
continuous measure, the variable being used in the analysis is really not this
continuous variable. Instead, the researcher would actually be analyzing a
dichotomous variable by counting the number of people who had not been
hospitalized in the past year (0 days) vs. those that had been (1 or more
days).