lecture1

Lecture 1
Types of scales & levels of measurement

Discrete and continuous variables
Daniel's text distinguishes between discrete and continuous variables. These are technical distinctions that will not be all that important to us in this class. According to the text, discrete variables are variables in which there are no intermediate values possible. For instance, the number of phone calls you receive per day. You cannot receive 6.3 phone calls. Continuous variables are everything else; any variable that can theoretically have values in between points (e.g., between 153 and 154 lbs. for instance). It turns out that this is not all that useful of a distinction for our purposes. What is really more important for statistical considerations is the level of measurement used. When I say it is more important, I've really understated this. Understanding the level of measurement of a variable (or scale or measure) is the first and most important distinction one must make about a variable when doing statistics!

Levels of measurement
Statisticians often refer to the "levels of measurement" of a variable, a measure, or a scale to distinguish between measured variables that have different properties. There are four basic levels: nominal, ordinal, interval, and ratio.

Nominal
A variable measured on a "nominal" scale is a variable that does not really have any evaluative distinction. One value is really not any greater than another. A good example of a nominal variable is sex (or gender). Information in a data set on sex is usually coded as 0 or 1, 1 indicating male and 0 indicating female (or the other way around--0 for male, 1 for female). 1 in this case is an arbitrary value and it is not any greater or better than 0. There is only a nominal difference between 0 and 1. With nominal variables, there is a qualitative difference between values, not a quantitative one.

Ordinal
Something measured on an "ordinal" scale does have an evaluative connotation. One value is greater or larger or better than the other. Product A is preferred over product B, and therefore A receives a value of 1 and B receives a value of 2. Another example might be rating your job satisfaction on a scale from 1 to 10, with 10 representing complete satisfaction. With ordinal scales, we only know that 2 is better than 1 or 10 is better than 9; we do not know by how much. It may vary. The distance between 1 and 2 maybe shorter than between 9 and 10.

Interval
A variable measured on an interval scale gives information about more or betterness as ordinal scales do, but interval variables have an equal distance between each value. The distance between 1 and 2 is equal to the distance between 9 and 10. Temperature using Celsius or Fahrenheit is a good example, there is the exact same difference between 100 degrees and 90 as there is between 42 and 32.

Ratio
Something measured on a ratio scale has the same properties that an interval scale has except, with a ratio scaling, there is an absolute zero point. Temperature measured in Kelvin is an example. There is no value possible below 0 degrees Kelvin, it is absolute zero. Weight is another example, 0 lbs. is a meaningful absence of weight. Your bank account balance is another. Although you can have a negative or positive account balance, there is a definite and nonarbitrary meaning of an account balance of 0.

One can think of nominal, ordinal, interval, and ratio as being ranked in their relation to one another. Ratio is more sophisticated than interval, interval is more sophisticated than ordinal, and ordinal is more sophisticated than nominal. I don't know if the ranks are equidistant or not, probably not. So what kind of measurement level is this ranking of measurement levels?? I'd say ordinal. In statistics, it's best to be a little conservative when in doubt.

Two General Classes of Variables (Who Cares?)
Ok, remember I stated that this is the first and most important distinction when using statistics? Here's why. For the most part, statisticians or researchers wind up only caring about the difference between nominal and all the others. There are generally two classes of statistics: those that deal with nominal dependent variables and those that deal with ordinal, interval, or ratio variables. (Right now we will focus on the dependent variable and later we will discuss the independent variable). When I describe these types of two general classes of variables, I (and many others) usually refer to them as "categorical" and "continuous." (Sometimes I'll use "dichotomous" instead of "categorical" ). Note also, that "continuous" in this sense is not exactly the same as "continuous" used in Chapter 1 of the text when distinguishing between discrete and continuous. It’s a much looser term. Categorical and dichotomous usually mean that a scale is nominal. "Continuous" variables are usually those that are ordinal or better.

Ordinal scales with few categories (2,3, or possibly 4) and nominal measures are often classified as categorical and are analyzed using binomial class of statistical tests, whereas ordinal scales with many categories (5 or more), interval, and ratio, are usually analyzed with the normal theory class of statistical tests. Although the distinction is a somewhat fuzzy one, it is often a very useful distinction for choosing the correct statistical test. There are a number of special statistics that have been developed to deal with ordinal variables with just a few possible values, but we are not going to cover them in this class (see Agresti, 1984, 1990; O’Connell, 2006; Wickens, 1989 for more information on analysis of ordinal variables).

General Classes of Statistics (Oh, I Guess I Do Care)
Ok, so we have these two general categories (i.e., continuous and categorical), what next…? Well this distinction (as fuzzy as it may sound) has very important implications for the type of statistical procedure used and we will be making decisions based on this distinction all through the course. There are two general classes of statistics: those based on binomial theory and those based on normal theory. Chi-square and logistic regression deal with binomial theory or binomial distributions, and t-tests, ANOVA, correlation, and regression deal with normal theory. So here's a table to summarize.

Type of Dependent Variable (or Scale)	Level of Measurement	General Class of Statistic (Binomial or Normal Theory)	Examples of Statistical Procedures
Categorical (or dichotomous)	nominal, ordinal with 2, 3, or 4 levels	binomial	chi-square, logistic regression
Continuous	ordinal with more than 4 categories	normal	ANOVA, regression, correlation, t-tests

Survey Questions and Measures: Some Common Examples
In actual practice, researchers and real life research problems do not tell you how the dependent variable should be categorized, so I will outline a few types of survey questions or other measures that are commonly used.

Yes/No Questions
Any question on a survey that has yes or no as a possible response is nominal, and so binomial statistics will be applied whenever a single yes/no question serves as the dependent variable or one of the dependent variables in an analysis.

Likert Scales
A special kind of survey question uses a set of responses that are ordered so that one response is greater than another. The term Likert scale is named after the inventor, Rensis Likert, whose name is pronounced "Lickert." Generally, this term is used for any question that has about 5 or more possible options. An example might be: "How would you rate your department administrator?" 1=very incompetent, 2=somewhat incompetent, 3=neither competent, 4=somewhat competent, or 5=very competent. Likert scales are either ordinal or interval, and many psychometricians would argue that they are interval scales because, when well constructed, there is equal distance between each value. So if a Likert scale is used as a dependent variable in an analysis, normal theory statistics are used such as ANOVA or regression would be used.

Physical Measures
Most physical measures, such as height, weight, systolic blood pressure, distance etc., are interval or ratio scales, so they fall into the general "continuous " category. Therefore, normal theory type statistics are also used when a such a measure serves as the dependent variable in an analysis.

Counts
Counts are tricky. If a variable is measured by counting, such as the case if a researcher is counting the number of days a hospital patient has been hospitalized, the variable is on a ratio scale and is treated as a continuous variable. Special statistics are often recommended, however, because count variables often have a very skewed distribution with a large number of cases with a zero count (see Agresti, 1990, p. 125; Cohen, Cohen, West, & Aiken, 2003, Chapter 13). If a researcher is counting the number of subjects in an experiment (or number of cases in the data set), a continuous type measure is not really being used. Counting in this instance is really examining the frequency that some value of a variable occurs. For example, counting the number of subjects in the data set that report having been hospitalized in the last year, relies on a dichotomous variable in the data set that stands for being hospitalized or not being hospitalized (e.g., from a question such as "have you been hospitalized in the last year?"). Even if one were to count the number of cases based on the question "how many days in the past year have you been hospitalized," which is a continuous measure, the variable being used in the analysis is really not this continuous variable. Instead, the researcher would actually be analyzing a dichotomous variable by counting the number of people who had not been hospitalized in the past year (0 days) vs. those that had been (1 or more days).