Data analysis involves the following attributes of a variable.
Variable name: A relatively short name by which the variable is referred to in the analysis.
The exact spelling and capitalization of each variable should be specified. It is the variable name by which variables are identified for analysis.
Variable type: Continuous or categorical variable.
Continuous variables have numeric values, ordered along the real number line. Categorical variables have a relatively small number of fixed categories (or levels).
Variable storage type: How the data values for the variable are stored in the computer.
Continuous variables can be stored as integers (type
integer) or with decimal digits (type double,
for double precision). Categorical variables can be stored as integers
or as non-numeric characters (type character).
Variable label: Describes the meaning of the variable in a sentence or less.
For a survey that consists of a sequence of items for which respondents provide answers, such as the level of agreement, the variable labels are typically the content of each item. In the analysis, each item is a separate variable.
Unlike standard R, lessR provides for variable labels,
which then augment the variable name in both text and data
visualizations. To read variable labels, create an Excel or csv file
with two columns. The first column is the variable name. The second
column is the corresponding variable label. There are no variable names,
so the first line is the information for the first variable. See an
example at http://lessRstats.com/data/employee_lbl.xlsx.
Variable label file for the Employee data set.
Read the variable label file into the l data frame. Add the
parameter var_labels set to TRUE in the call
to Read().
d <- Read("Employee", quiet=TRUE)
l <- Read("http://lessRstats.com/data/employee_lbl.xlsx", var_labels=TRUE, quiet=TRUE)
Then output contains the more explanatory variable labels.
BarChart(Dept, quiet=TRUE)
Value label: Describes a category of a categorical variable.
For example, preferably the Gender category Woman is coded with a
value label of W or Woman. If coded numerically, such as a 1, then the
meaning of that 1 needs to be defined. (With R, create a variable of
type factor, explained separately.) Associated with a value
label may be the code used to record the response in the data table.
Response code: The response that defines a recorded data value.
For example, if the Gender Woman is recorded as a 1, then the coded value is a 1. The corresponding value label is Woman.
The owner of a data set provides a code book that accompanies the data set. All analysts who will work with the data need access to the code book.
Codebook: A dictionary of the variables in a data set that provides their names, labels, and any relevant value labels.
The code book is the documentation that explains the meaning of the variables in the data table.
An example, for the Employee data set, follows, conveniently constructed as a spreadsheet.
Codebook for the Employee data set.
The code book clarifies the meaning of all the variables in the corresponding data set. The code book provides additional information, at least the title of the data set, the date the code book is prepared, and the person/organization that is charged with maintaining and supplying the data.