Excel Data Preparation and Counting
data analysis: calculations to convert data values (raw data) into useful information
Excel is the premier worksheet application, though not the first such program and not the only one available
- LibreOffice Calc: free and open source, now Version 5, http://www.libreoffice.org
- excellent MS Office compatibility; runs on Windows, Mac and Linux
- Apple Numbers, and others
worksheet: a rectangular table defined by columns, rows, cells
- cell: intersection of a row and a column, with an address
- where data and transformed data are stored, one data value per cell
purpose of a worksheet for data analysts:
- primarily a place to enter and store data, perhaps clean data,
- and for some, a place to transform data
workbook: a single file, a collection of one or more worksheets
- native Excel file type: .xlsx
- native LibreOffice Calc file type: .ods for open document spreadsheet
- each application can read the other’s file format, and can be set
by default to write and read each other’s file type
Data Preparation
Sample Data: employee data
data types
- text, align left, results if any alphabetical character is entered into a cell
- numbers, align right, results if only numbers are entered into a cell
- logical (Boolean), align center, TRUE or FALSE
- missing data, empty cell
data table
- variable name (or field name) in first row
- after the first row, each cell contains a data value
- case or observation in each row, called a record in database terminology
- often the first column are unique ID’s (primary keys)
- nothing else, all other cells empty
- use TAB and ENTER (or RETURN) keys to enter data in rows
format of the value vs the value
- format is the display of the value,
which can differ from the value, e.g., rounding
- formulas apply to the actual value
- show the value: formula bar, or general format
- date formats: number of days since a fixed date (such as Jan, 1, 1900)
allows data math
- time formats: e.g., 8 am (need space), represents the proportion of a day
divide by 24
- % formats: e.g., .05 is 5%
cell formatting
- bold, fill, color, border
- wrap text
excel data table
- optional Excel construct to formally define a table
- recommended if working with the data within Excel
- DO: select a cell in the data, Home tab —> Format as Table
- add new row by adding a row after the bottom of the table
press tab from last cell in the data
- row headers remain visible when scrolling down
- data table
data validation
- potential problem is the free form of Excel data entry leads to inconsistencies
- DO: Data tab —> Data Validation
- DO: apply to existing values, drop-down on Data Validation, choose Circle Invalid Values
- data validation
Data Analysis: Counting
Sample Data: employee data
most basic analysis is counting values of the variables
always the first analysis in an analysis project
Categorical Variable
frequency table
- DO: select Table by selecting the first line
- Insert tab —> PivotTable
- drag variable (field name) to Columns or Rows
- pivot table
bar chart or pie chart
- DO: select headings and counts
- Insert tab —> select Chart type
- select format
- bar chart
Note that Excel calls a bar chart a bar chart (horizontal bars) or a column chart (vertical bars)
Continuous Variable
Analysis ToolPak
- first install the Analysis ToolPak to access the histogram program
- DO: File tab --—> Options —> Add-ins —> Excel Add-ins, Click Go
- Click Add-Ins, and then in the Manage box, select Excel Add-ins, then Click Go
- Windows: Install Analysis Toolpak
- Mac: Install Analysis Toolpak
construct the histogram
- DO: Construct a row or column of bins, then Data tab, Data Analysis, choose Histogram
- DO: Tools —> Excel Ad-ins… select Analysis ToolPak
- Adjust the histogram so that the bars are adjacent
- Histogram