Data Analysis with R/lessR

Overview

This week introduces the R language for data analysis, one of the primary languages of data science. This week also introduces the two fundamental operations of any data analysis project:

Why R? This class is about forecasting unknown values of interest, particularly as applies to the future. R provides powerful forecasting tools that, once you get how the system works, are easy to use. R is also free to use and works identically on Windows, Macs, and Linux/Unix computers.

Each future week introduces concepts related to forecasting, accompanied by some brief R instructions as to how to implement the concepts for that week. As with this week, take the example R instructions given to you and simply change the data set and variable names to apply to a specific homework problem. You would do the same adaptation to apply those instructions to do real data analysis and forecasting on the job. We do not need to do any R programming, just copy some function calls presented either in the online R textbook or in the weekly Canvas content.

Introduction to R

lessR

I provide all the all needed information regarding R for doing the homework and tests. There is never any need to obtain more information about R beyond the provided instructions and information. Of course, although no need, you can always web search or ChatGPT (or similar) any R topic we cover in this course, but be aware that there are many ways to accomplish each task, ways that may differ much from the presentation I provide on how to do a specific data analysis.

Nor is there any need to memorize any R instructions (function calls). There is no R memorization of these instructions needed on any test. Moreover, memorization is discouraged. When doing homework, best to have the R instructions open or printed. Memorization is more than a waste of time as it is ultimately somewhat futile given how literal the computer is trying to decode your instructions.

Much of the data analysis in this course uses an extension to the standard R functions that I have developed called lessR. This R package of functions greatly simplifies the use of R. Analysts and students across the world now use lessR with about 150 downloads a day from the R servers the last time I checked. lessR was recently recognized by the American Statistical Association with my 2021 publication in their Journal of Statistics and Data Science Education.

Taylor and Francis published the second edition of my book in early 2023 on lessR, R Data Analysis without Programming: Explanation and Interpretation. The book explains how to do a wide range of data analyses with simple lessR instructions, complete with explaining the motivation for each analysis with emphasis on the explanation and interpretation of the results. This book is not required for this course but if you become intrigued by increasing your data analysis skills you may find it useful.

Online Text

Find much of this week's content in my developing online, interactive instructional module with embedded videos at:

      READ/WATCH: Introduction to R for Data Analysis

Read this online text on any device with a web browser, including phones, tablets, Chromebooks, and computers.

Videos

The above link about learning R contains embedded videos. Each video shows an actual R analysis. As you read the text, view each associated video to better understand how to accomplish each step of the analysis.

A New Way to Write Almost Anything

You do not need to do your computer analysis and writing with Quarto for this course. You are free to completely skip this content. However, you may be interested to learn this new writing system, not just for general writing but also as a modern way of obtaining and organizing your computer output. You can extend what you already have learned regarding RStudio to Quarto.

Quarto directions: web page

Quarto demonstration: video [23:25].

Some Stat Review

The online R instructional material reviews material from the first week of your prerequisite stat class, the two basic counting analyses: bar charts for categorical variables and histograms for continuous variables. These analyses are among the first analyses done for any data analysis project. They were introduced in the Introduction to R reading linked above but the following present more detail if interested.

The following pdf and the videos contain the same content. If you read the pdf's and understand the material, no need to also watch the videos, which cover the pdf's bullet point by bullet point with verbal explanation.

Note: The default name of a data frame (a data table within a running R session) for lessR recently changed from mydata to d. Some videos still reference mydata, which continues to work, but is more typing. Easier, and more current, to just use d for the name of an R data frame. You can use any name you wish but d is the default for lessR functions and so does not require to name the data frame in a function call with the data parameter.

1.3 Bar Chart (a)
Histogram (b, c)
pdf
[32]
1.3a
[9:59]
1.3b
[7:54]
1.3c
[12:20]

Remember from the course syllabus: If you can do the homework, including being able to answer the short-answer homework questions, then you understand what is required in this course to do well.