Seminar on Data Analysis and R

Of Interest

Packages

Packages and General Info

R Markdown

Much better to construct your R analysis files, file type .r, as R Markdown files, file type .Rmd. That way you can document your output, including not just R code, but explanations of the code and the corresponding results. The output contains both the r output and the included documentation.

Like an r script file, the Rmd file is a simple text file, with simple markdown conventions to indicate formatting, such as paragraphs, headers, tables, etc. Use the Knit button at the top of the file window in RStudio to generate html output, or Word output if MS Word is installed, or pdf output if LaTeX is installed. The syntax of these files is simple, easily explained by example in a few pages.

R Markdown Basics

ANOVA

ANOVA and Regression

This discussion expands and complements the material in Section 10.3, Indicator Variables, from my book R Data Analysis without Programming.

R Markdown file to generate the output

The Document as Web Output

data

ANOVA and Effect Size

Cohen's effect size index for multiple levels of a treatment (X) variable is the f-statistic, which is a straightforward generalization of his d-statistic for comparing two means (or a single mean against a fixed alternative). The d-statistics is based on a single mean difference, divided by the standard error. For a one-way design, the f-statistic is based on the difference of each group mean from the grand mean. For more complex designs, what is subtracted depends on the effect that is evaluated, such as a main effect or two-way interaction effect.

If you use SPSS, a thorough discussion regarding the eta-squared statistic reported by SPSS, which is actually a partial eta squared statistic, is provided by Levine and Hullett (Levene is from the MSU communications dept).

But, in general, as we discussed, omega squared, comparable to R squared adjusted, is preferred to eta squared, comparable to R squared, which is just a description of explained variance in the sample. Omega squared provides an estimate of this variance in the population, and so does not suffer from overfitting, and is thus a more conservative estimate. In my lessR output, R squared is eta squared (but not the eta squared reported by SPSS as explained in the above citation).

According to Kirk (1995, p. 177), third edition, omega squared and the intraclass correlation have the same definition for completely randomized designs. The term omega squared is applied to the more common fixed treatment designs. The term intraclass correlation is applied to random treatment effects. Their formulas differ (p. 178), but both operationalizations reflect the same concept: strength of association for a categorical variable with a continuous variable. The strength of association is defined as the population variance in the continuous variable accounted for by the categorical variable.