# R

## Descriptive Statistics for One Variable

Getting the descriptive statistics in RStudio is quick for one or multiple variables. Descriptive statistics are measures we can use to learn more about the distribution of observations in variables for analysis, transforming variables, and reporting. Each descriptive statistic has their own formula that we will not be covering in this guide, but we will walk through the interpretation of each.

Below is the code for calculating the descriptive statistics of the variable wages.

Code

summary(SLID\$wages)

We are conducting a summary of the variable wages fom the dataset SLID.

Output

A

Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
2.300   9.235  14.090  15.553  19.800  49.920    3278

A

The output chart shows us descriptive statistics and missing values. We are going to focus on a couple of descriptive statistics in this output. Moving from left to right, we can see the Min. (minimum), 1st Qu (first quartile), Median, Mean, 3rd Qu (third quartile), Max. (maximum), and NA's (missing values).

The average wage value in this dataset is 15.553 which is below the middle value of 26.11 ((49.92 – 2.30)/2) , indicating the distribution of the data is skewed toward lower values.

## Descriptive Statistics for Multiple Variables

We can also calculate the descriptive statistics for all the variables in one command line.

Code

summary(SLID)

We are conducting a summary on the SLID dataset.

Output

wages          education          age            sex          language
Min.   : 2.300   Min.   : 0.00   Min.   :16.00   Female:3880   English:5716
1st Qu.: 9.235   1st Qu.:10.30   1st Qu.:30.00   Male  :3545   French : 497
Median :14.090   Median :12.10   Median :41.00                 Other  :1091
Mean   :15.553   Mean   :12.50   Mean   :43.98                 NA's   : 121
3rd Qu.:19.800   3rd Qu.:14.53   3rd Qu.:57.00
Max.   :49.920   Max.   :20.00   Max.   :95.00
NA's   :3278     NA's   :249

In this chart RStudio provides us with each variable name, Min. (minimum), 1st Qu (first quartile), Median,  Mean3rd Qu (third quartile), Max. (maximum), and NA's (missing values).