Getting the descriptive statistics in RStudio is quick for one or multiple variables. Descriptive statistics are measures we can use to learn more about the distribution of observations in variables for analysis, transforming variables, and reporting. Each descriptive statistic has their own formula that we will not be covering in this guide, but we will walk through the interpretation of each.
Below is the code for calculating the descriptive statistics of the variable wages.
We are conducting a summary of the variable wages fom the dataset SLID.
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
2.300 9.235 14.090 15.553 19.800 49.920 3278
The output chart shows us descriptive statistics and missing values. We are going to focus on a couple of descriptive statistics in this output. Moving from left to right, we can see the Min. (minimum), 1st Qu (first quartile), Median, Mean, 3rd Qu (third quartile), Max. (maximum), and NA's (missing values).
The average wage value in this dataset is 15.553 which is below the middle value of 26.11 ((49.92 – 2.30)/2) , indicating the distribution of the data is skewed toward lower values.
We can also calculate the descriptive statistics for all the variables in one command line.
We are conducting a summary on the SLID dataset.
wages education age sex language
Min. : 2.300 Min. : 0.00 Min. :16.00 Female:3880 English:5716
1st Qu.: 9.235 1st Qu.:10.30 1st Qu.:30.00 Male :3545 French : 497
Median :14.090 Median :12.10 Median :41.00 Other :1091
Mean :15.553 Mean :12.50 Mean :43.98 NA's : 121
3rd Qu.:19.800 3rd Qu.:14.53 3rd Qu.:57.00
Max. :49.920 Max. :20.00 Max. :95.00
NA's :3278 NA's :249
In this chart RStudio provides us with each variable name, Min. (minimum), 1st Qu (first quartile), Median, Mean, 3rd Qu (third quartile), Max. (maximum), and NA's (missing values).