`base R`

and `ggplot2`

, a popular package from the `tidyverse`

. Charts and graphs are effective ways to represent data to an audience, and `ggplot2`

offers powerful and flexible tools for creating visually appealing plots.

Histograms are best to plot continuous level variables because, as the name suggests, the values are on a continuum. Histograms are very helpful for investigating the distribution of continuous variables which is important for determining if a variable needs to be recoded.

**Code**

We can create histograms either through base R or ggplot2 package.

- In base R, we use
`hist()`

function we are plotting a distribution of`expenditure`

variable. - In
`tidyverse`

, we use`ggplot()`

and`geom_histogram()`

functions to create the same graph. - In comparison to base R,
`ggplot()`

function enables us to customize our plots. For instance, we were able to change the count of bins, added a theme (`theme_bw()`

function), and change the labels of the x-axis and y-axis using`labs()`

function.

**Output from Base R**

**Output from ggplot()**

**Output from ggplot() - improved version**

The histogram shows us the range of ages among the observations and the frequency of occurrence. We can also see that the distribution of `expenditure`

does not follow a normal curve (it is closer to normal curve, but it is not normal) and is skewed to the right. This may effect our results of our earlier statistical tests.

Boxplots, often called box-and-whisker plots and are used to represent the quartiles of continuous level variables. Boxplots display the variation in the sample with boxes that represent the quartiles and 'whiskers' of observations outside the upper and lower quartiles. These plots can be done with a single variable or multiple variables, as we will see below.

**Code**

We can create boxplots either through base R or ggplot2 package.

- In base R, we use
`boxplot()`

function we are plotting a distribution of`expenditure`

variable. - In
`tidyverse`

, we use`ggplot()`

and`geom_boxplot()`

functions to create the same graph. - In comparison to base R,
`ggplot()`

function enables us to customize our plots. For instance, we were able to add a theme (`theme_bw()`

function), and change the labels of the x-axis and y-axis using`labs()`

function.

`expenditure`

with a horizontal line inside the gray box. The top and bottom edges of the gray box are the 25 (Q1) and 75 (Q3) quartiles of the distribution. Next, the whiskers are the minimum and maximum values recorded for `expenditure`

of the observations. Dots are outliers.

`ggplot()`

`ggplot()`

- improved versionWe can also create a boxplot of `expenditure`

variable by other variables. For instance, we can graph `expenditure`

by two counties in `county`

variable.

This code might look intimidating at first. However, each step helps us to configure a specific aspect of the plot:

`filter()`

function helps us to filter county variable into only two options: Sonoma and Merced`geom_boxplot()`

function creates a boxplot of expenditure by county`theme_bw()`

function creates black-and-white theme for the plot`labs()`

function changes the x-axis and y-axis names`coord_flip()`

function flips the coordinates x and y`scale_x_continuous()`

function helps us to change how x-axis scale looks like`breaks`

argument with`seq()`

function helps to alter the x-axis ticks`limits`

argument helps us to alter the limits of the x-axis (lower and upper limits)

This box plot is separated by the two counties (Merced and Sonoma) and `expenditure`

is represented in the y-axis. This helps us to see the distribution of `expenditure`

by `county`

.

Bar plots are bested used to represent ordinal level variables to show the distribution of the options. We can graph a bar plot of a single variable or multiple variables for a direct comparison.

We can create bar plots either through base R or ggplot2 package.

- In base R, we use
`barplot()`

function we are plotting a distribution of grades - In
`tidyverse`

, we use`ggplot()`

and`geom_bar()`

functions to create the same graph. - In comparison to base R,
`ggplot()`

function enables us to customize our plots. For instance, we were able to add a theme (`theme_bw()`

function), and change the labels of the x-axis and y-axis using`labs()`

function.

`ggplot()`

`ggplot()`

- improved versionThe bar plots above show the raw count of observations of the variable `grades`

broken up by the observations. We can clearly see that there are more KK-08 grades than KK-06 grades in the dataset.

This code might look intimidating at first. However, each step helps us to configure a specific aspect of the plot:

`filter()`

function helps us to filter county variable into only two options: Sonoma and Merced`geom_bar()`

function creates a boxplot of expenditure by county`fill`

and`color`

arguments help us to fill and color our bar plot by county variable

`theme_minimal()`

function creates a minimal theme for the plot`labs()`

function changes the x-axis and y-axis names`coord_flip()`

function flips the coordinates x and y`scale_y_continuous()`

function helps us to change how y-axis scale looks like`breaks`

argument with`seq()`

function helps to alter the y-axis ticks`limits`

argument helps us to alter the limits of the y-axis (lower and upper limits)

We have broken the observations by grades (KK-06 and KK-08) and the county (Merced and Sonoma district).

Scatter plots are best used to graphically show if there is a relationship between two variables and what that relationship may look like.

We can create bar plots either through base R or ggplot2 package.

- In base R, we use
`plot()`

function we are plotting a distribution of grades - In
`tidyverse`

, we use`ggplot()`

and`geom_point()`

functions to create the same graph. - In comparison to base R,
`ggplot()`

function enables us to customize our plots. For instance, we were able to add a theme (`theme_bw()`

function), and change the labels of the x-axis and y-axis using`labs()`

function, and even add a regression line using`geom_smooth()`

function.

`ggplot()`

`ggplot()`

- improved versionAbove are scatter plots of the variables `students`

by `teachers`

. Scatter plots are very helpful when examining continuous level variables and if a graphical relationship exists. We can see in this scatter plot that there is a linear and positive relationship between the number of students and teachers. After looking as this graph, we would next want to conduct statistical tests to see if the relationships is statically significant.

- Last Updated: Aug 7, 2024 2:15 PM
- URL: https://research.library.gsu.edu/R
- Print Page