Skip to Main Content

R

Charts and Graphs in RStudio

Making charts and graphs in R Studio is easy! We will provide examples using both base R and ggplot2, a popular package from the tidyverse. Charts and graphs are effective ways to represent data to an audience, and ggplot2 offers powerful and flexible tools for creating visually appealing plots.

Select Desired Graph

Histograms are best to plot continuous level variables because, as the name suggests, the values are on a continuum. Histograms are very helpful for investigating the distribution of continuous variables which is important for determining if a variable needs to be recoded.

Code

We can create histograms either through base R or ggplot2 package.

  • In base R, we use hist() function we are plotting a distribution of expenditure variable.
  • In tidyverse, we use ggplot() and geom_histogram() functions to create the same graph.
  • In comparison to base R, ggplot() function enables us to customize our plots. For instance, we were able to change the count of bins, added a theme (theme_bw() function), and change the labels of the x-axis and y-axis using labs() function.

Output from Base R

Output from ggplot()

Output from ggplot() - improved version

The histogram shows us the range of ages among the observations and the frequency of occurrence. We can also see that the distribution of expenditure does not follow a normal curve (it is closer to normal curve, but it is not normal) and is skewed to the right. This may effect our results of our earlier statistical tests.

Boxplots, often called box-and-whisker plots and are used to represent the quartiles of continuous level variables. Boxplots display the variation in the sample with boxes that represent the quartiles and 'whiskers' of observations outside the upper and lower quartiles. These plots can be done with a single variable or multiple variables, as we will see below.

Code

We can create boxplots either through base R or ggplot2 package.

  • In base R, we use boxplot() function we are plotting a distribution of expenditure variable.
  • In tidyverse, we use ggplot() and geom_boxplot() functions to create the same graph.
  • In comparison to base R, ggplot() function enables us to customize our plots. For instance, we were able to add a theme (theme_bw() function), and change the labels of the x-axis and y-axis using labs() function.

The boxplots below show us the median (just above 5,000) of the variable expenditure with a horizontal line inside the gray box. The top and bottom edges of the gray box are the 25 (Q1) and 75 (Q3) quartiles of the distribution. Next, the whiskers are the minimum and maximum values recorded for expenditure of the observations. Dots are outliers.

Output from base R

Output from ggplot()

Output from ggplot() - improved version

Code

We can also create a boxplot of expenditure variable by other variables. For instance, we can graph expenditure by two counties in county variable.

This code might look intimidating at first. However, each step helps us to configure a specific aspect of the plot:

  • filter() function helps us to filter county variable into only two options: Sonoma and Merced
  • geom_boxplot() function creates a boxplot of expenditure by county
  • theme_bw() function creates black-and-white theme for the plot
  • labs() function changes the x-axis and y-axis names
  • coord_flip() function flips the coordinates x and y 
  • scale_x_continuous() function helps us to change how x-axis scale looks like
    • breaks argument with seq() function helps to alter the x-axis ticks
    • limits argument helps us to alter the limits of the x-axis (lower and upper limits)

Output

This box plot is separated by the two counties (Merced and Sonoma) and expenditure is represented in the y-axis. This helps us to see the distribution of expenditure by county

Bar plots are bested used to represent ordinal level variables to show the distribution of the options. We can graph a bar plot of a single variable or multiple variables for a direct comparison.

Code

We can create bar plots either through base R or ggplot2 package.

  • In base R, we use barplot() function we are plotting a distribution of grades variable.
  • In tidyverse, we use ggplot() and geom_bar() functions to create the same graph.
  • In comparison to base R, ggplot() function enables us to customize our plots. For instance, we were able to add a theme (theme_bw() function), and change the labels of the x-axis and y-axis using labs() function.

Output from base R

Output from ggplot()

Output from ggplot() - improved version

The bar plots above show the raw count of observations of the variable grades broken up by the observations. We can clearly see that there are more KK-08 grades than KK-06 grades in the dataset.

Code

This code might look intimidating at first. However, each step helps us to configure a specific aspect of the plot:

  • filter() function helps us to filter county variable into only two options: Sonoma and Merced
  • geom_bar() function creates a boxplot of expenditure by county
    • fill and color arguments help us to fill and color our bar plot by county variable
  • theme_minimal() function creates a minimal theme for the plot
  • labs() function changes the x-axis and y-axis names
  • coord_flip() function flips the coordinates x and y 
  • scale_y_continuous() function helps us to change how y-axis scale looks like
    • breaks argument with seq() function helps to alter the y-axis ticks
    • limits argument helps us to alter the limits of the y-axis (lower and upper limits)

Output

We have broken the observations by grades (KK-06 and KK-08) and the county (Merced and Sonoma district).

Scatter plots are best used to graphically show if there is a relationship between two variables and what that relationship may look like. 

Code

We can create bar plots either through base R or ggplot2 package.

  • In base R, we use plot() function we are plotting a distribution of grades variable.
  • In tidyverse, we use ggplot() and geom_point() functions to create the same graph.
  • In comparison to base R, ggplot() function enables us to customize our plots. For instance, we were able to add a theme (theme_bw() function), and change the labels of the x-axis and y-axis using labs() function, and even add a regression line using geom_smooth() function.

Output from base R

Output from ggplot()

Output from ggplot() - improved version

Above are scatter plots of the variables students by teachers. Scatter plots are very helpful when examining continuous level variables and if a graphical relationship exists. We can see in this scatter plot that there is a linear and positive relationship between the number of students and teachers. After looking as this graph, we would next want to conduct statistical tests to see if the relationships is statically significant.