Skip to Main Content

Research Guides

R and R Studio

This guide provides general information about R and R Studio for data manipulation, analysis, and visualization.

Methods for Creating and Transforming Variables

Generating variables in Stata is quite simple, especially if you want to generate a new variable from an already existing variable. Researchers often generate new variables that are copies of a current one if they want to change or recode the data, while also keeping the original data so it is not lost. There is no formula for generating a new variable as it is likened to “copy” and “paste”.

Below is the code for generating the variable students_new from an already existing variable students

Screenshot of creating a new variable from an existing variable

The output of this code is simply the code above. RStudio only gives us this output to tell us the code ran correctly and there are no issues. Another way to check is to go to the my_data dataset in RStudio and look for our new variable students_new as an added column.

We can also generate new variables that are transformed from other variables in the dataset. This is helpful if we want to collapse a variable from a higher level of measurement to a lower level of measurement, such as continuous to categorical.

Below is the code for generating the new variable high_income from the variable income that is recoded from a continuous level to a categorical level.

Screenshot of ifelse function in R

The ifelse() function in R is a vectorized conditional statement that evaluates a logical test and returns one value if the test is TRUE and another value if the test is FALSE. For instance, we created a new variable called high_income through the conditional statement. If the income variable is 10 or above, then high_income is going to be equal to 1 and equal to 0 when it is less than 10.

We can quickly check what R did with ifelse() function. Below, using tidyverse pipe operator, we selected income and high_income variables and printed the first 6 observations using head() function. If income is equal or above 10, high_income is 1. 

Screenshot of output using select function

Standardizing a variable from raw values to standard values is often done for variables that do not have a normal distribution. Below is the code that will create a new variable called students_std which will have the standardized Z scores of students

Screenshot of scale function in R