Skip to Main Content

R

Methods for Creating and Transforming Variables

Generating variables in Stata is quite simple, especially if you want to generate a new variable from an already existing variable. Researchers often generate new variables that are copies of a current one if they want to change or recode the data, while also keeping the original data so it is not lost. There is no formula for generating a new variable as it is likened to “copy” and “paste”.

Below is the code for generating the variable students_new from an already existing variable students

The output of this code is simply the code above. RStudio only gives us this output to tell us the code ran correctly and there are no issues. Another way to check is to go to the my_data dataset in RStudio and look for our new variable students_new as an added column.

We can also generate new variables that are transformed from other variables in the dataset. This is helpful if we want to collapse a variable from a higher level of measurement to a lower level of measurement, such as continuous to categorical.

Below is the code for generating the new variable high_income from the variable income that is recoded from a continuous level to a categorical level.

The ifelse() function in R is a vectorized conditional statement that evaluates a logical test and returns one value if the test is TRUE and another value if the test is FALSE. For instance, we created a new variable called high_income through the conditional statement. If the income variable is 10 or above, then high_income is going to be equal to 1 and equal to 0 when it is less than 10.

We can quickly check what R did with ifelse() function. Below, using tidyverse pipe operator, we selected income and high_income variables and printed the first 6 observations using head() function. If income is equal or above 10, high_income is 1. 

Standardizing a variable from raw values to standard values is often done for variables that do not have a normal distribution. Below is the code that will create a new variable called students_std which will have the standardized Z scores of students