Skip to Main Content

R

Methods for Creating and Transforming Variables

Generating variables in Stata is quite simple, especially if you want to generate a new variable from an already existing variable. Researchers often generate new variables that are copies of a current one if they want to change or recode the data, while also keeping the original data so it is not lost. There is no formula for generating a new variable as it is likened to “copy” and “paste”.

Below is the code for generating the variable age1 from an already existing variable age

Code

SLID\$age1<-SLID\$age

 

We are creating the new variable, age1, which is <- (equal) to the already existing variable age.

Below is the output for generating a new variable that is a copy of already existing data. 

Output

> SLID\$age1 <- SLID\$age

The output is simply the code above. RStudio only gives us this output to tell us the code ran correctly and there are no issues. Another way to check is to go to the SLID dataset in RStudio and look for our new variable age1 as an added column.

We can also generate new variables that are transformed from other variables in the dataset. This is helpful if we want to collapse a variable from a higher level of measurement to a lower level of measurement, such as continuous to categorical.

Below is the code for generating the new variable highschool from the variable education that is recoded from a continuous level to a categorical level.

Code

SLID$highschool <- ifelse(SLID$education<=12, 1, 0)

 

We are creating the variable highschool that is <- (equal) to 1 ifelse the values of education are <= greater than or equal to 12 and equal to 0 when less than 12.

Output

The output for this is similar to the previous example. RStudio only gives us this output to tell us the code ran correctly and there are no issues. Another way to check is to go to the SLID dataset in RStudio and look for our new variable highschool as an added column.

> SLID\$education_new <- ifelse(SLID\$education<=12, 1, 0)

Standardizing a variable from raw values to standard values is often done for variables that do not have a normal distribution. In this case, we are standardizing the variable age in years to Z scores. 

Below is the code that will create a new variable called age_std which will have the standardized Z scores of age

Code

SLID$age_std<-scale(SLID$age)

 

We are creating the new variable age_std that is <- (equal) to the scale (standardized variable) of age.

Output

> age_sc<-scale(SLID$age)

 

RStudio only gives us this output to tell us the code ran correctly and there are no issues. Another way to check is to go SLID dataset in RStudio and look for our new variable age_std as an added column.