Generating variables in Stata is quite simple, especially if you want to generate a new variable from an already existing variable. Researchers often generate new variables that are copies of a current one if they want to change or recode the data, while also keeping the original data so it is not lost. There is no formula for generating a new variable as it is likened to “copy” and “paste”.
Below is the code for generating the variable age1 from an already existing variable age.
Code
DATA slid;
SET slid;
age1 = age;
RUN;
We are using the DATA slid, where our new variable age1 is equal to the already existing variable age. We then end with the RUN command.
Below is the output for generating a new variable that is a copy of already existing data.
Output
The output is simply the code above. Sas only gives us this output to tell us the code ran correctly and there are no issues. Another way to check is to go dataset tab in Sas and look for our new variable age1 as an added column.
We can also generate new variables that are transformed from other variables in the dataset. This is helpful if we want to collapse a variable from a higher level of measurement to a lower level of measurement, such as continuous to categorical.
Below is the code for generating the new variable highschool from the variable education that is recoded from a continuous level to a categorical level.
Code
We are using the DATA slid to create the variable highschool that is equal to 1 IF the values of education are equal to or greater than 12. IF education is less than 12 THEN highschool is equal to 0. The last line of code is to make sure all missing values in education remain missing in highschool.
Output
The output for this is similar to the previous example. A copy of the code is shown, the number of observations read and generated from the IF and THEN statements are shown. Sas only gives us this output to tell us the code ran correctly and there are no issues. Another way to check is to go dataset tab in Sas and look for our new variable highschool as an added column.
Standardizing a variable from raw values to standard values is often done for variables that do not have a normal distribution. In this case, we are standardizing the variable age in years to Z scores.
Below is the code that will create a new variable called age_std which will have the standardized Z scores of age.
Code
DATA slid;
SET slid;
age_std = age;
RUN;
PROC STANDARD DATA= slid MEAN=0 STD=1 OUT=slid;
VAR age_std;
RUN;
Output
Sas only gives us this output to tell us the code ran correctly and there are no issues. Another way to check is to go dataset tab in Sas and look for our new variable age_std as an added column.