A brief introduction to Stata

Throughout this research guide GSS 2016 Data will be used for all analysis allowing you to check your steps of analysis.

The General Social Survey is a great set of social indicators to practice analysis techniques while looking at topics of interest for scientists.

"Since 1972, the General Social Survey (GSS) has provided politicians, policymakers, and scholars with a clear and unbiased perspective on what Americans think and feel about such issues as national spending priorities, crime and punishment, intergroup relations, and confidence in institutions." General Social Survey, 2018 Website

Website for more information: http://gss.norc.org/

Brief presentation about Stata's abilities and links of places to go for Stata help.

The final output from the workshop today!

Do file of all code from the workshop.

Note: You should download this file and save it to the desktop. You can only open it within Stata. Do not try to open it like a regular document as it will not work!

*******************************************

*******************************************

*******************************************

****** Stata 1:Introduction to Stata ******

*******************************************

*******************************************

*******************************************

***** Overview *****

* This do file will walk you through opening data, cleaning data, and basic

* analysis within Stata. The methods shown here are my preference and there are

* multiple approaches to accomplish the same goals.

* To run code you hilight the code you wish to run and hit Control+d

* NOTE: This do file should not be distributed without the written

* permission of Raeda Anderson, Ph.D

*******************************************

******** Getting to Know Your Data ********

*******************************************

***** Open Data File *****

* Opening a data file is a rather simple process. You simply use the command

* use with the data file location in " ". Be sure to include.DTA

* NOTE: the following code will not work until you update the path location

use "C:\Users\randerson39\Documents\Stata Crash Course\GSS2016.DTA"

* use "data path file"

***** Looking at the Codebook Information for a Variable *****

* If the data within Stata is complete, there will be information on the

* variable properties (similar to the information that would be contained

* within a codebook)

codebook cappun degree

*codebook VariableName1 VariableName2

*******************************************

** Freq. Distributions & Crostabulations **

*******************************************

***** Running a Frequency Distribution *****

* Running a frequency distribution which contains the frequency, percents,

* valid percents, and cumulative percent.

* SINGLE VARIABLE:

tab cappun

* tab VariableName

* MULTIPLE VARIABLES:

tab1 cappun degree

* tab1 VariableName1 VariableName2 VariableNameN

***** Running a Crosstabulation *****

* Crosstabulation is a basic analysis generally conducted with two variables

* to roughly estimate the pattern between given variables

* JUST A CROSSTABULATION:

tab cappun degree

* tab VariableName1 VariableName2

* ADDING A CHI SQUARE TO THE CROSSTABULATION

tab cappun degree, chi

*tab VariableName1 VariableName2, chi

* ADDING COLUMN PERCENTS TO THE CROSSTABULATION

tab cappun degree, col

*tab VariableName1 VariableName2, col

* ADDING A CHI SQUARED AND COLUMN PERCENTS TO THE CROSSTABULATION

tab cappun degree, chi col

* tab VariableName1 VariableName2, chi col

*******************************************

****** Generating/Labeling Variables ******

*******************************************

***** Generating a Variable - No pre-existing variable *****

* In your analysis you may need to generate a variable that is constant across

* all respondents. I most commonly generate a variable like this to indicate

* which wave of data this data file contains before merging databases

gen wave1=1

* gen VariableName=Value

***** Generating a Variable- Equal to a pre-exiting variable *****

* If you need to make a copy of a variable that is a exact duplicate of an

* existing variable use the following code. I often use this option to generate

* a variable that I can later manipulate (collapse, take the average, etc)

gen overallhappy = happy

* gen NewVariableName = OldVariableName

***** Generating a Variable- Changing a pre-existing variable *****

* We often use variables in a different form from a survey/database and our

* analysis. One of the easiest ways to make this changes is with an egen or

* gen coding format.

* CALCULATED VARIABLE- I use this most frequently with age. As someone

* who studies older adults I find it important to discuss their age as

* 'one year older' or something similar. So, I am going to walk you

* through how to do just that.

* First we need to find out the minimum age of respondents in the data.

tab age

* Second we need to generate a new variable of age where the youngest

* person is 0 years old.

gen newage = age-18

* gen NewVariableName = OldVariable - amount

*Note: Stata will allow you to use common mathematical symbols such

* such as the following

* addition: gen NewVar1 = OldVar1+OldVar2

* subtraction: gen NewVar1 = OldVar1-OldVar2

* multiplication: gen NewVar1 = OldVar1*OldVar2

* division: gen NewVar1 = OldVar1/OldVar2

* VARIABLE USING SPECIFIC SUBGROUP- I use this most frequently when I

* need to analyze a group of people within a study. For this example we

* are going to generate variables that represent (1) females, (2) black

* people, and (3) black females

* Generating a variable for female

* First we need to look at the codebook for gender so we know how

* the variable is coded so it can be edited

codebook sex

* From the codebook we see that females are '2' and males are '1'

* to generate the female variable we are going to use "if" coding.

* We will say if sex is equal to 2, then we want female to be equal

* to 1. If sex is equal to 1, then we want female to be equal to 0.

* The result will be a dummy variable where female=1 and male=0.

gen female=1 if sex==2

replace female=0 if sex==1

* gen NewVariableName = value if OldVariableName == value

* replace NewVariableName = value if OldVariableName == value

tab female

* Generating a variable for black people

* First we need to look at the codebook for race so we know how

* the variable is coded so it can be edited

codebook race

* From the codebook we see that black is equal to 2, white is

* equal to 1, and other races are equal to 3. Thus we need to

* generate a variable where 1= black and 0= all other races.

gen black=1 if race==2

replace black=0 if race==1

replace black=0 if race==3

* gen NewVariableName = value if OldVariableName == value

* replace NewVariableName = value if OldVariableName == value

* replace NewVariableName = value if OldVariableName == value

tab black

* Generating a variable for black women

* This is done by generating a interaction variable of women and

* black respondents.

gen blackfemale = black*female

* gen NewVariableName = OldVariableName1*OldVariableName2

tab blackfemale

***** Labeling Variables *****

* We have generated the following new variables that need labels

* newage - respodent age 0=18, 1=19, 2=20, etc.

* female - female=1, male=0

* black- black=1, other race=0

* blackfemale- black female=1, non black female=0

* new age

* for data where the response is a number we only need to

* label the variable

label variable newage "age with 0=18 years old"

* label variable Variable1 "label for new variable"

tab newage

* female, black, black female

* for data where the response are words we need to generate variable

* labels so when we run analysis we know what each value represents

* this is a two step process (1) label the variable (2) label the values

label variable female "Female- Dummy Variable"

label variable black "Black- Dummy Variable"

label variable blackfemale "Black Female- Dummy Variable"

* label variable Variable1 "label for new variable"

label define female1 0 "male" 1 "female"

label values female female1

label define black1 0 "minority or white" 1 "black"

label values black black1

label define blackfemale1 0 "non-black minority, white, and/or male" 1 "black female"

label values blackfemale blackfemale1

* label define VariableForLabel 0 "what 0 represents" 1 "what 1

* represents"

* label values Variable1 VariableForLabel

tab1 female black blackfemale