It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

# Stata

A brief introduction to Stata

## Charts and Graphs in Stata

Making charts and graphs in Stata is easy!

Charts and graphs are effective ways to represent data to an audience.

## Select Desired Graph

A pie chart is best for showing the proportions of occurrence of the options in a nominal level variable.

Code

graph pie, over (language)

We use the command graph to tell Stata we want to produce a pie chart over the options in the variable language.

Output

The pie chart shows us that the overwhelming majority of observations reported speaking English, the next largest group is other languages, and the smallest French. This a quick and effective way to share the percent of observations compared with others. It iis often that publications do not accept color charts, so using patterns instead are also effective.

Histograms are best to plot continuous level variables because, as the name suggests, the values are on a continuum. Histograms are very helpful for investigating the distribution of continuous variables which is important for determining if a variable needs to be recoded.

Code

histogram age

We use the command histogram of the variable age.

Output

The histogram shows us the range of ages among the observations and the frequency of occurrence. We can also see that the distribution of age does not follow a normal curve and is skewed to the right. This may effect our results of our earlier statistical tests. Stata reports the density, that is the proportion of each variable in a given category, rather than raw counts.

Boxplots, often called box-and-whisker plots and are used to represent the quartiles of continuous level variables. Boxplots display the variation in the sample with boxes that represent the quartiles and 'whiskers' of observations outside the upper and lower quartiles. These plots can be done with a single variable or multiple variables, as we will see below.

Code

graph box age

We are using the command graph, specifically of a box plot, to graph the variable age.

Output

The box plot below shows us the median (just above 40) of the variable age with a horizontal line inside the blue box. The top and bottom edges of the blue box are the 25 (Q1) and 75 (Q3) quartiles of the distribution. Next, the whiskers are the minimum and maximum values recorded for age of the observations.

Code

graph box age, over (sex)

We are using the command graph, specifically of a box plot, to graph the variable age over the options in the variable sex.

Output

This box plot is separated by the sex of the observations (Female and Male). This helps us to see the distribution of age by sex.

Bar charts are bested used to represent ordinal level variables to show the distribution of the options. We can graph a bar chart of a single variable or multiple variables for a direct comparison.

Code

graph bar, over (sex)

We use the graph command and specify we want a bar graph over the options of the variable sex

Output

The bar chart above shows the raw count of observations of the variable sex broken up by the observations. We can clearly see that there are more females than males in the dataset, but this difference is not great. Using the results from this bar chart we could ask ourselves "Is there a statistically significant difference between females and males across language, education, or wages?".

Code

graph bar age, by(sex)

We use the graph command and specify we want a bar graph over the options of the variable age and by the variable sex

Output

We have broken the observations by sex (female and male) and the mean of age within each group.

Scatter plots are best used to graphically show if there is a relationship between two variables and what that relationship may look like.

Code

twoway scatter wage education

We are plotting a scatter plot with twoway axes of the variables wage and education

Output

Above is a scatter plot of the variables education by wages. That is, the points in this graph are the values of education relative to wages. Scatter plots are very helpful when examining continuous level variables and if a graphical relationship exists. We can see in this scatter plot that there is some clustering of observations when educations is 15 and wages is 10. This suggests there may be some relationship that exists. After looking as this graph, we would next want to conduct statistical tests to see if the relationships is statically significant.