It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

A brief introduction to Stata

Making charts and graphs in Stata is easy!

Charts and graphs are effective ways to represent data to an audience.

A pie chart is best for showing the proportions of occurrence of the options in a nominal level variable.

Code

`graph pie, over (language)`

We use the command `graph `to tell Stata we want to produce a `pie `chart `over `the options in the variable `language`.

Output

The pie chart shows us that the overwhelming majority of observations reported speaking English, the next largest group is other languages, and the smallest French. This a quick and effective way to share the percent of observations compared with others. It iis often that publications do not accept color charts, so using patterns instead are also effective.

Histograms are best to plot continuous level variables because, as the name suggests, the values are on a continuum. Histograms are very helpful for investigating the distribution of continuous variables which is important for determining if a variable needs to be recoded.

Code

`histogram age`

We use the command `histogram `of the variable `age`.

Output

The histogram shows us the range of ages among the observations and the frequency of occurrence. We can also see that the distribution of `age` does not follow a normal curve and is skewed to the right. This may effect our results of our earlier statistical tests. Stata reports the density, that is the proportion of each variable in a given category, rather than raw counts.

Boxplots, often called box-and-whisker plots and are used to represent the quartiles of continuous level variables. Boxplots display the variation in the sample with boxes that represent the quartiles and 'whiskers' of observations outside the upper and lower quartiles. These plots can be done with a single variable or multiple variables, as we will see below.

Code

`graph box age `

We are using the command `graph`, specifically of a `box `plot, to graph the variable `age`.

Output

The box plot below shows us the median (just above 40) of the variable `age` with a horizontal line inside the blue box. The top and bottom edges of the blue box are the 25 (Q1) and 75 (Q3) quartiles of the distribution. Next, the whiskers are the minimum and maximum values recorded for `age `of the observations.

Code

`graph box age, over (sex)`

We are using the command `graph`, specifically of a `box `plot, to graph the variable `age ``over `the options in the variable `sex.`

Output

This box plot is separated by the sex of the observations (Female and Male). This helps us to see the distribution of age by sex.

Bar charts are bested used to represent ordinal level variables to show the distribution of the options. We can graph a bar chart of a single variable or multiple variables for a direct comparison.

Code

`graph bar, over (sex)`

We use the `graph `command and specify we want a `bar` graph `over `the options of the variable `sex`.

Output

The bar chart above shows the raw count of observations of the variable `sex `broken up by the observations. We can clearly see that there are more females than males in the dataset, but this difference is not great. Using the results from this bar chart we could ask ourselves "Is there a statistically significant difference between females and males across language, education, or wages?".

Code

`graph bar age, by(sex)`

We use the `graph `command and specify we want a `bar` graph `over `the options of the variable age and `by `the variable `sex`.

Output

We have broken the observations by sex (female and male) and the mean of age within each group.

Scatter plots are best used to graphically show if there is a relationship between two variables and what that relationship may look like.

Code

` twoway scatter wage education`

We are plotting a `scatter `plot with `twoway `axes of the variables `wage `and `education`.

Output

Above is a scatter plot of the variables `education `by `wages`. That is, the points in this graph are the values of education relative to wages. Scatter plots are very helpful when examining continuous level variables and if a graphical relationship exists. We can see in this scatter plot that there is some clustering of observations when educations is 15 and wages is 10. This suggests there may be some relationship that exists. After looking as this graph, we would next want to conduct statistical tests to see if the relationships is statically significant.

- Last Updated: Feb 26, 2021 12:20 PM
- URL: https://research.library.gsu.edu/Stata
- Print Page