Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Stata

A brief introduction to Stata

What is a Frequency Distribution?

A frequency table shows the distribution of observations based on the options in a variable. Frequency tables are helpful to understand which options occur more or less often in the dataset. This is helpful for getting a better understanding of each variable and deciding if variables need to be recoded or not. There is no formula for a frequency table since it reports the count of each option in a variable.

Below is the code for the frequency distribution for the variable language in the SLID dataset. 

Code

tab language

 

The command tab is to tabulate the variable language.

Output

undefined

A

The output chart shows us the frequency distribution of the variable language.

Each row in the chart is an option that respondents could have selected during data collection. We can see the options English, French, and Other and the Total number of observations in this analysis (note: this total value excludes all missing observations). Stata always presents options that are coded from the smallest number the to greatest, so we know that English is coded with a numerical value of “1”, French is coded as “2”, and Other is “3”, which is helpful when recoding. 

We can see most, 5,716 observations, selected English as their language. The least commonly reported option is French with 497 observations and 1,091 observations selecting Other.

In the next column to the right, Percent, Stata shows us the percentage of each option from the entire dataset that only includes non-missing observations. For example, 78.26% of observations selected English.

The column furthest to the right, Cumulative Percent, is the percentage of each option and the option(s) above it. Cumulative Percent are used to determine cutoffs for quartiles. For example, 85.06% of observations in the dataset speak either English or French.