GSU Library Research Guides: SAS: Crosstabs (Contingency Table)

Cross Tabulation

A crosstabulation or a contingency table shows the relationship between two or more variables by recording the frequency of observations that have multiple characteristics. Crosstabulation tables show us a wealth of information on the relationship between the included variables. No formula is needed for a crosstabulation, since at a crosstabulation's core it is counts and percentages of observations.

The Chi-squared test is often used to accompany a crosstabulation to test if a significant relationship exists and the strength of the relationship between variables.

As a general rule, the dependent variable in a crosstabulation and Chi-squared test is represented in the columns while the independent variable is represented in the rows. In this example, our two variables are sex, the independent variable, and language, the dependent variable. If you want to include other variables, you may simply change sex and language and replace them with another variable in the dataset.

Formula

$$ \chi^2 = {\sum{{(O_i-E_i)^2}\over E_i}} $$

Above is the formula for a Chi-squared test. Where, $\chi^2$ the Greek letter for Chi is squared, equals the sum ($\sum$) in respect to $i$, a specific observation in the dataset of $O_i$, the observed values or the values that actually exist in the dataset. The observed values are subtracted by $E_i$, the expected values when predicted and the residual is squared. Hence the name Chi-squared! The numerator is divided by $E_i$ to calculate our final chi-squared ($\chi^2$) value.

Below is the code for conducting a crosstabulation and calculating the Chi-squared test.

Code

PROC FREQ DATA = SLID;
TABLES sex*language /chisq;
RUN;

Our code for the PROC (procedure) of FREQ (frequency) of the DATA from the dataset SLID. Specifically, we want a TABLES of the variables language by (*) sex and the chisq (Chi-sqaured) test. We then end with the RUN command.

Output

In the output chart Sas shows the crosstabulation of sex by language. We can see that sex is first in the code and appears in rows while language is written second and appears in the columns. Sas automatically provides the Frequency, Percent, Row Pct (row percent), and Col Pct (column percent) in the table. The percents are the percentages of observations of the total sample size in the dataset for this analysis.

The second output table, Statistics for Table of sex by language, Sas gives us a number of different tests in addition to the Chi-squared test that we specified in the code. We will focus on the first row, Chi-Square, which includes the columns, DF the degrees of freedom, Value the calculated chi-squared statistic, and Prob which is the two tailed significance level.

We can see that the Chi-squared value is .2442, the degrees of freedom is 2 and the significance level is 0.8851. Since we will be using the standard 0.05 or below as out cutoff point for the significance level, we can see that 0.885 is very much above 0.05 and then conclude there is no statistical significance of the chi-squared test. This means that there is no statistically significant relationship between the variables sex and language in this dataset.