Skip to Main Content

SPSS

Cross Tabulation

A crosstabulation or a contingency table shows the relationship between two or more variables by recording the frequency of observations that have multiple characteristics. Crosstabulation tables shows us a wealth of information on the relationship between the included variables. No formula is needed for a crosstabulation, since at a crosstabulation's core it is counts and percentages of observations.

The Chi-squared test often is used to accompany a crosstabulation to test if a significant relationship exists and the strength of the relationship between variables. 

In the first box below is the SPSS SYNTAX for a crosstabulation and Chi-squared test. As a general rule, the dependent variable in a crosstabulation and Chi-squared test is represented in the columns while the independent variable is represented in the rows. In this example, our two variables are "sex", the independent variable, and "language", the dependent variable. If you want to include other variables, you may simply change "sex" and "language" and replace them with another variable in the dataset. 

In the third box, we can see the original SPSS output of a general overview of our analysis (A), the crosstabulation (B), and Chi-squared test (C). We will go in more detail for the output in the "Results" section below. 

Formula

$$ \chi^2 = {\sum{{(O_i-E_i)^2}\over E_i}} $$

 

Above is the formula for a Chi-squared test. Where, $\chi^2$ the Greek letter for Chi is squared, equals the sum ($\sum$) in respect to $i$, a specific observation in the dataset of $O_i$, the observed values or the values that actually exist in the dataset. The observed values are subtracted by $E_i$, the expected values when we predict them and the residual is squared. Hence the name Chi-squared! The numerator is divided by $E_i$ to calculate our final chi-squared ($\chi^2$) value. 

Below is the SYNTAX for conducting a crosstabulation and calculating the Chi-squared test.

SYNTAX

CROSSTABS
 /TABLES=sex BY language
  /CELLS=COUNT TOTAL
  /STATISTICS=CHISQ.

 

Our command CROSSTABS is used to produce a contingency table of the variables sex and language which is specified by the subcommand /TABLES. We are including the raw counts and the percentages of the total sample size with the subcommand /CELLS=COUNT TOTAL. Lastly, we can include the Chi-squared test with the subcommand /STATISTICS and specifying CHISQ.

 

Output

undefined

 


 

A

In the first output chart, "Case Processing Summary", SPSS gives us a general overview of our crosstabulation analysis and Chi-Squared test. Here, we can see the analyzed sample size in the first column, under the “Valid” and "N". This tells us that here are 7304 observations in the dataset where there is no missing data for both “sex” and “language”. This is also accompanied by percentage of 98.4%, which is really high!

Moving to the right of the chart, we see SPSS gives us the number of missing observations and the percent under the heading “Cases Missing”. This tells us how many observations are excluded from this analysis. Finally, there is the “Total” column that adds the “Valid” and “Cases Missing” columns together.

 

B

In the second output chart, “sex*language Crosstabulation”, SPSS shows the actual crosstabulation of “sex” by “language”. We can see that “sex” is first in the SYNTAX and appears in rows while “language” is second in the SYNTAX and appears in the columns. In the SYNTAX, we also specified the cells to include COUNT and TOTAL which are the count of the observations and the percentages of the observations of the total valid sample size for this analysis.   

 

C

In the third output chart, “Chi-Squared Tests”, SPSS gives us a number of different tests for the chi-squared test. We will focus on the first row, “Pearson Chi-squared”, which includes the columns “Value”, the calculated chi-squared value, “df”, the degrees of freedom for the test, and the “Asymptotic Significance (2-sided), which is the two tailed significance level. We can see that the chi-squared value is .244, the degrees of freedom is 2 and the significance level is 0.885. Since we will be using the standard 0.05 or below as out cutoff point for the significance level, we can see that 0.885 is very much above 0.05 and then conclude there is no statistical significance of the chi-squared test. This means that there is no statistically significant relationship between the variables “sex” and “language” in this dataset.