Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

R

Cross Tabulation

A crosstabulation or a contingency table shows the relationship between two or more variables by recording the frequency of observations that have multiple characteristics. Crosstabulation tables show us a wealth of information on the relationship between the included variables. No formula is needed for a crosstabulation, since at a crosstabulation's core it is counts and percentages of observations.

The Chi-squared test is often used to accompany a crosstabulation to test if a significant relationship exists and the strength of the relationship between variables. 

As a general rule, the dependent variable in a crosstabulation and Chi-squared test is represented in the columns while the independent variable is represented in the rows. In this example, our two variables are sex, the independent variable, and language, the dependent variable. If you want to include other variables, you may simply change sex and language and replace them with another variable in the dataset. 

Formula

$$ \chi^2 = {\sum{{(O_i-E_i)^2}\over E_i}} $$

 

Above is the formula for a Chi-squared test. Where, $\chi^2$ the Greek letter for Chi is squared, equals the sum ($\sum$) in respect to $i$, a specific observation in the dataset of $O_i$, the observed values or the values that actually exist in the dataset. The observed values are subtracted by $E_i$, the expected values when predicted and the residual is squared. Hence the name Chi-squared! The numerator is divided by $E_i$ to calculate our final chi-squared ($\chi^2$) value. 

Below is the code for conducting a crosstabulation and calculating the Chi-squared test.

Code

table(SLID\$sex, SLID\$language)

chisq.test(table(SLID\$sex, SLID\$language))

 

There are two lines of code above. The first line of code we are table (crosstabulating) the variables sex and language from the SLID dataset. 

The second line of code we are conducting a chisq.test (Chi-squared test) on the crosstabulation table sex and language from the SLID dataset. 

Output

A   

         English French Other
  Female    2999    262   564
  Male      2717    235   527


B

Pearson's Chi-squared test

data:  table(SLID\$sex, SLID\$language)
X-squared = 0.24422, df = 2, p-value = 0.8851

A

In the output chart Rstuido shows the crosstabulation of sex by language. We can see that sex is first in the code and appears in rows while language is written second and appears in the columns.

B

The second output table, Pearson's Chi-squared test, ​we can see that the X-squared (Chi-squared) value is .24422, the degrees of freedom is 2 and the significance level is 0.8851. Since we will be using the standard 0.05 or below as out cutoff point for the significance level, we can see that 0.8851 is very much above 0.05 and then conclude there is no statistical significance of the Chi-squared test. This means that there is no statistically significant relationship between the variables sex and language in this dataset.