Sunday, August 7, 2011

15.3 – Tests for Independence

Sometimes situations arise when data are displayed in a contingency table, which is a table displaying data classified according to to 2 different factors / attributes. For example, the table below

This is a 2 by 3 table, which shows the different schools and their different performance in an exam. We use a χ2 test to determine whether the two factors are independent, or whether there is an association between them. According to the table above, we want to know whether the school affects their exam performance. Or in other words, since the amount of students of school A and school B. are different (80 and 70 respectively), we know that, if they have the same ratios of credit, pass and fail, it means that whichever the school it is, also it doesn’t affect the grades.

This kind of test is known as the test for independence. As usual, we shall find the expected frequency, find the degree of freedom ν and find the test statistic X2 which has the same formula as the previous section.

Let’s take the above example. The degree of freedom for a h × k contingency table can be found using the formula
ν = (h – 1)(k – 1)

and so, the above table has the value of ν = 2. The expected frequency E, can be found through the formula

To find this, we first need to find the total of each row and column. We modify the table above, colour it a little, then we get

The black numbers in the middle are known as the observed frequency. To proceed to find the expected frequencies, we construct another table, but clearing off all the data in the middle.

Next, we use the above formula to fill in the expected frequencies. For the top left cell, we have 90 × 80 ÷ 150 = 48.0
We proceed to fill up the rest:

From here, we proceed to find X2 by making use of the 6 values of O and E that we just calculated. Now let me give you an example:

A research worker studying the ages of adults and the number of credit cards they posses obtained the results shown in the table.

Use the χ2 statistic and a significance test at the 5% level to decide whether or not there’s an association between age and number of credit cards possessed.

H0: There’s no association between age and number of credit cards possessed.
H1: There’s an association between age and number of credit cards possessed.
Expected frequency,

ν = (2 – 1)(2 – 1) = 1, the Yates’ Correction is used.
Use the χ2 (1) distribution, perform the test at 5% level.
Since χ2(5%) (1) = 3.841, reject H0 if X2 > 3.841.

Since X2 > 3.841, H0 is rejected. There’s an association between age and number of credit cards possessed, at 5% level.

Easy? That’s all for this chapter.