A** χ ^{2 }Goodness-of-Fit test** is used when you have some practical data and you want to know how well a particular statistical distribution, such as a binomial or a normal, models that data. The

**null hypothesis**

**H**is that the particular distribution does provide a model or the data; the

_{0 }**alternative hypothesis H**is that it doesn’t.

_{1}Just like Hypothesis Tests, Goodness-of-fit Tests also follow a general guideline. You need to write all these 6 steps in your answer sheet:

**1. State the null & alternate hypothesisH _{0}: x is uniformly, B, P_{0}, N distributed / distributed in a ratio of ?H_{1}: x is not distributed this way**

**2. Calculate the expected frequency E in the table**

**3. State the degree of freedom**

**There are**

Consider a χ

__?__classes and__?__restrictionsConsider a χ

^{2}(__n – ?__) distribution**4. State the significance level**

**Perform at**

From the tables, χ

__?%__levelFrom the tables, χ

^{2 }_{(?%)}(ν) =__?__, so reject H_{0}if X^{2}>__?__**5. Calculate X ^{2} using the tables**

**6. Make your conclusionSince X ^{2} > / < ?, H_{0} is rejected in favour of H_{1} / not rejected**

**. There is evidence, at**

__?%__level, that __________ .Now we shall proceed to learn how to solve 5 kinds of **χ**^{2 }tests through examples. Questions are in blue and answers are in red:

**1. Uniform Distribution (Random)**

*A tetrahedral die is thrown 120 times and the number which on it lands is noted.*

*Test at the 5% level whether the die is fair.*

H

_{0}: The die is fair [I can also write, “the die follows a uniform distribution”. But this is better.]

H

_{1}: The die is not fair

There are 4 classes and 1 restriction (ΣE = 120) [Remember that ΣE = ? is always one of the restrictions]

Consider a χ

^{2}(3) distribution, perform at 5% level.

From the tables, χ

^{2 }

_{(5%)}(ν) = 7.815, so reject H

_{0}if X

^{2}> 7.815.

Since X^{2} < 7.815, H_{0} is not rejected. There is evidence, at 5% level, that the die is fair.

**2. Distributed in Given Ratio**

*The outcomes A, B & C of a certain experiment are thought to occur in the ratio 1 : 2 : 1. The experiment is performed 200 times and the observed frequencies of A, B & C are 36, 115 & 49 respectively. Is the difference in the observed and expected results significant? Test at the 5% level.*

H_{0}: The outcomes A, B & C are in the ratio 1 : 2: 1

H_{1}: The outcomes A, B & C are not in the ratio 1 : 2: 1

There are 3 classes and 1 restrictions (ΣE = 200)

Consider a χ^{2} (2) distribution, perform at 5% level.

From the tables, χ^{2 }_{(5%)} (ν) = 5.991, so reject H_{0} if X^{2} > 5.991

[To save time, you could just construct one table instead of 2. You find the **E** and the test statistic in one table.]

Since X^{2} > 5.991, H_{0} is rejected in favour of H_{1}. The difference in the observed & expected results are significant, at 5% level.

**3. Binomial Distribution**Nothing much is different from this with the above two, just that you need more vigorous calculations to find your

**E**. Once again, remember your binomial and Poisson formula, and

**combine expected frequencies less than 5**. You do that because the error will be reduced, and of course, a different degree of freedom will be used.

*Perform a χ ^{2} test to investigate whether the following is drawn from a binomial distribution with p =0.3. Use a 5% level of significance.*

H_{0}: X ~ B(5, 0.3) [Writing the short form is good enough.]

H_{1}: X is not distributed this way.

The expected frequency for a Binomial distribution,** E = P(X = x) × 100 = ^{5}C_{x}0.3^{x}0.7^{5-x }× 100**where

**ΣO = 100**. We tabulate the table below:

Since the expected frequency of x = 4, 5 are < 5, the last 3 classes are combined. [please take note of this piece of information.]

There are now 4 classes and 1 restrictions (ΣE = 100)

Consider a χ

^{2}(3) distribution, perform at 5% level.

From the tables, χ

^{2 }

_{(5%)}(3) = 7.815, so reject H

_{0}if X

^{2}> 7.815

Since X

^{2}< 7.815, H

_{0}is not rejected. ∴ X is binomially distributed.

Notice that the number of restriction can increase, if the** population proportion** is not known. You use **x̅ = np **to find the value of **p**. For example, a random sample of size 50 is taken, and you are given this table

You don’t know the mean, but you know that

You can find the value of **p **by using the equation **x̅ = np**, where **n = 50**. That will make the question having 2 restrictions, and your degree of freedom **n – 2**.

**4. Poisson Distribution**This one is very similar to the Binomial one. If the Poisson population mean

**λ**is unknown, the number of restriction will add 1, and you use the sample mean

**x̅ = λ**. Just take a look at the example.

*A local council has records of the number of children and the number of households in its area. It is therefore known that the average number of children per household is 1.4 It’s suggested that the number of children per household can be modelled by a Poisson distribution with parameter 1.40. In order to test this, a random sample of 1000 households is taken, giving the following data.Carry out a χ*

^{2}test, at the 5% level of significance, to determine whether or not the proposed model should be accepted.

Let X be the number of children per household.

[notice that in this case, I define X properly. You should do it when you know what is X.]

H_{0}: X ~ P_{0}(1.4)

H_{1}: X is not distributed this way.

There are 6 classes and 1 restrictions (ΣE = 1000).

Consider a χ^{2} (5) distribution, perform at 5% level.

From the tables, χ^{2 }_{(5%)} (5) = 11.070, so reject H_{0} if X^{2} > 11.070.

I suppose you can related that

Since X^{2} > 11.070 , H_{0} is rejected in favour of H_{1}. The proposed model shouldn’t be accepted, X doesn’t follow a Poisson distribution.

**5. Normal Distribution**As for normal distribution, it is either you know both the population mean

**μ**and population variance

**σ**or you don’t know both

^{2},**μ**and

**σ**In this case, you either have degrees of freedom

^{2}.**n –1**, or

**n – 3**. See the example below:

*The following data gives the heights in cm of 100 male students.*

*Find the expected frequencies of a normal distribution having the same mean and variance as the data given, and test the goodness of fit, using a 5% level of significance.*

To start, we need to find the values of **μ **and **σ ^{2 }**first.

Let X be the height (cm) of 100 male students.

H

_{0}: X ~ N(171.54, 50.56)

H

_{1}: X is not distributed this way.

Now this one needs a lot of calculations. The expectation frequency of each class can be found by using

where **a** and **b** are the lower and upper boundaries of each class (remember to **±0.5**). The work for a continuous variable takes some time. Remember that the bell curve goes all the way to infinity. I believe you know that your calculator can help you do tricks, right?

Remember to combine the small classes.

There are 5 classes and 3 restrictions (ΣE = 100, **μ **and **σ ^{2} **estimated from the sample).

Consider a χ

^{2}(2) distribution, perform at 5% level.

From the tables, χ

^{2 }

_{(5%)}(2) = 5.991, so reject H

_{0}if X

^{2}> 5.991.

Since X^{2} < 5.991, H_{0} is not rejected. X is normally distributed, X ~ N(171.54, 50.56).

Before I end this section, let me give you a summary of degrees of freedom used throughout this post:

This section is really not hard, but a lot of vigorous calculations required. Be very careful not make mistakes, and score! ☺

Hi,is the table of expected and observed frequency must be drawn? or i can just state expected frequency of A = x?

ReplyDeleteIt is best to draw it out, makes you see it clearer. I'm not sure whether marks will be deducted for less workings though.

ReplyDeleteHi!

ReplyDeleteThank you very much for this useful post! It will help me loads in the preparation of my exams.

I have a problem which I am encountering while trying to solve a problem. I am asked to check, using 3 classes, if 15 values are drawn from a Binomial distribution with n=10. However, the data values that I am given are 15 and in 5 classes, so this confuses me! Where do I start? And also, how can I calculate the expected values if I am not given a probability for the Binomial, and just that n=10. The data I have is (2,3,2,4,6,3,5,7,1,3,2,3,1,3,3).

Please note that I am NOT trying to ask you to solve this for me, but just trying to get some guidelines to where to start.

Thank ou again for your blog!

Alex

Hi Alex,

DeleteSorry I'm not sure how to solve your problem. The fact is, I'm currently reading Physics, and I've forgotten a lot about these statistics stuff... Maybe you can ask a teacher or so?