STPM Further Mathematics T: 15.2 – Tests for Goodness of Fit

A χ²Goodness-of-Fit test is used when you have some practical data and you want to know how well a particular statistical distribution, such as a binomial or a normal, models that data. The null hypothesis H₀is that the particular distribution does provide a model or the data; the alternative hypothesis H₁ is that it doesn’t.

Just like Hypothesis Tests, Goodness-of-fit Tests also follow a general guideline. You need to write all these 6 steps in your answer sheet:

1. State the null & alternate hypothesis
H₀: x is uniformly, B, P₀, N distributed / distributed in a ratio of ?
H₁: x is not distributed this way

2. Calculate the expected frequency E in the table

3. State the degree of freedom
There are ? classes and ? restrictions
Consider a χ² (n – ?) distribution

4. State the significance level
Perform at ?% level
From the tables, χ²_(?%) (ν) = ?, so reject H₀ if X² > ?

5. Calculate X² using the tables

6. Make your conclusion
Since X² > / < ?, H₀ is rejected in favour of H₁ / not rejected. There is evidence, at ?% level, that __________ .

Now we shall proceed to learn how to solve 5 kinds of χ²tests through examples. Questions are in blue and answers are in red:

1. Uniform Distribution (Random)
A tetrahedral die is thrown 120 times and the number which on it lands is noted.

Test at the 5% level whether the die is fair.

H₀: The die is fair [I can also write, “the die follows a uniform distribution”. But this is better.]
H₁: The die is not fair

There are 4 classes and 1 restriction (ΣE = 120) [Remember that ΣE = ? is always one of the restrictions]
Consider a χ² (3) distribution, perform at 5% level.
From the tables, χ²_(5%) (ν) = 7.815, so reject H₀ if X² > 7.815.

Since X² < 7.815, H₀ is not rejected. There is evidence, at 5% level, that the die is fair.

2. Distributed in Given Ratio
The outcomes A, B & C of a certain experiment are thought to occur in the ratio 1 : 2 : 1. The experiment is performed 200 times and the observed frequencies of A, B & C are 36, 115 & 49 respectively. Is the difference in the observed and expected results significant? Test at the 5% level.

H₀: The outcomes A, B & C are in the ratio 1 : 2: 1
H₁: The outcomes A, B & C are not in the ratio 1 : 2: 1
There are 3 classes and 1 restrictions (ΣE = 200)
Consider a χ² (2) distribution, perform at 5% level.
From the tables, χ²_(5%) (ν) = 5.991, so reject H₀ if X² > 5.991

[To save time, you could just construct one table instead of 2. You find the E and the test statistic in one table.]
Since X² > 5.991, H₀ is rejected in favour of H₁. The difference in the observed & expected results are significant, at 5% level.

3. Binomial Distribution
Nothing much is different from this with the above two, just that you need more vigorous calculations to find your E. Once again, remember your binomial and Poisson formula, and combine expected frequencies less than 5. You do that because the error will be reduced, and of course, a different degree of freedom will be used.

Perform a χ² test to investigate whether the following is drawn from a binomial distribution with p =0.3. Use a 5% level of significance.

H₀: X ~ B(5, 0.3) [Writing the short form is good enough.]
H₁: X is not distributed this way.
The expected frequency for a Binomial distribution,
E = P(X = x) × 100 = ⁵C_x0.3^x0.7^5-x× 100
where ΣO = 100. We tabulate the table below:

Since the expected frequency of x = 4, 5 are < 5, the last 3 classes are combined. [please take note of this piece of information.]
There are now 4 classes and 1 restrictions (ΣE = 100)
Consider a χ² (3) distribution, perform at 5% level.
From the tables, χ²_(5%) (3) = 7.815, so reject H₀ if X² > 7.815

Since X² < 7.815, H₀ is not rejected. ∴ X is binomially distributed.

Notice that the number of restriction can increase, if the population proportion is not known. You use x̅ = np to find the value of p. For example, a random sample of size 50 is taken, and you are given this table

You don’t know the mean, but you know that

You can find the value of p by using the equation x̅ = np, where n = 50. That will make the question having 2 restrictions, and your degree of freedom n – 2.

4. Poisson Distribution
This one is very similar to the Binomial one. If the Poisson population mean λ is unknown, the number of restriction will add 1, and you use the sample mean x̅ = λ. Just take a look at the example.

A local council has records of the number of children and the number of households in its area. It is therefore known that the average number of children per household is 1.4 It’s suggested that the number of children per household can be modelled by a Poisson distribution with parameter 1.40. In order to test this, a random sample of 1000 households is taken, giving the following data.

Carry out a χ² test, at the 5% level of significance, to determine whether or not the proposed model should be accepted.

Let X be the number of children per household.
[notice that in this case, I define X properly. You should do it when you know what is X.]
H₀: X ~ P₀(1.4)
H₁: X is not distributed this way.
There are 6 classes and 1 restrictions (ΣE = 1000).
Consider a χ² (5) distribution, perform at 5% level.
From the tables, χ²_(5%) (5) = 11.070, so reject H₀ if X² > 11.070.

I suppose you can related that

Since X² > 11.070 , H₀ is rejected in favour of H₁. The proposed model shouldn’t be accepted, X doesn’t follow a Poisson distribution.

5. Normal Distribution
As for normal distribution, it is either you know both the population mean μ and population variance σ², or you don’t know both μ and σ². In this case, you either have degrees of freedom n –1, or n – 3. See the example below:

The following data gives the heights in cm of 100 male students.

Find the expected frequencies of a normal distribution having the same mean and variance as the data given, and test the goodness of fit, using a 5% level of significance.

To start, we need to find the values of μ and σ²first.

Let X be the height (cm) of 100 male students.
H₀: X ~ N(171.54, 50.56)
H₁: X is not distributed this way.

Now this one needs a lot of calculations. The expectation frequency of each class can be found by using

where a and b are the lower and upper boundaries of each class (remember to ±0.5). The work for a continuous variable takes some time. Remember that the bell curve goes all the way to infinity. I believe you know that your calculator can help you do tricks, right?

Remember to combine the small classes.
There are 5 classes and 3 restrictions (ΣE = 100, μ and σ² estimated from the sample).
Consider a χ² (2) distribution, perform at 5% level.
From the tables, χ²_(5%) (2) = 5.991, so reject H₀ if X² > 5.991.

Since X² < 5.991, H₀ is not rejected. X is normally distributed, X ~ N(171.54, 50.56).

Before I end this section, let me give you a summary of degrees of freedom used throughout this post:

This section is really not hard, but a lot of vigorous calculations required. Be very careful not make mistakes, and score! ☺

4 comments:

LikitApril 20, 2012 at 12:41 AM
Hi,is the table of expected and observed frequency must be drawn? or i can just state expected frequency of A = x?
JohnivanMay 11, 2012 at 9:27 AM
It is best to draw it out, makes you see it clearer. I'm not sure whether marks will be deducted for less workings though.
AlexMay 17, 2012 at 6:51 AM
Hi!

Thank you very much for this useful post! It will help me loads in the preparation of my exams.

I have a problem which I am encountering while trying to solve a problem. I am asked to check, using 3 classes, if 15 values are drawn from a Binomial distribution with n=10. However, the data values that I am given are 15 and in 5 classes, so this confuses me! Where do I start? And also, how can I calculate the expected values if I am not given a probability for the Binomial, and just that n=10. The data I have is (2,3,2,4,6,3,5,7,1,3,2,3,1,3,3).

Please note that I am NOT trying to ask you to solve this for me, but just trying to get some guidelines to where to start.

Thank ou again for your blog!

Alex

STPM Further Mathematics T

Pages

Sunday, August 7, 2011

15.2 – Tests for Goodness of Fit

4 comments: