Sunday, August 7, 2011

15.2 – Tests for Goodness of Fit

A χ2 Goodness-of-Fit test is used when you have some practical data and you want to know how well a particular statistical distribution, such as a  binomial or a normal, models that data. The null hypothesis H0 is that the particular distribution does provide a model or the data; the alternative hypothesis H1 is that it doesn’t.

Just like Hypothesis Tests, Goodness-of-fit Tests also follow a general guideline. You need to write all these 6 steps in your answer sheet:

1. State the null & alternate hypothesis
H0: x is uniformly, B, P0, N distributed / distributed in a ratio of ?
H1: x is not distributed this way

2. Calculate the expected frequency E in the table
image

3. State the degree of freedom
There are ? classes and ? restrictions
Consider a χ2 (n – ?) distribution

4. State the significance level
Perform at ?% level
From the tables, χ2 (?%) (ν) = ?, so reject H0 if X2 > ?

5. Calculate X2 using the tables
image

6. Make your conclusion
Since X2 > / < ?, H0 is rejected in favour of H1 / not rejected
. There is evidence, at ?% level, that __________ .


Now we shall proceed to learn how to solve 5 kinds of χ2 tests through examples. Questions are in blue and answers are in red:

1. Uniform Distribution (Random)
A tetrahedral die is thrown 120 times and the number which on it lands is noted.
image
Test at the 5% level whether the die is fair.

H0: The die is fair [I can also write, “the die follows a uniform distribution”. But this is better.]
H1: The die is not fair
image
There are 4 classes and 1 restriction (
ΣE = 120) [Remember that ΣE = ? is always one of the restrictions]
Consider a χ2 (3) distribution, p
erform at 5% level.
From the tables, χ2 (5%) (ν) = 7.815, so reject H0 if X2 > 7.815.

image
Since X2 < 7.815, H0 is not rejected. There is evidence, at 5% level, that the die is fair.


2. Distributed in Given Ratio
The outcomes A, B & C of a certain experiment are thought to occur in the ratio 1 : 2 : 1. The experiment is performed 200 times and the observed frequencies of A, B & C are 36, 115 & 49 respectively. Is the difference in the observed and expected results significant? Test at the 5% level.

H0: The outcomes A, B & C are in the ratio 1 : 2: 1
H1: The outcomes A, B & C are not in the ratio 1 : 2: 1
There are 3 classes and 1 restrictions (ΣE = 200)
Consider a χ2 (2) distribution, p
erform at 5% level.
From the tables, χ2 (5%) (ν) = 5.991, so reject H0 if X2 > 5.991

image
[To save time, you could just construct one table instead of 2. You find the E and the test statistic in one table.]
Since X2 > 5.991, H0 is rejected in favour of H1. The difference in the observed & expected results are significant, at 5% level.


3. Binomial Distribution
Nothing much is different from this with the above two, just that you need more vigorous calculations to find your E. Once again, remember your binomial and Poisson formula, and combine expected frequencies less than 5. You do that because the error will be reduced, and of course, a different degree of freedom will be used.

Perform a χ2 test to investigate whether the following is drawn from a binomial distribution with p =0.3. Use a 5% level of significance.
image

H0: X ~ B(5, 0.3) [Writing the short form  is good enough.]
H1: X is not distributed this way.
The expected frequency for a Binomial distribution,
E = P(X = x) × 100 = 5Cx0.3x0.75-x × 100
where
ΣO = 100. We tabulate the table below:
image

Since the expected frequency of x = 4, 5  are < 5, the last 3 classes are combined. [please take note of this piece of information.]
There are now 4 classes and 1 restrictions  (ΣE = 100)
Consider a χ2 (3) distribution, p
erform at 5% level.
From the tables, χ2 (5%) (3) = 7.815, so reject H0 if X2 > 7.815
image
Since X2 < 7.815, H0 is not rejected. ∴ X is binomially distributed.

Notice that the number of restriction can increase, if the population proportion is not known. You use x̅ = np to find the value of p. For example, a random sample of size 50 is taken, and you are given this table
image
You don’t know the mean, but you know that
image
You can find the value of p by using the equation x̅ = np, where n = 50. That will make the question having 2 restrictions, and your degree of freedom n – 2.


4. Poisson Distribution
This one is very similar to the Binomial one. If the Poisson population mean λ is unknown, the number of restriction will add 1, and you use the sample mean x̅ = λ. Just take a look at the example.

A local council has records of the number of children and the number of households in its area. It is therefore known that the average number of children per household is 1.4 It’s suggested that the number of children per household can be modelled by a Poisson distribution with parameter 1.40. In order to test this, a random sample of 1000 households is taken, giving the following data.
image
Carry out a χ2 test, at the 5% level of significance, to determine whether or not the proposed model should be accepted.

Let X be the number of children per household.
[notice that in this case, I define X properly. You should do it when you know what is X.]
H0: X ~ P0(1.4)
H1: X is not distributed this way.
There are 6 classes and 1 restrictions  (ΣE = 1000).
Consider a χ2 (5) distribution, p
erform at 5% level.
From the tables, χ2 (5%) (5) = 11.070, so reject H0 if X2 > 11.070.

I suppose you can related that
image

image
Since X2 > 11.070 , H0 is rejected in favour of H1. The proposed model shouldn’t be accepted, X doesn’t follow a  Poisson distribution.


5. Normal Distribution
As for normal distribution, it is either you know both the population mean μ and population variance σ2, or you don’t know both μ and σ2.  In this case, you either have degrees of freedom n –1, or n – 3. See the example below:

The following data gives the heights in cm of 100 male students.
image
Find the expected frequencies of a normal distribution having the same mean and variance as the data given, and test the goodness of fit, using a 5% level of significance.

To start, we need to find the values of μ and σ2 first.
image
Let X be the height (cm) of 100 male students.
H0: X ~ N(171.54, 50.56)
H1: X is not distributed this way.

Now this one needs a lot of calculations. The expectation frequency of each class can be found by using
image
where a and b are the lower and upper boundaries of each class (remember to
±0.5). The work for a continuous variable takes some time. Remember that the bell curve goes all the way to infinity. I believe you know that your calculator can help you do tricks, right?

image
Remember to combine the small classes.
There are 5 classes and 3 restrictions  (ΣE = 100, μ and σ2 estimated from the sample).
Consider a χ2 (2) distribution, p
erform at 5% level.
From the tables, χ2 (5%) (2) = 5.991, so reject H0 if X2 > 5.991.

image
Since X2 < 5.991, H0 is not rejected. X is normally distributed,
X ~ N(171.54, 50.56).


Before I end this section, let me give you a summary of degrees of freedom used throughout this post:

image_thumb8

This section is really not hard, but a lot of vigorous calculations required. Be very careful not make mistakes, and score!

4 comments:

  1. Hi,is the table of expected and observed frequency must be drawn? or i can just state expected frequency of A = x?

    ReplyDelete
  2. It is best to draw it out, makes you see it clearer. I'm not sure whether marks will be deducted for less workings though.

    ReplyDelete
  3. Hi!

    Thank you very much for this useful post! It will help me loads in the preparation of my exams.

    I have a problem which I am encountering while trying to solve a problem. I am asked to check, using 3 classes, if 15 values are drawn from a Binomial distribution with n=10. However, the data values that I am given are 15 and in 5 classes, so this confuses me! Where do I start? And also, how can I calculate the expected values if I am not given a probability for the Binomial, and just that n=10. The data I have is (2,3,2,4,6,3,5,7,1,3,2,3,1,3,3).

    Please note that I am NOT trying to ask you to solve this for me, but just trying to get some guidelines to where to start.

    Thank ou again for your blog!

    Alex

    ReplyDelete
    Replies
    1. Hi Alex,

      Sorry I'm not sure how to solve your problem. The fact is, I'm currently reading Physics, and I've forgotten a lot about these statistics stuff... Maybe you can ask a teacher or so?

      Delete