Tuesday, August 2, 2011

13.2 – Sampling Distributions

When we are in the process of finding sample means, or standard deviations, we might also want to know how the data are distributed. So following the few distributions that we have learnt, being Binomial, Poisson and Normal, we are learning a new one here: The Sampling Distribution of means.


Before we start, we need to recall some information on expectation algebra. We remember that in a population, the expected value E(X) is actually the mean itself, μ, while the expected variance Var(X) is the variance of the population itself, σ2. So now, we are going to find the expected value of a sample mean, E(X̅).

We all know that the mean of a sample of size n can be represented by the equation

where x1, x2 and etc are independent observations in the populations. So we further find that the expected value of sample mean is

which is actually the same value as the population mean. What this means is that the sample mean estimated should have the same value of the population mean. We will then find that the sample variance has a different value from the population variance. Using the fact that

we find the sample variance to be

So the standard deviation of the sampling distribution is

which we call as the standard error of the mean. However, remember that this standard error is for samples with replacement. For samples without replacement, the variance would be

Where N is the size of the finite population, and n being the sample size. I do not know how to derive this, and I don’t think it will appear in exams. I put it here for your reference.

So now, for every time when we have a normal distribution X ~ N(μ, σ2), we have a sampling distribution of

Consider the distribution X ~ N(100, 64)

and consider the following:

Notice that the sample size affects the sampling distribution. So now to answer questions, unlike Maths T, you have to be very particular as in whether it is talking about a population or a sample. Let me give you an example:

The volume of wine in bottles are normally distributed with a mean of 758ml and a standard deviation of 12ml. A random sample of 10 bottles is taken and the mean volume found. Calculate the probability that the sample mean is less than 750ml.

Let X be the volume of wine in bottles.
X ~ N(758, 122)
Since X is normally distributed, then the sampling distribution with n = 10,
X̅ ~ N(758, 122 / 10)
X̅ ~ N(758, 14.4)

P(X̅ < 750) = P(Z < –2.108)
                   = 0.0175

I assume that you have fully studied the chapters Discrete Probability Distributions & Continuous Probability Distributions in Maths T. So now you know the difference between samples and populations, the final answer will be different if you used the wrong distribution.

We were assuming that the sample was taken from a population which follows the normal distribution. So what if it isn’t? Maybe, the sample was taken from a Binomial, Poisson or even a Uniform distribution?

Let’s do a little experiment. Suppose you have an unfair coin, such that every time you toss it, it has 25% chance of getting a head. So if you toss it 10 time, you get a binomial distribution, X ~ B(10, 0.25). We plot the probability graph below. The red bars are the Binomial plots, while the blue line is the normal approximation.

So now, we do the sampling distribution of . That means, we do the experiment various times, get different means, and tabulate them as a distribution. If we do it 30 times (sample size of 30) we get a graph like below:

then 50 times, we get

It gets closer to a normal distribution, doesn’t it?

Now we try a Poisson distribution, probably the average amount of monkeys seen along the road everyday is 4, then X ~ P0(4). So the probability of seeing n monkeys a day can be tabulated as follows:

Again, we get into serious investigation to see how many monkeys appear everyday, and we get the means for 30 times, and we find the sampling distribution of to be as follows:

Once again it is close to the normal blue curve. Remember that the y-axis stands for probability. So this sampling distribution simply tells us “the probability of the mean monkeys seen on the road daily, with a sample size of n”.

We try now for a uniform distribution. A uniform distribution X ~ R(a, b) means that X is uniformly distributed with a range of a ≤ x ≤ b. It has the following expectation and variance:

Assume X ~ R(0, 27), representing the probability of getting a number between 0 to 27 in a lucky draw to be equal. We can plot its distribution as

then again, we find the sampling distribution of X̅. We do 30 sample, and we find that actually, it looks like a normal distribution!

All these graphs are done with this applet. So after doing all these, we find that the sampling distribution taken from distributions not normally distributed, the sampling distribution takes the normal shape as the size increase. In other words, for large sample size n, it is approximately normal. And here, we introduce the central limit theorem:

When samples are taken from a non-normal population with known variance σ2 then for large values of n, the distribution is approximately normal such that

In statistics, we define a large sample to be n ≥ 30. You will be using this convention for the rest of the chapters. Let me show you an example of the use of central limit theorem:

The average number of telephone calls made in an evening to a counselling service is 4.5 calls. 30 random observations are taken, and find the probability that the sample mean exceeds 5.

X ~ P0(4.5)
Since n ≥ 30, by central limit theorem, X̅ is approximately normal, so
X̅ ~ N(4.5, 0.15)
P(X̅ > 5) = P [Z > (5 – 4.5) / √0.15] = P (Z > 1.291) = 0.098


Suppose a random sample of n observations is taken from a population in which the proportion of successes is p and the proportion of failures is q = 1 – p.

If X is the number of successes in the sample, then X follows a binomial distribution,
X ~ B(n, p). You should recall that E(X) = np and Var(X) = npq. Using the same method how we find the expectation of sample mean , now we use it find the expectation of the sample proportion Ps .

We know that

So finding E(Ps) and Var(Ps), we get
image  image

This in turns give us the distribution of sample proportion,

and we define the term

as the standard error of proportion.
When using a distribution of sample proportions, we need to put c
ontinuity corrections into account (try recalling what you learned in Maths T). For this case, the continuity correction is

I’ll show you an example:

It is known that 3% of frozen pies delivered to a canteen are broken. What is the probability that, on a morning when 500 pies are delivered, 5% or more are broken?

Let p be the probability that a pie is broken, p = 0.03.
Let Ps be the proportion of pies in the sample that are broken.
q = 0.97, n =500, we have
Ps = N(0.03, 0.0000582)
P(Ps ≥ 0.05) = P(Ps ≥ 0.05 – 0.001)
[continuity correction, as calculated]
= P(Ps > 0.049) = P(Z > 2.491) = 0.0064

if you could have noticed, there is another way of solving this solution, just by using Binomial Distribution alone.

Let X be the number of broken pies in the sample.
X ~ B(500, 0.03)
Since n ≥ 30, np, nq > 5, it is approximately normal.
X ~ N(np, npq)
X ~ N(15, 14.55)
500 × 5% = 25
P(X ≥ 25) = P( X > 24.5) = P(Z > 2.491) = 0.0064

If I were you, I would choose to do the second method. However, in exam questions, if you were asked to find the proportion, then you better do the first method to avoid deduction of marks. Note that in either cases, this sample of proportion can only be used for large sample size n.

You need to master this section well, because this will act as the basics of what you will be doing for the next few chapters. Understanding and not confusing yourself with the sampling distribution may take some time.

No comments:

Post a Comment