Monday, August 8, 2011

16.2 – Pearson Correlation Coefficient

Before we start, let us revise a little bit on standard deviation. We all know that the standard error s is given by the formula
image

In this chapter, we will be dealing with 2 variables, and thus, we need to specify whether the standard error is for the values of x or y. To make the difference, we put a subscript x or y to indicate which variable it refers to. So over here, we have

image  image
the standard errors for x and y respectively. We denote the variances of variables x and y as

image   image
Note that sxx and sx2 mean the same thing, it is just a different notation for some books. With this information in mind, we shall now introduce the covariance, which is defined by the formula

image


PEARSON’S PRODUCT-MOMENT CORRELATION COEFFICIENT

The correlation coefficient is a statistic which provides the information on how strong the relationship of 2 variables is. Pearson’s product-moment correlation coefficient, also known as Pearson correlation coefficient or product-moment correlation coefficient, is a numerical value between –1 and 1 inclusive, which indicates the linear degree of scatter. It is represented by the formula

image
where, –1 ≤ r ≤ 1.

When r 1, it indicates strong positive correlation, which means the regression line has a positive gradient, or y increases as x increases. Similarly, as r–1, it indicates the presence of strong negative correlation. If r = 1 or r = –1, The points lie exactly on a straight line, and we say that they have perfect positive / negative correlation.

However, when r = 0, it does not necessarily mean that there is no correlation. It might indicate that the variables x and y are independent of each other. Besides, it might also indicate that the variables x and y have a non-linear relationship. Take a look at the diagram below:

Scatter example

Sorry but the dots are ugly. This diagram represents a quadratic function. The variables do have a quadratic relationship, but however, its correlation coefficient r = 0. This is just an example of how r = 0 fail to explain anything. On the other hand, having r close to zero only approximates that the data is positively linear correlated. Take a look at the diagram below.

Scatter example2

This diagram has a very high r, about 0.7 to 0.8. But however, it doesn’t mean that the data is highly positively linear correlated. It might mean that there isn’t a relationship after all.

r is independent of the units used in the relation, and is very useful in determining the correlation of a 2 variables. Evaluating r can be tedious if you make use of the definitions of sx and sy. So here is the best way to calculate r:

image

Some other common formulas to find r are:

image

Besides, there is also this Big S format, whereby
image

and using this convention, the formula for r is
image 

I would suggest that you keep to the ‘small s format’. In order to teach you how to find r efficiently using the calculator, consider the example below.

Calculate the value of the p-m correlation coefficient for the data in the following table. Comment on your answers.
image

Let’s make use of the calculator’s functions. Using your CASIO fx-570MS, press the mode button, and select REG mode. There will many kinds of REG mode, so you press ‘1’ for Lin mode (which means ‘linear’).

Now, to input the data, you press [x-value] [, button] [y-value] [DT button]. So you should type in 5, 4.3 and the DT button for the first readings. Now the screen should display

[n=                     ]
[                       1]

Continue typing every data, and press the AC button when you are done. Now you press SHIFT + S-SUM. You will be able to get lots of data from here: Σx2, Σx, n, Σy2, Σy and Σxy. These are the useful information you needed for your r (you need these to show your workings). But there’s a better one, press SHIFT + S-VAR. You get to find the values of x̅, xσn (sx), y̅, yσn (sy), and in fact, r itself! The only thing you can’t get is sxy (what a pity). So using your calculator, you find that the answer is

r = 0.93, it is a strong positive correlation.


That’s all for this section. With enough knowledge, we will go into the next and very last section, which will be on Regression Lines.

1 comment: