Saturday, May 12, 2012

Content Page

Welcome to my Blog!

This blog contains mathematical resources based on the Sijil Tinggi Pelajaran Malaysia (STPM) old syllabus for the paper Further Mathematics T. These resources are very suitable for A-level Further Mathematics, or even University level mathematics. Feel free to browse through and grab whatever you need.

1. Logic & Proof
    1.1 Logic (propositions, quantifiers)
    1.2 Proof (direct, indirect, induction)

2. Complex Numbers
    2.1 Polar Form (geometrical effects, exponential form)
    2.2 de Moivre’s Theorem
    2.3 Equations (roots of unity, loci, transformation)

3. Matrices
    3.1 Row & Column Operations (properties of determinants)
    3.2 System of Linear Equations (consistency, uniqueness, Gaussian elimination,
          Cramer’s rule)
    3.3 Eigenvalues & Eigenvectors (diagonalization, Cayley-Hamilton theorem)

4. Recurrence Relations
    4.1 Recurrence Relations (problem models)
    4.2 Homogeneous Linear Recurrence Relations (2nd order, constant coefficients)
    4.3 Non-homogeneous Linear Recurrence Relations (2nd order, constant coefficients)

5. Functions
    5.1 Inverse Trigonometric Functions (graphs, identities)
    5.2 Hyperbolic Functions (graphs, identities, Osborn’s rule)
    5.3 Inverse Hyperbolic Functions (graphs, identities, logarithmic form)

6. Differentiation & Integration
    6.1 Differentiability of a Function (continuity)
    6.2 Derivatives of a Function Defined Implicitly or Parametrically (2nd derivatives)
    6.3 Derivatives & Integrals of Trigonometric & Inverse Trigonometric Functions
    6.4 Derivatives & Integrals of Hyperbolic & Inverse Hyperbolic Functions
    6.5 Reduction Formulae
    6.6 Applications of Integration (length of arc, surface area of revolution)

7. Power Series
    7.1 Taylor Polynomial (remainder theorem)
    7.2 Taylor Series (Maclaurin series, limits)

8. Differential Equations
    8.1 1st Order Linear Differential Equations (integrating factor)
    8.2 2nd Order Linear Differential Equations (complementary function, particular integral,
           general & particular solution, problem models)

9. Number Theory
    9.1 Divisibility (prime & composite numbers, unique factorisation, gcd & lcm, Euclid’s
           algorithm)
    9.2 Modular Arithmetic (linear congruences, Chinese Remainder Theorem)

10. Graph Theory
      10.1 Graphs (simple, complete, bipartite)
      10.2 Paths & Cycles (walk, trail, circuit, cycle, Eulerian, Hamiltonian)
      10.3 Matrix Representation (adjacency & incidence, problem models)

11. Transformation Geometry
      11.1 Transformation (isometries, similarity transformation, stretch & shears)
      11.2 Matrix Representation (images, scale-factor, operations)

12. Coordinate Geometry
      12.1 3D Vectors (scalar & vector product, properties)
      12.2 Straight Lines (equation, skew, parallel, intersect)
      12.3 Planes (equation, intersection, distance, angle)

13. Sampling & Estimation
      13.1 Random Samples (population, parameter, statistic)
      13.2 Sampling Distributions (sample proportion & mean, central limit theorem)
      13.3 Point Estimates (unbiased estimates, t-distribution, standard error)
      13.4 Interval Estimates (confidence intervals, large & small samples, sample size)

14. Hypothesis Testing
      14.1 Hypotheses (null & alternative hypotheses, test statistic, significance level)
      14.2 Critical Regions
      14.3 Tests of Significance (population proportion & mean, Type I & Type II errors)

15. χ2 Tests
      15.1 χ2 Distribution
      15.2 Tests for Goodness of Fit
      15.3 Tests for Independence (contingency table)

16. Correlation & Regression
      16.1 Scatter Diagrams
      16.2 Pearson Correlation Coefficient
      16.3 Linear Regression Lines (method of least squares, correlation & regression
               coefficient, coefficient of determination)

Tuesday, August 9, 2011

16.3 – Linear Regression Lines

Regression analysis is a statistical technique which can be used to obtain the equation relating 2 variables. A regression line makes estimations on one of the variables when the corresponding value of another variable is known.

In this section, we are going to learn how to draw regression lines (lines of best fit). There are actually 3 methods that I know of:

1. By eye method
You look at the bunch of dots, estimate using your eye, and start drawing the line. Not a good idea though. You probably used this method for your STPM Physics paper 3.
2. L & R method
We fisrt start by finding the average values of x and y. We draw a horizontal and vertical line across the mid-point. Then, we proceed to find the mid point of the data on the left and right of the vertical line, and we connect these 3 midpoints to obtain a line.
3. Least squares regression line
This is probably the best method of all, and we will be learning how to do it below.


METHOD OF LEAST SQUARES

The term ‘least squares’ tells us that the square of the distances between the points and the line is minimized. For a least squares regression line of y on x, the distance taken into account is the vertical distance. This line will definitely pass through the mid-point of the graph, (x̅, y̅). Take a look at the graph below.

The red dots are the scatters, while the blue line is the least squares regression line. The line is drawn in such a way that the sum of squares of the vertical distances between the red dots and the blue line (green lines) is minimized. So to form a least squares regression line, we have 2 equations of lines, namely

y = a + bx
x = c + dy

The line y = a + bx is known as the regression line y on x, while the line x = c + dy is the line x on y. Note that they are 2 different lines, and are not inversions of formulas. The line of y on x is used when x is the independent variable, and y being the dependent one. However, the line x on y is used only under 2 conditions:

1. when neither variable is controlled and you want to estimate x for a given value of y.
2.
when y is the independent variable, and x the dependent variable.

The line of x on y, according to its equation, has its gradient and y-intercept as follows:image

Notice another thing. In this chapter, the lines are not written as y = mx + c. The gradient is b, and by usual convention is put behind the constant a, so y = a + bx, but not y = bx + a. The constant b is known as the regression coefficient of y on x, and d is the regression coefficient of x on y. They are both calculated using the formulasimage

which in the end, you find b to be
image

If you could have looked closely,
image

where r is the product-moment correlation coefficient you learned in the previous section. The term r2 has a name too, called the coefficient of determination of regression lines.

r2 tells the percentage of the variable y can be explained by x. Or in other words,image

Or mathematically,
image

You don’t really need to understand what it means, but just memorize it just in case they ask you to define it in exams. Take note that 0 ≤ r2 ≤ 1.

Coming back to the relationship between the correlation coefficient and the regression coefficient. We can see that if
* b and d are positive, then r is positive.
* b and d are negative, then r negative.

Finding b is not enough to plot the regression line of y on x. The equation of the line, in the end will be
image

and from there, a can be found. Note that the terms and can be substituted with any ordered pair (x, y) given, and you get the same line.

By the way, sometimes the lines are not that straightforward. You might be asked to make use of coding, in the form of Y = a + bX to transform lines which are not linearly related, into a linear line that can be analysed using regression lines. Common examples are
image


Most statistical questions on this chapter mainly asks you to do these few things:

1. Plot scatter diagrams, and draw a regression line on it
All you need to do is use the table of data given, plot the scatter diagram (on graph paper), and find the respective values using your calculator to get the values of a and b.

2. Make predictions and estimations
Sometimes you are asked to extrapolate the line, to find a particular value of y, given x, and tell whether the data is sensible. Remember: extrapolation of a regression line is unreliable. You are to understand that there exists uncertainties of such predictions. In the case of a graph of age against running speed, you know that it doesn’t mean the older you are, the faster you run!

3. Calculator estimation
Within the scatter data, sometimes you are given a value of x, to find the value of y, using the regression line you formulated. The estimated value of y is denoted by . It is not hard: with your regression line in hand, just substitute the value of x into it, and you get the value of y. In calculator, you can press
[number] [x̂] [=] to find, and
[number] [ŷ] [=] to find.

However, do take note that you find using the equation x = c + dy, and you find by using the equation y = a + bx. Remember which is the dependent and independent variable, they both make a lot of difference.

4. Find the correlation / regression coefficient or the coefficient of determination
This is quite obvious. That was why we learned them in the first place.


Before I end this chapter, let us take a look at an example, and we will learn how to use your calculator to find the regression line too.

The following table shows the marks (x) obtained in a mid-year examination and the marks (y) obtained in the year-end exam by a group of 9 students.
image
a) Plot the scatter diagram.
b) Find the equation of the estimated least squares regression line of y on x, and x on y, and plot them.
c) A 10th student obtained a mark of 70 in the mid-year exam but was absent from the year-end exam. Estimate the mark that this student would have obtained in the year-end exam.

I think you shouldn’t have problem plotting the diagram, right? It looks something like this:
image

So now, we are to find the regression lines. Firstly, key in all your data into your calculator. Remember to clear your previous data by pressing SHIFT + CLR, press ‘1’, then ‘=' (refer to previous post on how to key data in REG mode). Now press SHIFT + S-VAR. Press the right button until you see A B r. Guess what, the given a and b are the coefficients of the line that you wanted. So you immediately found the regression line of y on x,

y = 15.83 + 0.72x
Remember to show your workings though. You need to show how you calculate sxx, sxy, and syy, and . For the equation x = c + dy, there’s no shortcut, so you have to calculate yourself, which gives you

x = 22.63 + 0.66y
We shall plot them on the graph:
image
with the red line being y = a + bx, and green line being x = c + dy. Remember to label them in exams though.

As for the estimate, you can use your calculator again. From the SHIFT + S-VAR function, and typing the formula I posted above, you should get 66.38.


Regression analysis will be very useful in the future, especially when you collect a lot of data for your company, and you want to see the relationships between variables. Master it, and it will help you.