Notes
Outline
Statistics 9
Correlation and Regression
Sherril M. Stone, Ph.D.
Department of Family Medicine
OSU-College of Osteopathic Medicine
Basic Terms
Correlation Ð statistical technique that describes degree of relationship between 2 variables
Regression Ð technique that uses data to write an equation for a straight line.
The equation then is used to make some type of prediction of the data.
Draw a line of best fit (straight line)
Make prediction of 2nd score when 1st score is known
Linear regression - from the phenomenon of regression toward the mean, a consequence of the laws of probability when correlation is less than perfect (as correlation gets lower, prediction gets closer to z = 0).
Line of Best Fit
Slope-intercept formula for a straight line is
Y = mX + b
Y and X = variables (scores) on the Y and X axes
m = slope of the line (a constant)
b = intercept (intersection) of line with Y axis
Positive Ð highest point on line is Right of lowest point
Negative Ð highest point on line is Left of lowest point
Equal Ð horizontal lines
Large (+ or -) Ð vertical lines
Line of Best Fit
Regression Equation Formula Ð finds the line of best fit
Least squares method (LSM)ÐPearsonÕs mathematics to create a straight line. LSM produces a value for the slope and the intercept. With slope and intercept Ð write an equation for a straight line (this is one that best fits your data)
Ŷ = a + bX
Ŷ = predicted value of Y from X
a = point where line intersects Y axis a, b = regression
b = slope of the regression line           coefficients
X = X score (data) used to predict Y
Correlation Coefficient
Correlation coefficient (r)  Pearson product-moment correlation Ð measures the degree  and direction of linear relationship between 2 variables
Coefficient of determination (r2) Ð proportion of shared variance (s2) between 2 variables
Effect size for r Р  Small  .10
                   Med    .30
                                 Large  .50
Correlation Coefficient Formula r
Blanched formula
       ŒXY
      Ð (X)(Y)
    r     =         N

(SDX)(SDY)
Raw score formula
        NŒXY Ð (ŒX)( ŒY)
r =
…  (NŒX2 Ð (ŒX)2       (NŒY2 Ð (ŒY)2
Regression Coefficient Formulas
          SDY
      b =  r    SDX
- OR -
       NŒXY Ð (ŒX) (ŒY)
b =
               N ŒX2 Ð (ŒX)2
Steps to Calculate Regression
Calculate mean and SD
Calculate r and r2
Calculate b
Calculate a
 Predict Y
Example 1
       X X2 Y   Y2            XY
5 3
7 4
1 1
6 3
9 5
9      5
10 7
4 3
3      2
2 2
·X           ·Y
N =
SDX r =
SDY r2 =
b =
a =
Ŷ = a + bX
Example 2
       X X2 Y   Y2            XY
21 8
14 9
16 10
11 15
15 11
10 16
9 14
  8                     21
·X           ·Y
N =
SDX r =
SDY r2 =
b =
a =
Ŷ = a + bX
Return to
Division of Research