Suppose that the average lifespan for people who smoke is:
We can calculate the least squares regression line:
y = 73 - 1.3x
We define the first residual to be the difference between the first lifespan and the first estimated lifespan:
72 - (73 - 1.3(1)) = 0.3
the second residual as:
70 - (73 - 1.3(2)) = -0.4
the third as:
69 - (73 - 1.3(3)) = -0.1
and the fourth as
68 - (73 - 1.3(5)) = 1.5
in general we have the residual is
Coefficient of determination: r2
We define the coefficient of determination as an indication of how linear the data is. r2 has the following properties:
Properties of the Coefficient of Determination
To compute r2, do the following:
If we multiply r2 by 100%, we arrive at the percent of the observed variation attributable to the linear relationship.
If we want to determine not just if they are linearly related, but also want to know whether there is a positive relationship or a negative relationship (b> 0 or b<0) and want the calculation unitless, we compute Pearson's correlation coefficient r
r2 = r2
that is the square of the correlation coefficient is equal to the coefficient of determination.
We say that the correlation is
For example there may
be a strong correlation between grayness in hair and wrinkles, but having
gray hair does not cause one to have wrinkles.