Regression

I.  Midterm I

II.  Least Squares Regression Line

Example:  Suppose that you want to find a linear relationship between advertising and revenue.  You experiment with three different levels of advertising and come up with the following data:

Amount spent on ads in $1000 0 1 3
Revenue in $100,000 1 5 12

If you graph the data you will see that it does not lie on a line.  What is the best linear fit.  We define best to mean that the sum of the squares of the errors are minimized.  If (xi,yi) lies on the line and the true revenue for xi spent on ads is y, then the error is y - yi .  

We compute the sum of squares of the error for a line y = a + bx:

[(1 - (a + b(0)))2 + (5 - (a + b(1)))2 + (12 - (a + b(3)))2 ]

Since we want the minimum the error (for all possible choices of a and b),

we set fa = 0 and fb = 0

0 = fa = -2(1 - a) - 2(5 - a - b) - 2(12 - a - 3b)

=  6a + 8b - 36

or 3a + 4b - 18 = 0

0 = fb = -2(5 - a - b) - 6(12 - a - 3b)

= 8a + 20b - 82 

or 4a + 10b - 41 = 0

Solving,

12a + 16b - 72 = 0

12a + 30b - 51 = 0

14b = 51

b = 51/14 = 3.64

a = 1.14

The equation of the linear regression line is

y = 1.14+ 3.64x

so if you want to forecast the revenue if 3,000 is spent on ads we compute

y = 1.14 + 3.64(3) =12.06 or $1.2 million

Challenge:  Suppose that you collect data on growth over time of a pine tree and come up with the following data:

Age 0 1 2 3
Height 0 2 3 3.5

You expect that the graph is parabolic.  Find the best fitting parabola.