Scatter Diagrams and Regression Lines

Scatter Diagrams

If data is given in pairs then the scatter diagram of the data is just the points plotted on the xy-plane.  The scatter plot is used to visually identify relationships between the first and the second entries of paired data.

Example

The scatter plot above represents the age vs. size of a plant.  It is clear from the scatter plot that as the plant ages, its size tends to increase.  If it seems to be the case that the points follow a linear pattern well, then we say that there is a high linear correlation, while if it seems that the data do not follow a linear pattern, we say that there is no linear correlation.  If the data somewhat follow a linear path, then we say that there is a moderate linear correlation.

 

A bivariate sample consists of pairs of data (x,y).  If we plot these pairs on the xy-plane then we have a scatter diagram.  

 


 

The Linear Regression Line  

Given a scatter plot, we can draw the line that best fits the data

.  

Recall that to find the equation of a line, we need the slope and the y-intercept.  We will write the equation of the line as


y = a + bx

 

Where a is the y-intercept and b is the slopex is the independent or predictor variable and y is the dependent or response variable.  To find a and b we follow the steps:

 

  1. List

     

    1. The sum of the x--  Sx

    2. The sum of the y--  Sy

    3. The sum of the squares of   x--  Sx2 

    4. The sum of the products of x and y--  Sxy 

     

  2. Calculate: 


     

  3. Calculate

    a = y - bx

     


Interpretations

We can interpret a as the value of y when x is zero and we can interpret b as the amount that y increases when x increases by one.

 

Example

Suppose that a study was done to determine the weight loss after taking various amounts of a diet pill in combination with exercise.  If the regression line was

        y = 3 + 2x

where x denotes the grams of the pill per day and y represents the weight loss, then we can say that with only the exercise and no pill the average weight loss is 3 pounds.  We can also say that if a person takes an additional gram of the pill, then that on average the person should expect to lose an additional 2 pounds.  If a person takes 5 grams than that person can expect to lose an average of 13 pounds.  

 


Example 

Data was collected to compare the length of time x (in months) couples have been in a relationship to the amount of money y that is spent when they go out.  The equation of the regression line was found to be 

        y  =  70 - 5x

The y-intercept tells us that at the beginning of the relationship, the average date costs $70.  The slope tells us at the relationship lasts an additional month, the average date costs $5 less than the previous date.  We can use the regression line to predict the amount of money that a date costs when the relationship has lasted, for example, six months.  We have

        y(6)  =  70 - 5(6)  =  40

 


Estimating the Mean Value of y for a Particular Value of x

Suppose that you own a pizza restaurant and are interesting in sending out menus to local residents.  You research what your 8 competitors have done to find the relationship between number of mailings and amount of pizzas bought per week.  You find that the equation of the regression line is 

        y = 100 + .2x.  

You calculate Se to be 4, the total mean to be 990, and SSx  =  73.

Next week you plan an advertising blitz of 1000 mailings.  How many pizzas do you expect to sell and what is a 95% confidence interval for this estimate.

Solution

We will use the main theorem that states that an unbiased estimate for the value of y given a fixed value of x is 

        a + bx

 The standard deviation is 

 

Hence we predict that we will sell about 

        100 + .2(1000) = 300 pizzas.  

We find the standard deviation

       

From the table, we have

        tc  =  2.365

so that a 95% confidence interval is 

        300 2.365(6.31)

or

        [283, 317]

Hence we expect between 283 and 317 pizzas to be sold.

 


Back to the Regression and Nonparametric Home Page

Back to the Elementary Statistics (Math 201) Home Page

Back to the Math Department Home Page

e-mail Questions and Suggestions