Scatter Diagrams and Regression Lines
If data is given in pairs then the scatter diagram of the data is just the points plotted on the xy-plane. The scatter plot is used to visually identify relationships between the first and the second entries of paired data.
The scatter plot above represents the age vs. size of a plant. It is clear from the scatter plot that as the plant ages, its size tends to increase. If it seems to be the case that the points follow a linear pattern well, then we say that there is a high linear correlation, while if it seems that the data do not follow a linear pattern, we say that there is no linear correlation. If the data somewhat follow a linear path, then we say that there is a moderate linear correlation.
A bivariate sample consists of pairs of data (x,y). If we plot these pairs on the xy-plane then we have a scatter diagram.
Given a scatter plot, we can draw the line that best fits the data
Recall that to find the equation of a line, we need the slope and the y-intercept. We will write the equation of the line as
Where a is the y-intercept and b is the slope. x is the independent or predictor variable and y is the dependent or response variable. To find a and b we follow the steps:
We can interpret a as the value of y when x is zero and we can interpret b as the amount that y increases when x increases by one.
Suppose that a study was done to determine the weight loss after taking various amounts of a diet pill in combination with exercise. If the regression line was
y = 3 + 2x
where x denotes the grams of the pill per day and y represents the weight loss, then we can say that with only the exercise and no pill the average weight loss is 3 pounds. We can also say that if a person takes an additional gram of the pill, then that on average the person should expect to lose an additional 2 pounds. If a person takes 5 grams than that person can expect to lose an average of 13 pounds.
Data was collected to compare the length of time x (in months) couples have been in a relationship to the amount of money y that is spent when they go out. The equation of the regression line was found to be
y = 70 - 5x
The y-intercept tells us that at the beginning of the relationship, the average date costs $70. The slope tells us at the relationship lasts an additional month, the average date costs $5 less than the previous date. We can use the regression line to predict the amount of money that a date costs when the relationship has lasted, for example, six months. We have
y(6) = 70 - 5(6) = 40
Estimating the Mean Value of y for a
Particular Value of x
Suppose that you own a pizza restaurant and are interesting in sending out menus to local residents. You research what your 8 competitors have done to find the relationship between number of mailings and amount of pizzas bought per week. You find that the equation of the regression line is
y = 100 + .2x.
You calculate Se to be 4, the total mean to be 990, and SSx = 73.
Next week you plan an advertising blitz of 1000
mailings. How many pizzas do you expect to sell and what is a 95%
confidence interval for this estimate.
We will use the main theorem that states that an unbiased estimate for the value of y given a fixed value of x is
a + bx
The standard deviation is
Hence we predict that we will sell about
100 + .2(1000) = 300 pizzas.
We find the standard deviation
From the table, we have
tc = 2.365
so that a 95% confidence interval is