Scatter Diagrams and Regression Lines Scatter Diagrams If data is given in pairs then the scatter diagram of the data is just the points plotted on the xyplane. The scatter plot is used to visually identify relationships between the first and the second entries of paired data. Example
The scatter plot above represents the age vs. size of a plant. It is clear from the scatter plot that as the plant ages, its size tends to increase. If it seems to be the case that the points follow a linear pattern well, then we say that there is a high linear correlation, while if it seems that the data do not follow a linear pattern, we say that there is no linear correlation. If the data somewhat follow a linear path, then we say that there is a moderate linear correlation.
A bivariate sample consists of pairs of data (x,y). If we plot these pairs on the xyplane then we have a scatter diagram.
Given a scatter plot, we can draw the line that best fits the data . Recall that to find the equation of a line, we need the slope and the yintercept. We will write the equation of the line as
Where a is the yintercept and b is the slope. x is the independent or predictor variable and y is the dependent or response variable. To find a and b we follow the steps:
Interpretations We can interpret a as the value of y when x is zero and we can interpret b as the amount that y increases when x increases by one.
Example Suppose that a study was done to determine the weight loss after taking various amounts of a diet pill in combination with exercise. If the regression line was y = 3 + 2x where x denotes the grams of the pill per day and y represents the weight loss, then we can say that with only the exercise and no pill the average weight loss is 3 pounds. We can also say that if a person takes an additional gram of the pill, then that on average the person should expect to lose an additional 2 pounds. If a person takes 5 grams than that person can expect to lose an average of 13 pounds.
Example Data was collected to compare the length of time x (in months) couples have been in a relationship to the amount of money y that is spent when they go out. The equation of the regression line was found to be y = 70  5x The yintercept tells us that at the beginning of the relationship, the average date costs $70. The slope tells us at the relationship lasts an additional month, the average date costs $5 less than the previous date. We can use the regression line to predict the amount of money that a date costs when the relationship has lasted, for example, six months. We have y(6) = 70  5(6) = 40
Estimating the Mean Value of y for a
Particular Value of x Suppose that you own a pizza restaurant and are interesting in sending out menus to local residents. You research what your 8 competitors have done to find the relationship between number of mailings and amount of pizzas bought per week. You find that the equation of the regression line is y = 100 + .2x. You calculate S_{e} to be 4, the total mean to be 990, and SS_{x} = 73. Next week you plan an advertising blitz of 1000
mailings. How many pizzas do you expect to sell and what is a 95%
confidence interval for this estimate. Solution We will use the main theorem that states that an unbiased estimate for the value of y given a fixed value of x is a + bx The standard deviation is
Hence we predict that we will sell about 100 + .2(1000) = 300 pizzas. We find the standard deviation
From the table, we have t_{c} = 2.365 so that a 95% confidence interval is 300 2.365(6.31) or
[283, 317]
Back to the Regression and Nonparametric Home Page Back to the Elementary Statistics (Math 201) Home Page Back to the Math Department Home Page email Questions and Suggestions
