The
Suppose you randomly sampled six acres in the Desolation Wilderness for a non-indigenous weed and came up with the following counts of this weed in this region: 34, 43, 81, 106, 106 and 115 We compute the sample mean by adding and dividing by the number of samples, 6.
34 + 43 + 81 + 106 + 106 + 115 We can say that the sample mean of non-indigenous weed is 80.83. The
The
One problem with using the mean, is that it often does not depict the typical
outcome. If there is one outcome that is very far from the rest of the
data, then the mean will be strongly affected by this outcome. Such an
outcome is called and
Suppose you randomly selected 10 house prices in the South Lake Tahoe area. Your are interested in the typical house price. In $100,000 the prices were 2.7, 2.9, 3.1, 3.4, 3.7, 4.1, 4.3, 4.7, 4.7, 40.8 If we computed the mean, we would say that the average house price is 744,000. Although this number is true, it does not reflect the price for available housing in South Lake Tahoe. A closer look at the data shows that the house valued at 40.8 x $100,000 = $4.08 million skews the data. Instead, we use the median. Since there is an even number of outcomes, we take the average of the middle two
3.7 + 4.1 The median house price is $390,000. This better reflects what house shoppers should expect to spend.
There is an alternative value that also is resistant to outliers. This is
called the
At a ski rental shop data was collected on the number of rentals on each of ten consecutive Saturdays: 44, 50, 38, 96, 42, 47, 40, 39, 46, 50.
To find the sample mean, add them and divide by 10:
44 + 50 + 38 + 96 + 42 + 47 + 40 + 39 + 46 + 50 Notice that the mean value is not a value of the sample. To find the median, first sort the data: 38, 39, 40, 42, 44, 46, 47, 50, 50, 96 Notice that there are two middle numbers 44 and 46. To find the median we take the average of the two.
44 + 46 Notice also that the mean is larger than all but three of
the data points. The mean is influenced by outliers while the median is
robust.
The mean, mode, median, and trimmed mean do a nice job in telling where the center of the data set is, but often we are interested in more. For example, a pharmaceutical engineer develops a new drug that regulates iron in the blood. Suppose she finds out that the average sugar content after taking the medication is the optimal level. This does not mean that the drug is effective. There is a possibility that half of the patients have dangerously low sugar content while the other half have dangerously high content. Instead of the drug being an effective regulator, it is a deadly poison. What the pharmacist needs is a measure of how far the data is spread apart. This is what the variance and standard deviation do. First we show the formulas for these measurements. Then we will go through the steps on how to use the formulas.
We define the
and
the
Calculate the mean, x.
Write a table that subtracts the mean from each observed value.
Square each of the differences.
Add this column.
Divide by n -1 where n is the number of items in the sample This is the *variance*.
To get the *standard deviation*we take the square root of the variance.
The owner of the Ches Tahoe restaurant is interested in how much people spend at the restaurant. He examines 10 randomly selected receipts for parties of four and writes down the following data. 44, 50, 38, 96, 42, 47, 40, 39, 46, 50 He calculated the mean by adding and dividing by 10 to get x = 49.2 Below is the table for getting the standard deviation:
Now
2600.4 Hence the variance is 289 and the standard deviation is the square root of 289 = 17.
Since the standard deviation can be thought of measuring how far the data
values lie from the mean, we take the mean and move one standard deviation
in either direction. The mean for this example was about
49.2 and the standard deviation was 17. We
have:
49.2 - 17 = 32.2
and
49.2 + 17 = 66.2
What this means is that most of the patrons probably spend between $32.20 and $66.20.
The sample standard deviation will be denoted by s and the population standard deviation will be denoted by the Greek letter s.
The sample variance will be denoted by s The variance and standard deviation describe how spread out the data is. If the data all lies close to the mean, then the standard deviation will be small, while if the data is spread out over a large range of values, s will be large. Having outliers will increase the standard deviation. One
of the flaws involved with the standard deviation, is that it depends on the
units that are used. One way of handling this difficulty, is called the
s In the above example, it is
17 This tells us that the standard deviation of the restaurant bills is 34.6% of the mean.
A mathematician named Chebyshev came up with bounds on how much of the data must lie close to the mean. In particular for any positive k, the proportion of the data that lies within k standard deviations of the mean is at least
1 For example, if k = 2 this number is
1 This tell us that at least 75% of the data lies within 75% of the mean. In the above example, we can say that at least 75% of the diners spent between 49.2 - 2(17) = 15.2 and 49.2 + 2(17) = 83.2 dollars. Back to the Descriptive Statistics Home Page Back to the Elementary Statistics (Math 201) Home Page |