Population, Sample and Data
Statistics is the science of collecting, organizing and summarizing data such that valid conclusions can be made from them. The collecting, organizing and summarizing part is called “descriptive statistics”, while making valid conclusions is inferential statistics.
Population Data vs. Sample Data
Population: the universal set of all objects under study.
Sample: Any subset of the population.
A large population may be impractical and costly to study, collecting data from every member of the population. A sample is more manageable and easier to study.
After collecting and organizing the data, a summary is made such as average values. Hopefully valid conclusions can be made on the whole population based on the sample data. Therefore it is important that the sample data collected be representative of the population. Otherwise conclusions may be invalid. Conclusions are only as reliable as the sampling process, and information can change from sample to sample.
Collecting: data points; each element in a set of data.
Organizing: frequency distribution; a chart that lists each data point with the number of times it occurs.
Relative Frequency: expressed as a percent of the total number of data points.
What is your major?
Only a few distinct data are found which are repeated. Charting the data is easier to compute frequency.
When data points consist of many different values, group them by taking the
largest value – smallest value = range. When the range is established for the data points, decide how many groups to form, usually 4-8 groups. Divide range by the number of groups wanted to get endpoints of intervals.
Example 2: Age of class. (Hypothetical)
16 21 18 22 19 20 19 21 26 30 27 25 20
24 21 20 29 19 32 35 20 19 18 21 23 25
Range = 35 – 16 = 19 n = 26 total 5 groups 19/5 ~ 4
When raw data consists of many different values, create intervals and work with grouped data. Not all charts must begin with the smallest data point value; a smaller value can be used.
Histograms: Bar chart of grouped data.
If groups are not of equal intervals (width) then relative frequency is not accurate (visually).
Relative frequency density:
RFD only used when widths are unequal. It gives a more truthful representation of distribution with respect to the vertical axis.
Pie Charts Used for categorical data: grouped by a common feature or quality, usually financial expenditures. Easier to visualize the whole and its parts.
Each slice is representative in size. 360* represents the whole pie, then a slice is a central angle portion. Take rf % and multiply by 360* to get angle measurement.
Using EX 1 data, create pie chart.