|
STAT 201 PROJECT: CUSTOMER COUNT IN A CONVENIENT STORE
FOR YEAR 2002 AND 2003 IN OCTOBER………………………..BY:KINJAL. PATEL
FALL 2003
Introduction:
This paper will discuss the differences in customer count
in a convenience store for the month of October for the year of 2003 versus that
of 2002. As the business is in Lake Tahoe is dependent on tourists it gives us
an idea about what to expect in a particular month. Also it helps in staffing,
whether to have more staff for a particular day or not. This also helps to see
if the business has grown or shrunk as compared to last year.
Data:
The data was collected from a convenient store in Lake
Tahoe for the month of October 2002-2003 by cluster sampling, as all the days
for October are included. The data is quantitative, bivariate and continuous.
This data is at the ratio level of measurement.
The following tables show the statistics for customer
count in Oct 02 and Oct 03.The trimmed mean for both the data is same as their
mean , as there are no significant outliers.
Summary statistics for Oct 02
|
Column |
n |
Mean |
Variance |
Std. Dev. |
Std. Err. |
Median |
Range |
Min |
Max |
Q1 |
Q3 |
|
C.C October 02 |
31 |
776.871 |
8361.649 |
91.442055 |
16.423477 |
759 |
412 |
600 |
1012 |
707 |
838 |
Summary statistics for Oct 03
|
Column |
n |
Mean |
Variance |
Std. Dev. |
Std. Err. |
Median |
Range |
Min |
Max |
Q1 |
Q3 |
|
C.C October 03 |
31 |
770.1613 |
14321.606 |
119.67291 |
21.49389 |
733 |
466 |
629 |
1095 |
683 |
845 |
Histograms
The Histogram for the customer count in Oct’ 02, shows that
the data is Unimodal , skewed right
 
The Histogram for the customer counts in Oct’ 03, shows
that the data is Multimodal with gaps in between.
Stem and Leaf Diagram:
Variable: C.COct02 Variable: C.COct03
Low: 10 6: 3344
6: 0 6: 5678
6: 88 7: 000122234
7: 000011234444 7: 6889
7: 66778 8: 14
8: 1234 8: 5579
8: 556 9: 0
9: 03 9: 6
9: 6 10:
10: 1 10: 6
High: 1095
Box and Whisker Plot:
Oct02 Oct 03
|
Median |
759 |
733 |
|
Highest value
|
1012 |
1095 |
|
Lowest value
|
600 |
629 |
|
Q1 |
707 |
683 |
|
Q3 |
838 |
645 |
It may be important for our store owners to know what the customer count is for a particular month to predict the sales. We can find this out by the box and whisker plots.
The box and whisker plot for the customer count in Oct’ 02, has the median more towards the center. The whiskers are evenly distributed in the top and bottom part. There is no outlier for this plot. IQR=Q3-Q1=838-707=131.
 
The box and whisker plot for the customer count in Oct’ 03,
has the median more towards the lower part, indicating that more customers are
in the lower half. There is also an outlier which affects the Q-2 value. IQR=Q3-Q1=845-683=162
Central Limit Theorem: If x possesses any
distribution with mean m and standard deviation s, then the sample mean x based
on a random sample of size n will have a distribution that approaches the
distribution of a normal random variable with mean m and standard deviation s/√n
as n increases without limit.
Hypothesis test for Paired
Data
Two Sample Z-test results:
m1
- mean of C.COct02
m2
- mean of C.COct03
H0: m1
- m2
= 0
HA : m1
- m2
< 0
|
Difference |
n1 |
n2 |
Sample Mean |
Std. Err. |
Z-Stat |
P-value |
|
m1
- m2 |
31 |
31 |
6.709677 |
27.050285 |
0.24804461 |
0.598 |
This shows that it is a left tailed test .The critical value for z-score for a
left tailed test at 5% level is -1.645.
For 95% confidence interval the
range for the population mean
m
is
x-E<
m <
x+E=[6.709-27.05,6.709+27.05]=[-20.34,33.76](read x
as x-bar, sample mean)
This means, that we are 95%
confident that the average customer count for any given day is different as the
range consists of both positive and negative values.

The z-score for the difference
of mean is 0.25, therefore for the left tailed test
P (z<0.25) =0.5987(from
z-score table for left tail) which does not fall in the critical region
Also, p = 0.598 > α = 0.05
Therefore, fail to reject H0
We do not have sufficient
evidence at 5% level to indicate that the average customer count for the month
of October 2002 is less than that of October 2003
Hypothesis Test on the Slope
In order to better predict the relationship between the
customer count in the month of October ’02 and that of the count in October ‘03
a hypothesis test on the slope is used.
H0: r = 0;
H1: r ≠ 0.
First we make a scatter diagram for these pairs. The x
value is the average customer count in October ‘02 and the y value is the
average customer count in October ‘03 paired by dates.

In general as customer count in October ‘02 goes up
customer count in October ‘03 go up. There seems to be a relatively linear
progression and a line fits reasonably well.
Simple linear regression results:
Dependent Variable: C.C October 03
Independent Variable: C.C October 02
Sample size: 31
Correlation coefficient: 0.6703
Estimate of sigma: 90.32325
|
Parameter |
Estimate |
Std. Err. |
DF |
T-Stat |
P-Value |
|
Intercept |
88.62975 |
141.0373 |
29 |
0.6284136 |
0.5347 |
|
C.C October 02 |
0.8772777 |
0.18034036 |
29 |
4.8645663 |
<0.0001 |
|
X value |
Pred. Y |
s.e.(Pred. y) |
95% C.I. |
95% P.I. |
|
700.0 |
702.7241 |
21.338972 |
(659.081, 746.3672) |
(512.9069, 892.54126) |

Therefore, 45% of the variation in y can be explained by the variation in x.
About 55% of the data is unexplained by the variation in x.
There is positive correlation between the customer count for the month of
October 2002 and October 2003.
Chi -Test for independence of
C.C October 02 and C.C October 03:
|
Statistic |
DF |
Value |
P-value |
|
Chi-square |
780 |
806 |
0.2521 |
H0: The Customer Count for October 2002 and 2003
are independent
H1: The Customer Count for October 2002 and 2003
are not independent
χ˛ = 806
χ˛α > 140 ( for degrees of freedom 780 , from
chi- square table)

As, the chi- square value falls in the critical region,
we Reject H0 .
We, have sufficient evidence at 5% level that the Customer
Count for October 2002 and 2003 are not independent. ,but dependent. This means
that with every increase in the Customer count in 2002 for each day, we can
predict that there will be increase in 2003.
Conclusion:
We performed various tests and from the results we can
conclude the following things:
On an average the customer count of October
2002-2003 are different (difference on means).This shows whether the business
has grown or shrunk from last year. This gives an idea as how to budget the
expenses for the business
As the count increases in October ’02 we can expect
an increase in October ’03 too (Regression line). This information gives us an
idea on how to staff on the slow days and busy days.
The Customer Count in October’03 is dependent to
that of October’02. (Chi-square test).
This data and the results can be used by the store
owners for their future interpretations of the Customer count, expected sale and
staffing issues. Also a potential business buyer can look at the statistics to
see how the store is performing, from the profit point of view.
Appendix:
The data was collected from a convenience store
located in South Lake Tahoe. Attached is the data that was collected:
|
C.COct02 |
C.COct03 |
|
600 |
629 |
|
681 |
630 |
|
683 |
635 |
|
696 |
639 |
|
697 |
652 |
|
697 |
656 |
|
701 |
674 |
|
707 |
683 |
|
707 |
700 |
|
722 |
701 |
|
731 |
702 |
|
740 |
706 |
|
741 |
719 |
|
743 |
722 |
|
743 |
724 |
|
759 |
733 |
|
759 |
743 |
|
767 |
759 |
|
769 |
782 |
|
779 |
783 |
|
813 |
788 |
|
816 |
809 |
|
827 |
843 |
|
838 |
845 |
|
846 |
853 |
|
853 |
874 |
|
864 |
885 |
|
895 |
896 |
|
933 |
957 |
|
964 |
1058 |
|
1012 |
1095 |
|