Chi Square and Two Way Tables

Two Way Tables

Is there any relationship between political affiliation and the type of car a person drives?

A survey was done and the following data was collected:

 Democrat Republican Other Row marginal total American 32 (35) 21 (24) 15 (9) 68 Foreign 55 (53) 40 (37) 8 (13) 103 Column marginal total 87 61 23 171 Column percent of total 51 36 13

We computed the numbers in parentheses by calculating the expected counts.  We multiplied the row marginal total by the column proportions.  We could also compute this number by calculating

 (Row Marginal Total)(Column Marginal Total) Expected Count  =                                                                                                                                         Grand Total

We have the hypothesis:

H0: The true proportions are the same for all of the populations

H1:  The true proportions are not the same for all of the populations.

We compute:
 (observed - expected)2                                                                       expected

(32 - 35)2         (21 - 24)2         (8 - 13)2
=                      +                      +                       +  ...  =  6.87
35                     24                    13

The degrees of freedom is

(num of rows - 1)(num of columns - 1) = (2 - 1)(3 - 1) = 2

Now the c2 that corresponds to 2 degrees of freedom and a = .05 is 5.99

We can reject H0 and therefore accept H1 hence there is an association between political affiliation and the type of car a person drives.

Chi Square For Univariate Data

Recall that we use a t-statistic for a difference between proportions.  If there are three or more Boolean variables, then we must use a different solution.

Example

Suppose that we run a lunch special in our restaurant and want to determine if it makes a difference which day of the week to close.  In other words are all days equally frequented by customers?  We take a tally of the customers for each day and find:

 Mon Tue Wed Thur Fri Sat Sun Customers 30 33 20 22 35 40 30

We have the following hypotheses:

H0:  p =  1/7,  p=  1/7,  p3  =  1/7,  p4  =  1/7,  p5  =  1/7,  p6  =  1/7,  p7  =  1/7

H1:  At least one of the p's is not 1/7

Let a = .05

The test statistic that we will use is also called the chi square statistic and is also denoted by the Greek letter c2 .  It is computed as follows:

Notice that the total sample size is 210, hence if H0 is true, then the expected count for each day is

210
= 30
7

For each of the data, we compute:

We compute:

S (observed - expected)2/expected

(30 - 30)2                        (33 - 30)2
= 0                                  = 0.3
30                                   30

(20 - 30)2                      (22 - 30)2                  (35 - 30)2
= 3.3                            = 2.1                         = 0.83
30                                  30                            30

(40 - 30)2                            (33 - 30)2
= 3.3                                  = 0
30                                     30

Now we add these numbers to get:

0 + 0.3 + 3.3 + 2.1 + 0.83 + 3.3 + 0 = 9.8

Hence we have

c2 = 9.8

The degrees of freedom is

k - 1 = 7 - 1      (k is the number of samples)

Now go to the chi square table

then the critical value for the c2 with 6 degrees of freedom is 12.50.  Since

9.8  <  12.59

we see that there is not enough evidence to conclude that the day of the week is a factor in lunch attendance.