Chi Square and Two Way Tables


Two Way Tables

Is there any relationship between political affiliation and the type of car a person drives?

A survey was done and the following data was collected:

Democrat Republican Other Row marginal total
American 32

(35)

21

(24)

15

(9)

68
Foreign 55

(53)

40

(37)

8

(13)

103
Column marginal total 87 61 23 171
Column percent of total 51 36 13

 

We computed the numbers in parentheses by calculating the expected counts.  We multiplied the row marginal total by the column proportions.  We could also compute this number by calculating


                                 (Row Marginal Total)(Column Marginal Total)
Expected Count  =                                                                            
                                                            Grand Total

 

We have the hypothesis:

        H0: The true proportions are the same for all of the populations

        H1:  The true proportions are not the same for all of the populations.

We compute:  

        (observed - expected)2 
                                                   
                 expected

              (32 - 35)2         (21 - 24)2         (8 - 13)2                                                                                 
        =                      +                      +                       +  ...  =  6.87
                  35                     24                    13

The degrees of freedom is 

        (num of rows - 1)(num of columns - 1) = (2 - 1)(3 - 1) = 2

Now the c2 that corresponds to 2 degrees of freedom and a = .05 is 5.99

We can reject H0 and therefore accept H1 hence there is an association between political affiliation and the type of car a person drives.

An applet that does the two way table computations can be found here

 


Chi Square For Univariate Data

 Recall that we use a t-statistic for a difference between proportions.  If there are three or more Boolean variables, then we must use a different solution.

 

Example  

Suppose that we run a lunch special in our restaurant and want to determine if it makes a difference which day of the week to close.  In other words are all days equally frequented by customers?  We take a tally of the customers for each day and find:

Mon Tue Wed Thur Fri Sat Sun
Customers 30 33 20 22 35 40 30

We have the following hypotheses:

        H0:  p =  1/7,  p=  1/7,  p3  =  1/7,  p4  =  1/7,  p5  =  1/7,  p6  =  1/7,  p7  =  1/7

        H1:  At least one of the p's is not 1/7

Let a = .05

The test statistic that we will use is also called the chi square statistic and is also denoted by the Greek letter c2 .  It is computed as follows:

Notice that the total sample size is 210, hence if H0 is true, then the expected count for each day is 

           210
                      = 30
            7

For each of the data, we compute:

We compute:  

        S (observed - expected)2/expected

       (30 - 30)2                        (33 - 30)2                        
                         = 0                                  = 0.3
           30                                   30

       (20 - 30)2                      (22 - 30)2                  (35 - 30)2   
                         = 3.3                            = 2.1                         = 0.83
           30                                  30                            30

        (40 - 30)2                            (33 - 30)2                        
                          = 3.3                                  = 0
              30                                     30

Now we add these numbers to get:  

        0 + 0.3 + 3.3 + 2.1 + 0.83 + 3.3 + 0 = 9.8

Hence we have

         c2 = 9.8

The degrees of freedom is 

        k - 1 = 7 - 1      (k is the number of samples)

Now go to the chi square table

then the critical value for the c2 with 6 degrees of freedom is 12.50.  Since 

        9.8  <  12.59 

we see that there is not enough evidence to conclude that the day of the week is a factor in lunch attendance.  

An applet that does goodness of fit computations can be found here

 


Back to the Regression and Nonparametric Home Page

Back to the Elementary Statistics (Math 201) Home Page

Back to the Math Department Home Page

e-mail Questions and Suggestions