Sunday, March 15, 2015

Things to check for using Chi Square test for evaluating independence of variables


  • Chi-square Test 
    • test null hypothesis that the variables are independent
    • is designed to analyze categorical data (i.e., data has been counted and divided into categories)
    • is only meant to test the probability of independence of a distribution data. It can't tell any details about the relationship between them. If we found that the two variables are related using Chi-square test, then can use other methods to explore their interaction in more detail such as using odds ratio.
    • the variables you consider must be mutually exclusive (no item should be counted twice and the data from all of your cells should add up to the total count)
    • never exclude some part of data set. For example,  If your study examined males and females registered as Republican, Democrat, and Independent, then excluding one category from the grid might conceal critical data about the distribution of your data. 
    • Estimated data in any given cell should not below 5. If this is the case, consider other techniques...
                ex). We have a complete data set on the distributions of 1000 individuals into categories of professional levels (pro/non-pro) and country categories (OECD/Non-OECD). A Chi-square test would allow you to test how likely it is that professional levels and country categories are completely independent.






  • References
    1. http://math.hws.edu/javamath/ryan/ChiSquare.html
    2. http://www.ling.upenn.edu/~clight/chisquared.htm
Post a Comment