Chi2

Chi squared is the other statistical test you are coming across in your first year....

I have already written a post on the t-test, and how this can be used to see if means from different data sets are significantly different. But the t-test only works with measurement data (bummer!). These are data that are continuously variable - i.e. can take any numerical value within a range, like between 1mm and 2mm you can have any value eg 1.5mm or 1.0000003mm or 1.8585mm.........

So what about when you have data that is discontinuous. For example eye colour, motile or non-motile cells, Marmite lover or Marmite detester. In these cases we are dealing with numbers of individuals in defined classes - therefore these data can only have particular values with nothing in-between. 

In these cases, the questions you might have are; Do my results reflect what I expected? or, Are my results significantly different from what I expected and so are they suggesting my expectation was wrong and something else is happening.... If you have these questionsChi squared comes to the rescue!

Chi squared can be used to measure how big the difference is between what you have got and what you expected


The Chi squared formula goes like this:

image taken from: http://course1.winona.edu/sberg/Equation/chi-squa.gif
X is the Greek letter Chi,
o is the number of observed individuals in a category,
e is the number of individuals expected to be in a category.

The best way to explain how Chi squared works, is with examples....

Example 1

Suppose you threw a dice 600 times and counted how many times you got a 1, 2, 3, 4, 5 and 6. You might expect to get each number 100 times assuming your dice is fair. In reality you get different values from what you expect - these are your observed values:






The question is, do your observed values agree with what you expectedor are they significantly different from what you expected in which case you have an unfair/weighted dice.... Lets put the Chi squared equation to some use to find out...



So, in the above table I have figured out a Chi squared value for my data. If you do this in a table like I have, you can easily break the formula at the top of this post down into nice manageable chunks. So, first you take all of the expected values away from your actual observed values (o - e), then you square the value you get, and then you divide this value by your expected value. For the last stage you add up all the values in the last row to get your Chi squared value of 26.

.... but what does a Chi squared value of 26 actually mean??? Well this is where you need to consult a table that shows values of Chi squared you could have from data with different degrees of freedom and with different associated probabilities:


Degrees of Freedom
Probability, p
0.99
0.95
0.05
0.01
0.001
1
0.000
0.004
3.84
6.64
10.83
2
0.020
0.103
5.99
9.21
13.82
3
0.115
0.352
7.82
11.35
16.27
4
0.297
0.711
9.49
13.28
18.47
5
0.554
1.145
11.07
15.09
20.52
6
0.872
1.635
12.59
16.81
22.46
7
1.239
2.167
14.07
18.48
24.32
8
1.646
2.733
15.51
20.09
26.13


To use this table you need to know how many degrees of freedom there were in your experiment; this is simply found by taking 1 from the number of categories you had - so for our example we had 6 categories so 5 degrees of freedomOnce you know that, you will only look in the row of Chi squared values next to the 5 degrees of freedom category:

Degrees of Freedom
Probability, p
0.99
0.95
0.05
0.01
0.001
1
0.000
0.004
3.84
6.64
10.83
2
0.020
0.103
5.99
9.21
13.82
3
0.115
0.352
7.82
11.35
16.27
4
0.297
0.711
9.49
13.28
18.47
5
0.554
1.145
11.07
15.09
20.52
6
0.872
1.635
12.59
16.81
22.46
7
1.239
2.167
14.07
18.48
24.32
8
1.646
2.733
15.51
20.09
26.13


In Biology we often set what p or probability value we are going use before we start an experiment, and most often the value of 0.05 is chosen. If you look up in the table, you will see that at a p-value of 0.05 and 5 degrees of freedom, you have a value of 11.07. If you compare the chi squared value we have calculated (26) to 11.07, you can see that it is a bigger number, and so would be placed on the right of 11.07 in the table. This indicates a significant result, i.e. the results were not what we expected, indicating our dice was biased.

Example 2

Lets look at this again, with data from a different dice:


This time our Chi squared value is 1.96...

Again you would compare this value to 11.07 in the table above. Now the chi squared value we have calculated is smaller than 11.07, and so we conclude that what we observed matches well with what we expected, or that our dice was fair this time.

TIps

In Biology we tend to use 5% as our cut off probability, and so we mainly focus on the 0.05 column in statistical tables:

Degrees of Freedom
Probability, p
0.99
0.95
0.05
0.01
0.001
1
0.000
0.004
3.84
6.64
10.83
2
0.020
0.103
5.99
9.21
13.82
3
0.115
0.352
7.82
11.35
16.27
4
0.297
0.711
9.49
13.28
18.47
5
0.554
1.145
11.07
15.09
20.52
6
0.872
1.635
12.59
16.81
22.46
7
1.239
2.167
14.07
18.48
24.32
8
1.646
2.733
15.51
20.09
26.13


Are you finding this all very confusing? Just remember:

  1.  Find the row with the correct number of degrees of freedom 
  2. Find the value in this row that corresponds to a probability of 0.05
  3. If your Chi squared value is higher than the value in the table, you know your results are deviating significantly from what you expected - i.e. what you expected is likely to be wrong. 
  4. If your Chi squared value is lower than the value in the table, then your results do not deviate significantly from what you expected them to be - i.e. your expectation is likely to have been correct.

Comments