The result of the test is (i) a ‘chisquared’ statistic (ii) a number called the degrees of freedom (which is usually one less than the number of studies, but can be less if some of the studies have no events, as in the example above) and (iii) a ‘pvalue’ obtained by referring the first two numbers to statistical tables. A small pvalue is often used to indicate evidence of heterogeneity.
As it applies to Cochrane reviews, this test is of somewhat limited value. This is because most metaanalyses in Cochrane reviews have very few studies in them. When there are few studies, the test is not very good at detecting heterogeneity if it is present (it has 'low power'). For this reason, a pvalue of less than 0.10 is often used to indicate heterogeneity rather than the conventional cutpoint of p = 0.05.
Conversely, if there are a lot of studies in a metaanalysis, the test can be too good at detecting heterogeneity. Since we have established that heterogeneity is almost certain to be present as studies are rarely identical, the test will detect significant heterogeneity even if it is clinically trivial (the test has too much power). But the basic problem is that the test does not answer a useful question. It asks the question 'Is there heterogeneity?' whereas we want to know 'How much heterogeneity is there?'
A useful way to identify heterogeneity without having to use statistical tables to look up pvalues is to compare the chisquare statistic with its degrees of freedom. If the statistic is bigger than its degrees of freedom then there is evidence of heterogeneity. A visual inspection of the confidence intervals will help get an idea of the amount of statistical heterogeneity, and guide you to think about whether it is reasonable to combine the results of these studies.
