# Meta-analysis of continuous data

## Measuring the effect of treatment

Combining continuous outcomes

Meta-analyses involving continuous outcomes are based on comparing means. The basic way of comparing outcomes from two treatment groups is to look at the difference between the mean of each group. This difference between means, and its standard error, can be calculated from the six numbers listed above. Given this standard error we can award each trial a weight and use the inverse-variance method of meta-analysis to obtain a summary or combined mean difference and its confidence interval. Fixed effect and random effects methods for achieving this are available using RevMan, where you need only enter the six basic numbers from each study.

Combining outcomes measured on different scales

The meta-analysis of differences between means from different trials relies on the outcome being measured in the same units in every trial: we can’t combine a difference in mean weight loss in kilograms with a difference in mean weight loss in pounds. If you know the multiplication factor to convert from one scale to another (for example how many pounds there are in a kilogram), then you should directly convert all the data to the same units. However, we can’t combine two different psychometric scales even if they both measure depression as the multiplication factor is not known. A way around this is to compare standardized mean differences, rather than actual means.

The standardized mean difference is the difference in means divided by a standard deviation. This standard deviation is the pooled standard deviation of participants’ outcomes across the whole trial. Note that it is not the standard error of the difference in means (a common confusion).

The standardized mean difference has the important property that its value does not depend on the measurement scale. For example, consider a trial evaluating an intervention to increase birth weight. The mean birth weights in intervention and control groups were 2700g and 2600g with an average SD of 500g. The SMD will be

(2700 – 2600)/500 = 0.2

If the trial had measured birth weight in ounces, the results would be means of 95oz and 92oz with an average SD of 15oz. The SMD will be

(95-92)/15 = 0.2

– the same number from the analysis based on grammes.

So, if we have several trials assessing the same outcome, but using different scales, we use a standardised mean difference to convert all outcomes to a common scale, measured in units of standard deviations. But what is the interpretation of the standardized mean difference? That is a good question, and one that troubles statisticians and health care decision makers. What it actually measures is the number of standard deviations between the means. This quantity is not directly useable.

Interpreting standardised mean difference

Let us consider the birth weight example. We can view the number of standard deviations’ difference as a ‘standardized’, or dimensionless, form of the actual findings. The value of 0.2 is the number of SDs by which the intervention changes outcome – if it is measured in grammes (where the SD is 500g) it changes by 0.2 x 500 = 100g, if it is measured in ounces (where the SD is 15oz) it changes by 0.2 x 15 = 3oz.

In practice, of course, we would not want to use the SMD method to analyse birth weight as we are able to convert between units of measurement using an exact conversion factor. However, we commonly have to use it when different measurement tools (e.g. scales) are used to measure the same clinical outcome.

For example, suppose a potential treatment for depression in the elderly achieves an average improvement of 2 points on the Hamilton Rating Scale for depression (HAMD). And suppose that the pooled standard deviation of HAMD scores is 8. Then the standardized mean difference is 2/8 = 0.25. If a similar treatment effect was to be observed on an alternative depression scale, say the Geriatric Depression Scale (GDS) which has a standard deviation of 5 points, then a standardized mean difference of 0.25 is equivalent to an improvement of 1.25 points on the GDS.

We must be careful with using the standardized mean difference, however. First, we must be sure that the different measurement scales are indeed measuring the same clinical outcome. Second, problems can arise through the use of the pooled standard deviation for the standardizing. To illustrate the latter, let us return to our study with a 2-point improvement in HAMD score (pooled SD = 8). Imagine a second study in the same meta-analysis that also used the HAMD, but had more restrictive inclusion criteria. The tight inclusion criteria meant that participants were more similar to each other, and their pooled standard deviation in HAMD scores was only 5. Imagine further that the drug was equally effective in this study in that it also achieved a 2-point average improvement in HAMD score. The standardized mean difference for this study is 2/5 = 0.4. Therefore the same effect of treatment gives a different standardized mean difference just because of the tighter inclusion criteria. This is an unfortunate implication of using standardized mean differences. Nevertheless, if studies do use different scales, there are usually few alternatives to using the standardized mean difference to combine results in a meta-analysis.

Finally, we should point out that in RevMan and commonly in The Cochrane Library, the mean difference method is referred to as ‘WMD’ (weighted mean difference) and the standardized mean difference method as ‘SMD’.