|
|
Additional Module 1 gives further information on continuous outcome data
|
Other types of data
At the beginning of Module 11 we listed a range of different types of outcome data. We dealt with dichotomous data in that module and in Module 12. Continuous data (including long ordinal scales) are addressed in Additional Module 1. Here we address the remaining types of data that you may plan as outcomes, or might come across in your included studies.
|
You can't treat count data directly as dichotomous data…
|
Counts of events
Count data are counts of occurrences measured on individuals. Examples include number of lesions, number of pregnancies, number of cigarettes smoked, number of strokes, or number of days in hospital. In Additional Module 2 on unit of analysis issues, you will see that one source of errors in meta-analysis is treating count data directly as dichotomous data. We cannot enter '12 strokes out of 28 people' as dichotomous data if any of the 28 people had more than one stroke.
How you might deal with count data depends on how common they are, in two respects:
- Do most participants have events? (Will most people have at least one?)
- Do participants have lots of events? (Will individual people tend to have high counts?)
Answering each question with a 'yes' or a 'no' gives us four approximate classifications. Two of them have fairly obvious solutions; the other two do not.
|
|
…but you can treat them as continuous data…
… or dichotomize people to turn them into dichotomous data…
|
- Most people have counts that are mostly high.
An example of this might be number of days on which a person with arthritis has pain, or pulse rate (number of pulses in a given time period). These count data can usually be treated as continuous data, although the distribution may be skewed. See Additional Module 1.
- Only some people have the event and counts are mostly low.
Admissions to hospital and strokes may come into class. A convenient way to analyze these data is to dichotomize people into those that have at least one event and those that have no events. The data can then be entered into RevMan. Alternatively, you might use the data as rate data (see below).
- Most people have counts that are mostly low.
Number of days off work with 'flu might fall into this class. These data can be awkward. They are most like ordinal data, so look in the section below for ideas.
- Only some people have counts that are mostly high. The number of cigarettes smoked in a smoking cessation trial will likely be of this sort. This is another awkward type of count data. It may be wise to dichotomize people, for example, as smokers or non-smokers. If you are tempted to treat the high counts as continuous data, then remember that if substantial numbers of people have counts of zero, then the distribution of the outcome will be severely skewed.
|
|
…or analyze them as rate data
|
When counts per unit time are of interest, then counts should be treated as rate data. A rate is simply a count per unit time, for example, 2 relapses per year. Rate data are extracted for a whole treatment group. To allow for the different times individual participants are followed up for, we come up with a total follow up time for all participants, by adding up the time for each participant. Data corresponding to the rate of 2 relapses per year might be extracted from a placebo group as '1042 relapses during 6247 person-months of follow-up'. These may be compared to '983 relapses during 6229 person-months' in an intervention group. The rate ratio from these numbers is
|

Activity: Convert the person-months in this risk ratio calculation to person-years. What happens?
|
| 938/6229 |
= |
0.158 |
= |
0.946 |
| 1024/6247 |
0.167 |
A standard error for the (log) rate ratio is available, and the generic inverse-variance method for meta-analysis (not available in RevMan 4.1) may be used to combine rate ratios across studies. The facility for combining rate data in meta-analysis is planned for RevMan 4.2, due for release in 2003, but in earlier versions the best solution is to include the results as individual studies in an additional table.
|
|
You can dichotomize ordinal data…
|
Short ordinal scales
Disease severity is commonly classified as 'none', 'mild', 'moderate' or 'severe'. Many assessment scales have only a few categories, say a score between 1 and 5. We often refer to such data as ordinal data. There are numerous approaches to their analysis. The simplest and usually the best is to try and find a cut-point in the scale and to create a dichotomous outcome. For example, 'none or mild' versus 'moderate or severe'. Sometimes, reviewers present more than one cut point, such as also giving 'none' versus 'mild or moderate or severe' in a sensitivity analysis to investigate whether choice of cutpoint affects conclusions.
|
|
…or treat them as continuous data…
|
Another approach is to treat ordinal data as continuous data. Thus we could assign 'none' = 1, 'mild' = 2, 'moderate' = 3 and 'severe' = 4 and take the mean and standard deviation. This is rarely a reasonable approach because it assumes that the assigned numbers represent a real measure of the outcome (i.e. that the difference between mild and moderate is exactly the same as the difference between moderate and severe), when in fact they are arbitrary.
There are more sophisticated methods for analysing these data, which avoid these assumptions, but they are not widely used, and not available in RevMan.
Censored data or survival data
In many situations, health care interventions aim to affect the time until an event happens. For example, we may aim to prolong disease-free survival in cancer, or extend the time to the next fit in epilepsy, or time to heart attack or stroke in people who have just had their first heart attack. The outcome that is measured on each patient in studies of such treatments may be a time until the event. When interest is focussed on time to the event rather than simply whether the event happens, we have survival data. Although time is a continuous outcome, survival data cannot be analysed in the same way as continuous data because we usually have some patients who have not yet experienced the event by the end of the study. For example, although everybody dies eventually, many patients will survive beyond the end of follow up in a randomized trial. Patients that don't experience the event have survival times that are censored. All we know about these patients is that they survived at least until the time when they were last observed.
|
|
You can dichotomize survival data… |
One way to deal with survival data is to select particular points in time and determine whether each participant had experienced the event by each time. The resulting data are dichotomous and can be analyzed as such. For example, many infectious diseases lead to fever, and one marker of a successful treatment is a reduction in the length of fever. Trials of such treatments tend to assess fever at a specific time point, for example after five days. This avoids the problem of censoring and the analysis is straightforward, as long as you have all the data. In longer-term trials, one might create a series of dichotomous outcomes for, say, mortality such as (i) death within 3 months; (ii) death within 1 year; (iii) death within 3 years, and so on. However, this approach can only be used when all participants have been followed up to or beyond the time point used for the analysis, i.e. all participants have been in the study for at least as long as the time point.
|
|
…or analyze them 'properly' using survival data techniques
|
In many specialties, the tradition is to analyze survival data using special methods that account for censoring. This is especially the case in cancer research. These methods include 'log-rank' tests, and 'proportional hazards regression' (or 'Cox regression'). These results can be used in meta-analyses. If the only data you can obtain are of this sort, then statistical expertise will be needed.
|
|