Different types of data
There are several different types of data you may come across in your included trials. Some of the more common data types are:
- Dichotomous data are data from outcomes that can be divided into two categories (e.g. dead or alive, pregnant or not pregnant), where each participant must be in one or other category, and cannot be in both
- Counts of events (for example number of epileptic fits)
- Short ordinal scales or scales with a small number of categories where there is a natural order to the categories (for example a pain scale of “none/mild/moderate/severe”)
- Long ordinal scales or scales with a large number of categories with a natural order (for example the Short Form-36 scale for assessing quality of life or a depression index)
- Continuous data which are data from outcomes measured on a continuous scale (for example blood pressure, range of motion of a knee joint)
- Censored data or survival data (such as time to recurrence of cancer, where we will not have a measurement on everyone at the end of the study, because some haven’t had a recurrence of cancer).
This module, and the next, will discuss issues of dichotomous data, as most Cochrane reviews contain data in this form. If you have continuous outcomes in your review then you will need to complete Additional Module 1 after you have completed Modules 11 and 12.
We often make our own dichotomous data from outcomes that are not truly dichotomous, so that they are easier to manage and understand. For example, converting blood cholesterol (measured on a continuous scale) to ‘high cholesterol’ or ‘not high cholesterol’ dichotomised around a clinical threshold above which you would consider the cholesterol to be high; or converting pain measured on a short ordinal scale to ‘absent or mild’ or ‘not absent or mild’ (by which we mean moderate or severe). Generally long ordinal scales, or scales with a large number of discrete categories, are treated as continuous data for the purpose of analysis.
Sometimes, censored data are converted into dichotomous data by counting the number of people who have had the event by a particular time (such as the number of people who have a recurrence of cancer within 5 years of an operation). This should only be done when all participants have been followed up to the particular time point.
The benefits of converting non-dichotomous data into dichotomous data relate to ease of analysis and interpretation. Of these, the more important is ease of interpretation. Dichotomous outcomes may be easier for decision makers to understand and make judgements about.
The down side of converting other forms of data to a dichotomous form is that information about the size of the effect may be lost. For example a participant’s blood pressure may have lowered when measured on a continuous scale (mmHg), but if it has not lowered below the cut point they will still be in the ‘high blood pressure group’ and you will not see this improvement. In addition the process of dichotomising continuous data requires the setting of an appropriate clinical point about which to ‘split’ the data, and this may not be easy to determine.