Senate House, London - 9th September 2015
The aim of this workshop was to provide an overview of the opportunities and challenges inherent in cross-cohort research, with a particular focus on data harmonisation.
The rationale behind this event was the belief in the value of cross-cohort comparisons – that is, the ability to compare findings from different cohort studies. Such comparisons allow the findings from one study to be tested and replicated, and more robust conclusions to be reached. Comparison of longitudinal studies that differ by birth cohort or country provide opportunities for understanding the influence of different contexts. Harmonisation of data facilitates pooling of data across multiple studies to increase statistical power and allows cross-cohort comparisons of results in different contexts.
Harmonising data in order to make valid comparisons between studies is challenging. The same can be true of harmonising data across different waves of the same study (for example, when measurement approaches and instruments change). There is no well-established standard procedure for the retrospective harmonisation of data. There are also different approaches to the analysis of cross-cohort data – from pooling in a single dataset, or a 2-step meta-analysis, to coordinated independent analyses of the different datasets.
William Johnson (MRC HNR) and Claire Crawford (Institute of Fiscal Studies) presented work carried out within CLOSER to harmonise body size and socioeconomic variables, respectively, across multiple UK cohorts. They demonstrated how these variables can be used to address relevant research questions about the development of the obesity epidemic and generational changes in social mobility. Researchers from successful external cross-cohort initiatives were invited to provide insight into projects across multiple disciplines. Jenny Head (UCL) talked about her work on a cross-country comparison of ageing cohorts (IDEAR) and Extending Working Life (renEWL). Susan Hodgson (Imperial College London) related her experience of work within BioSHaRE relating environmental variables such as noise and pollution to health and the use of DataSHIELD for analysis. Both speakers were asked to outline their approaches to data harmonisation and cross-cohort analysis and highlight how any challenges were overcome. To finish the day, Graciela Muniz-Terrera (MRC LHA) outlined the different approaches to analysis in cross-cohort studies and the implications for data harmonisation of each approach.
Several key themes emerged from the workshop and are listed below.
- The level to which data need to be harmonised depends on the scientific question under investigation and the studies included. Different levels of harmonisation will be appropriate depending on whether the aim is to compare means or prevalence estimates, or whether it is to investigate associations between a risk factor and an outcome.
- The documentation and meta-data of harmonised datasets are vital. Even where harmonised variables or datasets exist, researchers need to be able to consider whether the data are acceptable for their specific scientific question.
- Harmonisation is an iterative process in practice. The initial conceptual model of harmonisation often has to be modified in practice once data are obtained. The input of someone with expertise in the specific variables being harmonised is very useful.
- External data sources can be helpful. Existing data from surveys can be used to check harmonised variables, and results from calibration studies comparing measurements using different machines or tests can be very useful.
- Checking the robustness of scientific findings is important. Sensitivity analyses should be used where possible to check the sensitivity of the conclusions to decisions made during the harmonisation process.
Download and view the event presentations
Session 1: CLOSER data harmonisation
Harmonisation of socio-economic resources (slides only)
Claire Crawford, Institute of Fiscal Studies – video
Principles of data harmonisation (small group work)
Discussion facilitators: Graciela Muniz-Terrera and Rebecca Hardy, MRC Unit for Lifelong Health and Ageing at UCL
Session 2: Cross-cohort research: examples
Approaches to cross-national research on health and employment in later life: the renEWL and IDEAR studies
Jenny Head, Department of Epidemiology and Public Health (UCL)
BioSHaRE Environmental Determinants of Health project; opportunities and challenges of cross-cohort working (slides only)
Susan Hodgson, MRC-PHE Centre for Environment & Health (Imperial College London) – video
Graciela Muniz-Terrera, MRC Unit for Lifelong Health and Ageing at UCL – video
Publications referenced to in the talks
How Has the Age-Related Process of Overweight or Obesity Development Changed over Time? Co-ordinated Analyses of Individual Participant Data from Five United Kingdom Birth Cohorts – William Johnson, Leah Li, Diana Kuh, Rebecca Hardy
Development of NO2 and NOx land use regression models for estimating air pollution exposure in 36 study areas in Europe – The ESCAPE project – Beelen RG, Hoek D, Vienneau M et al.
Development of Land Use Regression Models for PM2.5, PM2.5 Absorbance, PM10 and PMcoarse in 20 European Study Areas; Results of the ESCAPE Project – Eeftens M, Beelen R, de Hoogh K et al.
Western European Land Use Regression Incorporating Satellite- and Ground-Based Measurements of NO2 and PM10 – Vienneau D, de Hoogh K, Bechle MJ et al.
International scale implementation of the CNOSSOS-EU road traffic noise prediction model for epidemiological studies – Morley DW, de Hoogh K, Fecht D et al.
Data harmonization and federated analysis of population-based studies: the BioSHaRE project – Doiron D, Burton P, Marcon Y, et al.
Long-term exposure to air pollution and cardiovascular mortality: an analysis of 22 European cohorts – Beelen R, Stafoggia M, Raaschou-Nielsen O.
DataSHIELD: taking the analysis to the data, not the data
to the analysis – Gaye A, Marcon Y, Isaeva J, et al.
Coordinated analysis of age,sex and education on change in MMSE scores – Piccinin et al.
Advantages of Integrative Data Analysis for Developmental Research – Bainter & Curran