What does linking data involve and why is it important?
Government departments routinely collect data on various aspects of life in the UK: children’s progress through the education system, information about benefits claimed and taxes paid, and individuals’ experiences of hospital treatment.
It is widely recognised that these data have immense potential value for research across a wide range of subject areas. The value of ‘administrative’ data for research can be even greater when securely linked, with participant consent, to longitudinal survey data.
What are the challenges?
Administrative data potentially provide a valuable resource for researchers but are often very complex and detailed. Work will be needed to construct useful and simple measures from administrative records that can be used by a wide range of researchers in conjunction with survey data. Longitudinal data and administrative data will only be linked where study members have given explicit permission and consent. It is therefore important to check what factors influence consent in order to analyse the data in a way that takes account of potential biases.
What is CLOSER doing about it?
CLOSER is working on a coordinated and strategic approach to improving the links between these different types of data and improving their accessibility for researchers. Under the Data Linkage work stream, there are four work packages:
Work Package 5: Linkage to Administrative and Educational Data
This work package is now being led jointly by Alissa Goodman who is leading the Data Linkage to Economic Indicators, and Lorraine Dearden who is leading the work on linking to education data and ensuring alignment to the Administrative Data Research Centre for England which is undertaking similar data linkage work. As part of the project to link educational data, the ‘consistent schools database’ (CSD) has been created to enable researchers to track the performance of schools more accurately and efficiently than ever before.
Work Package 6: Linkage to geographic data
This work package explores the geographical contextual data of contemporary relevance to survey data e.g. indicators such as area deprivation, geo-demographics, unemployment, weather, pollution, access to services and migration distance measures based on residential addresses. It seeks to geocode cohort members’ addresses in a non-identifiable way to enable cohort data to be linked with geo-contextual data. It is led by Chris Dibben at Edinburgh University, who is working with Zhiqiang Feng at St Andrews University.
Work Package 7: Linkage to Health data – Hospital episode statistics
This Work Package is based at the Institute for Social and Economic Research, University of Essex and is led by Michaela Benzeval with a team of researchers from ISER. It aims are to examine issues around consent to linkage to health data among people in a general population survey and to carry out a pilot linking survey data from consenting participants to Hospital Episode Statistics. The team published a paper which was published in November 2014 in the BMC Medical Research Methodology journal outlining some of the findings.
Work Package 8: Enabling data linkage in CLOSER studies
This project, led by Andy Boyd at ALSPAC, seeks to help cohort & longitudinal studies overcome barriers to data linkage. It will conduct an exemplar data linkage investigation as a case study of cross-cohort analysis using primary care and education records liked to ALSPAC and Born in Bradford data. It also will develop and disseminate linkage methodology and guidance.
Work Package 14: Setting standards to maximise the scientific potential of primary care record linkage in longitudinal studies
This work package, led by Andy Boyd at ALSPAC, seeks to pool the experience, understanding and contacts gained through previous primary care record linkage work to help CLOSER, and other UK, longitudinal studies overcome technical barriers and optimise research potential. The project will run a series of workshops and write a position paper focussing on data science issues rather than governance issues.
Work Package 21: A framework for linking and sharing social media data for high-resolution longitudinal measurement of mental health across CLOSER cohorts
Oliver Davis, University of Bristol, will lead a project to develop an open-source software framework for securely linking Twitter data. The project will then develop a software framework for archiving and sharing the information derived. There will be a proof-of-principle collection which shares data in ALSPAC with Understanding Society and TEDS. The results will be communicated at a workshop.