The discoverability of data collected by the UK’s longitudinal studies has come a long way since 2015, argues CLOSER’s Senior Metadata Officer Hayley Mills, who charts the past, present and future of finding study data.
Previously, trying to find out what data had been collected by the eight longitudinal studies in CLOSER involved accessing and searching through a variety of different forms of documentation, often varying by study. It was an involved and time-consuming process for researchers and those working in policy. For those new to the studies this was even more daunting, as understanding the vast amount of data collected could be overwhelming.
If you were a study, you may have had to handle incoming requests from researchers who wanted to browse the available data, and invest time in trying to find the most appropriate variables for each request. Managing longitudinal data is not easy, with studies having to maintain complex and diverse data, as well as make these available for several decades – over 70 years in some cases. This longevity brings with it potential risks: for example, loss of knowledge over time through staff changes or relocation. In addition, many of the systems may not have been built for interoperability, potentially resulting in duplication of work and information when working with others.
With the digital age beginning decades ago, and new researchers growing up in the age of Google, expectations about how we find and manage data have changed. So we set out to create CLOSER Discovery on behalf of the longitudinal community, to bring the discovery of longitudinal data in line with current technologies and to address some of these challenges.
Our first, and fundamental goal was make it easier – in some cases even possible – to search the data collected from the UK’s longitudinal studies, as well as the related survey questions and measurements. We also wanted to provide sufficient information to understand those underlying questions and measurements, and the context within which information was collected. Finally, we wanted as far as possible for it to be future-proof, adhering to the best metadata standards for interoperability with other infrastructures, both in the UK and internationally.
Three ways CLOSER Discovery makes your life easier
In addition to the benefits of discoverability, CLOSER Discovery and its infrastructure has the potential to be a powerful tool and resource for researchers and data managers.
- It facilitates cooperation
Using CLOSER Discovery, researchers can now find all the information in one place. Discovery provides detailed information about variables, data collection instruments and the data collection process, as well as allowing comparability and equivalency of this metadata between studies. This all aids the ability of researchers to harmonise data and draw comparisons between studies, an area of increasing interest both from funders (to maximise the value of publically funded data sets) and policymakers (to help them understand change across generations).
The metadata in CLOSER Discovery can be queried by all kinds of search engines, archives and data analysis tools by using either the API (Application Programming Interface) or OAI–PMH (Open Archives Initiative – Protocol for Metadata Harvesting), eliminating the need for individual studies to provide this metadata themselves and avoiding duplication.
- It is transparent
CLOSER Discovery facilitates openness of metadata by providing detailed provenance of a study’s data in a safe and secure way for studies that cannot make data openly accessible. In addition, longitudinal studies can be complex to understand, and getting to know a study can take a long time. Often, useful information is held in several different places or with a number of individuals. CLOSER Discovery provides a more, centralised, standardised and simplified method of learning about the studies.
- It is sustainable
Running longitudinal studies is a costly process and requires lots of manual work. CLOSER Discovery brings the studies’ infrastructure and information more in line with current technologies, like automation. It uses the same metadata standard (Data Documentation Initiative) as many other data centres, research organisations, statistical agencies and international organisations such as the World Bank. This guarantees the robustness and long-term validity of the metadata now being generated. As technology develops further, these standardised, sustainable metadata are easily moved to new platforms and systems, without any loss of information. This helps studies to protect against the risk of losing metadata about their data as well as losing the knowledge about data held by staff.
CLOSER Discovery will never be a finished product. Firstly, we are improving transparency further by allowing researchers to create lists of key variables that can be shared or referenced in journal articles. Secondly, we are encouraging more collaboration by identifying equivalent variables within and across studies, as well as working with other metadata platforms. And lastly, we are continually adding content from current and additional studies to safeguard more data and metadata for a sustainable future.