Academic researchers spend a lot of time looking for data. Whether we want to compare two existing explanations for the same phenomenon, or put our own novel theories to the test, finding the appropriate dataset with which to do this is a crucial step in our work. Data are the key ingredient we are all after.
A shift towards applied research has driven up supply and demand for data
This wasn’t always the case. Take economics for example – back in 1963, less than half of the papers published in the top three American general economics journals (AER, QJE, and JPE) included any kind of data, according to a paper by Professor Daniel S. Hamermesh of the University of Texas. The majority (51%) of the papers published during that year were exclusively theoretical. In 2011, 72 per cent of the papers published were empirical in nature while fewer than 20 per cent were theoretical. This shift towards applied research implied a significant change in our work, and in the tools we use every day.
Figure 1 – Methodology of published articles in three top academic journals of economics (American Economic Review, Quarterly Journal of Economics, Journal of Political Economy). Own calculations using data presented in Table 4 of Hamermesh (2013). ‘Theoretical’ category includes both theory only and theory with simulation papers.
Of course, researchers are not the only ones interested in data. Micro level data are key inputs into policy design, and both national and multinational aid agencies invest heavily in generating and analysing these data. As a result, more and more studies are being produced each year, with different characteristics and in different parts of the world. At the time of writing this blog, the World Bank Central Microdata Catalogue held 2,434 studies from 177 countries. The UK Data Service provided access to 7,345 studies. The data from most of the studies are publicly accessible. So researchers face a unique challenge: how to search and (hopefully) find the data they are looking for.
Data are not valuable if they’re not discoverable
The plethora of data available for research is at once exciting and daunting. An academic seeking to conduct work with existing data will first have to be able to find the data they are looking for. This implies searching through a host of public data archives such as the World Bank Central Microdata Catalog, the UK Data Service, the Harvard Dataverse, the Inter-University Consortium for Political and Social Research (ICPSR) archive, and the statistical agencies of each individual country of interest. This is extremely time consuming, and researchers might not find datasets hosted in archives they are unaware of.
One of CLOSER’s main objectives is to make data more easily discoverable. Our unique search engine CLOSER Discovery enables researchers to search and browse questionnaires and data from eight of the UK’s leading longitudinal studies, and to find out what data are available.
Improving discoverability across borders
Now CLOSER is taking up a new challenge. With funding from the Economic and Social Research Council (ESRC) through the Global Challenges Research Fund (GCRF), it has set up a new project, CLOSER International. Our objective is to bring to light opportunities for longitudinal research in the developing world, and for international comparisons between the UK and low and middle-income countries.
Figure 2 – Snapshot of The IFS Directory website. Countries in white have at least one study in the Directory. Clicking on a country opens up the menu with existing survey modules available.
We will build on previous ESRC-funded efforts in this direction, such as the Directory of Longitudinal Population Studies from Low and Middle-Income Countries (Figure 1), by the Institute for Fiscal Studies. This directory, co-funded by the Medical Research Council and the Wellcome Trust, includes 175 social science and biomedical studies from 51 countries. These studies are a unique source of evidence for researchers and policymakers alike. We aim to combine this existing resource with CLOSER’s established track record in metadata creation and data discoverability to create a detailed catalogue of the existing studies, and the persisting gaps. Our aim will be to promote high-quality, open-access studies, and to provide a platform for researchers in this area.
We will aim to build on the content available in CLOSER’s Learning Hub to provide relevant information and guidance for those who are conducting research in developing countries. We will also continue to provide technical assistance in the development and use of longitudinal data.
Francisco Oteiza is a Senior Research Associate for CLOSER, the home of longitudinal studies. You can follow him on Twitter (@franoteiza_econ).
Oteiza F (2018) “CLOSER international: data discoverability across borders”, CLOSER blog.