Building the CLOSER metadata search platform

Scoop.it ShareThis

Opinion

Search magnifying glassThe CLOSER metadata search platform has three main goals:

  1. To enable discovery of the data collected – and the related survey questions and measurements – across the 9 participating studies.
  2. To provide sufficient information to understand those underlying questions and measurements, and the context within which information was collected.
  3. To adhere to the best metadata standards for interoperability with other longitudinal studies, both in Britain and internationally.

The project is using several technologies in combination to harvest metadata where it already exists, and input metadata where it does not. We have chosen to structure the metadata to be compatible with the most recent version of the Data Documentation Initiative’s standard (DDI3.2 [1]).

This standard supports high-level metadata describing the study, survey and datasets but also lower-level metadata describing the data variables, instruments and questions, permitting comparison between questions and variables, between and within studies. The use of DDI allows this comparison to be undertaken systematically and at any level of the metadata.

The project will not only provide searchable metadata for a wide range of users, but it will also provide a resource for the studies themselves to continue to update and correct their back catalogue of metadata and add new content as future surveys are fielded.

Questionnaires

The majority of questionnaires from the studies were collected on paper and will be input into a questionnaire capture tool (CADDIES] originally developed at CLS and further developed under the auspices of CLOSER to support DDI 3.2 in particular to manage capture of question grids [2].

Although the capture tool is an efficient mechanism for inputting the questionnaire contents, doing so in a consistent manner for the wide variety of instruments used in over 60 years of data collection in these studies is a non-trivial problem.

To this end, the metadata officers responsible for questionnaire entry have developed a set of protocols and guidance which underpin our quality standards [3]. Some questionnaire text is available from the studies themselves or from the UK Data Service, and where this is available, it has been imported into CADDIES to reduce the data entry burden.

A controlled vocabulary is then applied to the questions to broadly classify them to assist in search.

Variables

For variables, metadata is extracted programmatically from the data file and this includes variable names, labels, code lists, descriptions and statistics. This information is then mapped to the questions.

All of this metadata is imported into a metadata repository. CLOSER is using an off-the-shelf software product: Colectica [4]. This a mature product which has allowed the project to focus on the development of the metadata collection rather than investing time and effort in software development.

Comparison and Provenance

Once in the Colectica Repository, relationships between metadata items are identified, allowing us to manage the metadata in a systematic way. We will be utilising the Software Developer’s Kit to add more value to the metadata; for example, by identifying those questions and variables which have been validated and reused by more than one of the CLOSER studies. This will allow those commissioning or directing other longitudinal studies to use these tried and tested measures and enable further comparative research.

Extending this, the use of DDI will allow the discovery and comparison of similar questions and variables from studies outside CLOSER which also utilise DDI such as the Midlife in the United States [5] and studies involved in DASISH [6].

Read more about the CLOSER search platform here.

Notes

[1] DDI-Lifecycle 3.2 [http://www.ddialliance.org/Specification/DDI-Lifecycle/3.2/]

[2] How Do We Manage Complex Questions in the Context of the Large-Scale Ingest of Legacy Paper Questionnaires into DDI-Lifecycle? (2013) Gierl, C, Johnson, J. http://www.eddi-conferences.eu/ocs/index.php/eddi/EDDI13/paper/view/92

[3] Protocol Development for Large-Scale Metadata Archiving using DDI-Lifecycle (2014) Poynter, W, Spiegel, J http://www.eddi-conferences.eu/ocs/index.php/eddi/eddi14/paper/view/168

[4] Colectica Repository http://www.colectica.com/software/repository

[5] MIDUS http://midus.colectica.org

[6] DASISH http://dasish.eu/activities/