Discussion & illustration of dietary data harmonisation

Scoop.it ShareThis

< Go to guide main menu

Click on the section links below to learn more about dietary data harmonisation:

Harmonisation potential

All of the original eight CLOSER studies have some form of diet-related questions; however the dietary assessment method used and the number of repeat assessments over time varied greatly between the studies. This heterogeneity will make it difficult to create harmonised dietary variables to apply to cross-cohort analyses.

Harmonisation aims to create comparable measures from various types of data across different studies. Harmonisation involves converting variables that capture the same latent construct across studies into a common format and it can be approached in different ways. Maelstrom Research developed guidelines for retrospective data harmonisation that can be found at https://www.maelstrom-research.org/about-harmonization/maelstrom-guidelines.

The DAPA toolkit described elsewhere in this guide also provides harmonisation principles from a dietary perspective (https://dapa-toolkit.mrc.ac.uk/) with these general steps: 1) Define the target variable; 2) Assess harmonisation potential; 3) Derive common format data. The section below outlines these steps using the harmonisation of fish intake across 12 studies as an exemplar.


Exemplar study from InterConnect consortium

The InterConnect consortium (http://www.interconnect-diabetes.eu/) was established to examine the causes of diabetes and obesity using existing data. As part of this aim, researchers used exemplar projects to understand challenges and approaches to harmonisation. The DAPA toolkit outlines the approach they took to harmonise fish consumption (https://dapa-toolkit.mrc.ac.uk/diet/harmonisation):

  1. Define target variable

The target variable is derived from harmonisation of the raw data in different studies and should be specified in terms of units. This variable should be appropriate to answer the research question as well as being dependent on the data and methods used in the different datasets.

In InterConnect, they aimed to harmonise a total of eight variables (total fish, fatty/oily fish, lean fish, shellfish, saltwater fish, freshwater fish, fried fish, smoked/salted fish), all in g/d, across 12 studies.

  1. Assess the harmonisation potential

It is important to know if the existing data have the ability to capture the same latent constructs. Understanding the specific methods and instruments used in each study as well as the format of the data, the overall study design and any assumptions made during processing within-study data are essential.

In InterConnect, ten studies assessed fish intake using FFQs with two using diet history (a retrospective structured interview method consisting of questions about habitual intake of foods from the core food group). While all studies could create total fish, not all could contribute to the seven other variables.

  1. Deriving a common format

A number of different methods can be applied to derive a common format for the target variable within each study, for example, using a conversion factor or collapsing to the least common denominator. Applying a conversion factor can be straightforward when the relationship between two units is known, as is the case for converting kilocalories per day to kilojoules per day. Collapsing to the least common denominator can include recoding or transforming existing data and would involve applying an agreed set of rules or algorithms depending on within-study data availability. External data can also be used to support deriving a common format. For example, data on average portion sizes could be used in combination with frequency and food type to derive food quantities. However, this should be applied with caution as the degree to which these values can be generalised depends on the specific study population.

When considering the harmonisation of dietary patterns (DPs), the food groups within each study and the items within these groups should be as similar as possible between the studies. If using PCA to determine a DP, the coefficients from study will need to be applied to the other to ensure the same DP is being compared.

All of these suggested approaches have limitations which might make it difficult to compare absolute levels of dietary intake across studies. However by ranking individuals in quartiles according to intake or adherence to a DP, a comparison of associations between diet and health outcomes between studies can be made.

For the InterConnect consortium, methods to transform variables from each study to the common target variable were created and agreed with each study. The two tables below outline the harmonisation approach taken.  There were some specific challenges related to this study. For instance, for some types of fish it was unclear if they should be classified as lean or fatty. Furthermore, the fat content of certain fish and portion sizes can vary depending on location; therefore local knowledge was required to make these decisions.

Example of pre-existing data used to derive target variables (FFQ)

Table adapted from https://dapa-toolkit.mrc.ac.uk/diet/harmonisation [145]

Fish items in the original cohortHarmonised items
Fish typesAssumption of g/portionFrequency and quantityTarget variable (g/d)Harmonisation - categorisation of fishHarmonisation - frequency and quantity
White fish (hake, whiting, bream, grouper, sole)150 gFrequency: never/almost never; 1-3 times/month; once a week; 2-4 times/week; 5-6 times/week; once per day; 2-3 times/day; 4-6 times/day; more than 6 times/dayLean fishWhite fish/day + Cod/dayLean fish: multiply portion/day*150 g
Cod150 g
Blue fish (sardines, tuna, bonito, mackerel, salmon)150 gFatty fishBlue fish/dayFatty fish: multiply portion/day*150 g
Salted or smoked fish50 gSalted/smoked/driedSalted or smoked fish/daySalted/smoked/dried fish: multiply portion/day*50 g
Clam, oyster, mussels60 gSeafood other than fishTotal seafood per daySource data already in g/d
Prawn, king prawn, crayfish100 g
Octopus, squid, cuttlefish150 g
Total fish and seafood per day (derived)g/dTotal fishTotal fish and seafood per daySource data already in g/d
Total seafood per day (derived)g/d


Example of pre-existing data used to derive target variables (diet history)

Table adapted from https://dapa-toolkit.mrc.ac.uk/diet/harmonisation [145]

Fish items in the original cohortHarmonised items
Fish typesFrequency and quantityTarget variableHarmonised categorisation of fishHarmonised frequency and quantity
Total fishg/dTotal fishTotal fish (sum of all available variables) - variables are mutually exclusiveSource data already in g/d
Cod; Baltic herring with bones; Baltic herring; Salmon; Salmon salted; Baltic herring salted with bones; Herring slated; Smoked Baltic herring with bones; Sardine; Smoked redfish; Perch; Pike; Flounder; Bream; Vendace with bones; Fresh frozen saithe; Whitefish; Fish average; Fish in soup, average; Roe; Stockfish; Vendace, salted with bones; Smoked vendace with bones; Smoked lamprey; Smoked whitefish; Smoked fish average; Tuna; Shrimpg/dLean fishCod; Stockfish; Fresh frozen saithe; Perch; Pike; Flounder; Fish, average; Fish in soup, averageSource data already in g/d
Fatty fishBaltic herring with bones; Baltic herring; Salmon; Salmon salted; Baltic herring salted with bones; Herring slated; Smoked Baltic herring with bones; Sardine; Smoked redfish; Whitefish; Vendace, salted with bones; Smoked vendace with bone; Vendace, with bones; Smoked fish averageSource data already in g/d
Salted/ smoked/ driedSalmon salted; Baltic herring salted with bones; Herring salted; Smoked Baltic herring with bone; Smoked redfish; Vendace, salted with bones; Smoked vendace with bone; Smoked lamprey; Smoked whitefish; Smoked fish average: mean of four species; Baltic herring smoked; Vendace smoked; Whitefish smoked; Bream smokedSource data already in g/d
Seafood other than fishShrimpsSource data already in g/d



There are no specific rules for harmonising dietary data across studies. The approach taken depends on the research question and the data available. A metadata inventory documenting methods, data formats and nuances of data processing etc. is the most time consuming aspect of harmonisation. With this guide, we have completed this key step for the original CLOSER partner studies, so that researchers can focus on how best to answer their specific diet-related questions.


Explore additional background detail:

Learn more about the individual studies covered by this guide and their dietary measurements:

Further information:

This page is part of the CLOSER resource: ‘A guide to the dietary data in eight CLOSER studies’.