The GDD Prediction Model

Overview

The GDD prediction model estimates mean intake of 55 dietary factors by country, year, age, sex, urbanicity, education, and pregnancy status in 185 countries by synthesizing survey mean intake data from heterogeneous sources. The Bayesian multilevel framework has some advantageous properties that are appealing for our aims. A summary of the model is provided below.

Description of the Model

Fundamentally, the model is Bayesian on the log-means of intake with a nested hierarchical structure, assuming exchangeability between countries and superregions after accounting for covariates. To this structure, we add sex, urban/rural, education, and non-linear age effects (also within a nested-hierarchical structure), survey and country-level covariates, and overdispersion on study-level variance to account for non-sampling variation. It borrows heavily from models presented in Finucane et al. (Lancet 2011) and Flaxman et al. (An Integrative Metaregression Framework for Descriptive Epidemiology, 2015).   

Hierarchical Nature of the Data

The model uses a hierarchical structure in which countries are nested in superregions, which are nested in the globe. The model assumes that the superregion means are distributed log-normally around the global mean, and that country means are distributed log-normally around their respective superregion means. 

The model uses the following six superregions:

  • Asia
  • Former Soviet Union (FSU)
  • High-Income Countries (HIC)
  • Latin America and The Caribbean (LAC)
  • Middle East, North Africa, and South  Asia (MENA)
  • Sub-Saharan Africa (SSA)

Intercept, age trend, sex differences, education differences, and urban/rural differences

We fit a multi-level model with 3 levels (countries nested in superregions nested in the globe) for intercepts and sex differences, and 2 levels (superregions nested in the globe) for age pattern, education differences, and urban/rural differences. For many surveys, intake is not linearly associated with age. We model age using cubic splines with knots at age 20 and 65, using both regional and global effects. The model assumes between-country variance is the same across all superregions, and that education, urban/rural differences and age patterns are assumed to be the same for countries within a superregion.

Covariate Effects

We incorporate both country- and year-specific covariate data to further inform our estimates. Click here for more information on the selection of covariates used for each dietary factor. 

Overdispersion

An additional variance component is added to each study to allow the model to account for non-sampling variation due to survey-level error. Sources of this non-sampling variation include surveys not being nationally representative, surveys not being fully stratified by sex, urban/rural, or education, and surveys using large age groups (greater than 10 years). We also add an additional constraint to ensure local surveys are considered more variable than regional surveys.

Computation and Predictions

We fit the model using STAN, using the No-U-turn sampler (NUTS),  a variant of Hamiltonian Monte Carlo.  We use 4 chains of 2000 iterations each, treating the first 1000 iterations of each chain as a warm-up, for a total of 4000 Monte Carlo iterations to define our posterior distributions. The model described above is ultimately used to provide predictive distributions of mean intake for each dietary factor by country-year and subgroup. The output presented in this website are the medians, and 2.5th and 97.5th percentiles of these distributions.