Type 2 Diabetes–Prevention Diet and All-Cause and Cause-Specific Mortality

A Prospective Study

Chun-Rui Wang; Tian-Yang Hu; Fa-Bao Hao; Nan Chen; Yang Peng; Jing-Jing Wu; Peng-Fei Yang; Guo-Chao Zhong


Am J Epidemiol. 2022;191(3):472-486. 

In This Article


The results of the present study were reported in accordance with the Strengthening the Reporting of Observational Studies in Epidemiology statement.[17]

Study Population

Our study population was identified from the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial, a large randomized clinical study with 10 enrollment centers (St. Louis, Missouri; Honolulu, Hawaii; Denver, Colorado; Pittsburgh, Pennsylvania: Marshfield, Wisconsin; Birmingham, Alabama; Salt Lake City, Utah; Washington, DC; Minneapolis, Minnesota; and Detroit, Michigan). This trial was designed to investigate the potential beneficial effects of selected screening exams on the risks of death from prostate, lung, colorectal, and ovarian cancers. Study design of the PLCO Cancer Screening Trial has been reported elsewhere.[18] Briefly, during November 1993 and September 2001, individuals aged 55–74 years were invited to take part in this trial. A total of 154,887 individuals were qualified for enrollment and individually randomized to the intervention group or the control group in equal proportions, with individuals in the intervention group receiving selected screening exams while those in the control group received usual care. All participants provided written informed consent. The PLCO Cancer Screening Trial was approved by the institutional review boards of the US National Cancer Institute and each enrollment center.

The following participants were further excluded from our study: 1) 4,918 participants failing to return a baseline questionnaire, a baseline risk-factor questionnaire with participant-reported information (e.g., demographic characteristics and medical history); 2) 33,241 participants failing to return a diet history questionnaire (DHQ); 3) 5,221 participants with an invalid DHQ—the valid DHQ refers to having a DHQ completion date, DHQ completion date prior to death date, <8 missing frequency responses, and the absence of extreme energy intake (top 1% and bottom 1%); 4) 9,684 participants with a history of cancer at baseline; 5) 2,046 participants with a history of stroke at baseline; 6) 7,886 participants with a history of heart attack at baseline; and 7) 5,258 participants with a history of diabetes at baseline. Finally, a total of 86,633 participants were included (Figure 1). The reason for excluding participants with a history of cancer, stroke, heart attack, or diabetes at baseline was that they might alter their dietary habits after receiving these diagnoses, which might result in reverse causation.

Figure 1.

Flow chart identifying subjects included in this study evaluating a type 2 diabetes–prevention diet and multiple causes of mortality, a post hoc analysis of the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial, United States, 1993–2009. The total number of subjects for an exclusion category box was not available in the PLCO Cancer Screening Trial. DHQ, diet history questionnaire.

Calculation of Dietary Diabetes Risk-reduction Score

A dietary diabetes risk-reduction score was calculated to quantify adherence to a type 2 diabetes–prevention diet using the approach described in the literature.[4] Briefly, all participants were divided into 5 strata based on quintiles of dietary intake of each component. For favorable components (i.e., cereal fiber, ratio of polyunsaturated to saturated fatty acids, coffee, and nuts), participants in the highest stratum were awarded 5 points and those in the lowest stratum were awarded 1 point; in contrast, for unfavorable components (i.e., glycemic index, trans-fatty acids, red and processed meat, and sugar-sweetened beverages), participants in the highest stratum were awarded 1 point and those in the lowest stratum were awarded 5 points (Web Table 1, available at https://doi.org/10.1093/aje/kwab265). An individual's dietary diabetes risk-reduction score was calculated as the sum of points for each dietary component, with a range of 8–40 points. Higher scores suggest greater adherence to the diet. Glycemic index was calculated as described previously.[19] Notably, in this study, sugar-sweetened beverages referred to soft drinks or fruit drinks, and cereal fiber referred to insoluble fiber. In addition, given that higher consumption of fruits and vegetables has been identified to be associated with a lower risk of type 2 diabetes,[20] we calculated a modified dietary diabetes risk-reduction score by regarding these 2 foods as favorable components (Web Table 2).

In the PLCO Cancer Screening Trial, food or nutrient intakes, including those used for the calculation of dietary diabetes risk-reduction score, were evaluated at the study baseline through the DHQ. The DHQ is a 137-item self-administered food frequency questionnaire designed for evaluating food and supplement consumption over the past year; its validity had been confirmed elsewhere.[21] Daily food consumption for each participant was estimated by multiplying food frequency by serving size; daily nutrient intake was calculated based on 2 nutrient databases, namely US Department of Agriculture's 1994–1996 Continuing Survey of Food Intakes by Individuals[22] and Nutrition Data Systems for Research.[23]

Outcome Assessment

Mortality status of each participant was confirmed predominantly through a mailed annual study update form. Participants failing to return this form were contacted repeatedly by telephone or e-mail. Moreover, mortality status was adjudicated by periodic linkage to the US National Death Index. The ninth revision of International Classification of Diseases was applied to define the underlying causes of death obtained from death certificates: cardiovascular disease (codes 390–459) and cancer (codes 140–209).

Covariate Assessment

Age at DHQ completion, alcohol consumption, single or multivitamin supplement use, and food consumption were collected with the above-mentioned DHQ. Of note, dietary intakes of foods and nutrients were adjusted for energy intake from diet with the residual approach[24] before data analysis. Physical activity level was defined as total time of moderate to vigorous activity per week, and was assessed through a self-administered supplemental questionnaire. Healthy Eating Index 2015 and the plant-based diet index were computed as described in the literature.[25,26] Sex, ethnic group, marital status, body weight, height, educational level, smoking status, history of hypertension, family history of cancer, and aspirin use were collected with a self-administered baseline questionnaire. Body mass index was calculated as body weight (kg) divided by height squared (m2).

Statistical Analysis

To minimize potential biases and maximize statistical power, multiple imputation with chained equations was applied to impute missing data under the assumption that data were missing at random (the number of imputations = 25);[27] all variables involved in data analysis were applied to yield imputed data sets. Web Table 3 shows the distribution of covariates with missing values before and after multiple imputation. Main data analyses were repeated for participants with complete data to determine the potential influences of data imputation on our results.

To evaluate the associations of the dietary diabetes risk-reduction score with all-cause and cause-specific mortality, hazard ratios (HRs) and 95% confidence intervals (CIs) were calculated using a Cox proportional hazards regression model, with follow-up time as time metric. In our study, follow-up time was calculated as the difference between DHQ completion date and death date, loss to follow-up, or the end of follow-up (December 31, 2015), whichever came first (Figure 2). In regression models, the dietary diabetes risk-reduction score was split into quintiles, with the first quintile as the reference group. For examining linear trends in risk estimates across quintiles of dietary diabetes risk-reduction score, the median of each quintile was assigned to each participant in the quintile at first to yield an ordinal variable, which was then treated as a continuous variable in regression models for testing its significance. No evidence suggesting the violation of the proportional hazards assumption was found, using the Schoenfeld residuals method (all P values for global test >0.05). Covariate selection for multivariable regression was based on the change-in-estimate approach[28] and our knowledge of the existing literature. Specifically, model 1 adjusted for age and sex; model 2 further adjusted for ethnic group, trial arm, educational level, marital status, history of hypertension, family history of cancer (only for all-cause and cancer mortality), aspirin use, single or multivitamin supplement use, smoking status, alcohol consumption, body mass index, physical activity, and energy intake from diet; and model 3 further adjusted for consumption of fruits, vegetables, tea, fish, and dairy. We also performed an analysis treating body mass index as a time-varying covariate (model 4). Moreover, we also calculated absolute risk difference in mortality rate per 10,000 person-years for each HR from the above Cox regression analysis and the below subgroup analysis using the method described in the literature.[29]

Figure 2.

The timeline and follow-up scheme for this study evaluating a type 2 diabetes–prevention diet and multiple causes of mortality, a post hoc analysis of the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial, United States, 1993–2009. Note that the time span between 2 events represents the average value of all subjects.

Prespecified subgroup analyses were performed after stratifying for age (≥65 vs. <65 years), sex (male vs. female), trial group (intervention group vs. control group), history of hypertension (yes vs. no), body mass index (≥25 vs. <25), smoking status (current or past vs. never), and alcohol consumption (heavy vs. no, light, or moderate). For men, we defined light, moderate, and heavy alcohol consumption as ≤6 g/day, >6 and ≤28 g/day, and >28 g/day, respectively; for women, we defined light, moderate, and heavy alcohol consumption as ≤6 g/day, >6 and ≤14 g/day, and >14 g/day, respectively.[30] A P for interaction was estimated by comparing models with and without multiplicative interaction terms prior to performing the above-mentioned subgroup analyses to avert the possible spurious subgroup differences.

Sensitivity analyses were performed to determine the stability of our results: 1) including participants with a history of cancer, stroke, heart attack, or diabetes at baseline; 2) excluding deaths observed within the first 5 years of follow-up to determine the possibility of the observed association resulted from reverse causation; 3) excluding participants with implausible energy intake from diet, defined as <800 or >4,000 kcal/day for men and <500 or >3,500 kcal/day for women;[31] 4) repeating analyses with a competing risk regression model (only for cause-specific mortality) to evaluate the potential influences of competing risk bias; 5) adjustment for propensity score on crude model (all covariates included in model 3 were applied to calculate propensity score with logistic regression); 6) additionally adjusting for Healthy Eating Index 2015 or plant-based index in model 3 to test whether the observed associations were mediated by diet quality, and 7) additionally adjusting for intakes of polyunsaturated and saturated fatty acids per reviewer's suggestion.

To determine the main contributor(s) of the type 2 diabetes–prevention diet, we examined the association between each component of this dietary pattern and the risk of death separately. Statistical analyses were conducted with STATA software (version 12.0; StataCorp LP, College Station, Texas). The statistical significance level was set at P < 0.05 under a 2-tailed test.