Study Population and Design
We conducted 2 analyses by using electronic health records (EHRs) from Geisinger, a health system serving central and northeastern Pennsylvania. The goal of the first study (hereinafter, the hospitalization study) was to evaluate associations of diabetes and severe COVID-19 outcomes. We conducted a retrospective cohort study of all patients hospitalized with COVID-19 through December 31, 2020, in 10 Geisinger hospitals. The goal of the second study (hereinafter, the utilization study) was to measure the impact of the COVID-19 pandemic on diabetes care. We included all Geisinger patients with diabetes residing in a 37-county region who had at least 1 clinical encounter between 2017 and 2020 (Appendix Figure 1). We identified individuals with diabetes based on encounter diagnoses, diabetes-relevant medication orders, and laboratory test results, as described previously.
The hospitalization study included 5 COVID-19 outcomes: death during hospitalization (up to 120 days after admission) or after hospitalization (up to 220 days) (yes vs no); ICU admission (yes vs no); required mechanical ventilation (yes vs no); D-dimer, a biomarker for thromboembolism (≥0.5 μg/mL vs lower); and troponin level, a biomarker for myocardial damage (elevated vs lower; elevated defined as ≥22 ng/L in men and ≥14 ng/L in women). For the utilization study, we measured weekly rates per 1,000 patients of emergency department (ED) visits, outpatient encounters (including telehealth), hemoglobin A1c (HbA1c) tests, and antihyperglycemic medication orders from January 1, 2018, through December 31, 2020. The denominator for the rates included anyone who was alive at the beginning of the measurement year and met diabetes criteria by the end of the measurement year (2018, 2019, or 2020).
Geocoding and Community Measurement
We obtained patient addresses from the EHRs and geocoded them to the street level by using ArcGIS World Geocoding Service in ArcMap version 10.4 (Esri) and assigned each patient to an administrative community type based on the residential location. This previously described approach uses boundaries from Pennsylvania's minor civil divisions and city census tracts to create 3 community types: townships (rural areas to low-density suburbs), boroughs (small towns), and city census tracts, representing a continuum of lower to higher population density (see Appendix). We also classified residential addresses into the US Census Bureau's categories of urbanized areas, urban clusters, and rural.
For each community type, we measured CSD based on 6 indicators from the American Community Survey (2015–2019): percentage of unemployed, with less than a high school education, below poverty level, on public assistance, not in the workforce, and of households without a car. A previously described factor analysis demonstrated an adequate model fit of these indicators to a single factor and supported the use of an equally weighted scale based on the sum of the z transformed values of these indicators. We have previously reported an association between this CSD measure and diabetes onset. We quartiled the index, with the highest quartile representing the most socioeconomically deprived communities.
Statistical Analysis — Hospitalization Study
The analysis goals were to determine the association between diabetes and each of the 5 COVID-19 outcomes and evaluate whether administrative community type and CSD modified these associations. We first evaluated bivariate associations between individual-level characteristics, administrative community type, and CSD and each of the COVID-19 outcomes. Next, we used logistic regression models with a random intercept for community to estimate the odds ratios (ORs) and 95% CIs for the COVID-19 outcomes.
For each outcome, we evaluated a series of 4 models adjusted sequentially to evaluate potential confounders. We then evaluated effect modification of the association between diabetes and COVID-19 outcomes by CSD and community type by adding cross-products between diabetes and administrative community type or CSD to the models. Global test P values were calculated to compare each model with all cross-products to a model with none. The series of models and their covariates are presented below.
Model 1 included age (years; centered linear, quadratic, and cubic terms to allow for nonlinearity), sex (female vs male), race (Black, all other races [Asian, Native Hawaiian or other Pacific Islander, American Indian or Alaska Native] vs White), ethnicity (Hispanic vs non-Hispanic), Medical Assistance, also known as Medicaid in Pennsylvania, as a surrogate for family socioeconomic status (ever vs never), and time period in which hospitalization occurred (early: March to May; middle: June to September; late: October to December 2020). We collapsed races other than White and Black into a single category, all other races, because of small sample sizes. We included time period, based on our hypothesis that COVID-19 outcomes may have improved as the health system learned more about how to manage the disease. In model 2 we added the following comorbid diseases and community features one at a time to both evaluate their associations with COVID-19 outcomes and to determine whether they confounded diabetes associations: chronic kidney disease (vs none); chronic lung disease (vs none); resides in an institutional setting (eg, nursing home) (vs not); administrative community type (borough or city census tract vs township); and CSD (quartiles 2, 3, or 4 vs 1). In model 3 we added both chronic kidney disease and chronic lung disease to model 1 and in model 4 we added institutional setting (nursing homes) to model 3. We conducted sensitivity analyses examining death after discharge and repeated models replacing administrative community type with the urbanicity measure (urbanized areas or urban clusters vs rural).
Statistical Analysis — Utilization Study
The goal was to determine whether administrative community type or CSD modified the impact of the COVID-19 pandemic on diabetes care. We conducted an interrupted time series analysis for each of the 4 utilization outcomes: weekly rates per 1,000 patients of HbA1c tests, antihyperglycemic medications orders, ED visits, and outpatient (including telehealth) visits. An interrupted time series design measures data at multiple time points before and after the introduction of an intervention, in this case the start of the COVID-19 pandemic in Pennsylvania, to examine the effect of the intervention.
We first explored multiple iterations of generalized linear models. After observing high dispersion by using Poisson regression models, we used negative binomial models. We conducted diagnostic checks, including serial residual plots and correlograms, which showed nonnegligible serial autocorrelation. We then added harmonic terms to account for seasonal trends in utilization, but diagnostic checks still showed nonnegligible serial autocorrelation. Thus, we used autoregressive integrated moving average (ARIMA) time-series models of utilization rates to account for the autocorrelation.
The study period was January 1, 2018, through December 31, 2020, and the intervention period for the models was from March 16, 2020, through the end of 2020. On March 16, 2020, Geisinger implemented restrictions to elective and nonurgent procedures and Pennsylvania implemented statewide mitigation policies, including an initial stay-at-home order. We added linear splines at time points during the intervention period that we predicted could trigger a change in health care utilization: March 16, 2020; May 4, 2020, when Geisinger reinstated elective and nonurgent procedures; July 13, 2020, which marked an increase in state and national COVID-19 infection rates; and November 30, 2020, when Geisinger reinstated elective procedure restrictions, as ordered by the Pennsylvania Department of Health.
We fit ARIMA models that did and did not account for seasonal utilization trends and models with and without the splines after the start of the intervention period. Based on Bayesian Information Criterion, model fit was better for nonseasonal ARIMA models when the intervention period splines were included; thus, our final models were nonseasonal ARIMA models with 4 linear splines. Data preparation was done by using Stata version 16 (StataCorp LLC). Analyses were performed in R version 4.0.3 (R Core Team).
For each utilization outcome, we evaluated effect modification separately by administrative community type and CSD. Models included a main effect term for each level of the community variable and interaction terms with the following variables: an indicator for the intervention period, study week, and spline terms. ARIMA models allow for 3 parameters: "p," the number of autoregressive lags incorporated; "d," the number of past values subtracted ("differenced") from the current value; and "m," the number of lags over which errors from prior observations are incorporated in the current error. The auto.arima() function in R's forecast package (version 8.15) was used to determine the best ARIMA(p,d,q) order for the main effect models for each of the 4 utilization outcomes; d equaled 0 in all cases because no differencing was applied. We then applied the same ARIMA order to the models that evaluated effect modification by administrative community type and CSD. In sensitivity analyses, we repeated the models for each outcome, stratified by the urbanicity measure (urbanized areas or urban clusters vs rural).
Prev Chronic Dis. 2022;19(7):e44 © 2022 Centers for Disease Control and Prevention (CDC)