Estimate of Burden and Direct Healthcare Cost of Infectious Waterborne Disease in the United States

Sarah A. Collier; Li Deng; Elizabeth A. Adam; Katharine M. Benedict; Elizabeth M. Beshearse; Anna J. Blackstock; Beau B. Bruce; Gordana Derado; Chris Edens; Kathleen E. Fullerton; Julia W. Gargano; Aimee L. Geissler; Aron J. Hall; Arie H. Havelaar; Vincent R. Hill; Robert M. Hoekstra; Sujan C. Reddy; Elaine Scallan; Erin K. Stokes; Jonathan S. Yoder; Michael J. Beach


Emerging Infectious Diseases. 2021;27(1):140-149. 

In This Article


We defined waterborne disease as disease in which water was the proximate vehicle for exposure to an infectious pathogen. Thus, diseases such as Legionnaires' disease (typically transmitted via inhaled water droplets containing Legionella bacteria) were considered waterborne. In contrast, arboviral diseases like malaria, for which standing water can increase the population of mosquitoes that transmit the parasite that causes malaria, were not considered waterborne. Algal toxins and chemical exposures were not considered. We determined the proportion of disease totals that were attributed to domestic waterborne exposure.

For this estimate, we chose diseases for which surveillance data, administrative data, or literature reports indicated that waterborne transmission for the disease in the United States was plausible, the disease was likely to cause substantial illness or death, and data were available to quantify associated health outcomes. Diseases included in this analysis were campylobacteriosis, cryptosporidiosis, giardiasis, Legionnaires' disease, NTM infection, norovirus infection, acute otitis externa, Pseudomonas pneumonia and septicemia, Shiga toxin–producing Escherichia coli (STEC) infection serotype O157, non-O157 serotype STEC infection, salmonellosis, shigellosis, and vibriosis (including infection by Vibrio alginolyticus, V. parahaemolyticus, V. vulnificus, and other species). To aid in quantifying the burden of respiratory diseases and enteric disease separately, we considered Legionnaires' disease, NTM infection, and Pseudomonas pneumonia primarily respiratory diseases, whereas we considered campylobacteriosis, cryptosporidiosis, giardiasis, norovirus infection, salmonellosis, and shigellosis primarily enteric diseases.

We employed methods similar to those of Scallan et al.[14,15] to estimate the number of illnesses, treat-and-release emergency department (ED) visits (i.e., visits in which the person was not admitted to the hospital), hospitalizations, and deaths attributed to waterborne transmission in the United States. We also quantified the direct healthcare costs of treat-and-release ED visits and hospitalizations, as measured by insurer and out-of-pocket payments. Our overall methods are described here; detailed methods are described in Appendix 1–3 (;;

Data were for 2000–2015. All estimates were based on the 2014 US population (318.6 million persons); 2014 was the most recent year for which data were available for all surveillance sources. Estimates were derived from statistical models; each model input had uncertainty represented by a distribution of plausible values. Inputs are described in Appendix 1 and more details on the modeling process are described in Appendix 2. All estimates were rounded to 3 significant figures.


The initial model input was the number of reported or documented cases of illness for each disease, selected hierarchically: data from active surveillance systems were preferred, passive surveillance data were used if active surveillance data were not available, and administrative data were used if no active or passive surveillance system for the disease existed (Table 1). Administrative data sources included the Health Care Utilization Project (HCUP) National Inpatient Sample (HCUP NIS) hospitalization database, the HCUP National Emergency Department Sample (HCUP NEDS) ED visit database, and, in the case of otitis externa, the National Ambulatory Medical Care Survey (NAMCS), which surveys visits to physicians' offices. These administrative data sources use complex sample survey weighting methods and are considered nationally representative. We multiplied the initial reported or documented number of cases for each disease by a series of multipliers that accounted for underreporting and underdiagnosis (including illness severity, medical care-seeking, likelihood of specimen submission, proportion of laboratories capable of performing a diagnostic test, and test sensitivity).

Emergency Department Visits

The surveillance systems used do not tally treat-and-release ED visits but do capture the proportion of patients hospitalized with a given disease; we combined this proportion with the ratio of treat-and-release ED visits for each disease (reported in HCUP NEDS) to hospitalizations for that disease (in HCUP NIS) to calculate the estimated proportion of reported cases with an ED visit. Although not all patients who visited the ED would have been reported or received a diagnosis, they were assumed to be more likely to receive a diagnosis than patients without an ED visit. Instead of applying the higher underdiagnosis factor used for illness, we used an underdiagnosis factor with a modal value of 2, consistent with previous estimates, and supported by a recent analysis comparing the incidence of bacterial gastroenteritis captured in surveillance and hospital discharge data.[14,22,23]


We applied the proportion of patients hospitalized according to surveillance data to the estimated number of reported cases to calculate the estimated number of reported hospitalized patients. If surveillance data were not available, the number of hospitalizations reported in HCUP NIS for a particular disease was used. Hospitalized case-patients were assumed to be more likely to have received a diagnosis than nonhospitalized case-patients. Instead of applying the higher underdiagnosis factor used for illness, we used an underdiagnosis factor with a modal value of 2, consistent with previous estimates, and, for some bacterial enteric diseases, supported by recent work.[14,22,23]


We applied the proportion of case-patients who died, as reported by surveillance data, to the estimated number of reported cases to calculate the estimated number of reported deaths. If surveillance data were not available, we used the method of Gargano et al..[24] In brief, we combined the number of in-hospital deaths for each disease reported in HCUP NIS with the number of out-of-hospital deaths reported in death certificate records. We assumed that patients who died were more likely have received a diagnosis than patients who did not die. Instead of applying the higher underdiagnosis factor used for illness, we used an underdiagnosis factor with a modal value of 2, consistent with previous estimates.[14,22]

Domestically Acquired Waterborne Disease

We used surveillance data, when available, to determine the proportion of persons with a given disease who traveled outside the United States during the incubation period. The remaining proportion of cases was considered domestically acquired. When this information was not available, we used literature estimates and expert consultation. We used recent attribution estimates for each disease (;[25] E.M. Beshearse, unpub. data), derived through structured expert judgment (SEJ), a formal process that answers questions for which data are sparse using expert opinions,[26,27] to determine the proportion of disease attributable to waterborne transmission.

Uncertainty Estimates

For each input and multiplier in the model, we used a distribution that accounted for low, high, and midpoint estimates. This distribution accounted for the uncertainty in each input and multiplier and facilitated calculation of uncertainty intervals for final estimates. For diseases with surveillance data available, we used the methods of Scallan et al. to produce model inputs.[14] For diseases with administrative data only (e.g., NTM infection and Pseudomonas pneumonia and septicemia), we used the mean hospitalization count from HCUP NIS and computed the illness count as the ratio of hospitalization count to hospitalization rate. We assumed the distribution of the hospitalization count to be normal, with the SD calculated from the reported 95% CI. As we did with surveillance data, we included the variation of hospitalization count over time in the model and assumed that the distribution for each multiplier followed the 4-parameter Program Evaluation and Review Technique (PERT) distribution,[28] with disease-specific parameter values based on available publications.

Uncertainty in the final estimates is a cumulative effect of the uncertainty of each model input. Each multiplier was generated independently. Using 100,000 iterations, we obtained distributions of counts and used them to generate point estimates of means and the corresponding 95% credible interval (CrI, the 2.5th percentile through the 97.5th percentile of the empirical distribution). We generated all-disease totals for each outcome by sampling from the distributions generated for each individual disease, using SAS 9.4 ( and R 3.5.1.[29]

Direct Healthcare Cost per ED Visit and Hospitalization

We used methods described previously[30,31] to calculate the direct cost of healthcare for ED visits and hospitalizations, using the 2012–2013 MarketScan research databases (IBM Watson Health, These databases contain de-identified insurance billing data for tens of millions of persons covered by private, Medicare (which covers primarily persons ≥65 years of age), and Medicaid (which covers primarily persons with low incomes or disabilities) health insurance plans and contain information on insurance and out-of-pocket payments for hospitalizations, ED visits, doctors' office visits, laboratory testing, and outpatient drug prescriptions. We used these data to calculate the sum of insurer and out-of-pocket payments per hospitalization or visit, by insurance source. We calculated a weighted cost per hospitalization or visit by multiplying the mean total payments for each insurance source by the proportion of cases with the insurance source in HCUP NIS or HCUP NEDS. We assumed that persons with other sources of health insurance (e.g., Tricare, the US military health insurance plan) or no health insurance have the same costs as persons with private insurance. For ED visit costs, we used the data described by Adam et al.,[30] except for norovirus infection (not examined by Adam et al.) and STEC O157 and non-O157 (categorized differently by Adam et al.) (Appendix 1).

Total Direct Health Care Costs of Domestically Acquired Waterborne Hospitalizations and ED Visits

We estimated the total direct healthcare cost of ED visits and hospitalizations attributed to waterborne transmission in the United States using the total number of ED visits and hospitalizations attributed to waterborne transmission in the United States. We multiplied these figures by the weighted average cost per ED visit or hospitalization, using 100,000 iterations, with uncertainty distributions as described (Appendix 1).