Accuracy and Efficiency of Deep-Learning–Based Automation of Dual Stain Cytology in Cervical Cancer Screening

Nicolas Wentzensen, MD; Bernd Lahrmann, PhD; Megan A. Clarke, PhD; Walter Kinney, MD; Diane Tokugawa, MD; Nancy Poitras, BS; Alex Locke, MD; Liam Bartels, BS; Alexandra Krauthoff, BS; Joan Walker, MD; Rosemary Zuna, MD; Kiranjit K. Grewal, MS; Patricia E. Goldhoff, MD; Julie D. Kingery, MD; Philip E. Castle, PhD; Mark Schiffman, MD; Thomas S. Lorey, MD; Niels Grabe, PhD


J Natl Cancer Inst. 2021;113(1):72-79. 

In This Article


Using a rigorous study design, we developed a novel deep-learning–based image analysis platform for automated evaluation of DS cytology. In a large population of women undergoing HPV-based cervical cancer screening, we show that automated evaluation of DS slides dramatically increases the efficiency of cervical cancer screening by substantially reducing unnecessary colposcopies compared with current standards and similarly achieves excellent performance in a simulated fully vaccinated population. Thus, CYTOREADER exceeds human diagnostic accuracy and serves as an example of AI achieving improvements beyond the automation of a human standard.

Our results demonstrate how automation and machine learning can transform cervical cancer screening that is currently undergoing major changes. HPV testing for cervical cancer screening is an objective and reliable approach directly linked to the carcinogenic process.[28] HPV-negative women are at very low risk of developing precancer or cancer over the next decade and screening intervals can be extended.[8–10] Yet most HPV infections are transient, and women require additional tests to decide who needs further evaluation or treatment.[11,12] Pap cytology is recommended and approved for triage of HPV-positive women but suffers from subjectivity, lack of reproducibility, and relatively low sensitivity.[14] Our previous study comparing manual DS to cytology together with the current results demonstrates that automated DS evaluation can supplant and improve the role of Pap cytology for triage of HPV-positive women and should also be evaluated for postcolposcopy and posttreatment surveillance.[16] Compared with Pap cytology, manual DS has higher accuracy and can provide longer reassurance against disease when a test is negative, while the risks to patients do not differ from Pap cytology, because the same sample type is used.[17,21] We previously showed that the few DS-negative CIN3s are more likely to have no HPV16/18 and no high-grade cytology, suggesting that these cases are less likely to progress.[16] Automated DS evaluation can provide a completely objective cervical cancer–screening approach, improving efficiency and reducing harms and cost related to false-positive screening results. Furthermore, by demonstrating that AI-based DS detection works for anal cytology, we show the robustness of the imaging and analysis platform. Importantly, our approach is also suited for vaccinated populations, where it may achieve even higher specificity and counterbalance the lower disease prevalence in vaccinated women.[29]

Automated DS evaluation immediately quantifies the number of DS-positive cells on a slide, allowing tailoring positivity cutoffs for specific clinical decisions. Current guidelines give an option for immediate treatment in women with HSIL cytology, who have a very high probability of having underlying CIN3+.[30] A higher cutoff of DS-positive cells could be used to guide treatment decisions. Moving forward, additional criteria can be developed to expand slide assessment; for example, the presence of abnormal glandular cells to identify adenocarcinoma precursors, which is a particular challenge for Pap cytology.[31]

Digitization of glass slides paired with automated evaluation in the cloud can provide high-throughput triage of HPV-positive women with inherent objectivity. Furthermore, the functionality of CYTOREADER can provide an assisted diagnostics mode for evaluating DS slides. The automatic algorithm can be used for presenting all DS-positive cells found on a slide ranked by the likelihood that a cell is DS-positive to accelerate slide evaluation. Similarly, CYTOREADER can be used for quality control of a program that is based on manual DS evaluation.

Successful implementation of CYTOREADER requires an infrastructure for high-quality staining, full-slide scanning, and running the machine-learning algorithm. However, slide preparation, scanning, and slide evaluation can be geographically separated, providing high-quality cervical cancer screening and triage in locations that currently do not have infrastructure and training to achieve reliable DS evaluation given a reliable courier system is available. Compared with manual evaluation of DS slides, the automated evaluation requires access to scanning infrastructure but may require a smaller cytotechnology workforce. Scanners are increasingly available in pathology laboratories and can process large batches of slides with limited need for a skilled operator.[22,23] Studies are warranted to evaluate if DS is amenable to self-collected specimens, a sampling strategy that is important for low-resource settings. Future efforts also need to evaluate how long a negative automated DS result provides reassurance against precancer and how automated DS can be used in women undergoing surveillance.

We conducted a large, well-powered study to evaluate performance of automated DS for triage of HPV-positive women. However, some limitations should be noted. In contrast to the large KPNC study on HPV triage based on SurePath slides, 2 studies using ThinPrep slides were comparably small, and they were conducted in colposcopy/anoscopy populations. Future studies need to evaluate automated DS in larger HPV screening populations using ThinPrep slides. Also, the positivity and sensitivity of cytology at KPNC is much higher compared with other settings, which may affect the comparison of clinical efficiency estimates.

Our approach to train and validate both on the tile level and the slide level with ground truth disease endpoints sets our work apart from other deep-learning approaches in digital pathology that focus on replicating a subjective evaluation. We recognize that there is substantial subjectivity underlying histologic endpoints of cervical disease.[15] In our study, we minimized the impact by relying on the most reproducible correlate of cervical precancer, CIN3, as our primary endpoint for evaluation of triage of HPV-positive women. Our work also emphasizes the importance of integrating epidemiology and AI with the availability of population bases studies to improve medical diagnostics beyond automation. It has been proposed for a long time that "digital pathology" will become an important cornerstone of future health care. Despite this vision, image analysis currently does not contribute substantially to routine clinical practice and to the benefit of the patient. The automated evaluation of DS cytology slides has substantially improved accuracy and efficiency compared with Pap cytology and serves as an important example for introducing digital pathology and deep learning into clinical practice. This approach has the potential to substantially improve screening program performance, potentially affecting millions of women testing HPV-positive in cervical cancer screening each year.