Rapid Validation of Whole-Slide Imaging for Primary Histopathology Diagnosis

A Roadmap for the SARS-CoV-2 Pandemic Era

Megan I. Samuelson, MD; Stephanie J. Chen, MD; Sarag A. Boukhar, MBChB; Eric M. Schnieders; Mackenzie L. Walhof; Andrew M. Bellizzi, MD; Robert A. Robinson, MD, PhD; Anand Rajan KD, MBBS


Am J Clin Pathol. 2021;155(5):638-648. 

In This Article

Materials and Methods

Case Inclusion and Scanning

We retrieved routine H&E slides from 180 surgical pathology cases from the files of the Department of Pathology at the University of Iowa Hospitals and Clinics. Cases with diagnoses rendered by each of the 5 participating study pathologists in the preceding 6-month to 2-week window were included (n = 36) such that 2 weeks or more had lapsed from the glass slide–based sign-out. Slides spanned the subspecialty disciplines of gastrointestinal (GI), gynecologic, head and neck, breast, genitourinary, and dermatologic pathology. Of the 36, two-thirds (n = 24) were "small" cases, including diagnostic biopsies and small resection specimens (eg, cholecystectomies), and the remainder (n = 12) were larger resections or multipart biopsies with a high number of slides per case. Within this framework, individual case selection was random and carried out such that the proportion of small and large cases would be roughly similar among pathologists and distributed evenly across subspecialties. During case search and retrieval, accession numbers for which slides were not on file or were comprised solely of frozen section slides were excluded. For each case, in line with CAP recommendations, frozen sections, special stains, and immunohistochemistry slides were removed by the adjudicating pathologist, and only those necessary and enough to arrive at the initial reported diagnosis were retained. Once assembled, slides were digitally scanned with the 20× objective (0.24μm/pixel resolution) by a P1000 Pannoramic scanner (3DHistech) by an imaging technologist. Glass slide markings for key findings and annotations (eg, lymph node status and count) were either removed at the scanning preview stage or after scanning, with the slide export function creating a second digital slide with only the manually selected area of interest. Quality control was performed by multiple personnel who examined digital slides for blurred areas, artifacts arising from scanner focusing errors, and tissue exclusion due to improper scanner thresholding (Figure 1, supplemental data; all supplemental materials can be found at American Journal of Clinical Pathology online). Slides with scanning errors were subject to rescan. Owing to the lack of a digital case manager, WSI slides were manually placed in subfolders named with the corresponding accession number and made available to study pathologists from a secure departmental network location. Brief clinical histories (ranging from 1 or 2 lines to a page) and gross descriptions including part labels and slide designations were automatically generated from the laboratory information system EPIC AP Beaker (EPIC), checked for formatting, and provided to the participants in the form of mock-up pathology requisition sheets. This process was carried out to closely replicate the original sign-out environment and to avoid logging into the pathology laboratory information system or electronic medical record to view details necessary for slide interpretation. The process is depicted in Figure 1.

Figure 1.

Case selection and exclusion and scanning quality control process used in the study. IHC, immunohistochemistry; WSI, whole-slide imaging.

Interpretation of Whole-slide Images

Digital slides were viewed using CaseViewer 2.3.0 (3DHistech). The study pathologists recorded diagnoses, including major pertinent histopathologic findings such as the presence of lymphovascular or perineural invasion, tumor distances from inked margins, and the presence and size of metastatic tumor deposits.

Adjudication and Scoring

Standardized methods for case arbitration and discrepancy assessment in WSI have been described in the literature.[18] Given the smaller volume of cases to be analyzed, we adopted single-investigator classification and subject matter expert consultation as a method of adjudication. Post–digital review debriefing with the study pathologists was performed to inform participants of the review outcomes. Cases were classified as concordant or discordant, exhibiting major or minor discrepancy with the original diagnosis. A major discrepancy was defined as a change in diagnosis that would potentially affect either patient therapy or management after biopsy or surgery.[19,20] Intraobserver agreement was calculated for each study pathologist counting both major and minor disagreements and counting major disagreements alone. Although concordance was assessed on a per-part basis, the full case was considered discrepant even if any part was classified as being so. In addition, disagreement rates were calculated for the full set of cases for each pathologist and for the whole group.


Case sampling was implemented to achieve blinding and randomization. Overall, 1,000 random samples (n = 18) with replacement were drawn for each pathologist from their total examined cases, and mean percentage of agreement and 95% CIs were computed. Likewise, 1,000 random samples (n = 90) were drawn from the full data set to compute mean percentage of agreement and 95% CIs for the whole sample set (see supplemental data for scripts used).

Validation Threshold Selection

As adopted by CAP, concordance (synonymous with intraobserver agreement) was the targeted metric, and the clinically reported diagnosis was set as the standard. CAP approves the medical director arriving at an acceptable intraobserver agreement rate in conjunction with the published data. In comparing the study sample set (n = 90) with available CAP data in which mean glass vs digital concordance rates range from 75% (n = 20 cases) to 91% (n = 200 cases), the upper end was selected to be 91%. Although familiar with many routine functions, the study pathologists were not previously trained formally in the use of WSI, and the included cases encompassed those with diagnosis rendered well over the CAP-recommended washout period of 2 weeks; both situations could potentially have effects on intraobserver concordance. The CAP data review found that pathologists who were trained in using WSI showed greater concordance that those who were not (89% vs 84%; ie, a difference of 5%) based on a study of dermatopathology specimens.[21] Regarding the rate of disagreement with a gold standard diagnosis, manual (glass) slide review in a large scale multicenter double-blind study of 2,045 cases[13] was found to exhibit a 3.20% rate of disagreement. Taking the above 2 factors into account, a target concordance rate for the present validation protocol was prespecified as 81% to 91% counting major discrepancies alone. The χ2 test was used to compare proportions of cases. Statistical analysis was performed in IBM SPSS version 26, and RStudio version 1.2.1335 running R 64-bit version 3.6.3.