Targeted Next-Generation Sequencing: The Clinician's Stethoscope for Genetic Disorders

Jan Haas; Ioana Barb; Hugo A Katus; Benjamin Meder


Personalized Medicine. 2014;11(6):581-592. 

In This Article

Benchmarking of Selective Circularization Enrichment Against Hybridization

To evaluate the performance of one of the relatively new circularization methods, we conducted a comparative study in the clinical context of cardiomyopathies. We opted for Agilent's SureSelect technology,[14] which is one of the most commonly used hybridization-based approaches and the recently introduced Haloplex Custom Kit,[15] as a representative of selective circularization.

Our study was approved by the local ethic committee and patients involved have given written informed consent. The design included split samples from 10 idiopathic dilated cardiomyopathy patients. For enrichment, standard protocols were used requiring 3 μg (SureSelect XT V.1.1.1. January 2011) or 225 ng (Haloplex V. D3, December 2012). Sequencing was performed on an in-house Illumina HiSeq 2000 system. Target region design in eArray for Haloplex was based on RefSeq, Ensembl, CCDS, Gencode and VEGA databases and included a total of 2066 targets (508,105 bp), representing 120 genes. For SureSelect, the ENSEMBL database as well as the 'University of California Santa Cruz' (UCSC) genes prediction track, which is based on data from RefSeq, Genbank, CCDS, UniProt, Rfam and the tRNA Genes track was used. The resulting 1588 target regions (496,183 bp) designed by SureDesign covered 84 genes. Of those, 77 genes completely overlapped between both designs with in total 333,325 bases, representing 66% of the Haloplex and 67% of the SureSelect target region. In the mean, 58,910,498 (Haloplex) and 57,007,123 (SureSelect) raw fastq-reads were mapped against the human genome (hg19) using the burrows-wheeler alignment tool (BWA v. 0.59-r16).[64] To increase mapping around INDELs, a local realignment was performed with the Genome-Analysis-Toolkit (GATK v. 1.5-21-g979a84a).[65] By this approach, we were able to map 46% (Haloplex) and 21% (SureSelect) of the raw sequence reads to the respective target regions. In the case of SureSelect, we marked duplicate reads (Picard-tools 1.56[66]). For variant calling, we relied on the HaplotypeCaller of the GATK-package v. 2.7-2-g6bda569. Raw variant calls were filtered as suggested by GATK-best-practices v.4 for small target regions. For the comparison of variant detection between Haloplex and SureSelect, we only considered variants in the overlapping target region. Using BEDTools,[67] we extracted all genetic variants for each patient sample that have been identified through both methods (shared variants) and those unique to Haloplex or SureSelect. SNPs and INDELs were analyzed separately. A more detailed overview on variant calling and interpretation can be found elsewhere.[68–70]

Both enrichment methods were able to efficiently enrich the overlapping target region. Estimated enrichment factors were in the mean 4400-fold (SD ± 1613) for Haloplex and 2000-fold (SD ± 488) for SureSelect. The important measure, the percentage of 20-fold coverage, was 97.4% for Haloplex and 99.5% for SureSelect. Detailed analyses underlined the less uniform enrichment of the Haloplex solution, having clusters of excess enrichment and regions without enrichment.

Regarding the detection of variants, both methods showed considerable overlap. The results over all patients are given in Figure 2, a summary is provided in Figure 3. We next focused on the nonoverlapping variants, which might best indicate problems regarding correct mutation calls in the investigated target region. For 30.8% of the unique SureSelect variants, there was no or insufficient coverage in the Haloplex data sets, underlining a capture deficiency across certain targeted genomic intervals. In contrast, none of the unique Haloplex variants had a missing coverage in the SureSelect data sets. To further evaluate whether the variants found by only one of the methods are correctly called genetic variants, we randomly selected unique SNVs and INDELs of various kinds (exonic, intronic, synonymous, nonsynonymous, stop, 3UTR) for each method and validated them by Sanger sequencing. Out of the variants uniquely found with the Haloplex method, only a single SNV (from 10 which were Sanger sequenced; 10%) and two INDELs (from 5; 40%) were confirmed by Sanger sequencing. By contrast, five for SureSelect unique SNVs (from 8; 62,5%) and five INDELs (100%) could be confirmed by Sanger. These results suggest that at least for the selected target region and workflow and in a small cohort of n = 10 patients, SureSelect delivers a more homogeneous target coverage and better base-call quality compared with Haloplex.

Figure 2.

Target enrichment comparison. Illustrates the shared (green) and unique variants (SNVs and INDELs) detected using the SureSelect system (yellow) and the Haloplex method (blue) for 10 investigated dilated cardiomyopathy patients.

Figure 3.

Target enrichment comparison. Illustrates the mean number of shared (green) and unique variants (SNVs and INDELs) detected using the SureSelect system (yellow) and the Haloplex method (blue).

With SureSelect currently delivering the better quality in regard to correct genotype calls, we next compared the results of our workflow/protocol to a gold-standard reference sample/genotypes (Hapmap sample NA 12878; National Institute of Standards and Technology, NIST).[71] We found the data sets to share 167 SNPs and 10 INDELs. Only two NIST SNPs could not be detected by the applied pipeline. On the other hand, we found one SNP and one INDEL to be unique in our calls. Whereas the novel deletion could be detected by direct Sanger sequencing of the amplicon (Supplementary Figure 1; see online at, the initial electropherogram for the SNV was not conclusive (Supplementary Figure 2 After subcloning of the NA12878 sample around this variant, we were able to confirm the variant by sequencing individual clones. Based on these results, we calculated for the variant calls (SNV and INDEL) an overall sensitivity of 98.9% and an overall specificity of 100% for the presented cardiomyopathy gene panel, underlining its excellent performance.