Targeted Next-Generation Sequencing: The Clinician's Stethoscope for Genetic Disorders

Jan Haas; Ioana Barb; Hugo A Katus; Benjamin Meder

Disclosures

Personalized Medicine. 2014;11(6):581-592. 

In This Article

Translation of NGS Into the Clinics

Next-generation sequencing (NGS) technologies have rapidly evolved over the past decade.[6] Today, a plethora of systems using different technological concepts are in the market covering applications such as whole-genome, whole-exome, panel-sequencing, RNA-seq, miRNA-seq, Chip-seq or bisulfit-sequencing.[7] The predominately used platforms use emulsion-PCR or bridge-PCR prior to pyrosequencing, sequencing by ligation or sequencing by synthesis.[8]

NGS is still predominantly used in a research setting, which is due to the lack of guidelines for sample processing and data analysis as well as missing or insufficient measures for quality of the sequencing results. The American College of Medical Genetics and Genomics (ACMG) has made considerable efforts to establish such guidelines and benchmarks for clinical NGS testing, but it is difficult to compare results of the different platforms and techniques.[8–12]

Only recently, the US FDA released the first marketing authorization for an NGS instrument (Illumina's MiSeqDx), paving the way for a routine clinical use of NGS by eliminating the need for post-NGS sanger-validation.[13] With its limited sequencing capacity, it becomes obvious that target enrichment methods are inevitable to utilize such an instrument. One of the major problems is the underlying diversity in enrichment techniques, which is reflected by the variety of commercially available methods and predefined target enrichment gene panels, which are at present offered by several vendors for different groups of genetically heterogeneous disorders. Table 1 lists examples of currently available enrichment technologies.[14–20]

In order to translate NGS technologies into the clinics, we were among the first groups to utilize microarray-based target gene enrichment in a study on patients with dilated and hypertrophic cardiomyopathy. We successfully enriched 47 genes (0.273 Mb) and subsequently used NGS on the SOLiD platform to reliably detect cardiomyopathy causing mutations in a rapid and cost-efficient manner.[21] Several other groups have successfully evaluated similar approaches for cardiomyopathy patients using arrays,[22,23] filters,[24] in-solution capture[25] or PCR enrichment.[26] These and other studies suggest a good feasibility of using targeted NGS as a powerful diagnostic test[27] in a clinical environment, however, at the same time they again show the diversity of currently used approaches. In the following paragraphs we provide an overview on existing and emerging techniques that may be most suitable for clinical tests.

Technologies for Target Enrichment

Target enrichment refers to the selective capture or amplification of specific genomic regions of interest prior to massive parallel sequencing. Compared to whole-genome sequencing, such a targeted approach aims to screen a limited set of genomic loci with high-enough depth, which is a prerequisite for accurate variant identification.[28] In contrast to whole-exome or whole-genome sequencing, targeted sequencing is limited in the possibility to discover new disease relevant genes and functionally important intergenic or intronic variants. However, focusing on the relevant genes offers the possibility to analyze high numbers of patients to comparatively low costs and shorter processing time. The reduced amount of data per sample also facilitates downstream analysis and eases data storage to manageable proportions.[29] Moreover, the accidental discovery of additional mutations or variants of unknown significance or for diseases not in the scope of the current clinical investigation is less likely to occur, ultimately avoiding ethical dilemmas which arise, when such findings need to be interpreted and communicated to the patient.[30] Various target enrichment techniques[31–41] have been developed and are suitable for a broad range of applications and for different sequencing platforms.[8] According to the basic principle, they rely on PCR-driven methods, hybrid capture of genomic fragments and targeted circularization-based techniques. Essential for almost all platforms is the introduction of multiplexing by incorporating indexing adaptors. After the sequencing run, they allow assignment of each sequencing read to each of the individual sample. An overview of those technologies is given in Figure 1.

Figure 1.

Target enrichment methods. Displayed are the hybridization-based approaches (A) performed on an array surface (left), in solution (middle) or on a filter (right). (B) Selective circularization techniques using selectors (left) and MIPs (right).
MIP: Molecular inversion probe

PCR-based Enrichment Methods. PCR has proven to be well-suited as amplification prior to Sanger capillary electrophoresis,[42,43] since both techniques work on a similar scale of throughput. However, in order to meet the requirements of massive parallelization, various PCR-based methods were developed in order to adapt to the higher throughput.[16,17,44–47] When facing the challenge of large-scale amplicon production, two different PCR methods can be used: uniplex, long-range PCRs or multiplex, short-range PCRs. In both cases, before pooling of the PCR products, their concentrations must be normalized, in order to obtain a uniform coverage during sequencing.[31] Long-range PCR produces very long, tiled amplicons (e.g., up to 20 kb for SeqTarget from Quiagen) across the targeted genomic intervals that are chosen to undergo shot-gun library preparation. The method is relatively easy to apply, requires no special equipment and reduces the number of necessary primer pairs.[16,48] A problem encountered when using long-range PCR is the uneven coverage despite normalization and equimolar pooling of the amplicons.[49] Harismendy et al. argued that this phenomenon occurred due to the over-representation of the product ends and showed that the use of 5′-blocked primers alleviates this effect.[49] Groups from different fields have successfully used long-range PCR in combination with NGS to detect underlying mutations for different pathologies such as breast and ovarian cancer (BRCA1/2)[50] or autosomal-dominant retinitis pigmentosa (12 different genes).[51] For most clinical applications however, hybridization-based methods or short-range PCR seem to be more suited, as they focus to the directly relevant regions.

In contrast to long-range PCR, short-range multiplex PCRs can eliminate the need for DNA shearing, if the product sizes fulfill the requirements of the utilized sequencing platform. In this case, the platform's specific adapters can be incorporated in one of the PCR primers and amplified during PCR.[48] This approach requires significantly less time to produce the target DNA library needed for sequencing. It is noteworthy that in this instance, the target enrichment kit is usually specific for the used sequencing platform (e.g., Ion AmpliSeq from Life Technologies[18]). When using such a multiplexed approach, a different problem might occur. Multiplexing a PCR reaction requires a multitude of primer pairs in a single reaction tube. However, using a large number of primer pairs in a single reaction can lead to numerous interactions between the primers, which can result in undesired fragments being amplified at the expense of the target region, hence leading to a decrease in the specificity of PCR.[52] Newer concepts of highly multiplexed PCR emerged, for example, Gene-Collector,[44] megaplex PCR[46] or nested patch PCR.[45] For instance, Fredriksson et al. succeeded in enriching all coding regions of 10 cancer genes in one single, 170-plex PCR reaction, by using collector oligonucleotides, which guided the self-circularization process of those amplicons carrying corresponding primer pairs.[44] All specifically amplified products could hereby be separated from the rest. A different approach aiming to avoid nonspecific products was developed by Meuzelaar et al., who immobilized primer pairs to a solid surface in order to physically separate them from each other.[46]

In 2009, Tewhey et al. described the microdroplet-based PCR amplification.[32] This approach offers the possibility to compartmentalize a highly multiplexed PCR using the microdroplets of a water and oil emulsion. In order to achieve this, the DNA sample must be dispensed among multiple droplets, which then subsequently merge with a different droplet population containing the individual primer pairs. The resulting droplets contain both, the template DNA and one primer pair, as well as the necessary amount of reagents, thereby housing a singleplex PCR reaction, which is isolated from the rest.[32] This approach overcomes critical problems associated with multiplexing, such as primer interactions, mispriming or competition for the same reagent pool. At present, the targeted DNA sequencing platform from RainDance Technologies offers the possibility to successfully use up to 20,000 primer pairs in one experiment.[17] Excellent results using microdroplet-PCR followed by NGS have been achieved for the diagnostics of several heterogeneous disorders, such as congenital muscular dystrophies,[53] hearing loss[54] or mitochondrial disorders.[55] Furthermore, compartmentalization of a highly multiplexed PCR can also be achieved by performing several thousands of individual reactions in separate chambers inside a microfluidic chip[48] (e.g., Access Array System from Fluidigm[47]). Such an approach has also been successfully utilized in the context of clinical diagnostics, for instance in the case of familial hypercholesterolemia.[56]

Hybridization of Genomic Regions of Interest. In contrast to PCR-driven methods, where the enrichment process is performed on high molecular DNA, the hybridization approach relies on a shotgun library construction prior to target capture.[33–36] The library preparation usually starts with the random shearing of genomic DNA, followed by end repair and ligation of sequencing platform-specific adapters to each DNA fragment.[12] The resulting library is then hybridized to capture oligonucleotides, which are either solubilized[36] or bound to the solid surface of an array plate.[33–35]

One of the first hybridization-based capture approaches used custom high-density oligonucleotide arrays and was developed as an alternative to PCR-based enrichment methods.[33–35] Okou et al. used a microarray-based genomic selection protocol and tested it by sequencing two X-chromosomal genomic regions of different sizes (50 and 304 kb).[35] Their experiments had a total base-calling rate of 99.1%, a reproducibility of 99.98% and an accuracy of 99.81%, hereby demonstrating the efficiency of this approach. Also, Hodges et al. established a high-density microarray-based method, capable of capturing all coding exons using 7 arrays (24,000–30,000 exons per array plus 37,000 exons representing alternative gene transcripts for the seventh array), yielding a specificity of 55–85%, a sensitivity of 40–78% and an average 237-fold enrichment for their method.[34] Two years later, Gnirke et al. first described the in-solution hybrid capture approach using long RNA baits.[36] Their method used programmable microarrays to synthesize 200 bases long oligonucleotides, which are subsequently released from the array and then PCR amplified. The probes undergo in vitro transcription to produce biotinylated, 170 bases long RNA capture probes, comprising target complementary sequences. After hybridization, streptavidin-decorated magnetic beads bind to the biotin-labeled RNA probes, hereby capturing the target DNA fragments, which can subsequently be isolated from the rest by using a magnetic field. Unbound DNA fragments are washed out and the captured DNA library is eluted from the beads and PCR-amplified to finalize the library preparation. Mamanova et al. compared the performance of in-solution and on-array capture and found similar results for the two approaches.[31] Only for larger target sizes (e.g., 3.5 Mb), the in-solution technique managed to outperform the array, by yielding a higher specificity and uniformity.[31] Despite a similar performance in the enrichment, working with arrays on a high number of patient samples in parallel is difficult due to the required instrumentation. Here, in-solution enrichment can be parallelized more easily. Furthermore, the DNA amounts required per array exceed those needed for in-solution capture (e.g., for a 30 Mb long target: 15–20 vs 3 μg[31]). Several vendors offer target enrichment kits based on in-solution hybridization, for example, SureSelect from Agilent Technologies[14] or SeqCap EZ from Roche Nimblegen.[19]

A less known hybridization-based enrichment strategy uses filters, marked with amplicons, to capture targeted DNA libraries.[37] To achieve this, the selected target regions must be PCR amplified, quantified and pooled in equimolar amounts, hereby generating target groups, which are subsequently ligated to DNA concatemers. In order to produce the capture filters, the resulting concatemers are bound to a nitrocellulose membrane. By adding the sample DNA library to it, the target fragments hybridize to their complements on the filter, hereby completing the capture process. Afterward, the enriched DNA library is eluted from the filter, amplified and ready to be sequenced.

In summary, the great advantage offered by the hybridization approach is the possibility to define a target region up to 50 Mb, making it possible to capture numerous genes in a single reaction more efficiently than PCR.[15] Furthermore, the scalability of the hybrid capture techniques outranks most of the other methods. Especially when performed in-solution, they proved to be very effective in enriching large target regions, up to the whole exome, for a large number of samples.[57]

Selective Circularization Technology. A new class of target enrichment methods rely on the selective circularization of the regions of interest prior to amplification, by using either selectors[41,58,59] or molecular inversion probes (MIPs), also known as gap-fill padlock probes.[38,39] Selectors are oligonucleotides, comprising a central linker and two target-specific ends, designed to hybridize to the genomic regions of interest.[41,58,59] In contrast to on-array- or in-solution-based hybridization methods, restriction enzymes are used to perform digestion of the genomic DNA to make it suitable for selector probes. Then, in principle the DNA binds to the complementary selector probes, leading to the selective circularization of the target regions, followed by filling up of the remaining gap by so-called vectors which are reverse-complementary to the central linker. Finally, two enzymes (a polymerase and a ligase) are used to cleave the remaining DNA overhangs, and to close the circular complexes, which at this point comprise the targeted genomic region and the sequence motif of the 'backbone'. Next, exonucleases remove the unspecific uncircularized fragments, before the final PCR amplification of the circularized targets takes place.[40] One important advantage of this approach is the possibility of using one single primer pair in the subsequent multitemplate PCR reaction by binding each target region to a common linker.[52] In this way, previously described drawbacks of multiplex PCR, such as nonspecific amplification or primer dimerization, can be effectively overcome.

Another selector method uses multiple displacement amplifications instead of PCR for amplification, hereby reducing the capture oligonucleotides' length, because primer incorporation in the selector probes is no longer necessary.[41] Notably, the selector technology offers the possibility to circumvent the time-consuming library construction process, if the generated amplicons meet the requirements (e.g., read length) of the dedicated sequencing platform. However, in this case, specific adapters and barcodes must be included in the vectors and amplified during PCR; for example, the Haloplex method from Agilent uses such an optimized approach, eliminating the need for library preparation and hereby increasing the processing speed of multiple samples.[15]

MIPs are circularizing, single-stranded oligonucleotides that are composed of an internal common sequence flanked by two target-specific ends.[60] When the probes' ends hybridize to the DNA segments flanking the target region, the probe takes a circular shape, leaving a gap over the DNA fragment, which will subsequently be filled by a polymerase and completed by a ligase. This is followed by the enzymatic digestion of uncircularized fragments and amplification of the remaining products. Depending on the gap size, MIPs can be used for different applications, such as SNP genotyping[61] or the parallelized capture of numerous exons.[38,62] The latter shows MIPs' potential as a target enrichment method compatible with NGS, due to its significant multiplexing capacity and its high specificity. Several optimizations of the initial protocol included a longer hybridization and gap-fill time, higher ligase and MIP concentrations, which resulted in an improved capture performance, as well as the elimination of one time-consuming step, namely library preparation.[39] Umbarger et al. successfully used MIPs to develop a carrier screening test, capable of analyzing 15 genes in order to determine a couple's risk factor for conceiving a child with a recessive genetic disease.[63] The main strength of the selective circularization approach is the very high specificity, which cannot be acquired using a conventional hybridization strategy, because of nonspecific interactions which might occur.[48] Moreover, compared with the hybridization approach, the DNA input requirements are significantly lower, for example, 200 ng gDNA (Haloplex) compared with 3 μg gDNA (SureSelect), needed for shotgun library preparation.[14,15,52] However, the multiplexing capacity is in between hybridization-based techniques and PCR-based enrichment.[52] The encountered nonuniformity of enrichment by MIPs is by no means random, but highly reproducible for individual targets, which allows adjustment for this effect.[52]

processing....