Finding Missing Heritability Using Personalized Approaches in Chronic Disease
It has long been recognized that only a portion of heavy cigarette smokers ever develop COPD. This suggestion of a genetic component to the disease was verified by early epidemiological studies that demonstrated that abnormal pulmonary function was more common in relatives of COPD patients than in the general population. Twin studies further supported this genetic relationship and early segregation studies indicated that the most likely genetic model was one of many genes with small effects.[37,38] A single known monogenic determinant of COPD was first described in the early 1960s and is caused by mutations in the SERPINA1 gene, leading to deficiency in the protease inhibitor A1AT.[39,40] Patients with homozygous mutations in SERPINA1 have serum expression of A1AT at approximately 15% the level of patients without mutations, while patients with heterozygous mutations have variable reductions in serum A1AT. Patients with homozygous mutations have a significantly increased risk of airway obstruction and emphysema, particularly in the presence of smoking. Patients with heterozygous mutations have also been shown to have an increased risk of airway obstruction compared with those without any mutations. Although these reductions in serum A1AT have profound effects in patients with mutations in SERPINA1, a recent study found that less than 1% of patients with fixed airflow obstruction have homozygous mutations and less than 10% have heterozygous mutations.
In the last 10 years, genome-wide association studies (GWASs) have demonstrated a number of regions to be strongly associated with phenotypes of COPD, including FEV1, forced vital capacity (FVC), FEV1/FVC and both clinically and radiologically defined emphysema.[44–51] The discoveries in these studies have been reviewed in much greater detail elsewhere,[52–54] but briefly, only a few of the regions found in these studies have been replicated. Of the candidate genes that have been identified by being the closest to each of the four most frequently replicated regions, FAM13A, HHIP, CHRNA3/5/IREref-2 and CYP2A6, only HHIP has been validated at the molecular level. Furthermore, recent studies have indicated that there is significant heritability that remains unexplained by discoveries in GWASs. As hypothesized for other complex diseases, the 'common disease–common variant' hypothesis that is assumed in GWASs probably only explains a portion of the heritability of COPD, and biologically informative variants of multiple types remain to be found.
Even this brief history of the genetics of COPD demonstrates several challenges that we face in terms of better understanding the genetic basis of chronic disease. Monogenic variants leading to disease offer significant biological insight, but in chronic disease, tend to be responsible for only a small number of cases. GWASs have given us a number of candidate genes, but few of them have offered significant biological insight. This 'failure' of GWASs has been observed in multiple multifactorial diseases. As authors have pointed out in other reviews and commentaries, this 'failure' is not so much an inability of the studies to detect related variants or a lack of heredity in these diseases as much as inflated expectations that GWASs would identify the entire heritable spectrum of complex diseases.[57–59] Therefore, in order to improve our understanding of the genes involved in the development of chronic disease, it will be necessary to look for genetic variants that GWASs were not designed to detect. Rare variants and structural variation remain largely unstudied in most chronic diseases. Since the heritability of chronic disease has only been partially explained by GWAS results, these unstudied variants have the potential to offer insights into their pathogenesis, as well as add depth to the discoveries made by GWASs by revealing 'causal' variants.
Next-generation sequencing (NGS) allows for either the targeted detection of all variants in specific regions of the genome, such as in whole-exome sequencing, or the detection of all variants in the genome, such as in whole-genome sequencing. Since NGS utilizes massively parallel sequencing, comparisons of read depth can also be used in order to detect structural variation in the genome. Successes in monogenic disorders and cancer therapeutics demonstrate the power of NGS to increase our understanding of the genetic contribution to disease, and in some cases have offered cures for previously untreatable diseases. Although more challenging in chronic disease, some success has already been achieved using NGS, notably in Type 2 diabetes. In a large study looking at Type 2 diabetes risk, whole-genome sequencing identified four previously unreported variants involved in disease risk. Another study used targeted sequencing of 78 known genes from GWASs in order to identify rare variants in the regions around these genes that are likely to illuminate the role they play in the disease. Thus, NGS has the ability to both identify previously unrecognized variation that plays a role in chronic disease, and also to further clarify the role of regions identified in GWASs whose biological relevance remains unclear.
The largest barrier to NGS studies has been the high cost of sequencing, which is prohibitive to all but the top-funded consortia. With recent advances in sequencing technologies, however, the sequencing of large portions or even the entire genome has become relatively affordable. For example, resequencing of a whole genome was estimated to cost approximately US$8000 in 2013. Illumina (CA, USA) recently announced the release of the Illumina HiSeq X™ Ten, which is predicted to move this price closer to US$1000. Of course, the storage and interpretation of the data obtained in these sequencing experiments has not yet seen a similar decrease in cost. A number of technological solutions, however, including analysis pipelines using cloud computing, are being more widely implemented and will help to solve the issue of costly analysis. Together, these rapid cost reductions are beginning to enable smaller laboratories to perform sequencing experiments and consortia to conduct larger studies that, in turn, will accelerate variant discovery and increase our understanding of the genetic factors leading to complex disease.
Importantly, improvements in our understanding of phenotype will also contribute greatly to our ability to recognize novel variants in chronic disease. Since pathophysiology differs between different phenotypes, it is reasonable to suspect that variants contributing to disease also differ between phenotypes. This has been demonstrated in a pair of GWASs looking at a refined phenotype of COPD – radiologic emphysema – in which both novel variants and variants previously reported in less-refined phenotypes were found to be associated with emphysema.[45,65] In fact, a recent study using both simulated and real GWAS data demonstrated that phenotypic heterogeneity of 50% increased the sample size that is necessary to achieve detection power by almost three times. Clearly, a well-defined phenotype not only offers the potential for detecting novel variants that would have been otherwise confounded, but also the ability to conduct such studies at a significantly reduced cost.
Personalized Medicine. 2014;11(7):669-679. © 2014 Future Medicine Ltd.