The role of genomics and genetics in pulmonary arterial hypertension

[No abstract. Showing first paragraph of article]Although pulmonary hypertension (PH) had been recognised for centuries, it was not until the invention of cardiac catheterisation in the 1950s that enabled an accurate gene encoding bone morphogenetic protein receptor type 2, in patients with familial and clinical diagnosis. The discovery of heterozygous germline mutations in BMPR2, the idiopathic forms of pulmonary arterial hypertension (PAH) was another breakthrough in understanding the disease and initiated a new era in care of patients with this condition.


INTRODUCTION
Although pulmonary hypertension (PH) had been recognised for centuries, it was not until the invention of cardiac catheterisation in the 1950s that enabled an accurate clinical diagnosis 1 . The discovery of heterozygous germline mutations in BMPR2, the gene encoding bone morphogenetic protein receptor type 2, in patients with familial and idiopathic forms of pulmonary arterial hypertension (PAH) was another breakthrough in understanding the disease and initiated a new era in care of patients with this condition 2 . It has been reported that around 70-80% of familial PAH and 10-20% of idiopathic PAH (IPAH) cases are caused by mutations in BMPR2 3 . Over the last 20 years, 20 further genes that increase the risk of PAH have been reported, which together contribute an additional ∼5% of PAH heritability. It is anticipated that the advent of multi-omic approaches can elucidate the aetiology of the remaining cases and explain the extent to which noncoding variation, variation at more than one locus, allelic and locus heterogeneity and common variation contribute to disease penetrance and expressivity.
In this review, we summarise the established and emerging knowledge about the genetic architecture of PAH, technological and statistical approaches to new risk gene discovery and discuss the clinical utility and therapeutic implications of genetic testing.

Autosomal dominant forms of PAH The TGF-β superfamily and PAH
In 2000, genetic analysis of families with PAH identified heterozygous germline mutations in BMPR2, the gene encoding bone morphogenetic protein receptor type 2, a member of the transforming growth factor-β superfamily 4 . Subsequently, BMPR2 mutations were also identified in patients with IPAH 2 . To date, over 600 individual mutations in BMPR2 have been identified in PAH patients. Mutation types include nonsense, frameshift, splice-site, missense, and copy number variants [5][6][7][8][9] .
BMPR2 is highly expressed on pulmonary vascular endothelium, where it couples with the type I receptor (ALK1) in response to the circulating bone morphogenic protein (BMP) ligands, BMP9 and BMP10, with endoglin (ENG) serving as a coreceptor 10 . High expression of ALK1 and ENG in pulmonary endothelium contributes to the lung-specific phenotype of patients harbouring mutations in BMPR2.
In pulmonary artery endothelial cells (PAECs) BMPR2 knockout leads to endothelial dysfunction characterised by increased permeability, heightened proliferation and enhanced apoptosis. The downstream signalling depends on canonical SMAD signalling but also other non-canonical pathways 11 .
The loss of BMPR2 signalling also impacts on the secretion of vasodilatory and inflammatory molecules by endothelial cells such as IL-6, IL-8, E-selectin and eNOS 12 , facilitates a transition from mitochondrial oxidative phosphorylation to glycolysis 13 and promotes endothelial-to-mesenchymal transition 14 . Additionally, loss of BMPR2 in other cell types like smooth muscle cells (PASMC), fibroblasts, immune cells, cardiomyocytes and hematopoietic stem cells may contribute to disease development.
PASMCs with BMPR2 haploinsufficiency show a hyperproliferative phenotype due to the loss of antiproliferative Smad1/5 signalling 15 . Significantly worse right ventricle function indices in patients with BMPR2 mutations can be, at least partially, explained by the impact of a loss of BMPR2 function on cardiomyocyte metabolism.
In the transgenic mouse harbouring a heterozygous Bmpr2 C-terminal truncating mutation (R899X), the overexpression of a dominant-negative mutant Bmpr2 allele in cardiomyocytes led to the accumulation of long-chain fatty acids and failure to develop adaptive right ventricle hypertrophy 16 . There is mounting evidence that PAH is associated with myeloproliferative disorders 17 . Likewise, patients with PAH demonstrate abnormalities in the bone marrow and hematopoietic progenitor cells. It was previously demonstrated that low dose lipopolysaccharide caused PH in genetically susceptible mice 18 . A follow-up study showed that the hematopoietic stem cell compartment is involved in the susceptibility to PH in Bmpr2 +/− mice and that those mice can be rescued by hematopoietic stem cell transplantation 19 .
It is now established that around 70-80% of familial PAH and 10-20% of IPAH cases are caused by mutations in BMPR2 20 . Interestingly, BMPR2 expression and signalling are also decreased in PAH patients without BMPR2 germline mutations [21][22][23] , which suggests that impaired BMPR2 signalling might be a universal feature of PAH. This has been further supported by the identification of deleterious variants in the key members of the canonical BMPR2 signalling pathway.
PAH can co-occur with hereditary hemorrhagic telangiectasia (HHT), a disease characterized by arteriovenous malformations in the lung, brain, liver, skin, and mucus membranes which implicates ACVRL1 (ALK1) and ENG mutations in the pathogenesis of PAH [24][25][26] . It is important to note that a much larger proportion of HHT patient will develop PH secondary to pulmonary arteriovenous fistulas 27 . In some instances, ALK1 and ENG associated I/HPAH can occur without clinical features of HHT 25,27,28 with the caveat that HHT is characterised by age-related penetrance with clinical manifestations developing over the lifetime.
Sequencing of genes encoding BMP receptor signalling intermediaries led to the identification of rare sequence variants in SMAD1, SMAD4 29 and SMAD9 30 . The role of SMAD9 variants in the pathogenesis of PAH was further confirmed in larger cohorts 8 . Moreover, exome sequencing of individuals without deleterious variants in BMPR2, but with more than one family member diagnosed with PAH revealed mutations in caveolin-1, encoded by CAV1, which participates in colocalisation of BMP receptors 31 . A de novo variant (c.473delC, p.P158Hfs*23) was also found in a patient with IPAH 31 . A separate study identified a third CAV1 frameshift mutation (c.471delC, p.D157fs) in an adult patient with PAH 32 . All three variants are located in the terminal exome and escape nonsense-mediated decay. This was corroborated by functional analysis that demonstrated retention of the mutant protein in endoplasmic reticulum and sequestration of the wild-type protein, which, together, lead to the impairment of caveolae assembly 33 . In lung endothelial and mesenchymal cells, caveolin 1 is essential for the regulation of SMAD signalling 34,35 . The c.474delA CAV1 mutation leads to hyperphosphorylation of SMAD1, SMAD5 and SMAD8, consequently resulting in a reduction of the anti-proliferative function of caveolin 1, thereby supporting SMAD gain of function as the underlying molecular mechanism of disease in patients with this CAV1 variant 36 .
Also, a rare mutation in CAV1 has been linked to lipodystrophy and PAH in a young child 37 . The results of animal studies further supported the role of CAV1 in the pathogenesis of PH as CAV1 deficient mice developed changes in pulmonary vasculature consistent with those seen in human PAH [38][39][40] .
Finally, the NIHR BioResource -Rare Diseases (NBR) study identified associations between rare deleterious variants in BMPR2 ligands, GDF2 and BMP10 and PAH 8,41 . Further, in vitro analysis demonstrated that missense variants in GDF2 led to impaired cellular processing and secretion of BMP9. PAH patients carrying these mutations had reduced plasma levels of BMP9 and reduced BMP activity. Interestingly, plasma BMP10 levels were also markedly reduced in these individuals. Although overall BMP9 and BMP10 levels did not differ between PAH patients and controls, a subset of PAH patients had markedly reduced plasma levels of BMP9 and BMP10 in the absence of GDF2 mutations. These findings support therapeutic strategies to enhance BMP9 or BMP10 signalling in PAH 41 .

Channelopathies in PAH
Beyond TGF-β signalling, there is a growing body of evidence supporting the role of channelopathies in PAH. In 2013, six different mutations were identified in the KCNK3 gene (Potassium Channel, Subfamily K, Member 3) in PAH patients. Heterozygous KCNK3 mutations were found in sporadic and familial cases in which they segregated with the disease. Patch-clamp experiments demonstrated a loss of function in all identified mutations 42,43 .
The role of KCNK3 was further supported by a Spanish study, which found two KCNK3 variants in three individuals. Importantly one individual was homozygous with a particularly aggressive disease diagnosed at birth 44 . Additionally, two more variants were identified in the US PAH cohort 45 . To date, ten different KCNK3 mutations have been described in PAH patients [45][46][47][48] . The Kcnk3-mutated rat model recapitulates severe PAH phenotype reported in humans 49 . KCNK3 has been identified as a druggable target 50 .
A rare deleterious variant in ABCC8, encoding the ATP binding cassette subfamily C member 8, was found in a patient with childhood-onset IPAH. Further screening of initial and validation cohort identified deleterious, heterozygous, missense variants in patients with IPAH, familial PAH, and PAH associated with congenital heart disease. Functional studies confirmed the decreased activity of the ATP-sensitive potassium channel, adding evidence to the theory that a subset of PAH patients might be mechanistically described as having potassium channelopathy 51 .
Further support for this hypothesis came from the NBR study in PAH, which reported an association between rare deleterious variants in ATP13A3 and AQP1 and PAH 8 . ATP13A3 encodes for cation-transporting ATPase 13A3, a member of the P-type ATPase family of proteins, highly expressed in a variety of vascular cell types. Identified mutations clustered within the catalytic phosphorylation domain, which is likely to have a functional impact.

PAH as a disorder of transcriptional regulation
In a proportion of cases, with accompanying syndromic features and early-onset disease, PAH can be described as a disorder of transcriptional regulation. To date, two transcription factors have been implicated in PAH, namely TBX4 (T-Box Transcription Factor 4) and SOX17 (SRY-Box Transcription Factor 17).
TBX4 is a member of a conserved family of genes that share a common DNA-binding domain, the T-box, and encode for transcription factors involved in the regulation of developmental processes. TBX4 is expressed in lung, trachea, atria 61 and hindlimb 62 and was initially associated with small patella syndrome, characterised by hypoplasia or aplasia of the kneecap, ossification defects of the ischia and inferior pubic rami, as well as feet abnormalities 63 . Array comparative hybridisation and sequencing of a population of children with PAH and concurrent mental retardation and/or dysmorphic features implicated TBX4 in the pathogenesis of PAH 64 . It is now recognised that mutations in TBX4 are the second most common mutation found in pediatric-onset PAH 32 . Pathogenic TBX4 variants have also been reported in adult-onset PAH indicative of bimodal age distribution 8 .
SOX17 encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of angiogenic processes, including arteriovenous differentiation and development of the lung microvasculature [65][66][67] . Rare deleterious variants in SOX17 have been found in a large cohort of whole-genome sequenced I/HPAH patients characterised by young age at diagnosis, some of these variants also segregated with the phenotype 8 . These findings were validated in a Japanese cohort of I/HPAH patients 68 . Another study involving whole-exome sequencing of 256 patients implicated SOX17 in the pathogenesis of PAH associated with congenital heart disease 69 .

Autosomal recessive forms of PAH
Pulmonary veno-occlusive disease (PVOD) and capillary pulmonary hemangiomatosis (PCH) are subphenotypes of PAH group 1. They are histologically characterised by extensive venous and capillary involvement with only occasional pulmonary arterial changes. The clinical course of PVOD/PCH is also distinct with typically early-onset, rapidly progressive vasculopathy, low DLCO and lack of response to vasodilators. Biallelic (homozygous and compound heterozygous) EIF2AK4 variants have been detected in hereditary and sporadic forms of PVOD 70 and PCH 45 . Moreover, biallelic EIF2AK4 variants were found in patients with a clinical diagnosis of PAH and conferred poor prognosis; the radiological assessment was unable to distinguish reliably between these patients and patients with idiopathic PAH 45,71 .
These discoveries prompted the revision of the clinical classification of PAH and inclusion of PVOD/PCH as a subgroup of Group 1 PAH. EIF2AK4, also known as general control non-depressible 2 (GCN2), encodes a serine/threonine-protein kinase that phosphorylates the alpha subunit of eukaryotic initiation factor 2, which plays a key role in modulation amino acid metabolism in response to nutrient deprivation.

THE NEW KIDS ON THE BLOCK
The identification of additional genes harbouring potentially causal rare variants in PAH with smaller effect size requires a large collaborative effort to ensure adequate study power. Over the last few years, three consortia have developed large scale genomic and multi-omics programs to uncover the missing genetic architecture of PAH and to gain indepth insight into disease pathobiology (The US PAH Biobank http://www.pahbiobank. org, NBR 8,72 and PVDOMICS 73 ).
Through such efforts, the US PAH Biobank which consists of 37 US PAH centres, has recently identified two new PAH risk genes, tissue kallikrein 1 (KLK1) and gamma-glutamyl carboxylase (GGCX ) and has confirmed many previously reported genes using a variable threshold method 9 . In a mixed cohort of patients with Group 1 PAH, a total of 12 cases harbouring KLK1 variants (10 IPAH, 2 APAH) and 28 cases carried GGCX variants (17 IPAH, 9 APAH, 1 FPAH, 1 unknown subclass) were found.
KLK1 74 is a component of the kallikrein-kinin system (KKS) implicated in the homeostasis of the cardiovascular, renal and central nervous system 75,76 . The KKS comprises kallikreins, kininogens, kinins, kinin receptors and kininases (angiotensinconverting enzyme, ACE is the most important kinin-degrading enzyme in the cardiovascular system) 77 .
Fifteen tissue kallikreins have been discovered, of which only hK1 (encoded by KLK1) is a kininogenase, contributing to the formation of bradykinin (BK) and lys-bradykinin (lys-BK or kallidin), which exert their beneficial actions via nitric oxide and prostaglandins. In accordance with that animal and human studies found that BK B1 and B2 receptor antagonists administration results in an increase in systemic blood pressure [78][79][80][81] . Similarly, inactivating mutations in Klk1 was also associated with systemic hypertension in spontaneously hypertensive rats 82 .
In humans, polymorphism in the regulatory region of KLK1 is responsible for significant differences in hK1 expression and susceptibility to systemic hypertension 83,84 . These findings suggested that KLK1 pathway might be a potential drug target; indeed, recombinant KLK1 treatment has been shown to improve the recovery in acute ischemic stroke by augmenting penumbral blood flow and suppressing inflammation 74 .
GGCX, on the other hand, encodes a protein which carboxylates glutamate residues of vitamin K-dependent proteins, a critical modification required for their activity. Vitamin K-dependent proteins impact many physiologic processes including coagulation, prevention of vascular calcification, and inflammation. Variants in GGCX have been previously associated with combined deficiency of vitamin K dependent clotting factors 1 and pseudoxanthoma elasticum-like disorder with multiple coagulation factor deficiency 85,86 . Functional studies on the role of these genes in PH are still lacking.
A recent study by our group employed a novel Bayesian comparison method called BeviMed to discover new genotype-phenotype associations in a large cohort of deeply phenotyped patients with PAH on whom whole-genome sequencing was performed 87 . Using BeviMed, we identified another PAH candidate genes, KDR (kinase insert domain receptor), which encodes for vascular endothelial growth factor receptor 2 (VEGFR2) 87 .
We found that protein-truncating variants (PTV) in KDR were strongly associated with significantly reduced KCO and older age of onset. In addition to statistical evidence accompanied by biological plausibility, we also identified one case with a family history, which together with a recently published case report of two families, in which PTVs in KDR segregated with the phenotype of PAH and significantly reduced KCO 88 , amounts to three reported cases with familial segregation.
The role of VEGF signalling in the pathogenesis of PAH has been an area of intense research since reports of increased expression of VEGF, VEGFR1 and VEGFR2 in rat lung tissue in response to acute and chronic hypoxia 89 . An increase in lung VEGF has also been reported in rats with PH following monocrotaline exposure 90 . In humans, VEGF-A is highly expressed in plexiform lesions in patients with IPAH 91 , tracheal aspirates from neonates with a persistent PH of the newborn 92 and small pulmonary arteries from infants with PH associated with a congenital diaphragmatic hernia 93 .
Given these findings, it is surprising that the overexpression of VEGFA ameliorates hypoxia-induced PAH 94 . In contrast, inhibition of VEGF signalling by SU5416 (sugen) combined with chronic hypoxia triggers severe angioproliferative PH 95 .
Sugen, a small-molecule inhibitor of the tyrosine kinase segment of VEGF receptors inhibits VEGFR1 96 and VEGFR2 97 causing endothelial cell apoptosis, loss of lung capillaries and emphysema 98 . In combination with chronic hypoxia, sugen causes celldeath dependent compensatory PAEC proliferation and severe PH 95 . Further evidence supporting the role of VEGF inhibition in the pathobiology of PAH comes from reports of PH in patients treated with bevacizumab 99 and the multi-tyrosine kinase inhibitors 100,101 .

FACTORS INFLUENCING DISEASE PENETRANCE AND EXPRESSIVITY
Penetrance is defined as the percentage of individuals with a particular mutation who exhibit a typical clinical phenotype. As such, penetrance is a measurement of the relationship between a genotype and phenotype. Understanding this relationship provides insight into the pathobiology of the disease and is fundamental for genetic counselling 102 .
Among well-established factors affecting penetrance are mutation type, individual variation in gene expression, epigenetic and environmental factors, as well as age and sex. Moreover, reduced penetrance may reflect the digenic or oligogenic inheritance. In PAH, female sex is the single most important determinant of the penetrance, which is estimated to be 42%, whereas penetrance in male carriers is 14% 103 . This is most likely driven by oestrogen metabolism [104][105][106] .
Variable gene expression was also found to impact on PAH penetrance; the expression levels of wild-type BMPR2 from the unaffected allele transcript were higher in healthy carriers than in affected individuals 107 . Additionally, analysis of genetic variants affecting alternative splicing of BMPR2 showed that patients have a higher isoform B/A ratio than carriers 108 . Moreover, in vitro study on patient-specific induced pluripotent stem cells derived endothelial cells, it was shown that downregulation of endogenous receptor antagonists and upregulation of receptor activators might compensate for impaired BMPR2 signalling in unaffected BMPR2 mutation carriers 109 . Somatic chromosomal abnormalities in lung tissue have been described as second-hit mutations affecting the BMPR2 pathway in PAH 110 Related to penetrance is disease expressivity, which refers to a range and severity of signs and symptoms that occur in individuals with the same genetic condition. In PAH, it was shown that patients with missense mutations that escape nonsense-mediated decay have more severe disease than those with truncating mutations, suggestive of a dominant-negative impact of mutated protein on downstream signalling 111 . However, missense variants in the cytoplasmic tail appear to confer less severe phenotype than other BMPR2 variants with a later age of onset, milder haemodynamics, and more vasoreactivity 112 .
Currently, the impact of environmental factors on PAH penetrance remains largely unknown, but studies show that BMPR2 expression and degradation can be affected by viral proteins and cocaine [113][114][115] .

GENE-SPECIFIC PHENOTYPES
The clinical characterisation of patients harbouring rare deleterious variants in risk genes allows to understand better the functional impact of the mutation and in many instances has clinical and prognostic implications. Conversely, deep clinical phenotyping of patients with rare disorders permits clustering of those patients into homogenous cohorts, which may share common genetic architecture. To date, the best characterised group of mutation carriers are patients harbouring deleterious variants in BMPR2.
A large individual participant data meta-analysis has found that patients with BMPR2 mutations have earlier disease onset, worse haemodynamics, are less likely to respond to NO challenge and have lower survival when compared to those without BMPR2 mutations 116 . The histological analysis of lungs explanted from those patients also revealed a higher degree of bronchial artery hypertrophy/dilatation, which correlated with the frequency of haemoptysis at presentation 117 . Distinct phenotypes were also described in patients with rare deleterious variants in TBX4, ENG, ALK1 and EIF2AK4.
Mutations in TBX4, known for its pivotal role in embryogenesis, present with severe PAH associated with bronchial and parenchymal changes, low DLCO, with or without skeletal abnormalities 118 , and bimodal age of onset 9 . Interestingly, the penetrance of TBX4 mutations for skeletal abnormalities is much higher than for PAH 119 . Similarly, patients with variants in ENG and ALK1 can present with signs and symptoms of either HHT, PAH or both 24,120 .
Several studies have now shown that identification of biallelic mutations in EIF2AK4 is sufficient to diagnose a hereditary form of pulmonary veno-occlusive disease even in the absence of typical radiographic features 121 . This is of importance as these patients present with rapidly progressive disease and may develop life-threatening pulmonary oedema in response to PAH medication. Detection of biallelic EIF2AK4 mutation should trigger referral for lung transplantation 121 .
Recently, deep clinical phenotyping of patients with PAH combined with whole genome sequencing (WGS) revealed an association between protein-truncating KDR variants and PAH with reduced KCO and older age at diagnosis. Additionally, all patients were found to have mild interstitial lung disease. The frequency of systemic hypertension and thyroid dysfunction was also higher than in patients without the mutation in PAH risk genes 87 .

MONOGENIC VS. DIGENIC OR OLIGOGENIC INHERITANCE
HPAH has been historically considered a monogenic condition with an autosomal dominant mode of inheritance, meaning that a single deleterious variant in a PAH risk gene is sufficient to result in PAH phenotype. Nevertheless, 20 years of research into the genetic background of PAH indicates that none of the discovered genes is either sufficient or necessary for the disease to develop. Overall, low penetrance of PAH indicates that other environmental and genetic factors might be required for the disease to develop.
One of the possible explanations is that, at least in a proportion of cases, the inheritance is actually digenic or oligogenic. In the true digenic model, both genes are required to develop the disease. Conversely, in the composite class model, a variant in one gene is sufficient to produce the phenotype, but an additional variant in a second gene impacts the disease phenotype or alters the age of onset 122 . The latter model seems to be plausible in PAH, where co-occurrence of the variants in different PAH risk genes has been reported to impact on disease onset and penetrance 123 . Patients harbouring deleterious variants in more than one PAH risk gene were reported in small 124 and large cohorts of HPAH patients 8 .

COMMON GENETIC VARIATION
Although PAH is considered a Mendelian disorder, its low penetrance and heterogenous phenotype suggest a contribution of common sequence variation to disease susceptibility. In PAH several common genetic variants were found to contribute to phenotypic heterogeneity among patients with rare causal mutations in BMPR2. Polymorphisms at BMPR2 125 , TGF -β1 126 and sex hormones loci 104 were shown to contribute to variable gene expression and at least partially explain phenotypic variation between BMPR2 mutation carriers.
In non-hereditary forms of PAH, a common polymorphism at the CBLN2 (cerebellin) locus 127 as well as in the endostatin 128 and serotonin transporter genes 129 were discovered. To date, the largest PAH GWAS study of multiple international I/HPAH cohorts identified two novel loci associated with PAH: an enhancer near SOX17, and a locus within HLA-DPA1/DPB1 130 . These findings corroborate and extend the previous discovery of the association of rare variants in SOX17 with PAH 8 .
One of the most successful applications of GWAS has been in the area of pharmacology. Pharmacogenetics aims to pinpoint DNA sequence variations that are associated with drug metabolism and efficacy as well as adverse effects. Recent studies have advanced our understanding of the influence of genetic variation on response to PAH therapy.
A study by Benza 131 revealed that polymorphism in the endothelin-1 pathway was significantly associated with outcomes in patients treated with ERA. Likewise, common variants in SIRT3 (mitochondrial deaminase) and UCP3 (uncoupling protein 2), which regulates calcium entry to the cell, predicted response to dichloroacetate, pyruvate dehydrogenase kinase inhibitor in phase 2 clinical trial in PAH 132 . Concurrently genetic studies into mitochondrial DNA revealed that mitochondrial haplogroups influence the risk of PAH and that susceptibility to PAH emerged as a result of selective enrichment of specific haplogroups upon the migration of populations out of Africa 133 .

TECHNOLOGICAL AND STATISTICAL APPROACHES TO RARE VARIANT ANALYSIS
Rare variant analysis poses a number of challenges related to the sequencing, phenotyping, association testing and interpretation of rare variants. A common data processing and analysis pipeline for sequencing-based association studies is depicted in Figure 1.

Sequencing
The development of 'Sanger sequencing' in the 1970s initiated a new era of human genomics; indeed, the advanced form of this technology remains the gold standard for sequencing 134 . Although highly accurate, Sanger sequencing proved to be costly and low throughput, and only the advent of ''next-generation'' sequencing (NGS) technology in the mid-2000s offered low-cost sequencing in both research and diagnostic settings.
Several sequencing modalities, such as targeted sequencing, whole-exome sequencing (WES) and low depth WGS, as well as sampling strategies (i.e. extreme phenotype sampling), have been developed to maximise resources and increase power.
WES, which enables sequencing of all protein-coding regions, gained traction, especially in Mendelian disorders. Although the protein-coding space occupies only 2% of the genome, it is estimated to harbour most of the disease causing variants 135,136 . WGS enables analyses of coding and non-coding space, allowing for more comprehensive genomic testing. Rapid technological progress contributes to the variability in sequencing data acquired over time (differences in method and pipeline used, variable read lengths).
The additional biological variation that needs to be accounted for may come from the source of DNA (although saliva samples are a valid alternative to the blood samples, they contain a lower percentage of amplifiable DNA) or age at specimen collection. The latter can contribute to inflated rates of qualifying variants in older patients with ageassociated clonal haematopoiesis [137][138][139] . Downstream analysis might be affected by genetic relatedness in the study cohort, unequal representation of ancestry groups 140 , male to female ratio in cases and controls 141,142 or choice of the transcript 143 .

Phenotyping for genetics studies
No genetic study will lead to meaningful results without a comprehensive approach to characterising the phenotype of interest. Clinical diagnosis involves grouping patients based on observable traits, signs and symptoms, which are the product of genetic, epigenetic and environmental factors. As a result, clinical phenotypes can be dynamic and reactive, which is useful and desirable in the clinical setting but unsuitable for genetic studies.
There are many differences between clinical and research diagnosis (particularly diagnosis for genetic analyses) that need to be recognised and addressed. The former is obtained in three interrelated stages: history taking and physical examination, differential diagnosis and confirmation. These can be spread over time and involve multiple patientpractitioner encounters.
Research diagnosis is usually limited to a single encounter during which complete, reliable and valid phenotype description must be obtained. In research, setting completeness is assured by uniformed and standardised checklist (electronic case report forms, questionnaires) and reliability can be enhanced by the development of the standard operating procedure, diagnostic criteria for phenotypes, use of controlled vocabulary and regular staff training. Additionally, the phenotype description must be amenable to computational analysis. Finally, the validity of phenotypes is confirmed in test cohorts and through functional studies.
The precision with which the phenotype information is measured cannot be overestimated. In genetic studies mislabeling of participants or admixture of phenocopies can significantly affect power to detect an association 144 . Equally, categorising biologically continuous phenotypes (i.e. mPAP, DLCO) is prone to errors due to flaws in quantification methods and arbitrary thresholds.
Phenotype optimisation for genetic studies aims at finding homogenous groups of patients that likely share the same genetic architecture. This can be approached through various strategies: stratification based on family history, age of onset, sex, covariatesbased methods which jointly estimate the effect of multiple variables and data reduction techniques. Alternatively, intermediate phenotypes (endophenotypes) can be used. Intermediate phenotypes are features closer to underlying biology that are at least as heritable as the phenotype itself, stable over time, and are associated with the disease of interest 145 .
Although clinical phenotyping remains the most widely used method of patients' stratification both in clinical practice and research, it requires substantial domain knowledge and is very time-consuming. Deep computational phenotyping based on clinical and/or ''omics'' datasets using supervised or unsupervised machine learning might be an alternative due to unparalleled diagnostic precision and speed. At the heart of this approach are phenotype ontologies, like Human Phenotype Ontology, which allow standardised, highly granular and precise phenotyping across different disease domains 146 .
Use of ontologies to define phenotypes has already proven useful in identifying novel candidate genes for rare disorders 147 . Ontology-based analysis of phenotypes got further facilitated by the implementation of methods for manipulation, visualisation and computation of semantic similarity between ontological terms and sets of terms 148 . Complementary to this method is using a reverse phenotyping approach in which genetic marker data are used to infer about new phenotypes. This approach aims to cluster patients based on more deviant allele frequencies and validate findings in a separate sample or using resampling techniques 145 .

Study design and statistical methods
Several recent publications have addressed the issue of study design and statistical methods in rare variant association analysis 149,150 . As opposed to GWAS studies, singlevariant testing is not suitable for rare variant analysis as the number of individuals carrying a particular variant might be very low and the effect size small requiring unrealistically large sample sizes. To circumvent this problem several gene-and regionbased aggregation strategies have been proposed.
A gene-based aggregation approach uses gene boundaries to test for differences in the counts between cases and controls. This approach is useful when different variants confer an equivalent disease risk i.e. any loss-of-function variants (LOF; nonsense, frameshift, essential splice site) would result in the same phenotype. In such cases, the difference in presence or absence of LOF between cases and controls determines the association. Many variant filtering methods are routinely employed prior to association testing.
Firstly sequencing-based quality metrics, secondly MAF filters 151 (usually MAF of 1:10 000 for autosomal dominant disorders and MAF of 1:1000 for autosomal recessive disorders), finally in silico deleteriousness scores for missense variants like PolyPhen-2 152 , SIFT 152,153 , and REVEL 154 and conservation scores like GERP 155 , PhyloP 156 or PhastCons 157 , or the ensemble score CADD 158 for genome-wide analysis, which combines several metrics in one score.
Given the known number of genes being tested (i.e., 20,000 protein-coding genes), the conventional adjustment for multiple testing within protein-coding space is calculated using Bonferroni formula, α = (0.05/20,000) ≈ 2.5 × 10 −6 . Importantly, if multiple models are applied, as is usually the case, the significance threshold needs to be further divided by the number of models tested.
Region-based collapsing approaches hinge on the notion that different regions within genes may vary in their tolerance to missense variation. An alternative approach, particularly useful in smaller studies, is collapsing variants that belong to the same gene set (i.e., genes that belong to the same pathway).
Complex genetic models like recessive or digenic/oligogenic mode of inheritance pose additional challenges. In the recessive mode of inheritance, MAF threshold must be relaxed as heterozygotes are unaffected (higher MAF in the reference populations); also, variants in cis configuration might be wrongly counted. Testing for digenic inheritance is particularly problematic due to the large number of possible combinations needing testing and adjusting for 150 .
A number of statistical methods have been developed to test for rare variant associations: • Burden tests [159][160][161] aggregate the information found within a predefined genetic region into a summary dose variable. In weighted burden tests 162 , variants are weighted according to their frequency or functional significance.
• Adaptive burden tests 163 aim to account for bidirectional effects by selecting appropriate weights.
• Variance component (kernel) tests such as SKAT 164 allow to test risk and protective variants simultaneously, but are underpowered when most variants are causal, and effects are unidirectional.
• Omnibus tests such as SKAT-O 165 , which combines burden test with the variancecomponent test, might be particularly useful when there is little knowledge of the underlying disease architecture.
In addition to frequentist approaches, Bayesian statistical framework offers a robust alternative. Bayesian model comparison methods, like BeviMed 166 , allow testing association between rare Mendelian disease and a genomic locus by comparing support for a model where disease risks depend on genotypes at rare variant sites in the locus and a genotype-independent ''null'' model.
The prior probability in such models can vary across variants (reflective of external biological information, i.e. depending on MAF, conservation scores, gene ontologies, expression in the tissue of interest) or be constant for all genes/variants reflecting the prior belief of the overall proportion of variants that are associated with a given phenotype. Population-based rare variant association studies can be complemented by family-based designs; these are particularly useful for dichotomous traits and robust to population stratification 167 .
Last but not least, an essential step in rare variant discovery is to ascertain the pathogenicity of a given variant and its causative role in the disease. Not all damaging variants are pathogenic and in silico approaches alone are not enough to predict if the variant is disease-causing 168 . To aid both research and clinical decision making, the American College of Medical Genetics and the Association for Molecular Pathology (ACMG) issued recommendations that combine and weight the computational, functional, population and clinical evidence to determine pathogenicity 169 . Other initiatives like ClinGen and ClinVar aim to define the clinical relevance of reported in the literature genes and variants for use in precision medicine and research 170 .

Clinical utility of genetic diagnosis
The utility of genetic diagnosis cannot be overestimated since it explains aetiology, informs prognosis and treatment decisions, and allows risk stratification of family members. Genetic testing has the potential to mitigate the disease course and has been recommended by current practice guidelines [171][172][173][174] . The influence of the mutation on outcomes has been well described in BMPR2 and EIF2AK4 mutation carriers. While BMPR2 and ACVRL1 mutation carriers present at a younger age, with more severe haemodynamics and worse survival than patients without pathogenic PAH mutations 3,175,176 , EIF2AK4 mutation carriers present at a younger age than non-mutation carriers but the mutations status does not seem to affect prognosis 177 . Identification of EIF2AK4 mutations allows confirming diagnosis not only in PVOD/PCH cases but also in patients who clinically presented as I/HPAH, eliminating the need to perform lung biopsy 178 . Due to dismal prognosis related to PVOD/PCH diagnosis, early referral for transplantation is warranted 121 .

Pharmacogenetics
Knowledge about the genetic makeup of the individual allows targeted treatments enhancing efficacy and decreasing the risk of potential side effects 179 . Given the central role of BMP signalling in the pathogenesis of PAH, it is not surprising that therapies to enhance or rescue the BMPR2 pathway gained most of the traction (Figure 2). In preclinical studies, BMP9 administration in heterozygous BMPR2 knockout mice had a positive effect on haemodynamics, suggestive of a possibility to overcome BMPR2 haploinsufficiency by up-titration of the ligand 10,132 . Likewise, ataluren reads through nonsense mutations in BMPR2 and SMAD9 and restores the full-length proteins 180 . Chloroquine prevents progression of experimental PH through inhibition of lysosomal degradation of BMPR2, leading to increased receptor density at the surface of the endothelial cells 181 . Similarly, TNFα inhibitor, etanercept, reduces inflammation, receptor shedding and proteasomal degradation of BMPR2 182 .
As reduced BMPR2 signalling is a known phenomenon even in patients without the mutation, targeting BMPR2 modifier genes might be an effective rescue mechanism. Enzastaurin was shown to rescue the BMPR2 modifier gene, fragile histidine triad (FHIT ) and reverse animal PH 183 . Ligand traps inhibiting negative BMPR2 regulation are another therapeutic option.
Recently Acceleron announced positive results from the sotatercept Phase 2 PULSAR trial in patients with PAH 184 . Sotatercept is a novel recombinant fusion protein that inhibits TGF-β superfamily members including GDF11 and activin A and B and restores the balance in the BMPR2 signalling pathway. Both EMA and FDA granted Sotatercept a breakthrough therapy status allowing expedited development process.
In the future, patients with deleterious variants in CAV1, may benefit from elafin, a peptidase inhibitor 3, encoded by PI3 gene 185 . In sugen hypoxia rats, elafin reduced elastase activity and reversed PH, as well as improved endothelial function by increasing apelin 186 .
Tacrolimus (FK508), a potent immunosuppressant, was shown to increase BMP signalling via blocking FK-binding protein 12 in an animal PH model. The preclinical study confirmed the safety and tolerability of this agent in PAH 187 . Gremlin 1 secreted by vascular endothelium may inhibit BMPR2 signalling. Neutralizing antibodies interfering with gremlin 1 proved effective in ameliorating chronic hypoxia/sugen-induced PAH in mice 188 .
Besides the BMP pathway, modulating ion channels functions might be an effective therapeutic method. There is evidence that reduced potassium channel conductance in some KCNK3 mutants can be recovered by the phospholipase A2 inhibitor ONO-RS-082 189 . Similarly, ABCC8 variants can be rescued by the SUR1 activator, diazoxide 51 .

Knowledge, attitudes and barriers towards genetic testing and counselling
Despite recent developments in understanding the genetic background of PAH and the potential therapeutic implications, almost 80% of physicians caring for PAH patients never or rarely refer their patients to genetic counsellors or order genetic testing 190 . At least in the US, the most frequent reasons for that were reported to be lack of insurance coverage and limited access to genetic counsellors. Interestingly, the most important driver for genetic testing was patient inquiry 190 . This situation might be different in Europe where healthcare is publicly funded 191 , but it varies from country to country. In any case, a forward-thinking and innovative regulatory environment is necessary for integrating research and technological advances into clinical practice. In this respect, NHS England's Genomic Medicine Service is likely to revolutionise routine patients' care in the UK. Similar small-and large-scale initiatives are booming around the world. To give an example, the use of rapid WGS pioneered by Dr Stephen Kingsmore and his team at Rady Children's Institute for Genomic Medicine enables precise diagnoses for critically-ill newborns within 26 hours 192 .

SUMMARY AND FUTURE DIRECTIONS
In clinically diagnosed ''idiopathic'' PAH cohorts, up to 25% of patients have mutations in known PAH risk genes, leaving the remaining 75% without explanation about the disease trigger and pathobiology. Although environmental factors may account for some of these idiopathic cases, it seems likely that additional, unknown, rare genetic variation is responsible for many more cases. Moreover, both common and rare genetic factors may influence the penetrance of the known genes and disease expressivity. Multi-omic and spatial technologies offer an additional layer to the understanding of the disease pathobiology and are likely to enter clinical settings in the near future.
Only large-scale international collaborative efforts can collect the sample sizes powered to elucidate this missing heritability of PAH. Beyond that, pulmonary hypertension physicians' education in genetics and genomics is crucial in the delivery of precision diagnostics and therapeutics. Access to genetic counselling and testing must be addressed at the national level and account for healthcare financing models and patients' preferences.

FUNDING SOURCES
ES is a PhD student, founded by British Heart Foundation. SG is supported by the National Institute of Health Research (NIHR). NWM is a BHF Professor and an Emeritus NIHR Senior Investigator.