Skip main navigation

Whole Genome Analysis of Venous Thromboembolism: the Trans-Omics for Precision Medicine Program

Originally published Genomic and Precision Medicine. 2023;16



Risk for venous thromboembolism has a strong genetic component. Whole genome sequencing from the TOPMed program (Trans-Omics for Precision Medicine) allowed us to look for new associations, particularly rare variants missed by standard genome-wide association studies.


The 3793 cases and 7834 controls (11.6% of cases were individuals of African, Hispanic/Latino, or Asian ancestry) were analyzed using a single variant approach and an aggregate gene-based approach using our primary filter (included only loss-of-function and missense variants predicted to be deleterious) and our secondary filter (included all missense variants).


Single variant analyses identified associations at 5 known loci. Aggregate gene-based analyses identified only PROC (odds ratio, 6.2 for carriers of rare variants; P=7.4×10−14) when using our primary filter. Employing our secondary variant filter led to a smaller effect size at PROC (odds ratio, 3.8; P=1.6×10−14), while excluding variants found only in rare isoforms led to a larger one (odds ratio, 7.5). Different filtering strategies improved the signal for 2 other known genes: PROS1 became significant (minimum P=1.8×10−6 with the secondary filter), while SERPINC1 did not (minimum P=4.4×10−5 with minor allele frequency <0.0005). Results were largely the same when restricting the analyses to include only unprovoked cases; however, one novel gene, MS4A1, became significant (P=4.4×10−7 using all missense variants with minor allele frequency <0.0005).


Here, we have demonstrated the importance of using multiple variant filtering strategies, as we detected additional genes when filtering variants based on their predicted deleteriousness, frequency, and presence on the most expressed isoforms. Our primary analyses did not identify new candidate loci; thus larger follow-up studies are needed to replicate the novel MS4A1 locus and to identify additional rare variation associated with venous thromboembolism.

Venous thromboembolism (VTE), comprised of deep vein thrombosis and pulmonary embolism, is a major public health issue, with an annual incidence rate in adults of ≈1 to 2 per 1000.1,2 Twin and family studies suggest a strong genetic influence in VTE risk, with heritability estimates that range from 40% to 60%.3–6 Most of the genetic loci identified to date have been identified using a candidate-gene approach or a genome-wide association approach with imputed genotypes.7–14 Because these discovery efforts are array-based and rely on linkage disequilibrium to localize disease associated regions, array-based efforts can miss associations and likely fail to identify the underlying causal variants.15–17

With the exception of familial variation found in the 3 major natural anticoagulant structural genes (protein C [PROC], protein S [PROS1], and antithrombin [SERPINC1]), most validated genetic variation associated with VTE has been common, with minor allele frequency (MAF) >0.01.7–14 The search for less common genetic variation associated with VTE has been limited by the size of the discovery population or has been limited to discovery in the exonic regions of the genome.18–22

High pass whole genome sequencing data, as available through the TOPMed program (Trans-Omics for Precision Medicine),23 provides an excellent opportunity to directly observe and interrogate essentially complete genomic variation to improve our understanding of the underlying genetic architecture of complex traits such as VTE.


This analysis included participants from 6 studies, and anonymized data and materials have been made publicly available in the database of Genotypes and Phenotypes for all studies: the ARIC (Atherosclerosis Risk in Communities Study; the database of Genotypes and Phenotypes accession number phs001211.v4.p3),24 the CHS (Cardiovascular Heart Study; phs001368.v3.p2),25 the FHS (Framingham Heart Study; phs000974.v4.p3),26–28 the HVH (Heart and Vascular Health Study; phs000993.v5.p2),7 the Mayo (Mayo Clinic Venous Thromboembolism Study; phs001402.v3.p1),29 and the WHI (Women’s Health Initiative; phs001237.v3.p1).30,31 All studies were approved by appropriate institutional review boards, and informed consent was obtained from all participants.

See the Supplemental Material for comprehensive information about the methods used in this work.


Study Population

A total of 11 627 participants (3793 cases and 7834 controls) from 6 studies contributed to this analysis, with an overall case:control ratio of 1:2.07 (Table 1). The participants were predominantly female (65%) and predominantly of European ancestry (EA; 83%), with individuals of African ancestry (AA) participants representing 16%; the remaining participants consisted of a small number of individuals of Hispanic/Latino or Asian ancestry (Table 1). Results for single variant and gene-based tests have been uploaded to the database of Genotypes and Phenotypes under accession number phs001974.v3.p1.

Table 1. Study Population Characteristics of Participating Studies from the TOPMed Program

Cases, N51511312360713331102
Controls, N4721287584002242
Female, %53.455.849.135.151.8100
Ancestry, %
Median age, y*

ARIC indicates Atherosclerosis Risk in Communities Study; CHS, Cardiovascular Health Study; FHS, Framingham Heart Study; HVH, Heart and Vascular Health Study; NA, not available; TOPMed, Trans-Omics for Precision Medicine; VTE, venous thromboembolism; and WHI, Women’s Health Initiative.

* Age for cases is age at the first VTE event and age for controls is the age at censoring (ie, at death or at last follow-up contact).

Single Variant Analysis

We interrogated 25 740 878 variants that met inclusion criteria (minor allele count ≥20 and sufficient sequencing coverage; see the Supplemental Material for more detail) for single variant analysis. A variant right at this threshold (minor allele count ≥20; MAF ≥0.00086) would need a relative risk >5.3 to have power to be detected. We identified 6 VTE-associated loci with one or more variants that had a P value that exceeded genome-wide significance (Table 2; Figure 1; Table S1a; see Table S1b for variants with subthreshold P values). A QQ plot (Figure S1) suggests that the P value distribution was well controlled (λ=1.04). Five of the identified loci have been previously established as associated with VTE (F5, FGG, F11, ABO, SCL44A2)9–11; however, with the exception of Factor 5 Leiden (rs6025), the variants with the smallest P values in these known regions were all different from the ones identified in previous studies (Table 2 versus Table S1a; summary statistics from this study for all variants identified in Lindström et al11 are listed in Table S2). Nevertheless, when conditioning on all known loci, the variants identified in our study all became nonsignificant, suggesting that no new independent signals were identified in these regions. The most extreme P value observed in a novel region was on chromosome 1 (index variant rs56040707 [MAF=0.03]; see Figure S2 for a LocusZoom plot) and had a P value of 3.4×10−8 in the primary analysis and a P value of 2.5×10−8 after conditioning on all of the known variants (Table 2). The estimated odds ratio (OR) of rs56040707 was 1.6 (95% CI, 1.3–1.8). After running ancestry-specific analyses in individuals of EA (MAF=0.04 and AA (MAF=0.005), we found the signal was driven by EA samples (Table 2). We attempted to replicate this finding in a meta-analysis from a multiancestry case-control sample from the UK Biobank (19 406 cases and 97 391 controls), which controlled for age at event, year of event, sex, self-reported ethnicity, and 10 principal components while modeling relatedness. A similar MAF was seen (0.037) for the A allele but the association did not replicate (P=0.58; OR=1.0). The variant was imputed well (Rsq info=0.94), and the sample size was adequate to provide 95% power even if the true OR was as small as 1.1.

Table 2. Genome-Wide Significant Results From Single Variant Tests in Whole Genome Sequence Analysis of VTE in TOPMed Populations (MAC ≥20)

LocusChrPosition*Index SNVCACAF trans-ethnicCAF EACAF AATrans-ethnic P valueConditional P valueEA P valueAA P valueOR trans-ethnic (CI)OR EA (CI)OR AA (CI)
Intergenic1159552202rs56040707G0.030.040.0053.4×10−82.5x10−88.0×10−80.941.57 (1.3–1.8)1.57 (1.3–1.8)0.96 (0.3–2.7)
F51169549811rs6025T0.040.040.0075.0×10−36NA1.4×10−360.422.58 (2.2–3)2.65 (2.3–3)1.49 (0.6–4)
FGG4154596690rs35147053G0.270.270.322.3×10−100.0144.1×10−90.0661.23 (1.2–1.4)1.24 (1.2–1.4)1.18 (1–1.4)
F114186292729rs4444878A0.400.420.293.0×10−150.0252.0×10−140.0451.27 (1.2–1.4)1.28 (1.2–1.4)1.19 (1–1.4)
ABO9133261662rs687621A0.620.630.571.2×10−390.00182.7×10−410.0210.67 (0.63–0.71)0.64 (0.6–0.68)0.83 (0.71– 0.97)
SCL44A21910631494rs2288904G0.810.790.941.3×10−90.294.7×10−100.641.26 (1.2–1.4)1.28 (1.2–1.4)1.08 (0.8–1.5)

AA indicates individuals of African ancestry; CA, coded allele; CAF, coded allele frequency; Chr, Chromosome; EA, individuals of European ancestry; MAC, minor allele count; NA, not applicable; OR, odds ratio; SNV, single nucleotide variant; TOPMed, Trans-Omics for Precision Medicine; and VTE, venous thromboembolism.

* Position is with respect to build hg38.

† Conditional P value reflective of single variant analysis in transethnic population conditioned on index variants from previous genome-wide association studies conducted by the INVENT consortium.10

‡ rs6025 was included in the list of conditional variants.

Figure 1.

Figure 1. Manhattan plot of single variant tests in whole genome sequence analysis of venous thromboembolism (VTE) in Trans-Omics for Precision Medicine (TOPMed) participants. The horizontal axis represents the build 38 genomic position by chromosome of each single nucleotide variant and the vertical axis records the −log10 (P value) from the association analysis. The dashed horizontal line indicates the genome-wide significance threshold P=1×10−8. Only variants with a minor allele count >20 were included. Variants with a call rate <90% (samples with read depth <10 were set to missing) in all of TOPMed freeze8 samples were excluded. Labels indicate associated gene loci.

Power calculations assuming 3793 cases and 7834 controls, a disease prevalence of 10%, and a 1.0×10−8 significance threshold indicated that we had 80% power to detect variants with MAF=0.001 with an OR ≥4.7 and at MAF=0.01 with an OR ≥2.1.

Gene-Based Rare Variant Analysis

Using our primary variant filtering strategy (see the Supplemental Material), we identified a single gene associated with VTE, PROC (Table 3; Figure 2), that exceeded a Bonferroni-corrected significance level for the number of genes tested (P<2.3×10−6). PROC is the structural gene for the natural anticoagulant Protein C, and rare coding variants associated with VTE have previously been reported.37 In the TOPMed data, results were driven by individuals of EA (EA burden P=8.0x10−12, AA burden P=0.12, using the primary filter). Since no individual harbored more than one rare variant in PROC, the calculated OR represents the risk for carriers versus noncarriers. Using the variants selected by our primary filter, which included only missense and loss-of-function (LOF) variants predicted to be deleterious by at least one of 5 bioinformatics algorithms, yielded an OR (5.1 [95% CI, 3.2–8.1]) that was larger than that of variants identified by our secondary filter (3.8 [95% CI, 2.7–5.5]; Table 3), which included all missense and LOF variants. The calculated OR in individuals of EA using the primary filter was also 5.1 (95% CI, 3.2–8.2).

Table 3. Results in Rare Variant Aggregate Association Results of VTE in TOPMed Populations

GeneFilter*MAF filterAll casesUnprovoked cases
OR (95% CI)P valueOR (95% CI)P value
PROCPrimary0.015.1 (3.2–8.1)7.4×10−14§4.9 (2.9–8.1)7.5×10−11§
PROCPrimary0.00055.1 (3.2–8.1)7.4×10−14§4.9 (2.9–8.1)7.5×10−11§
PROCSecondary0.013.8 (2.7–5.5)1.6×10−14§3.5 (2.4–5.3)8.6×10−11§
PROCSecondary0.00054.5 (3.1–6.7)1.4×10−15§4.3 (2.8–6.6)1.6×10−12§
MS4A1Primary0.013.6 (1.9–6.7)3.4×10−54.6 (2.2–9.7)1.5×10−5
MS4A1Primary0.00053.6 (1.9–6.7)3.4×10−54.6 (2.2–9.7)1.5×10−5
MS4A1Secondary0.012.5 (1.5–3.9)2.2×10−43.1 (1.8–5.4)3.4×10−5
MS4A1Secondary0.00053.3 (1.9–5.6)6.6×10−64.4 (2.4–8.1)4.4×10−7§
PROS1Primary0.011.5 (1.2–1.9)2.3×10−41.6 (1.2–2.1)1.9×10−4
PROS1Primary0.00052.3 (1.6–3.5)3.8×10−52.7 (1.6–4.5)1.1×10−4
PROS1Secondary0.011.7 (1.4–2.1)1.8×10−6§1.7 (1.3–2.1)2.3×10−5
PROS1Secondary0.00051.8 (1.3–2.4)2.3×10−41.8 (1.3–2.6)1.4×10−3
SERPINC1Primary0.011.3 (1.0–1.8)7.6×10−21.5 (1.1–2.2)2.8×10−2
SERPINC1Primary0.00052.7 (1.5–4.6)8.7×10−42.6 (1.3–5.1)1.1×10−2
SERPINC1Secondary0.011.5 (1.1–2.0)4.9×10−31.6 (1.2–2.2)4.3×10−3
SERPINC1Secondary0.00052.9 (1.8–4.8)4.4×10−52.5 (1.4–4.7)3.9×10−3

MAF indicates minor allele frequency; OR, odds ratio; TOPMed, Trans-Omics for Precision Medicine; VEP, variant effect predictor; and VTE, venous thromboembolism.

* The primary filter included high confidence loss-of-function variants based on Loss-of-Function Transcript Effect Estimator (; missense variants that were predicted to be deleterious by either SIFT4G,32 Polyphen2_HDIV,33 Polyphen2_HVAR,33 or LRT;34 and synonymous variants, inframe insertions, or inframe deletions predicted to be deleterious by fathmm_XF_coding_score.35 The secondary filter included variants that were predicted to cause frameshift, stop gain, stop lost, start lost, or change the splice donor or acceptor sites by Ensembl VEP36 and variants predicted to be missense by VEP.

P value from the burden test with no weighting.

† Unprovoked was defined slightly differently in each study but typically meant no provoking factor, such as a hospital stay, surgery, cardiovascular event, cancer event, and sometimes hormone replacement therapy, during the 90 days prior to the VTE event.

§ Significant.

Figure 2.

Figure 2. Manhattan plots for Variant Set Mixed Model Association Test (SMMAT) rare variant analyses of venous thromboembolism (VTE) in Trans-Omics for Precision Medicine (TOPMed) participants using the primary or secondary filter. The horizontal axis represents the build 38 genomic position by chromosome of each aggregate unit and the vertical axis records the -log10 (P value) from the aggregate association analysis using the SMMAT test with Wu weights. Aggregate units with a cumulative minor allele count <5 are not included. A, The dashed horizontal line indicates the genome-wide Bonferroni significance threshold P=2.6×10−6. The primary filter included high confidence loss-of-function variants based on Loss-of-Function Transcript Effect Estimator []; missense variants that were predicted to be deleterious by either SIFT4G,32 Polyphen2_HDIV,33 Polyphen2_HVAR,33 or LRT34; and synonymous variants, inframe insertions, or inframe deletions predicted to be deleterious by fathmm_XF_coding_score.35B, The dashed horizontal line indicates the genome-wide Bonferroni significance threshold P=2.3×10−6. The secondary filter included variants that were predicted to cause frameshift, stop gain, stop lost, start lost, or that changed the splice donor or acceptor sites by Ensembl variant effect predictor (VEP)36 and variants predicted to be missense by VEP.

Our secondary variant filtering strategy included all missense variants, regardless of whether a bioinformatic tool predicted them to be deleterious. This secondary analysis led to a much smaller effect size for PROC, but also identified another known gene, PROS1. See Table S3 for the full distribution of P values across genes for the primary, secondary, and LOF filters. Additional results using alternative filtering strategies are described in the Supplemental Material. One of these included noncoding variants in enhancer and promoter regions and another included all rare noncoding variants using a sliding window approach. The minimum P value for the sliding approach was 1×10−6 and the Bonferroni correction would have required a P value to exceed 1×10−10.

Analysis of Unprovoked Cases

Single variant and rare variant aggregate tests were repeated on the subset of cases that were unprovoked. In single variant analyses, the same loci were significant, with the exception of SCL44A2. In the aggregate tests, PROC remained the only gene with a significant result when applying the primary variant filter, and effect estimates were similar between the analyses of all cases and the analyses of unprovoked cases (Table S2). When the additional missense variants that passed the secondary filter were included, a novel locus reached genome-wide significance (MS4A1; Table S3).

Genes of Interest

For genes of interest (PROC, MS4A1, PROS1, SERPINC1, STAB2, F7, and CD36), we explored additional variant filtering strategies, including lower MAF thresholds, presence in the canonical isoform of the protein, presence in disease databases, and whether it was predicted to be deleterious by bioinformatics algorithms (see Table S4a and S4b [PROC], Table S5a and S5b [MS4A1], Table S6a and S6b [PROS1], Table S7a and S7b [SERPINC1], Table S8a and S8b [STAB2], Table S9a and S9b [F7], and Table S10a and S10b [CD36]).


In this study, the first whole genome sequencing analysis of VTE in population-based samples, we identified signals at 5 additional loci (F5, FGG, F11, ABO, SCL44A2) at genome-wide significant levels. The next most extreme P value in a novel region was 3.4×10−8 and was for an intergenic variant on chromosome 1 that did not replicate in an independent sample. Power analyses suggested that we had adequate power to detect semi-rare variants (MAF ≥0.001) with large effect sizes (OR ≥4.7), indicating that it is unlikely that there are additional common or semi-rare variants with large effect sizes associated with VTE in populations similar to those sampled for this study. In addition, gene-based aggregate tests also replicated the PROC gene (previously identified in family studies) and found a much stronger effect size for PROC when only LOF variants and missense variants predicted to be deleterious by bioinformatics algorithms were used, rather than including all missense variants.

Secondary analyses identified 2 more loci that exceeded our original significance threshold of 2.3×10−6. PROS1, also originally identified via family studies, became significant when all missense mutations were included in our aggregate variant filter, including those predicted to be benign by all in silico algorithms. In addition, sensitivity analyses conducted in the subset of cases that were unprovoked identified nominated MS4A1, but only when all missense variants were included. If the Bonferroni threshold were adjusted to account for these additional tests, they would likely not be significant.

Rare variants in MS4A1 have not been previously reported in VTE cases. The protein the gene codes for, CD20, is expressed on the cell surface of B cells and plays a role in the differentiation of these cells into plasma cells.38 Other genes with subthreshold P values for their aggregate tests of rare variants that play a role in hemostasis include F7 and CD36 (No. 15 and No. 21 on our list of 21 868 gene units tested; see Table S3). F7 encodes for a prothrombotic factor that is part of the coagulation cascade and factor VII deficiency leads to hemophilia, so it is not surprising that mutations in this gene were associated with a decreased risk of VTE (OR=0.55), with LOF variants having the strongest effect (OR=0.28). CD36 was recently found to have a role in regulating factor VIII levels,39 in particular a stopgain variant (rs3211938) that is prevalent in participants of AA and may have undergone selection. This particular variant was not associated with VTE in the current study (P=0.97) and was not included in the current rare variant aggregate test because it is so common in participants of AA (MAF=0.08). Of note, the P values for this gene were dramatically different between the Variant Set Mixed Model Association Test as implemented in the GENESIS package (GENetic EStimation and Inference in Structured samples) (P=8.5×10−5 for LOF variants with MAF <0.01) and the burden test for the same model (P=0.32), suggesting that many of the variants either have no effect on the risk of VTE or may even be protective.

Several recent sequencing studies have consistently nominated 2 genes besides PROC with an excess of rare variants in VTE cases, namely PROS1 and SERPINC1 (No. 3 and No. 15 on our list of aggregate results; see Table S3). Lotta et al40 sequenced the exomes of 10 cases and 12 controls, and Manderstedt et al41 sequenced 17 genes in 96 cases. These studies lacked power to achieve statistical significance, but it is notable that they nominated the same genes. Halvorsen et al42 sequenced the exomes of 68 cases with idiopathic fatal pulmonary embolism, a much more severe phenotype, and compared them to a set of 9332 convenience samples of controls and individuals with noncardiovascular diseases. They identified these same 3 genes with extremely large ORs (56.4–144.2) by including only variants previously shown to have segregated in families with VTE. These extreme ORs and P values were driven more by control frequencies being far lower than expected rather than cases being much higher. Control frequencies in our population and those of other control databases are 7 to 8× higher than that of their controls (see Tables S4b, S6b, and S7b), while their case frequencies were only about twice that of ours, which may reflect their phenotype being more severe or the small number of case carriers (2–4 per gene), or both.

A more recent article by Desch et al22 sequenced the exomes of 393 cases and 6114 controls and identified significant associations for PROS1 and STAB2, a gene previously identified in genome-wide association studies of common variants for factor VIII levels, a prothrombotic hemostatic factor.43,44 Their next 2 subthreshold genes were PROC and SERPINC1. In our study of 3793 cases and 7834 controls, neither PROS1 nor SERPINC1 was significant in our primary model (PROS1 P=2.3×10−4; SERPINC1 burden P=0.08); however, PROS1 did reach significance when employing our secondary variant filtering strategy (P=1.8×10−6). The minimum P value for SERPINC1 was also achieved with the secondary filter, but only when restricting to variants with MAF <0.0005 (P=4.4×10−5; see Table 3). STAB2 was null for both our primary and secondary models (P≤0.49). The proportions of cases with qualifying variants in PROS1, SERPINC1, and STAB2 were higher in the results reported by Desch et al compared with ours (≈2× higher; see Tables S6b, S7b, and S8b), whereas PROC was similar. In addition, the proportion of controls that were carriers was consistently lower in Desch et al. The biggest difference between study populations is that around half of the samples sequenced in Desch et al were from the GIFT study of familial thrombosis, where cases had at least one first-degree relative with a VTE. These cases will be enriched for strong genetic factors like the ones previously identified using family studies. In contrast, all of the individuals in the present study were part of either prospective or retrospective population-based studies that did not select for an enrichment of genetic causes of VTE. Another major difference between our primary model and the Halverson et al and Desch et al studies is that they focused solely on unprovoked cases. However, the results from our study revealed that unprovoked cases had remarkably similar effect sizes for PROC, PROS1, and SERPINC1 compared with the analyses that also included provoked cases (see Table 3), suggesting that these genetic variants increase the risk for both unprovoked and provoked events (see Tables S4b, S6b, and S7b). Similarly STAB2 was null for the unprovoked models (see Table S8b).

Our aggregate, gene-based tests consistently identified PROC using both Variant Set Mixed Model Association Test and burden tests, regardless of the filter. However, the OR based on variants identified by our primary filter (5.1) was larger than the OR based on variants identified by our secondary filter (3.8), which included missense variants that were predicted to be benign. This suggests that carefully considering the filtering strategy is imperative, as it can noticeably affect both the effect size estimates and the power to identify them. It has been well established that filtering variants can significantly increase power in gene-based tests.45 Here, we found that a filter with fewer single nucleotide variants (primary filter—71 variants) resulted in a larger OR than a filter with more single nucleotide variants (secondary filter—91 variants). This suggests that the additional variants in the secondary filter have much smaller effects, no effects, or opposite effects, any of which would lead to a reduced OR for the gene. Without additional work (eg, functional follow-up), it is not known if the 20 variants included in the secondary filter but not the primary filter simply have a much weaker effect or no effect at all on the risk of VTE, but it is clear that the filter plays an important role in affecting the magnitude of the estimated effect. This is important to consider as studies move forward from discovering new associations to trying to characterize known associations, which is a critical direction in genomics research.

As an example of this, there is a region of the PROC gene that is translated in only one (ENST00000409048) of the 10 currently enumerated isoforms of this protein (Figure S3). It represents a readthrough of what is intron 5 of the isoform with the most exons (ENST00000234071). Data from the GTEx project (Genotype-Tissue Expression) suggests that expression of ENST00000409048 represents <20% of all expressed isoforms in the only 2 tissues that have appreciable expression (liver and kidney). Of note, none of the variants in this region are annotated as causing protein C deficiency in the Human Gene Mutation Database (HGMD)46 (Table S4a) and Table S4b summarizes how variants in this region do not support an association with VTE (ORprimary=0.4; ORsecondary=0.9) the way the variants elsewhere in the protein do (ORprimary=7.5; ORsecondary=4.7). The same pattern was observed in PROS1, where variants only in noncanonical transcripts had ORs far below 1 (see Table S6b; all variants were in the canonical transcript for MS4A1 and SERPINC1). Within an aggregate testing framework, power to detect signals may be improved by weighting the contribution of each variant based on the relative abundance of each isoform in either the relevant tissue for the disease or across the tissues with appreciable expression for any given protein.

Another example of how variant selection impacts aggregate tests is the lone variant with an appreciable allele frequency. The frequency of this variant (rs145800354; 2-127428671-A-G) was twice as high in controls (12 of 7834) compared with cases (3 of 3793) and was the only variant with an overall frequency >0.1% in controls. This variant was highlighted as being likely benign in an exome sequencing study of VTE,37 which used samples partially overlapping with the current study, and is far more prevalent in individuals of African descent (MAF=0.003) than those of European or Asian descent (MAF ≤0.00001). Since this variant is not predicted to be deleterious, our a priori filters excluded it from the primary model. If we had used an MAF filter of 0.0005 instead of 0.01 in our analyses, the ORsecondary outside of the region unique to ENST00000409048 would have increased from 4.7 to 6.4 and, therefore, would have increased our power to detect PROC in this study. Such a hard filter could potentially lower power to detect more common variants with smaller effect sizes in aggregate tests for other diseases, suggesting that weighting variants based on their maximum frequency across all populations and more aggressively downweighting higher frequency variants across the range 0<MAF ≤0.01 (in contrast to the Wu weights that we used) may maximize power more generally. We re-ran analyses genome-wide using only variants having an MAF <0.0005 in both populations (individuals of AA and EA), but this did not lead to any new associations. PROC associations were identical for the primary model (see Table S4b), since all qualifying PROC variants were rare when filtering on the maximum frequency across ancestries.

Given that Halvorsen et al42 analyzed only variants that were noted as pathogenic in previous studies, we performed a sensitivity analysis that stratified our analyses of these genes using only those variants listed in HGMD as leading to a severe phenotype. Rare variants (MAF <0.01) in PROC that were listed in HGMD as leading to protein C deficiency (see Tables S4a and S4b) had larger effect sizes on average (OR=6.3) compared with those that did not, though those not previously associated with protein C deficiency still led to an increased risk on average (OR=2.5 for variants MAF <0.01; P=5.8×10−5), especially when removing the known benign variant (OR=3.7 for variants MAF <0.0005; P=4.1×10−7). Rare variants that were listed in HGMD as leading to protein S deficiency or another form of thrombophilia (see Table S6a and S6b) were remarkably similar (OR=1.3) compared with those that were not (OR=1.3). Rare variants in SERPINC1 that were listed in HGMD as leading to antithrombin deficiency (see Table S7a and S7b) were also similar (OR=1.4) compared with those that were not (OR=1.5). Variants in SERPINC1 that were very rare (MAF <0.0005) had a more dramatic difference (OR=5.5) compared with those that were not (OR=1.8); however, counts were very small, especially for controls.

Given the need in genomic analyses to constantly increase sample size, especially when working with rare variants, we were able to successfully expand our sample size by integrating case-only sample sets into our analysis, developing an algorithm to match cases with controls from an external study (in this case, ARIC), which enabled the combined analysis of the TOPMed VTE whole genome sequencing data. We were successfully able to match nearly all cases to at least one control, which allowed us to account for the differential risk of VTE by age. We decided to choose pure controls (ie, no VTE event during prior history or during follow-up) because of concern that, in finite samples, cases harboring rare variants might reduce power if they are also chosen as controls. Lubin and Gail47 showed that use of pure controls can bias effect estimates. We note that this bias could lead to false positive phenotype-genotype associations if the exposure of interest (ie, genotype) was strongly associated with a secondary cause that would have led to a subject being excluded from the pool of available controls (eg, high risk of death from secondary conditions). However, in the context of genome-wide association studies, where individual variant effect sizes are typically small, any bias is also likely to be small; so this strategy may present a useful way to expand genomics data sets to enhance the power to detect effects of low frequency variants by enabling the use of case-only studies to increase sample size, allowing the adjustment for important covariates such as age, and reducing the potential influence of resampling subjects with rare variants affecting risk.

These results provide several important methodological observations. First, in gene-based aggregate tests, the filtering strategy can significantly impact not just power (and, therefore, precision and significance) but also estimated effect size. Because of this, future researchers focused on the discovery and characterization of rare genetic risk factors should carefully consider their variant filtering strategy with regard to allele frequency, predicted deleteriousness, and presence or absence on the most expressed isoforms. Second, given the caveat expressed in the previous paragraph, we have presented a strategy for including case-only studies in future genetic analyses, thereby enabling the continued expansion of sample size while still accounting for important covariates.

Article Information


Whole genome sequencing for the TOPMed program (Trans-Omics for Precision Medicine) was supported by the National Heart, Lung, and Blood Institute. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the US Department of Health and Human Services. Acknowledgments for each study are available in the Supplemental Materials.

Supplemental Material

Supplemental Methods

Tables S1–S13

Figures S1–S6

Study Descriptions

Adjudication/Medical Review Protocols

Analysis Strategy Details

Nonstandard Abbreviations and Acronyms


African ancestry


Atherosclerosis Risk in Communities Study


Cardiovascular Heart Study


European ancestry


Framingham Heart Study


Human Gene Mutation Database


Heart and Vascular Health Study




minor allele frequency


Mayo Clinic Venous Thromboembolism Study


odds ratio


venous thromboembolism


Women’s Health Initiative

Disclosures None.


*A.A. Seyerle and C.A. Laurie are joint first authors.

†C. Kooperberg, J.S. Pankow, N.L. Smith, and N. Pankratz are joint senior authors.

For Sources of Funding and Disclosures, see page 180.

Supplemental Material is available at

Correspondence to: Nathan Pankratz, PhD, University of Minnesota, Mayo Bldg, Room 780 420 Delaware St, SE Minneapolis, MN 55455. Email


  • 1. Silverstein MD, Heit JA, Mohr DN, Petterson TM, O’Fallon WM, Melton LJ. Trends in the incidence of deep vein thrombosis and pulmonary embolism: a 25-year population-based study.Arch Intern Med. 1998; 158:585–593. doi: 10.1001/archinte.158.6.585CrossrefMedlineGoogle Scholar
  • 2. Wolberg AS, Rosendaal FR, Weitz JI, Jaffer IH, Agnelli G, Baglin T, Mackman N. Venous thrombosis.Nat Rev Dis Primers. 2015; 1:15006. doi: 10.1038/nrdp.2015.6CrossrefMedlineGoogle Scholar
  • 3. Heit JA, Phelps MA, Ward SA, Slusser JP, Petterson TM, De Andrade M. Familial segregation of venous thromboembolism.J Thromb Haemost. 2004; 2:731–736. doi: 10.1111/j.1538-7933.2004.00660.xCrossrefMedlineGoogle Scholar
  • 4. Larsen TB, Sørensen HT, Skytthe A, Johnsen SP, Vaupel JW, Christensen K. Major genetic susceptibility for venous thromboembolism in men: a study of Danish twins.Epidemiology. 2003; 14:328–332.CrossrefMedlineGoogle Scholar
  • 5. Souto JC, Almasy L, Borrell M, Blanco-Vaca F, Mateo J, Soria JM, Coll I, Felices R, Stone W, Fontcuberta J, et al. Genetic susceptibility to thrombosis and its relationship to physiological risk factors: the GAIT study.Am J Hum Genet. 2000; 67:1452–1459. doi: 10.1086/316903CrossrefMedlineGoogle Scholar
  • 6. Zöller B, Ohlsson H, Sundquist J, Sundquist K. A sibling based design to quantify genetic and shared environmental effects of venous thromboembolism in Sweden.Thromb Res. 2017; 149:82–87. doi: 10.1016/j.thromres.2016.10.014CrossrefMedlineGoogle Scholar
  • 7. Smith NL, Hindorff LA, Heckbert SR, Lemaitre RN, Marciante KD, Rice K, Lumley T, Bis JC, Wiggins KL, Rosendaal FR, et al. Association of genetic variations with nonfatal venous thrombosis in postmenopausal women.JAMA. 2007; 297:489–498. doi: 10.1001/jama.297.5.489CrossrefMedlineGoogle Scholar
  • 8. Bezemer ID, Bare LA, Doggen CJM, Arellano AR, Tong C, Rowland CM, Catanese J, Young BA, Reitsma PH, Devlin JJ, et al. Gene variants associated with deep vein thrombosis.JAMA. 2008; 299:1306–1314. doi: 10.1001/jama.299.11.1306CrossrefMedlineGoogle Scholar
  • 9. Tang W, Teichert M, Chasman DI, Heit JA, Morange P-E, Li G, Pankratz N, Leebeek FW, Paré G, de Andrade M, et al. A genome-wide association study for venous thromboembolism: the extended cohorts for heart and aging research in genomic epidemiology (CHARGE) consortium.Genet Epidemiol. 2013; 37:512–521. doi: 10.1002/gepi.21731CrossrefMedlineGoogle Scholar
  • 10. Germain M, Chasman DI, de Haan H, Tang W, Lindström S, Weng L-C, de Andrade M, de Visser MCH, Wiggins KL, Suchon P, et al; Cardiogenics Consortium. Meta-analysis of 65,734 individuals identifies TSPAN15 and SLC44A2 as two susceptibility loci for venous thromboembolism.Am J Hum Genet. 2015; 96:532–542. doi: 10.1016/j.ajhg.2015.01.019CrossrefMedlineGoogle Scholar
  • 11. Lindström S, Wang L, Smith EN, Gordon W, van Hylckama Vlieg A, de Andrade M, Brody JA, Pattee JW, Haessler J, Brumpton BM, et al; Million Veteran Program. Genomic and transcriptomic association studies identify 16 novel susceptibility loci for venous thromboembolism.Blood. 2019; 134:1645–1657. doi: 10.1182/blood.2019000435CrossrefMedlineGoogle Scholar
  • 12. Egeberg O. Inherited antithrombin deficiency causing thrombophilia.Thromb Diath Haemorrh. 1965; 13:516–530.CrossrefMedlineGoogle Scholar
  • 13. Schwarz HP, Fischer M, Hopmeier P, Batard MA, Griffin JH. Plasma protein S deficiency in familial thrombotic disease.Blood. 1984; 64:1297–1300.CrossrefMedlineGoogle Scholar
  • 14. Griffin JH, Evatt B, Zimmerman TS, Kleiss AJ, Wideman C. Deficiency of protein C in congenital thrombotic disease.J Clin Invest. 1981; 68:1370–1373. doi: 10.1172/jci110385CrossrefMedlineGoogle Scholar
  • 15. Cirulli ET, Goldstein DB. Uncovering the roles of rare variants in common disease through whole-genome sequencing.Nat Rev Genet. 2010; 11:415–425. doi: 10.1038/nrg2779CrossrefMedlineGoogle Scholar
  • 16. Kruglyak L. The road to genome-wide association studies.Nat Rev Genet. 2008; 9:314–318. doi: 10.1038/nrg2316CrossrefMedlineGoogle Scholar
  • 17. Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease.J Clin Invest. 2008; 118:1590–1605. doi: 10.1172/JCI34772CrossrefMedlineGoogle Scholar
  • 18. Cunha MLR, Meijers JCM, Rosendaal FR, Vlieg A van H, Reitsma PH, Middeldorp S. Whole exome sequencing in thrombophilic pedigrees to identify genetic risk factors for venous thromboembolism.PLoS One. 2017; 12:e0187699. doi: 10.1371/journal.pone.0187699CrossrefMedlineGoogle Scholar
  • 19. Lee E-J, Dykas DJ, Leavitt AD, Camire RM, Ebberink E, García de Frutos P, Gnanasambandan K, Gu SX, Huntington JA, Lentz SR, et al. Whole-exome sequencing in evaluation of patients with venous thromboembolism.Blood Adv. 2017; 1:1224–1237. doi: 10.1182/bloodadvances.2017005249CrossrefMedlineGoogle Scholar
  • 20. Lindström S, Brody JA, Turman C, Germain M, Bartz TM, Smith EN, Chen M, Puurunen M, Chasman D, Hassler J, et al; INVENT Consortium. A large-scale exome array analysis of venous thromboembolism.Genet Epidemiol. 2019; 43:449–457. doi: 10.1002/gepi.22187CrossrefMedlineGoogle Scholar
  • 21. Deguchi H, Shukla M, Hayat M, Torkamani A, Elias DJ, Griffin JH. Novel exomic rare variants associated with venous thrombosis.Br J Haematol. 2020; 190:783–786. doi: 10.1111/bjh.16613CrossrefMedlineGoogle Scholar
  • 22. Desch KC, Ozel AB, Halvorsen M, Jacobi PM, Golden K, Underwood M, Germain M, Tregouet D-A, Reitsma PH, Kearon C, et al. Whole-exome sequencing identifies rare variants in STAB2 associated with venous thromboembolic disease.Blood. 2020; 136:533–541. doi: 10.1182/blood.2019004161CrossrefMedlineGoogle Scholar
  • 23. Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, Taliun SAG, Corvelo A, Gogarten SM, Kang HM, et al; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed program.Nature. 2021; 590:290–299. doi: 10.1038/s41586-021-03205-yCrossrefMedlineGoogle Scholar
  • 24. The atherosclerosis risk in communities (ARIC) study: design and objectives. The ARIC investigators.Am J Epidemiol. 1989; 129:687–702.CrossrefMedlineGoogle Scholar
  • 25. Fried LP, Borhani NO, Enright P, Furberg CD, Gardin JM, Kronmal RA, Kuller LH, Manolio TA, Mittelmark MB, Newman A, et al. The cardiovascular health study: design and rationale.Ann Epidemiol. 1991; 1:263–276. doi: 10.1016/1047-2797(91)90005-wCrossrefMedlineGoogle Scholar
  • 26. Dawber TR, Meadors GF, Moore FE. Epidemiological approaches to heart disease: the Framingham study.Am J Public Health Nations Health. 1951; 41:279–286. doi: 10.2105/ajph.41.3.279CrossrefMedlineGoogle Scholar
  • 27. Feinleib M, Kannel WB, Garrison RJ, McNamara PM, Castelli WP. The Framingham offspring study. Design and preliminary data.Prev Med. 1975; 4:518–525. doi: 10.1016/0091-7435(75)90037-7CrossrefMedlineGoogle Scholar
  • 28. Splansky GL, Corey D, Yang Q, Atwood LD, Cupples LA, Benjamin EJ, D’Agostino RB, Fox CS, Larson MG, Murabito JM, et al. The third generation cohort of the national heart, lung, and blood institute’s Framingham heart study: design, recruitment, and initial examination.Am J Epidemiol. 2007; 165:1328–1335. doi: 10.1093/aje/kwm021CrossrefMedlineGoogle Scholar
  • 29. Heit JA, Armasu SM, Asmann YW, Cunningham JM, Matsumoto ME, Petterson TM, de Andrade M. A genome-wide association study of venous thromboembolism identifies risk variants in chromosomes 1q24.2 and 9q.J Thromb Haemost. 2012; 10:1521–1531. doi: 10.1111/j.1538-7836.2012.04810.xCrossrefMedlineGoogle Scholar
  • 30. Design of the women’s health initiative clinical trial and observational study. the women’s health initiative study group.Control Clin Trials. 1998; 19:61–109. doi: 10.1016/s0197-2456(97)00078-0CrossrefMedlineGoogle Scholar
  • 31. Anderson GL, Manson J, Wallace R, Lund B, Hall D, Davis S, Shumaker S, Wang C-Y, Stein E, Prentice RL. Implementation of the women’s health initiative study design.Ann Epidemiol. 2003; 13:S5–17. doi: 10.1016/s1047-2797(03)00043-7CrossrefMedlineGoogle Scholar
  • 32. Vaser R, Adusumalli S, Leng SN, Sikic M, Ng PC. SIFT missense predictions for genomes.Nat Protoc. 2016; 11:1–9. doi: 10.1038/nprot.2015.123CrossrefMedlineGoogle Scholar
  • 33. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. A method and server for predicting damaging missense mutations.Nat Methods. 2010; 7:248–249. doi: 10.1038/nmeth0410-248CrossrefMedlineGoogle Scholar
  • 34. Chun S, Fay JC. Identification of deleterious mutations within three human genomes.Genome Res. 2009; 19:1553–1561. doi: 10.1101/gr.092619.109CrossrefMedlineGoogle Scholar
  • 35. Rogers MF, Shihab HA, Mort M, Cooper DN, Gaunt TR, Campbell C. FATHMM-XF: accurate prediction of pathogenic point mutations via extended features.Bioinformatics. 2018; 34:511–513. doi: 10.1093/bioinformatics/btx536CrossrefMedlineGoogle Scholar
  • 36. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, Flicek P, Cunningham F. The ensembl variant effect predictor.Genome Biol. 2016; 17:122. doi: 10.1186/s13059-016-0974-4CrossrefMedlineGoogle Scholar
  • 37. Tang W, Stimson MR, Basu S, Heckbert SR, Cushman M, Pankow JS, Folsom AR, Pankratz N. Burden of rare exome sequence variants in PROC gene is associated with venous thromboembolism: a population-based study.J Thromb Haemost. 2020; 18:445–453. doi: 10.1111/jth.14676CrossrefMedlineGoogle Scholar
  • 38. Kläsener K, Jellusova J, Andrieux G, Salzer U, Böhler C, Steiner SN, Albinus JB, Cavallari M, Süß B, Voll RE, et al. CD20 as a gatekeeper of the resting state of human B cells.Proc Natl Acad Sci USA. 2021; 118:e2021342118. doi: 10.1073/pnas.2021342118CrossrefMedlineGoogle Scholar
  • 39. Pankratz N, Wei P, Brody JA, Chen M-H, Vries PS, Huffman JE, Stimson MR, Auer PL, Boerwinkle E, Cushman M, et al. Whole exome sequencing of 14 389 individuals from the ESP and CHARGE consortia identifies novel rare variation associated with hemostatic factors.Hum Mol Genet. 2022; 31:3120–3132. doi: 10.1093/hmg/ddac100CrossrefMedlineGoogle Scholar
  • 40. Lotta LA, Wang M, Yu J, Martinelli I, Yu F, Passamonti SM, Consonni D, Pappalardo E, Menegatti M, Scherer SE, et al. Identification of genetic risk variants for deep vein thrombosis by multiplexed next-generation sequencing of 186 hemostatic/pro-inflammatory genes.BMC Med Genomics. 2012; 5:7. doi: 10.1186/1755-8794-5-7CrossrefMedlineGoogle Scholar
  • 41. Manderstedt E, Lind-Halldén C, Svensson P, Zöller B, Halldén C. Next-generation sequencing of 17 genes associated with venous thromboembolism reveals a deficit of non-synonymous variants in procoagulant genes.Thromb Haemost. 2019; 119:1441–1450. doi: 10.1055/s-0039-1693130CrossrefMedlineGoogle Scholar
  • 42. Halvorsen M, Lin Y, Sampson BA, Wang D, Zhou B, Eng LS, Um SY, Devinsky O, Goldstein DB, Tang Y. Whole exome sequencing reveals severe thrombophilia in acute unprovoked idiopathic fatal pulmonary embolism.EBioMedicine. 2017; 17:95–100. doi: 10.1016/j.ebiom.2017.01.037CrossrefMedlineGoogle Scholar
  • 43. Smith NL, Chen M-H, Dehghan A, Strachan DP, Basu S, Soranzo N, Hayward C, Rudan I, Sabater-Lleal M, Bis JC, et al; Wellcome Trust Case Control Consortium;. Novel associations of multiple genetic loci with plasma levels of factor VII, factor VIII, and von Willebrand factor: the CHARGE (Cohorts for Heart and Aging Research in Genome Epidemiology) Consortium.Circulation. 2010; 121:1382–1392. doi: 10.1161/CIRCULATIONAHA.109.869156LinkGoogle Scholar
  • 44. Sabater-Lleal M, Huffman JE, de Vries PS, Marten J, Mastrangelo MA, Song C, Pankratz N, Ward-Caviness CK, Yanek LR, Trompet S, et al; INVENT Consortium; MEGASTROKE Consortium of the International Stroke Genetics Consortium (ISGC). Genome-wide association transethnic meta-analyses identifies novel associations regulating coagulation factor VIII and von Willebrand factor plasma levels.Circulation. 2019; 139:620–635. doi: 10.1161/CIRCULATIONAHA.118.034532LinkGoogle Scholar
  • 45. Chen H, Wang C, Conomos MP, Stilp AM, Li Z, Sofer T, Szpiro AA, Chen W, Brehm JM, Celedón JC, et al. Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models.The American Journal of Human Genetics. 2016; 98:653–666. doi: 10.1016/j.ajhg.2016.02.012CrossrefMedlineGoogle Scholar
  • 46. Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NST, Abeysinghe S, Krawczak M, Cooper DN. Human Gene Mutation Database (HGMD): 2003 update.Hum Mutat. 2003; 21:577–581. doi: 10.1002/humu.10212CrossrefMedlineGoogle Scholar
  • 47. Lubin JH, Gail MH. Biased selection of controls for case-control analyses of cohort studies.Biometrics. 1984; 40:63–75.CrossrefMedlineGoogle Scholar
  • 48. Gogarten SM, Sofer T, Chen H, Yu C, Brody JA, Thornton TA, Rice KM, Conomos MP. Genetic association testing using the GENESIS R/Bioconductor package.Bioinformatics. 2019; 35:5346–5348. doi: 10.1093/bioinformatics/btz567CrossrefMedlineGoogle Scholar
  • 49. Dey R, Schmidt EM, Abecasis GR, Lee S. A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS.Am J Hum Genet. 2017; 101:37–49. doi: 10.1016/j.ajhg.2017.05.014CrossrefMedlineGoogle Scholar
  • 50. Zhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, Wolford BN, LeFaive J, VandeHaar P, Gagliano SA, Gifford A, et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies.Nat Genet. 2018; 50:1335–1341. doi: 10.1038/s41588-018-0184-yCrossrefMedlineGoogle Scholar
  • 51. Fadista J, Manning AK, Florez JC, Groop L. The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants.Eur J Hum Genet. 2016; 24:1202–1205. doi: 10.1038/ejhg.2015.269CrossrefMedlineGoogle Scholar
  • 52. Frankish A, Diekhans M, Ferreira A-M, Johnson R, Jungreis I, Loveland J, Mudge JM, Sisu C, Wright J, Armstrong J, et al. GENCODE reference annotation for the human and mouse genomes.Nucleic Acids Res. 2019; 47:D766–D773. doi: 10.1093/nar/gky955CrossrefMedlineGoogle Scholar
  • 53. Chen H, Huffman JE, Brody JA, Wang C, Lee S, Li Z, Gogarten SM, Sofer T, Bielak LF, Bis JC, et al; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium. Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole-genome sequencing studies.Am J Hum Genet. 2019; 104:260–274. doi: 10.1016/j.ajhg.2018.12.012CrossrefMedlineGoogle Scholar
  • 54. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test.Am J Hum Genet. 2011; 89:82–93. doi: 10.1016/j.ajhg.2011.05.029CrossrefMedlineGoogle Scholar
  • 55. Liu X, White S, Peng B, Johnson AD, Brody JA, Li AH, Huang Z, Carroll A, Wei P, Gibbs R, et al. WGSA: an annotation pipeline for human genome sequencing studies.J Med Genet. 2016; 53:111–112. doi: 10.1136/jmedgenet-2015-103423CrossrefMedlineGoogle Scholar
  • 56. Chamberlain AM, Folsom AR, Heckbert SR, Rosamond WD, Cushman M. High-density lipoprotein cholesterol and venous thromboembolism in the Longitudinal Investigation of Thromboembolism Etiology (LITE).Blood. 2008; 112:2675–2680. doi: 10.1182/blood-2008-05-157412CrossrefMedlineGoogle Scholar
  • 57. Cushman M, Tsai AW, White RH, Heckbert SR, Rosamond WD, Enright P, Folsom AR. Deep vein thrombosis and pulmonary embolism in two cohorts: the longitudinal investigation of thromboembolism etiology.Am J Med. 2004; 117:19–25. doi: 10.1016/j.amjmed.2004.01.018CrossrefMedlineGoogle Scholar
  • 58. Tsai AW, Cushman M, Rosamond WD, Heckbert SR, Polak JF, Folsom AR. Cardiovascular risk factors and venous thromboembolism incidence: the longitudinal investigation of thromboembolism etiology.Arch Intern Med. 2002; 162:1182–1189. doi: 10.1001/archinte.162.10.1182CrossrefMedlineGoogle Scholar
  • 59. Puurunen MK, Gona PN, Larson MG, Murabito JM, Magnani JW, O’Donnell CJ. Epidemiology of venous thromboembolism in the Framingham Heart Study.Thromb Res. 2016; 145:27–33. doi: 10.1016/j.thromres.2016.06.033CrossrefMedlineGoogle Scholar
  • 60. KING: relationship inference software. Scholar
  • 61. Fang H, Hui Q, Lynch J, Honerlaw J, Assimes TL, Huang J, Vujkovic M, Damrauer SM, Pyarajan S, Gaziano JM, et al; VA Million Veteran Program. Harmonizing genetic ancestry and self-identified race/ethnicity in genome-wide association studies.Am J Hum Genet. 2019; 105:763–772. doi: 10.1016/j.ajhg.2019.08.012CrossrefMedlineGoogle Scholar
  • 62. Conomos MP, Miller MB, Thornton TA. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness.Genet Epidemiol. 2015; 39:276–293. doi: 10.1002/gepi.21896CrossrefMedlineGoogle Scholar
  • 63. Patterson N, Price AL, Reich D. Population structure and eigenanalysis.PLoS Genet. 2006; 2:e190. doi: 10.1371/journal.pgen.0020190CrossrefMedlineGoogle Scholar
  • 64. Zhu X, Li S, Cooper RS, Elston RC. A unified association analysis approach for family and unrelated samples correcting for stratification.Am J Hum Genet. 2008; 82:352–365. doi: 10.1016/j.ajhg.2007.10.009CrossrefMedlineGoogle Scholar
  • 65. Yang J, Lee SH, Goddard ME, Visscher PM. Genome-wide complex trait analysis (GCTA): methods, data analyses, and interpretations.Methods Mol Biol. 2013; 1019:215–236. doi: 10.1007/978-1-62703-447-0_9CrossrefMedlineGoogle Scholar
  • 66. Dong C, Wei P, Jian X, Gibbs R, Boerwinkle E, Wang K, Liu X. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies.Hum Mol Genet. 2015; 24:2125–2137. doi: 10.1093/hmg/ddu733CrossrefMedlineGoogle Scholar
  • 67. Fishilevich S, Nudel R, Rappaport N, Hadar R, Plaschkes I, Iny Stein T, Rosen N, Kohn A, Twik M, Safran M, et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards.Database. 2017; 2017:bax028. doi: 10.1093/database/bax028CrossrefMedlineGoogle Scholar
  • 68. Zerbino DR, Wilder SP, Johnson N, Juettemann T, Flicek PR. The ensembl regulatory build.Genome Biol. 2015; 16:56. doi: 10.1186/s13059-015-0621-5CrossrefMedlineGoogle Scholar


eLetters should relate to an article recently published in the journal and are not a forum for providing unpublished data. Comments are reviewed for appropriate use of tone and language. Comments are not peer-reviewed. Acceptable comments are posted to the journal website only. Comments are not published in an issue and are not indexed in PubMed. Comments should be no longer than 500 words and will only be posted online. References are limited to 10. Authors of the article cited in the comment will be invited to reply, as appropriate.

Comments and feedback on AHA/ASA Scientific Statements and Guidelines should be directed to the AHA/ASA Manuscript Oversight Committee via its Correspondence page.