Deep Learning of the Retina Enables Phenome- and Genome-Wide Analyses of the Microvasculature

Supplemental Digital Content is available in the text.


UK Biobank whole exome sequencing and quality control filters
UK Biobank whole exome sequencing was performed among 200,627 individuals using whole blood-derived DNA at the Regeneron Sequencing Center 44, 45 , for which the methods have been previously described for the earlier release of data across approximately 50,000 individuals 44 . In brief, the IDT xGen Exome Research Panel v1.0 was used to capture exomes at over 20x coverage across 95% of sites. Extensive additional genotype, variant, and sample-level exclusion filters were applied to study high-quality autosomal exome sequence variants as previously done 45 using Hail-0.2.
In addition to any quality-control that was performed centrally, we applied extensive additional genotype, variant and sample quality-control procedures to ensure a high-quality dataset for analyses for both the common variant analyses using array genotyping data and rare variant analyses using whole exome sequencing data.
For the whole exome sequencing data used in rare variant analyses, we utilized the OQFE WES pVCF files provided by the UK Biobank, which contained calls for 200,643 sequenced samples. We applied genotype refinement to the raw genotype calls in the pVCF files using Hail-0.2 (https://hail.is/docs/0.2/index.html). We first split multi-allelic sites to represent separate bi-allelic sites.
We first performed genotype quality control, filtering out calls that did not pass the following hard filters were then set to no-call in our analysis: • For homozygous reference calls: Genotype Quality < 20; Genotype Depth < 10; Genotype Depth > 200 • For heterozygous calls: (A1 Depth + A2 Depth)/Total Depth < 0.9; A2 Depth/Total Depth < 0. We then performed variant-level quality control. We removed variants that failed the following filters: • Call rate of < 90% (restricting to males for Y chromosomal markers) (N= 4,023,284) • Failed a liberal Hardy-Weinberg Equilibrium test (HWE) at P < 10 -14 among unrelated samples (not applied to Y chromosomal markers) (N=136,869) • Present in Ensembl low-complexity regions (N=748,116) • Monomorphic in the final dataset (N=55,614) After performing these variant filters, 13,003,057 variants remained of which 12,756,075 were autosomal.
To perform sample level quality control, we computed a number of quality metrics to identify bad-quality or duplicated samples. We first used KING6 (version 2.2.5) to calculate pairwise heterozygote concordance rates for each pair of samples, using the high-quality independent autosomal markers. Then we used the high-quality autosomal variants present in both WES and array datasets to compute per-sample heterozygote concordance rates between WES calls and genotyping array calls. We inferred the genetic sex of each participants with the --check-sex option in PLINK, using the high-quality independent X-chromosomal markers. We set any sample with F > 0.8 to male, while samples with F < 0.5 were set to female. Finally, using all ~12.7M autosomal WES variants, we computed a number of additional metrics including sample call rate, transition/transversion ratio (Ti/Tv), heterozygote/homozygote ratio (Het/Hom), SNV/indel ratio (SNV/indel) and the number of singletons. After computing these metrics, we excluded participants based on the following criteria: • Decided to revoke their consent • Sample duplicates based on heterozygote concordance rates > 0.8 (N=0) • Samples with blatant discordance between self-reported and genetically inferred sex • Discordance between WES and array calls with heterozygote concordance rates < 0.8 • Call rate < 90% • Samples further than 8 standard deviations from the mean for Ti/Tv (n=0), Het/Hom (N=100), SNV/indel (N=1) and number of singletons (N=111) After applying these filters 200,337 samples remained for analysis.

Automatic image outlier detection and removal
To automatically identify poor quality fundus photographs, we developed a convolutional neural network model on the Google Cloud's AI platform. We used a sample of 1,000 retinal fundus photographs from the UK Biobank. We first included all images from those with multiple photos of the same eye from a single visit (to enrich for poor quality images; 827 photos from 206 individuals). The remaining individuals were randomly sampled from all participants with fundus photography (additional 173 photos from 87 individuals). A board-certified ophthalmologist (J.C.W) hand-labeled images as poor quality or good quality based on presence or absence of motion artifact, media opacity, and ability to clearly distinguish retinal vessels on the photograph. The model was trained and tuned on a random sample of 80% of individuals (N=233 individuals, 794 photographs) and tested on the remaining 20% of individuals (N=60 individuals, 206 photographs). A 45% poor-quality probability was used as the threshold to remove poor quality images to balance removal of poor-quality images and sufficient sample size for downstream association studies, which resulted in removal rates similar to other published studies of UK Biobank Fundus photos 49,50 . This threshold resulted in a sensitivity of 97.4% and a specificity of 100.0% for detecting poor quality images. The deep learning model was then applied to UK Biobank fundus photographs. Prior to analysis, a pixel intensity threshold was used to detect the outer edge of the fundus, and images were square cropped around the fundus boundary. For model development, images were resized to 320x320 pixels. Originally, 134,653 fundus photographs were available from 67,339 individuals from the UK Biobank enrollment visit. After applying the outlier detection algorithm and removing all photographs with >45% probability of poor quality, 99,736 images remained from 55,603 participants, resulting in removal of 26% of the original images.

Overview of prior methods for retinal vessel segmentation:
There has been an extensive body of work using automated computational approaches to segment retinal microvasculature and quantify microvasculature phenotypes 42,51,52 . These approaches can be broadly classified into unsupervised methods and supervised methods. Unsupervised approaches 53, 54 do not rely upon hand-drawn segmentations to train a classification model and instead use mathematical transformations of the raw image to identify edges corresponding to vessel walls, and then perform post-processing steps such as in-painting to fill in holes in segmented vasculature. Classical supervised approaches 55, 56 use transformations (e.g., wavelet filters) 55 to convert the raw pixel intensities into usable features which are then passed to a machine learning algorithm such as a Support Vector Machine or a Gaussian Mixture Model to classify each pixel as vasculature or not. More recently, studies have shown that deep neural networks substantially improve segmentation performance over classical approaches 52 . These models have an added benefit that once trained, they do not require any specific parameter settings during inference. However, it is not yet known whether deep learning-based segmentation approaches can generate quantifiable vascular phenotypes (e.g., fractal branching) 42 and whether these phenotypes are useful measures of disease risk.

Deep learning model training process
Our deep learning model consisted of an ensemble of 10 U-Net models each randomly initialized and trained for 1,000 epochs using this procedure. One-cycle learning rates were used with a maximum rate of 1e-4 during training.
Geometric data augmentation was used during training including 5 degrees of random rotation, 50% brightness/contrast adjustment, 20% zoom in/out, random black-white inversion, and patch sampling to 25% of original image size. The final segmentation mask was determined by rounding the pixel-wise probability of each model to 0/1 and taking a majority vote for each pixel.

Fractal dimension and vascular density calculation
The segmentation model was applied to these images to calculate fractal dimension and total vessel area. Fractal dimension was calculated using a box counting method 61 as previously described for the retina 42 , and applied to the segmented vasculature. Vessel density was defined as the total number of pixels in the segmented vessels given a fixed dimension of 320x320 pixels for each image. For individuals with multiple photos from the same visit, the maximum fractal dimension and vessel density for each eye was used. Extreme outliers for right and left eye vascular FD and density were excluded by adjusting the traditional box and whisker upper and lower bounds and accounting for skewness in the phenotypic data using the Robustbase package in R (setting range=3) (https://cran.rproject.org/web/packages/robustbase/robustbase.pdf). For all analyses except for the quantitative ocular traits (where eye-specific analyses were performed), FD and vascular density were averaged across right and left eyes and inverse rank normalized to mean 0 and standard deviation 1. For the quantitative ocular traits, analyses were performed by eye and retinal vascular FD and density were inverse rank normalized to mean 0 and standard deviation 1 for right and left eyes separately.

Phenome-wide association analyses (PheWAS)
PheWAS with prevalent and incident phenotypes was performed across all of the 1,866 hierarchical phenotypes defined from the Phecode Map 1.2 62 ICD-9 (https://phewascatalog.org/phecodes) and ICD-10 (https://phewascatalog.org/phecodes_icd10) phenotype groupings 63 . Associations of retinal vascular FD and vascular density with prevalent phenotypes were performed utilizing logistic regression models, and associations with incident phenotypes were performed using Cox proportional hazards models after excluding individuals with the corresponding diagnosis at or prior to enrollment. Both models were adjusted for age, age 2 , sex, smoking status (current/prior/never smoker), and ethnicity (Data field 21000). The proportional hazards assumption was assessed by Schoenfeld residuals and was satisfied for each model. Analysis was performed across disease phenotypes with at least 9 cases with retinal fundus images available. Statistical significance was defined using false discovery rate <0.05.
PheWAS across 88 quantitative systemic biomarkers acquired at enrollment including blood counts (Category ID 100081), blood biochemistry markers (Category ID 17518), liver MRI iron and inflammation measures (Category 126), arterial stiffness and reflection index from finger photoplethysmography (Category 100007), blood pressure (Category 100011), pulmonary function tests from spirometry (Category 100020), left ventricular size and function as well as pulse wave analysis from cardiac MRI (Category 102), as well as eye measures (Category 100011) were performed. For all phenotypes, sex-specific extreme outliers were excluded by adjusting the traditional box and whisker upper and lower bounds and accounting for skewness in the phenotypic data using the Robustbase package in R (setting range=3) as previously performed 46-48 (https://cran.r-Supplemental Results:

Sensitivity analyses used in automated image quality control
To ensure that the automated image removal model training process was robust to selection bias, we used 5-fold cross validation and found that the five models achieved an average AUC of 0.98 (±0.02), suggesting that selection bias was minor. To ensure that the quality control model was not detecting ocular pathology, we compared characteristics of poor-quality vs high quality images (Supplemental Table 1). Poor quality images were slightly enriched for glaucoma and cataracts -pathology that is expected to occlude vasculature, but were otherwise not meaningfully different.

Univariate and multivariate associations with retinal microvascular FD and density
In univariate associations, average vascular FD and density across right and left eyes significantly decreased with age (Figure 1c), with slightly lower values present in females and individuals of Chinese or South Asian ethnicity, and slightly higher values present in current smokers (Supplemental Figure 4, Supplemental Table 3).
Multivariate associations recapitulated the importance of including age, sex, current smoking status, and ethnicity in predicting retinal vascular FD and density (Supplemental Table 4). In multivariate models, female sex was associated with reduced FD but no difference was observed in vascular density compared to males. However, ancestral differences were more evident for vascular density than FD.

Phenome-wide association study sensitivity analyses
Sensitivity analyses were performed first adjusting the incident associations for prevalent hypertension and diabetes (Supplemental Figure 10, Supplemental Table 8), removing associations with most incident cardiovascular conditions (including incident heart failure and hypertensive heart disease); however a significant association remained for renal dialysis (Pre-adjustment: HR 1.61, P=1.99x10 -5 ; post-adjustment: HR 1.53, P=0.00019); however incident ocular phenotypes persisted after adjustment. Separately, analyses were performed adjusting the incident associations for mean spherical equivalent (Supplemental Figure 11, Supplemental Table 9), identifying consistent association with incident ocular phenotypes after these adjustments.

GWAS Sensitivity analyses and comparison with vascular tortuosity variants
Further analyses removing prevalent type 2 diabetes mellitus cases identified consistent associations across all previously identified genome-wide significant loci (Supplemental Figure 13, Supplemental Table 12).
Furthermore, a comparison was performed between 173 independent genome-wide significant variants 24 identified in a prior published GWAS of retinal vascular tortuosity and FD and density (Supplemental Figure 14).
Firstly, the top gene in the FD RVAS and one of the top genes of the retinal density RVAS analyses is MITF (melanocyte inducible transcription factor, also called microphthalmia associated transcription factor) (Beta -0.27, P=3.3x10 -5 ), whose signal was driven by a predicted deleterious missense variant with allele frequency 0.0032, rs149617956-A (p.Glu419Lys), which was suggestively associated with reduced vascular density (Beta -0.27 SD, P=9.4x10 -6 ) as well as FD (Beta -0.28 SD, P=2.5x10 -6 ) in the GWAS analyses. Second, for the vascular density RVAS, nominal association was detected with GNB3 (Beta = -0.86SD, P=0.0064), which also had a more common missense variant with allele frequency 0.071 detected in the common variant GWAS as noted earlier, not included in this RVAS aggregation. Prior studies have linked GNB3 to myopia 92 , hypertension 91 , as well as retinal vascular caliber 90 , and retinal degeneration 99 .    Average FD Smoking Status Average Vascular Density    Odds ratio (OR) per 1 SD decrease in either vascular density (left) or FD (right). Labeled phenotypes have false discovery rate P<0.05. X-axis reflects an organized grouping of the phenotypes by phenotypic category and p-value of association. FD = fractal dimension.   . Linear regression models were used for analysis, adjusting for age, age 2 , sex, smoking status, and ethnicity. Y-axis reflects the -log10(p-value) x beta direction, whereby above 0 reflects a positive beta (ie: each SD increase in phenotype is associated with increased vascular density or FD) and below 0 reflects a negative beta (ie: each SD increase in phenotype is associated with decreased vascular density or FD). Horizontal dotted line reflects the Bonferroni threshold for significance. X-axis reflects an organized grouping of the phenotypes by phenotypic category (laboratory values versus other quantitative phenotype) and p-value of association. Fvc = forced vital capacity, BMI = body mass index, DBP_adjMeds = diastolic blood pressure, adjusted for antihypertensive medication use (see methods), SBP_adjMeds = systolic blood pressure, adjusted for antihypertensive medication use  Hazard ratio (HR) per 1 SD decrease in either vascular density (left) or FD (right). Labeled phenotypes have false discovery rate P<0.05. X-axis reflects an organized grouping of the phenotypes by phenotypic category and p-value of association. FD = fractal dimension, HR = hazard ratio  a.

Renal_dialysis
Supplementary Figure 11: Phenome-wide associations of retinal vascular indices with incident disease additionally adjusted for spherical equivalence. Adjustment was performed for age, age 2 , sex, smoking status, ethnicity, and spherical equivalence averaged across both eyes. a) -log10(P-value) of associations of retinal vascular density and FD with incident disease plotted grouped by phenotypic category. b. Hazard ratio (HR) per 1 SD decrease in either vascular density (left) or FD (right). Labeled phenotypes have false discovery rate P<0.05. Xaxis reflects an organized grouping of the phenotypes by phenotypic category and p-value of association.  c. Density Beta (95% CI)