Fully Automatic Atrial Fibrosis Assessment Using a Multilabel Convolutional Neural Network

Supplemental Digital Content is available in the text.

To isolate the body of LA, we developed automatic ostia localisation and PV truncation tools using a Voronoi diagram (1). This diagram was extracted from a surface mesh made from the blood pool segmentation. Each of the PVs was initially identified by placing landmarks in the centre of gravity for PV labels provided by the segmentation network. These landmarks represented the distal ends of the PVs in the mesh. Centrelines were then automatically drawn from these points to the centre of the atrial body, running through the veins. As the centrelines enter the body, the maximum area of the surrounding structure increases significantly. This inflection was used to identify the ostium. The rate of change of the area, shown in black polygons in Figure S1(a), was calculated as: where A i is the area of a polygon at step i on the centreline. The possibility of finding a PV ostium was then determined based on the following conditions: where Θ max and Θ min are predefined thresholds based on the average area of atria. To circumvent incorrect ostia localisation due to irregularities in the surface mesh, an additional function χ B (∆A i ) was included that evaluated to T rue if the area increased monotonically at the identified ostia. Figure S1: Ostia Localisation and Truncation: (a) A Voronoi diagram of the left atrium displaying partitioning of the geometry into polygons with area of surrounding structure encoded. (b) Computed centrelines are illustrated in blue running from PV landmarks to the centre of gravity. The localisation of ostia was achieved by analysing the change in the area of the surrounding structure. The colourful disks represent different PV truncation techniques. Yellow disks represent a fully automatic truncation method. Green disks represent the semi-automatic method, where the user has the option to define the size and angle of the clipper's shape. The red disk and its surrounding red points are the representative of a fully manual method, in which the user places a number of seeds around the PV to define the truncation path.
The PVs arrangements differ significantly between patients, which limit the use of the truncation approach described by Tobon-Gomez et al. (2). Their clipping is performed with an infinite plane, rendering it susceptible to unwanted cropping. To tackle this issue, we engineered a fully automatic clipper, which computed geometric properties of the PVs inner walls. This information was used to construct dynamically shaped clippers. The clipper shapes helped to truncate the blood pool at the ostia and the isolated PVs were used to relabel the original segmentation.
In addition to the fully automatic clipper, we also devised two other types of truncation methods with different levels of manual interventions to provide flexibility. The semi-automatic type of truncation method exploited the visualization toolkit (VTK) implicit functions. The intersection of an infinite plane in conjunction with a user defined sphere created a ring shaped geometry, which resulted in a convex and flexible shape for the truncation of veins. For the manual method, the user picked a number of seeds on the surface mesh to define a contour. These seeds generated a custom shaped surface, which was then used for truncation of the veins. An example of this manual method is shown as a red disk in Figure S1(b). The wide variation in patient anatomy makes truncation tools such as these essential.
The labelled MV provided by the network undergoes three sequential steps to be automatically truncated from the atrial shell. First, the segmented MV is dilated and then converted to a surface using the MIRTK marching cubes algorithm. The surface of the MV is converted to an implicit function using the VTK Implicit PolyData Distance class. Finally, the implicit function is used to truncate the atrial shell using the VTK Clip PolyData filter, which allows for smooth clipped borders suitable for the rest of the estimation pipeline.

Convolutional Neural Network Implementation
We have a dedicated website to our open-source platform: http://www.cemrgapp.com. This website now has links to documentations, training videos, wiki pages, binary executables, and the public GitHub repository (https://github.com/CemrgAppDevelopers/CemrgApp), which host all the platform's source code. The convolutional neural network's source code used in this study is publicly available from this repository: https://github.com/OrodRazeghi/CemrgNet.

Network Architecture
The number of feature maps k in the convolutions of the first convolutional group was optimised as a hyperparameter during model selection. These 3 x 3 convolutional filters were connected to each other by 2 x 2 max pooling layers in the contracting path and up-convolutions in the expanding path. The number of max pooling layers, which is equal to the number of up-convolutions, was generalised to the depth of the network m as a hyperparameter. The number of feature maps in the output of the convolutions doubled after every max pooling layer, and this number halved after every up-convolution. The convolutions in these convolutional groups and those in the up-convolutions used padding, such that the output of the convolution was the same size as the input of this convolution. The convolutions in the convolutional blocks were followed by rectified linear unit (ReLU) activation and batch normalisation. The ReLU activation function was used for all layers apart from the last layer, which used a softmax activation function. The probabilistic output of this layer was considered to be the output of the model, which was thresholded at 50%.

# Network's Layers Parameters
Size of the convolution filter = 3 x 3 Size of the max pooling operation = 2 x 2 Output layer threshold = 0.5

Network Training
The types of augmentation used for training the network were rotation between -20 and +20, scaling between -20% and +20%, shearing between -10% and +10%, additive Gaussian noise with a mean of 0 and a standard deviation between 1 and 15 pixels, and contrast changing through the power law transformation. The proportion of the training set used for augmentation was tuned as to introduce a sufficient amount of new data but not cause overfitting by examining training and validation sets errors. The adaptive moment estimation (ADAM) optimizer was used for optimisation of the network. An initial learning rate of 0.001 was selected for the ADAM optimizer and the exponential decay rates for the 1 st and 2 nd moment estimates were set to 0.9 and 0.999, respectively. The Dice coefficient was used as the cost function, since it was previously used in the 2013 Left Atrial Segmentation Challenge benchmark (2) and is better suited for datasets with a large label imbalance (3): where |.| are the cardinalities of the prediction and groundtruth sets.

# Optimiser Parameters
Name of the optimiser = ADAM Learning rate = 0.001 Decay rate for 1 st moment estimates = 0.9 Decay rate for 2 nd moment estimates = 0.999

Cost function = Dice coefficient
During training, the accuracy was evaluated on the validation dataset after each iteration of all the training data through the network. This was repeated until the validation accuracy stopped increasing, and the best performing model was selected for evaluation on the test set. The maximum number of epochs was 100 and 500 steps were set for each epoch. Early stopping with a patience of 50,000 iterations was used as a means of regularisation and to reduce training time. The training was done on two NVIDIA Titan V GPU with 5120 CUDA cores and 12GB of memory each, and took approximately 6 hours. Training error of the segmentation network was also evaluated using Dice, accuracy, sensitivity, specificity, and precision measurements for each of the blood pool, PVs, and MV labels.

Network Model Selection
To select the optimal network architecture, a grid search of hyperparameters was performed using five separate models. Each model was trained and tested on the validation set. The models were permutations of the following hyperparameters: varying depth m ∈ 4, 5, 6, number of feature maps k ∈ 32, 64, and dropout rates f ∈ 25, 50, 75. We chose a depth of 5 and 32 feature maps for the first layer of network. Table S2 summarises Dice scores of segmenting the body of LA using the validation set. Overfitting is a potential issue in larger neural networks due to the large number of trainable parameters. To minimise this issue, dropout rates of 25%, 50%, and 75% were evaluated to find the most effective number of nodes to remove, while still keeping enough nodes for sufficient feature learning. We selected a dropout of 50% as the result of tuning this parameter.  Table S2: Hyperparameter selection explored in possible models with grid search. Models deeper than 6 with feature maps larger than 64 were too large to fit into GPU memory.

Rigid Registration Implementation
The rigid transform can register objects that are related by rotation and translation by optimising a similarity measure. The source code of registration algorithm can be downloaded from MIRTK GitHub 1 . The set of parameters for use within MIRTK framework can be downloaded from our website 2 . Below is the full list of chosen hyperparameters for the rigid MIRTK registration method: Step length rise = 1.1 Step

Permutation Tests
We examined our pipeline efficacy by training on a 70% random selection of scans from all five operators. For further analysis, we also examined training our pipeline on one operator, who had the largest number of scans processed. We then tested this pipeline's performance against independent scans from the same operator and also the 60 scans analysed by the other four operators. The results can be found in tables S3, S4, S5, and S6.  Table S6: Fibrosis scores calculated from segmentations generated manually by four operators and automatically by our CNN using similar measurement metrics. Table S3 and S4 represent the results from two separate subsets. Table S3 represents the results from   scans analysed only by one operator in training and testing pools, whereas table S4 represents results from the model that was trained only on scans analysed by one operator and tested on scans analysed by four separate operators, which were completely absent in the training set. Retraining the model on a larger dataset with labelled samples from equally skilled operators will decrease these minor differences, as the available data will be from a more uniform distribution.

Challenge Dataset Tests
Current state-of-the-art deep learning segmentation methods are deep artificial neural networks. We chose the U-Net architecture as recent reviews of cardiac image segmentation methods have confirmed their success in dealing with limited size datasets (4,5,6). The proceedings of the 2018 international workshop on statistical atlases and computational models of the heart cover a range of 2D and 3D CNN based methods for segmenting the atria (4). We trained our network on 2D slices of scans, as it was observed that feeding in 3D sets does not have a significant effect on the accuracy of results and requires more processing power. The benchmarking of architectures used in the 2018 Atrial Segmentation Challenge confirmed the lack of significant difference between 2D and 3D models (4,6). We additionally tested our CE-MRA based network on 100 LGE-CMR scans from the 2018 Atrial Segmentation Challenge dataset and evaluated its potential limitations on analysing different scans from a different centre. The network without any retraining achieved a Dice score of 0.80 ± 0.05. Our previous work specifically trained on the LGE-CMR scans from the challenge dataset had achieved a Dice score of 0.89 (3). We took these cross-centre evaluations further by processing the scans from the 2013 Left Atrial Segmentation Challenge and performing a direct comparison to Mortazi et al. work (7). By training our network architecture on the 2D planes of scans, we found a LA Dice score of 0. Previous automatic atrial segmentation work (8,7,9,10) have relied on one expert delineation per subject and do not provide any inter-observer error margin. For example AtriaNet (10), consisted of a dual pathway CNN architecture and was validated on LGE-CMR dataset of 154 patients from the University of Utah. In contrast, we trained our network on 207 scans with labels of blood pool, MV, and PVs. Jia et al. introduces another solution consisting of two successive networks based on the U-Net architecture and a contour loss on a dataset of 100 subjects (9). Their first network was used to locate the target and the second performed single label segmentation from the cropped region of interest. Our experiments showed training one network with sufficient data is able to delineate LA as accurately as a trained operator.
By quantifying the inter-observer variability, we provide an estimate of the error in manual segmentations and an estimate of the degree of accuracy that we could achieve with an automatic segmentation network, given the inherent inaccuracies in the annotation processes, which are subject to operator's interpretation of the blood pool, landmarks defining the MV's plane, and position of PVs ostia. In fact, the network outperformed operators on all the three labels (p < 0.05), with the greatest improvement in the PVs and MV (blood pool: 0.91, PVs: 0.61, MV: 0.73). This more accurate automatic segmentation gives rise to improved or equal inter-observer scores between the pipeline and the operators, in comparison to between the operators across all methods for estimating fibrosis.
Studies like (8,9,10) segmented LGE-CMR scans. In contrast, we used our clinically validated LGE-CMR interrogation technique (11) and preformed our segmentation on the higher contrast CE-MRA scans. This potentially allowed more accurate segmentations. It is worth noting that the operators were not instructed to segment the PVs fully and were asked to segment enough tissue to localise ostia. Therefore, the suggested proximal PVs manually generated labels varied and a low inter-observer score was seen in the results. Our pipeline is not influenced by this variance, as the ostia localisation algorithm finds the centre of mass for any quantity of PV segmentation and uses it as a landmark for finding ostia. The automatic clipping tool then removes the unwanted tissue.

Effect of Wall Thickness on Fibrosis Burden
The results of a one way ANOVA in Figure S2 revealed no significant difference (p = 0.06) in the global fibrosis burdens calculated by varying the length of the normal projections initiating from the nodes of the atrial surface mesh by considering the significance level as 0.05. Figure S2: Wall Thickness Analysis: A one way ANOVA test confirms that varying the length of normal projections initiating from the nodes of the mesh does not significantly change the mean of fibrosis burdens.