TractGeoNet: A geometric deep learning framework for pointwise analysis of tract microstructure to predict language assessment performance

The brain's white matter connections (fiber tracts) and their tissue microstructure can be quantitatively mapped using diffusion magnetic resonance imaging (dMRI) tractography (Zhang et al., 2022a). This mapping enables the study of the brain's white matter structural connectivity, a critical substrate for human cognition. Recent investigations in cognitive neuroscience highlight the importance of predicting individual behaviors or traits from individual measures of brain connectivity (Abdallah et al., 2023; Finn and Rosenberg, 2021; Gabrieli et al., 2015; Liu et al., 2023a; Shen et al., 2017). By performing individualized predictions of phenotypic measures using methods that can generalize to novel individuals, these approaches can improve our understanding of the organization of the brain (Rosenberg et al., 2018; Scheinost et al., 2019). This prediction is usually achieved by performing a regression task that uses input neuroimaging data to predict an output phenotypic measure, such as cognitive performance. (Regression, which relates a dependent variable to one or more independent (explanatory) variables, is an important technique for data prediction tasks.) Many studies have successfully used dMRI as an input modality to predict phenotype information at the individual subject level (Chen et al., 2020; Dhamala et al., 2020; Feng et al., 2022; Gong et al., 2021; Jeong et al., 2021; Liu et al., 2023a; Ooi et al., 2022; Xue et al., 2022a). However, these studies used images or summary information from tractography and were unable to benefit from fine-grained, detailed information about individual points within a fiber tract and their tissue microstructure for prediction. In this work, we investigate the prediction of phenotypic information, specifically performance on language assessments, using highly detailed fiber tract information as input. We investigate solutions to several challenges, including the computational representation of white matter fiber tract geometry and microstructure, the design of a deep network that can leverage fiber tract information as input, the improvement of regression performance for predicting language assessment scores, and the interpretation of prediction results in the context of white matter fiber tract and cortical anatomy.

A critical computational challenge in the analysis of white matter fiber tracts is how to represent tracts and their tissue microstructure. dMRI tractography of a single fiber tract, such as the arcuate fasciculus (Fig. 1a), can contain thousands of streamlines that trace the course of the fiber pathway. These streamlines are encoded as sequences of points, such that one tract can contain several hundred thousand points, with associated tissue microstructure information at each point. Many investigations analyze only a single summary statistic per tract (Fig. 1b), such as the number of streamlines (Yeh et al., 2021) or mean fractional anisotropy (Zekelman et al., 2022), ignoring the known spatial variation of tissue microstructure along fiber tracts. More advanced approaches perform tractometry (Fig. 1c) to analyze averages (O'Donnell et al., 2009; Yeatman et al., 2012) or distributions (Chandio et al., 2020) of microstructure measures in subregions along the lengths of fiber tracts. However, due to the need to bin data along tracts, streamline-specific or pointwise information is generally obscured, and it is assumed that the bins correspond across subjects despite the known intersubject variability in shapes, lengths, volumes, and cortical connectivity of tracts (Yeh, 2020). In contrast to these traditional representations, we propose to represent a complete white matter tract with its set of raw points for microstructure analysis using a point cloud, which is an important type of geometric data structure (Fig. 1d). Point cloud representations have previously been utilized to leverage positional information of streamline points (e.g., point coordinates) in tractography data processing tasks such as tractography segmentation and tractogram filtering (Astolfi et al., 2020; Chen et al., 2023a, 2021b; Xue et al., 2023a, 2023b, 2022b). In this study, we investigate the effectiveness of representing a whole white matter tract and its microstructure measurements using a point cloud. In this way, we can directly utilize not only positional information but also tissue microstructure information from all points within a fiber tract and avoid the need for along-tract feature extraction. A point cloud can encode detailed information about the tissue microstructure as well as the three-dimensional shape and spatial extent of fiber tracts. We leverage this point cloud representation for input to a high-level learning task of prediction of neuropsychological scores. We hypothesize that using more detailed white matter fiber tract information can improve the prediction of cognitive measures and enable the localization of critical regions for prediction.

Designing a high-performance regression model that can handle input geometric data is challenging. Traditional regression algorithms, including linear and non-linear models (such as ElasticNet and Random Forest), have been used in the neuroimaging-based prediction context (Cui and Gong, 2018; Feng et al., 2022; Huang et al., 2016). Deep learning, including multilayer perceptrons (MLP) and convolutional neural networks (CNNs), has also shown promise in predicting cognitive measures (Feng et al., 2022; Jeong et al., 2021; Xue et al., 2022a). However, a limitation of such deep learning models is that they can only process input features in the form of raw images or feature vectors and are unable to handle other feature formats that may be more informative, e.g., a point cloud representation of geometric data. Point-based neural networks, such as PointNet, have thus been developed to process geometric data represented as point clouds and demonstrated superior performance (Chen et al., 2017; Qi et al., 2017; Vora et al., 2020). These neural networks are specifically designed to handle irregular and unordered input data, and they have not been used to perform regression based on quantitative representations of tractography geometry. This represents an unexplored opportunity to leverage the strengths of point-based neural networks in this domain.

Another important research challenge is the improvement of performance in regression-based prediction. In particular, we investigate how to leverage the regression label information to enhance training. Unlike a classification task, where the labels are a series of independent discrete values, the labels for a regression task are continuous values with quantitative meaning. While existing regression methods often utilize the predicted label of each subject individually, recent studies have leveraged additional information about the relationship between the predicted labels across subjects to improve prediction performance. For example, the ranking loss (Le Vuong et al., 2021; Liu et al., 2018) utilizes the ranking of the predicted labels across subjects to inform learning. Our recent work has proposed using the differences between regression labels of subjects to define positive or negative pairs in contrastive learning for regression based on tabular data (Xue et al., 2022a). However, these works have not used the quantitative label difference information to guide the training of a regression model. Thus, it is worthwhile to explore the development of novel neural networks that can effectively and quantitatively leverage continuous regression scores to improve tract-based neuropsychological score prediction.

Finally, the challenge of the interpretation of neuroimaging-based prediction results has also attracted substantial attention (Chen et al., 2023b; Cui et al., 2022; Dan et al., 2022; Feng et al., 2022; Li et al., 2021; Zhang et al., 2018a, 2016; Zhang et al., 2022). Interpretation refers to identifying the critical brain regions or features that contribute the most to a prediction task. This interpretation can provide insights into the underlying neural mechanisms related to the predicted variable (Jiang et al., 2022; Kohoutová et al., 2020). Some studies have explored results interpretation in prediction tasks using features from white matter connections (Chen et al., 2023b; Feng et al., 2022; Kawahara et al., 2017; Xue et al., 2022a). For instance, some methods rank the importance of each input feature, where features are summarized measurements from all points within a white matter connection (Feng et al., 2022; Xue et al., 2022a). Another recent study adopts an attention module to identify fiber clusters that predict a specific variable (Chen et al., 2023b). However, these methods can only identify entire white matter connections as important for the prediction task and do not allow exploration of the contributions of different regions or individual points within the white matter connections. Therefore, there is a need to develop interpretation methods specifically tailored for prediction tasks based on detailed point cloud representations of white matter fiber tracts.

As an initial testbed for our proposed framework in this paper, we focus on predicting two cognitive measures related to language performance. Previous works have not investigated the machine learning prediction of cognitive measures using input geometric representations of fiber tracts. However, several studies have applied deep learning methods to predict general cognitive (Yeung et al., 2023) and language (Feng et al., 2022) performance using structural connectivity matrices as input. While the connectivity matrix is a powerful abstraction for network analysis of the brain, it includes only scalar connection “strength” information. It thus cannot leverage detailed geometric or microstructure information from tractography. In addition to deep learning methods, traditional statistical methods have shown that measures from individual fiber tracts, such as mean microstructure values or estimates of connectivity strength, significantly relate to neuropsychological measures of language (Ivanova et al., 2021; Liu et al., 2023a; Sánchez et al., 2023; Yeatman et al., 2011; Zekelman et al., 2022). By investigating a more detailed fiber tract representation, this investigation can potentially contribute to a deeper understanding of the specific white matter pathways that underlie language abilities.

This study presents a novel geometric deep learning framework, TractGeoNet, for predicting neuropsychological assessment scores and localizing highly predictive regions within white matter tracts. TractGeoNet is designed as a supervised deep-learning pipeline for regression tasks. The paper outlines four key contributions as follows. First, we utilize point cloud representations to preserve the microstructure measurement information from all points within the white matter tracts. This approach provides a comprehensive representation of fiber tract data that can benefit machine learning tasks. Second, we introduce a Paired-Siamese Regression loss for regression to effectively utilize information about the differences between continuous regression labels (language neuropsychological scores in this case). This loss function considers the paired relationship between samples, improving the regression performance. Third, we propose a Critical Region Localization (CRL) algorithm to identify critical regions within each white matter tract. This algorithm enables the localization and interpretation of important regions that contain points that highly contribute to the prediction task. Fourth, we evaluate the proposed TractGeoNet on a large-scale white matter tract dataset of 20 white matter tracts from 806 subjects obtained from the Human Connectome Project (HCP) (Van Essen et al., 2013). The results demonstrate the effectiveness of the proposed approach for predicting neuropsychological scores and identifying critical regions within the tracts.

The current paper extends a preliminary version of the work (Chen et al., 2022) by incorporating several improvements and additional analyses. First, we improve the CRL algorithm by identifying critical regions that exhibit high consistency across multiple trained models and high correspondence across subjects through group-wise analysis. Second, we expand the analysis from one tract (the arcuate fasciculus in the conference publication) to include a total of 20 white matter tracts. Third, we investigate the prediction of an additional language-related neuropsychological assessment and compare the localized critical regions across assessments. By incorporating these improvements, the paper strengthens the overall methodology and provides a more comprehensive evaluation of the proposed TractGeoNet framework.

留言 (0)

沒有登入
gif