Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data

Data

For our correlation analyses, we drew on data from Wave 6 of the COVID-19 Psychological Research Consortium (C19PRC) study [21]. This study began in March 2020 with the aim of monitoring the psychological, social and economic impact of the COVID-19 pandemic in the UK. The study initially comprised a nationally representative sample of 2,025 adults, with ‘top-up’ participants added at later waves. The sixth wave of data collection occurred between August and September 2021. At this sweep, 1,643 participants from earlier waves were re-interviewed, and an additional 415 new respondents were surveyed (N = 2,058) and the final sample matched the original sample in terms of the quota-based sampling. All participants had complete data. The mean age of participants was 45.92 years (SD = 15.79), 51.9% of the sample were female, 87.7% of the sample were of white British/Irish ethnicity, 57.6% had post-secondary education, and 64.2% were in either full-time or part-time employment. Wave 6 of the C19PRC study was granted ethical approval by the University of Sheffield [Reference number 033759]. The data and meta-data used in this study can be found at https://osf.io/v2zur/.

Measures

We drew on data from five self-report questionnaires. Two of these questionnaires assessed depression, two covered anxiety, and one measured symptoms of PTSD.

The Patient Health Questionnaire-9 (PHQ-9) [22] consists of nine questions that align with the DSM-IV criteria for major depressive disorder. Participants were asked about the frequency with which they experienced these depressive symptoms over the preceding two weeks. Response options were on a 4-point Likert scale ranging from 0 (not at all) to 3 (nearly every day). The psychometric properties of the PHQ-9 have been extensively documented [23].

Participants also completed the Generalized Anxiety Disorder Scale (GAD-7) [24]. Respondents were asked to indicate on a 4-point Likert scale ranging from 0 (not at all) to 3 (nearly every day), how frequently they were troubled by seven symptoms of anxiety over the preceding two weeks. The reliability and validity of the GAD-7 has been supported widely evidenced [25].

Two newly developed scales were also administered; the International Depression Questionnaire (IDQ) and the International Anxiety Questionnaire (IAQ) [26]. These scales were designed to align with the ICD‐11 descriptions of Depressive Episode and Generalized Anxiety Disorder. The IDQ consists of nine questions, and the IAQ has eight. For both questionnaires, responses are indicated on a 5-point Likert scale ranging from 0 (Never) to 4 (Every day). Initial psychometric work suggests these scales are reliable and valid [26].

The International Trauma Questionnaire (ITQ) [27] was used to screen for ICD-11 post-traumatic stress disorder (PTSD). The ITQ consists of six questions that can be grouped into two-item symptom clusters of Re-experiencing, Avoidance, and Sense of Threat. Participants were asked to complete the ITQ as follows: “…in relation to your experience of the COVID-19 pandemic, please read each item carefully, then select one of the answers to indicate how much you have been bothered by that problem in the past month”. Responses were indicated on a 5-point Likert scale, ranging from 0 (Not at all) to 4 (Extremely). Three additional questions measure functional impairment caused by the symptoms. The psychometric properties of the ITQ scores have been supported in both general population [28] and clinical and high-risk [29] samples.

All 39 questions from the five scales, which were used as input for our NLP analyses, are presented in the supplementary files (Table S2). All questions were scored in the same direction (i.e. higher scores reflected greater frequency/severity of symptomatology), therefore no reverse coding was required.

Pre-processing

First, using the data from the C19PRC study, we calculated a Spearman rank correlation for each pair of questions in the battery. Given there were 39 questions in total, this resulted in 741 correlation coefficients (39*38/2). Second, we imported the questionnaire content, in pdf format, into Harmony, which produced a semantic similarity score (cosine index) for each of the 741 item-pairs. We then merged the results from the above two steps, creating a simple data set where the rows corresponded to item-pairs, and columns corresponded to correlation and cosine values for each item-pair (available in Supplementary file 2).

Analyses

We explored the association between the correlations from the empirical data and NLP-derived similarity scores by doing the following:

First, we randomly split the dataset into training (80% of item-pairs) and testing samples (20% of item-pairs). Using the training sample, we produced a scatterplot to visualise the relationship between the cosine and correlation scores, and then calculated the Pearson correlation between the two indices. Next, we estimated a linear regression model, with cosine scores as the predictor and correlation coefficients as the outcome variable. We then tested this model in the holdout sample, and calculated the mean absolute error (MAE), and Root Mean Squared Error (RMSE), the Median Absolute Error (MedAE) between what was predicted by our model and the observed correlation coefficients in the holdout sample. These errors were visualised as a violin plot. All of the above analyses were conducted in R version 4.3.1 and visualisations were produced using the ggplot2 package [30].

Next, to examine the ability of NLP to uncover complex structures using questionnaire meta-data, we estimated and visualised matrices of the item-pair correlations and cosine scores as separate graphical networks using the full dataset (N = 741). In the network of cosine scores, nodes (points in space) represented questions, and edges (connections between nodes) reflected the cosine similarity scores between a given pair of questions, with thicker and more saturated lines indicating higher cosine values. We estimated two versions of the correlation network – a bivariate/pairwise correlation network, and a regularised partial correlation network. In the bivariate network, nodes represented questionnaire variables and edges reflected the strength of the correlations between nodes. In the regularised partial correlation network, edges can be interpreted as partial correlation coefficients, with line thickness and saturation reflecting the strength of association between two symptoms after controlling for all other symptoms in the network. In this network, a LASSO penalty was applied to the edges, which shrinks edges and sets very small connections to zero. This is a commonly employed approach in the estimation of networks of mental health data, as it produces a sparse network structure that balances parsimony with explanatory power [31]. The LASSO utilizes a tuning parameter to control the degree of regularization that is applied. This is selected by minimizing the Extended Bayesian Information Criterion (EBIC). The degree to which the EBIC prefers simpler models is determined by the hyperparameter γ (gamma) – this was set to the recommended default of 0.5 in the present study [31]. For further detailed information on the estimation of regularised partial correlation networks, we refer readers elsewhere [31, 32]. The networks in the present study were estimated and visualised in R using the qgraph package [33].

After estimating the cosine and correlation networks, we used the walktrap community detection algorithm [34] to identify communities or clusters of nodes within the three networks. Walktrap is a bottom-up, hierarchical approach to uncovering structures within networks. The central idea of walktrap is to simulate random walks within a given network. Random walks start from a particular node and traverse the network by moving to a neighboring node at each step, following edges randomly. This process is repeated for multiple random walks initiated from each node in the network. Walktrap is based on the idea that nodes within the same community will have similar random walk patterns and thus be close to each other in the clustering. We ran the walktrap algorithm using the igraph package, taking the weighting of edges into account, with the default number of four steps per random walk. Research has shown that the walktrap method produces similar results to other methods for uncovering underlying structures in multivariate data (e.g. exploratory factor analysis, parallel analysis) [35]. However, the walktrap algorithm can produce a clustering outcome, even in scenarios involving entirely random networks. Consequently, we calculated the modularity index Q [36] to assess the clarity and coherence of the clustering solutions identified. In real-world data, Q typically ranges from 0.3 to 0.7, with values closer to 0.3 indicating loosely defined communities, while those around 0.7 indicate well-defined and robust community structures [36].

While the LASSO networks offer a more conservative and interpretable structure than networks consisting of bivariate correlations, to our knowledge, there is no equivalent approach for networks of cosine scores. Furthermore, there are no guidelines for determining when two questions are considered ‘similar enough’ based on their cosine similarity score. To address this, we conducted sensitivity analyses in which we manually set small edges (cosine vales) to zero, to produce increasingly sparse networks. We estimated five additional cosine networks, removing any connections with edge weights below certain thresholds. These thresholds ranged from 0.2 up to 0.6. For each of these networks we also tested for community structures and modularity.

View original article

BMC PSYCHIATRY

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data

Comments (0)