Six molecular subsets (endotypes) within 1620 active, female lupus patients in the combined datasets from the ILL clinical trials (GSE88884 ILL-1 & ILL-2) were identified by k-means clustering of GSVA enrichment scores of 32 immune cell/inflammatory pathway gene modules (Fig. 1A, Additional file 2: Figure S1). Endotypes labeled as Z1-Z6 were distinguished by patterns of enriched (red) and unenriched (blue) gene modules. To assign SLE endotypes as normally or abnormally enriched, lupus patient and control samples were re-clustered together (Additional file 2: Figure S4). Most non-lupus control samples clustered with the least abnormal SLE samples (11/17 in Z1’), indicating that these SLE patients had minimal to no gene expression differences from controls; analysis by cosine similarity indicated their module identity. Features were considered as abnormally enriched or expressed based on their GSVA enrichment score as compared to controls. For example, in non-lupus controls, modules such as IFN, plasma cell, IG chains, myeloid cell, and tumor necrosis factor (TNF) tended to have GSVA scores less than zero, whereas modules such as B cell, T cell, and T cell chains (TCRA, TCRAJ, TCRB, TCRD) tended to have GSVA scores greater than zero. Within the lupus patient samples, the endotype designated Z1 manifested the least number of abnormally enriched gene modules (features), whereas the endotype designated Z6 had the greatest number of abnormally enriched features, with other endotypes arrayed between.
Next, we extended this endotyping approach to include other unrelated datasets, two with both active and inactive patients, and a large RNA-seq dataset, to determine whether the same or additional endotypes could be detected (Additional file 1: Table S1, Datasets 3–5). Within these three other datasets, six endotypes were identified in a cohort of 266 adult lupus patients, five endotypes were identified in a cohort of 137 pediatric lupus patients, and four endotypes were identified in 160 other adult lupus patients (Fig. 1B–D, Additional file 2: Figure S1). As with GSE88884, non-lupus controls tended to cluster in the “least perturbed” endotype with similar enrichment patterns as Z1 (Additional file 2: Figure S5).
Cosine similarity analysis indicated that several of these SLE endotypes were reproducible among all datasets, whereas others were found in only some datasets (Fig. 1E). Of note, when compared to a large cohort of adult lupus patients, three shared subsets and two transcriptionally distinct subsets emerged in the pediatric patients using a cosine similarity cutoff of 0.7. Combining all unique endotypes across datasets indicated that a total of 11 endotypes can be identified by cosine similarity. However, hierarchical clustering of the mean GSVA scores of the individual endotypes suggested that after eight subsets, the next three subsets emerged within a very small statistical space (Fig. 1F), namely, that with a slightly lower cut height, eight endotypes were identified, which indicated that these three additional subsets were similar to other subsets even though they missed being captured by cosine similarity. Furthermore, the additional subsets contained very few members compared to the eight endotypes. In summary, these analyses conservatively identified eight as the optimal number of SLE endotypes. K-means clustering of the eight identified endotypes from the five datasets demonstrated distinct molecular patterns (Additional file 2: Figure S6A). Principal component analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) demonstrated good separation of the subsets (Additional file 2: Figure S6B-C).
ML to classify SLE samples into endotypesTo identify the eight endotypes of lupus among a complete cohort of 3166 SLE patients, we concatenated the GSVA enrichment scores of 26/32 measurable features across 17 datasets and employing the k-means (k = 8) pipeline, subsetted all samples into eight endotypes labeled A-H (Figs. 2 and 3A). These results became the class labels for subsequent ML. It was essential that all subsets have a “true label” in order to validate subsequent ML algorithms, so this step was employed to assign subset designation of all patients within the universe of 3166 SLE patients examined. Each endotype was characterized by unique patterns of dysregulation of the 26 informative gene modules (Additional file 2: Figure S7).
Next, we employed GSVA scores from the five training/internal validation datasets (n = 2183, Additional file 1: Table S1, Datasets 1–5) to generate an ML model to classify patients into the final eight endotypes. Testing and external validation were carried out on gene expression profiles of 983 patient samples from 12 additional, independent datasets not used to generate the ML model (Additional file 1: Table S1, remaining datasets). The training data from the 2183 samples were further partitioned into training (80%) and validation (20%) sets and one-vs.-one multi-class classification by multiple algorithms was carried out to predict endotype memberships and carry out internal validation. Receiver operating characteristic (ROC) curves for RF, SVM, logistic regression (LR), and gradient boosting (GB) models were generated on the 983 patients from 12 test datasets not used to generate the model (external validation) and demonstrated high predictive capabilities (Fig. 3B, C, Additional file 2: Figure S3). LR had the highest precision overall with sensitivity ranging from 89 to 100% and specificity ranging from 99 to 100% for the eight endotypes. Altogether, all the classifiers worked well, and even though SVM and LR appeared to be somewhat more effective, we selected RF because it was effective in both multi-class and binary analyses and also afforded the opportunity to access many feature importance techniques. Classification of the 983 patients from the test set into the eight endotypes using the RF classifier is shown (Fig. 3D).
Because of concern that assignment of “true labels” might have biased the outcome of the external validation, we carried out an additional ML approach, the centroid-based ML validation, in which “true labels” were not assigned by k-means clustering of the GSVA scores of the entire 3166 patients. Rather, labels for the test set (983 patients, 12 datasets) were generated by centroid similarity of the GSVA scores to the training set. With this external validation ML design, we observed classification performance with high accuracy (Additional file 2: Figure S8-10). A third ML approach, the unsupervised ML validation, was also carried out, in which class labels were generated using only the initial five training/validation datasets. Five different ML models (LR, SVM, RF, GB, extreme GB, or XGB) were trained and model weights were generated. The model weights were then used to predict the class of the patients from the 12 independent testing datasets, for which we did not have true labels (Additional file 2: Figure S11A). When the output of the ML models on the 12 datasets were compared to the outputs from the five training/validation datasets using cosine similarity, it can be seen that there was a high level of sameness of the derived endotypes (cosine similarity > 0.9 for each comparison) for each of the five ML algorithms employed (Additional file 2: Figure S11B-F). This third orthogonal approach clearly shows that the ML model generates the same endotypes from unrelated datasets in an unbiased manner and validates the performance of the models.
To determine whether all 26 features were required to achieve similar performance, we analyzed the model performance when features were randomly removed. We found that all 26 features were needed to identify the eight endotypes by k-means optimally (Additional file 2: Figure S12). In addition, the RF classifier and other ML models were largely built using female patients. To ensure the model was applicable to both sexes, we applied the RF classifier from the first ML approach (Additional file 2: Figure S8-10) to a large cohort of male patients (Additional file 2: Figure S13). Indeed, we observed all endotypes in the male cohort, demonstrating these endotypes are not restricted to female patients.
Development of composite metric LuCISWith the identification of eight endotypes representing the apparent universe of lupus patients and high predictive capability of ML algorithms, we sought to reduce the information from gene expression profiles into a clinical metric, designated LuCIS, to display the range of molecular abnormalities numerically. An RPLR model was employed to calculate a LuCIS value for each patient based on his or her binarized GSVA enrichment score (Fig. 4A). These scores were then compared to the placement of each patient in our eight lupus endotypes and showed increasing LuCIS values for each endotype (Fig. 4B).
Fig. 4LuCIS summarizes the severity of molecular abnormalities in individual lupus patients
Logistic regression with ridge penalization was employed to classify endotype A (Z1), the “least abnormal,” and endotype G/H (Z6), the “most abnormal,” from GSE88884 ILL-1 & ILL-2 which produced coefficients (A) that can be used to calculate LuCIS. B The mean + SEM (top) and distribution (bottom) of LuCIS calculated for the eight endotypes in all 17 datasets using the 26 features shown in A and the imputed values for the six features not measured on all platforms (IG chains, TCRA, TCRAJ, TCRB, TCRD, Treg) as described in the Supplemental Methods. Statistical differences between mean LuCIS of the endotypes were evaluated with the Kruskal–Wallis test and Dunn’s multiple comparisons. C Comparison of mean LuCIS between non-lupus healthy controls and the least “abnormal” lupus endotype in five independent datasets for which adequate controls were available. Missing values for GSVA modules not measured (IG chains, TCRAJ, TCRB, TCRD) in GSE65391 were imputed as described in the Methods. All plots were generated, and statistics were computed, in GraphPad Prism v. 9.4.0 (673)
To contextualize LuCIS values, we calculated the scores of healthy, non-lupus controls in five datasets for which adequate control samples were available (Fig. 4C). Notably, the mean LuCIS values of the least abnormal lupus endotypes were not significantly different from those of the non-lupus controls, indicating that LuCIS identified the least abnormal endotypes’ resemblance to a normal transcriptional profile.
Use of SHAP to determine the most important features of endotypesWe determined the most important GSVA enrichment scores contributing to the endotype groupings using additional ML classification and SHAP. To accomplish this, we used ML to classify samples using one-vs.-rest multi-class classification (Additional file 2: Figure S14) and then employed SHAP to compute the contribution of each feature to each endotype. A summary of mean absolute SHAP values across patients in the eight endotypes from the extreme GB classifier revealed the top 20 features contributing to the model, with gamma/delta (gd) T cells, major histocompatibility complex II (MHCII), and IFN being the overall most impactful (Additional file 2: Figure S15). Anti-inflammation, granulocyte, and neutrophil features most distinguished endotype H, the most perturbed endotype (Fig. 3A), whereas the lack of enrichment of monocytes, anti-inflammation, and IFN most impacted the least perturbed endotype, endotype A.
To elucidate the specific features determinant of the eight endotypes, we additionally employed seven binary classifications, each comparing one of the seven more transcriptionally perturbed endotypes (B–H) to the least abnormal endotype (A). These classifiers (Additional file 2: Figure S16-22) demonstrated excellent performance with a mean positive predictive value of 0.97 for the RF classifier, but all classifiers performed well (Additional file 1: Table S4).
SHAP analysis of the RF classifiers was then employed to delineate the features that characterized individual endotypes from those of the most normal lupus endotype (endotype A) (Fig. 5). Data are shown as the features that distinguish endotypes. In addition, the impact of each feature at the individual patient levels is shown as SHAP waterfall plots (Additional file 2: Figures S16-22). Different patterns of features distinguished most of the endotypes (Fig. 5). For example, the endotype with the most immunologic abnormalities (H) demonstrated high contributions from the monocyte, neutrophil, TNF, and IFN signatures (Fig. 5, Additional file 2: Figure S22), whereas T cell signatures were most contributory to endotype B (Fig. 5, Additional file 2: Figure S16). SHAP dependence plots detailed the various effects of the individual features on predictions made by the model with, for example, granulocytes having an impact throughout the range of granulocyte GSVA scores, gd T cells at the transition of GSVA scores from positive to negative, and LDG at the extremes of GSVA scores for distinguishment of endotype B (Additional file 2: Figure S16E). Using the Gini Index, another feature importance metric employed with the RF classifier, we were able to confirm these analyses (Additional file 2: Figure S23).
Fig. 5SHAP analysis reveals features most distinctive of transcriptional perturbations in the seven abnormal lupus endotypes
SHAP analysis of the seven binary RF classifiers distinguishing the seven out of eight most transcriptionally abnormal lupus endotypes (B–H) from the eighth least abnormal endotype (A) reveals the features most contributory to the ML model’s classification capacity. Mean SHAP values were calculated using only the samples in each more severe endotype (B–H) that were positive. The size of the data points enumerates the positive mean SHAP value of each feature listed on the y-axis. Bubble plot rendered in R using the ggplot2 package
Clinical data do not identify molecular endotypesTo test whether clinical characteristics alone could identify the endotypes, we employed our clustering pipeline on ILL-2 lupus patient clinical metadata. With k-means, six subsets based on clinical features alone were identified (Additional file 2: Figure S24A). Another six subsets were also identified by separate employment of a variational autoencoder to determine whether a deep learning algorithm would alternatively be able to identify the endotypes (Additional file 2: Figure S24B). The clinical k-means subsets were largely dictated by ancestry, whereas the clinical autoencoder subsets were ancestrally heterogeneous. Notably, by Adjusted Rand Index, the clinically determined subsets were significantly different from the molecular endotypes (Additional file 2: Figure S24C-D) and from each other (Additional file 2: Figure S24E). To corroborate this finding, we employed ML classifiers using the same clinical data as features to determine whether they could predict molecular endotype memberships (Additional file 2: Figure S25). Performance was suboptimal, with a mean RF classifier precision of 32%, further indicating that clinical characteristics are insufficient features to identify the molecular endotypes.
Clinical characterization of SLE endotypesNext, we sought to determine whether endotype membership was associated with various clinical features of SLE. For clarity, subsets/endotypes identified in Fig. 1A were re-assigned to a letter classification (A-H) based on cosine similarity to the final eight endotypes (Fig. 6, Additional file 2: Figure S26). Despite all patients having SLEDAI ≥ 6 in GSE88884 ILL-1 & ILL-2, analysis of the associated metadata revealed significant differences among endotypes with respect to SLEDAI, autoantibody titers, lymphopenia, and serum complement levels. Subset A (Z1, the least abnormal) had the lowest SLEDAI, lowest autoantibody titers, and highest complement levels, whereas endotypes E (Z3), D/G (Z4), F/H (Z5), and G/H (Z6) manifested more abnormal clinical characteristics (Fig. 6A, Additional file 2: Figure S27). By these systemic measures, endotype A (Z1) exhibited the lowest disease activity. On the other hand, E (Z3) exhibited the most disease activity, followed closely by F/G (Z6). Endotypes associated with more disease activity were characterized by various combinations of enrichment of features for plasma cells, myeloid cells, neutrophils, inflammatory cytokines, and lymphopenia (Fig. 1A).
Fig. 6Clinical characterization of SLE endotypes
Clinical metadata were summarized for each cluster from GSE88884 (ILL-1 & ILL-2) using baseline values. Metadata was categorized by A quantitative immunologic/inflammatory and systemic disease indicators, B incidence of subsequent flares over 52 weeks in placebo patients receiving SoC medication (n = 550), C patient ancestry, and D medication use. Labels on x-axes indicate the shorthand name for the endotypes. Clusters were relabeled as one of the eight endotypes using cosine similarity. Scatterplots in A display the mean ± SD for each endotype; statistical differences were found with Dunn’s multiple comparisons test. Significant associations between categorical variables and endotypes in B–D (denoted with asterisks in titles) were identified using chi-square test of independence. In B–D, odds ratios of endotype A (Z1) having a positive value for the clinical trait of interest as compared to the other cohorts combined are displayed above the respective bar with significance indicated by asterisks. Missing data (n.d.) were excluded from analyses. All plots were generated, and statistics were computed, in GraphPad Prism v. 9.4.0 (673). MMF = mycophenolate mofetil. MTX = methotrexate. AZA = azathioprine. n.d. = no data. *p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001
Of note, the patients in A (Z1) receiving standard of care (SoC) medication had a lower frequency of severe flares over the subsequent 52 weeks as compared to the patients in other subsets (OR = 0.116, p = 0.00041, Fig. 6B, Additional file 1: Table S5). Significant relationships between endotype membership and ancestry (African, European, or Hispanic) (Fig. 6C) and medication use (oral steroids, azathioprine, or methotrexate) were also identified (Fig. 6D). Patients of each ancestry were noted in each subset, although AA lupus patients were most enriched in E (Z3) and least enriched in A (Z1) (Fig. 6C). More subtle but significant differences were found among endotypes for vasculitis, alopecia, leukopenia, arthritis, mucosal ulcers, accompanying organ systems, and the overall number of SLEDAI domains involved (Additional file 2: Figures S28-29).
Because extensive clinical metadata accompanied the pediatric lupus patients (GSE65391), we examined the relationships between endotype and clinical features and found significant differences among endotypes with regard to SLEDAI, complement levels, and ESR, with endotype E (X3) exhibiting the most disease activity (Additional file 2: Figure S30). Although mean anti-dsDNA levels varied among endotypes and were highest in E (X3), intra-group variation was too high to detect significant intergroup differences. Pediatric endotypes also differed by occurrence of lymphopenia (p < 0.05), oral steroid use (p < 0.01), and HCQ use (p < 0.01), but, interestingly, not ancestry (p > 0.05, Additional file 2: Figure S30C-E). However, the endotype with the highest SLEDAI, E (X3), contained the highest proportions of AA and Native American Ancestry (NAA/Hispanic) patients and patients with proliferative nephritis (Additional file 2: Figure S30F).
Relationship between endotypes and ancestry and SoC therapiesBecause we previously found that ancestry and SoC therapies, such as mycophenolate, can significantly influence gene expression [13], and there were variations in subset membership based on ancestry and medication use (Fig. 6C-D), we examined these relationships in greater detail. When only EA patients from GSE88884 were clustered (n = 1118), the same six endotypes as observed in the full GSE88884 ILL-1 & ILL-2 cohort were represented (Additional file 2: Figure S31). Similar results were seen in the AA population (n = 216, Additional file 2: Figure S32), although the percentage of patients in the least abnormal subset was diminished (R1, chi-square, p = 0.03) and a plasma cell-enriched subset was more prominent (R3, chi-square, p = 0.02). When only NAA/Hispanic patients were clustered (n = 232), five out of six endotypes from the full cohort were identified (Additional file 2: Figure S33). Thus, endotype distribution varied based on ancestry, even though most endotypes were observed in each ancestry group.
We repeated these analyses among patients stratified by immunosuppressive agents at baseline and compared distributions of patients to the endotypes in the full prototypic cohort (all active female) to identify which endotypes were maintained (cosine similarity > 0.7, Additional file 2: Figure S34). Treatment with mycophenolate or methotrexate in combination with steroids appeared to deplete the most perturbed endotype, G/H (Z6), whereas this endotype was expanded in groups treated without SoC SLE therapies, without immunosuppressive agents, and without steroids alone. Treatment with steroids with methotrexate also appeared to deplete endotype E (Z3), which exhibited the highest disease activity. The distribution of patients in the least perturbed endotype, A (Z1), also increased with treatment by steroids and immunosuppressive agents. Altogether, these analyses illustrate that some endotypes cannot be found/are reduced (by cosine similarity) in patients treated with immunosuppressive drugs, and implies that therapy can suppress the appearance of specific endotypes. In addition, therapy may contribute to the proportion of patients in each endotype.
Utility of molecular endotyping in determining patients with likelihood of therapeutic responseTo explore the clinical utility of molecular endotyping in greater detail, we applied the k-means clustering pipeline to the patients belonging to the successful ILL trial, GSE88884 ILL-2 [39], and identified six endotypes similar to those found in the combined trial datasets (Fig. 7A, Additional file 2: Figure S35). Endotypes were re-assigned to A-H based on cosine similarity (Fig. 7B). We examined clinical response to the investigational product, tabalumab, by two metrics: SRI-5, used in the trial, and the more standard SRI-4 [39], among these endotypes (Fig. 7C, D). We identified three responsive groups by SRI-5 (B [V2], F/H [V5], and G [V6]) and one responsive group by SRI-4 (B [V2]), each with a response effect size > 20%. Of note, the endotype with the least immunologic activity (A [V1]) was not responsive by either metric. Notably, although A (V1) manifested a high placebo SRI-5 response rate, the placebo response rates between A (V1) and D/G (V4) were not significantly different (p > 0.05). Clinically, the responsive endotypes were characterized by lymphopenia at baseline (chi-square, p < 0.0001, OR = 3.2) and were comprised of more patients taking azathioprine (chi-square, p < 0.01, OR = 2.0), more NAA/Hispanic patients (chi-square, p < 0.0001, OR = 2.6), and fewer EA patients (chi-square, p < 0.05, OR = 0.72) (Additional file 2: Figure S36, Additional file 1: Table S6). The responsive endotypes also trended toward having an increased likelihood to experience a severe flare over the subsequent 52 weeks on SoC therapy (chi-square, p = 0.06, OR = 2.0).
Fig. 7Endotyping to stratify patients who are more likely to respond to treatment
Gene expression profiles (A) of k-means clustering (k = 6) of 807 active female lupus patients from GSE88884 ILLUMINATE-2. B Cosine similarity comparing the eight global SLE endotypes to the molecular endotypes in ILL-2. Clinical responses by C SRI-4 and D SRI-5 per endotype determined by gene expression data and 32 features. Responses among the treatment groups in C and D were ascertained by the chi-square test for trend (all three doses) or chi-square test for independence. Chi-square tests were performed for each endotype individually and summarized on the same plot. Significant results of the chi-square test for trend are denoted in the x-axis label of the endotype. Clusters were relabeled as one of the eight global endotypes using cosine similarity. Q2W and Q4W indicate frequency of drug administration was every 2 weeks and 4 weeks, respectively. Heatmap in A was generated with GraphPad Prism v. 9.4.0 (673). Cosine similarity plot in B was generated with the plot.matrix R package and edited in Adobe Illustrator. Histograms in C, D were generated with GraphPad Prism v. 9.4.0 (673). *p < 0.05
Utility of LuCIS in determining likelihood of flare and therapeutic responseFinally, we also determined whether there was a relationship between LuCIS values and clinical features. First, we examined the correlation between LuCIS and anti-dsDNA titer, SLEDAI, serum C3, and serum C4 (Fig. 8A–D) in GSE88884 for which full clinical data were available. Positive correlations were identified (p < 0.0001) between LuCIS and anti-dsDNA titer or SLEDAI, whereas negative correlations were identified (p < 0.0001) between LuCIS and C3 or C4; however, coefficients were modest, suggesting that LuCIS is related to these metrics but may provide additional information not captured by SLEDAI or serology that is reflective of immunologic activity. Next, we used groupings defined by LuCIS values to predict likelihood of flare or response to active treatment in the ILL-2 trial. Patients were assigned to sextiles based on their LuCIS value, to mirror the number of groups identified by the clustering pipeline. LuCIS-defined subsets were associated with a likelihood of severe flare (Fig. 8E) and response to the investigational product in post hoc analysis of the ILL-2 trial (Fig. 8F, G).
Fig. 8The relationship between LuCIS value and clinical variables, flares or clinical response
Pearson correlations were computed between LuCIS values and A anti-dsDNA titers, B SLEDAI, C serum C3, and D serum C4 using LuCIS values of all the patients from GSE88884 ILL-1 & ILL-2 (n = 1620). LuCIS values were divided into sextiles for GSE88884 ILL-1 & ILL-2 and E incidence of subsequent flares over 52 weeks in placebo patients receiving SoC medication (n = 550) was examined. LuCIS values were also divided into sextiles for GSE88884 ILL-2, and F SRI-4 response and G SRI-5 response were examined. Significant associations between incidence of flare among the sextiles (denoted with asterisks in the title) were identified using the chi-square test for trend. Differences in incidence of flare between sextiles were determined by chi-square test of independence and denoted below the figure. Similarly, responses among the treatment groups in each LuCIS sextile independently were determined by the chi-square test for trend (all three doses) or chi-square test for independence. Significant results of the chi-square test for trend are denoted in the plot title. All plots were generated, and statistics were computed, in GraphPad Prism v. 9.5.0 (730). *p < 0.05; **p < 0.01; ***p < 0.001
Comments (0)