Cuproptosis-related gene subtypes predict prognosis in patients with head and neck squamous cell carcinoma

Expression patterns of the cuproptosis genes

We identified a set of 19 putative CRGs (with different expression patterns. As shown in Fig. 1A, the expression of 15 (GLS, FDX1, LIPT1, ATP7B, SLC31A1, LIAS, DLAT, PDHA1, NFE2L2, PDHB, DLD, MTF1, CDKN2A, ATP7A, and DBT) of the 19 genes were significantly different between HNSCC and normal samples (P < 0.05). Of these, 2 genes (CDKN2A, GLS) were up-regulated, and the other 13 genes were down-regulated in HNSCC samples. To explore the association between different genes associated with cuproptosis, we describe correlation patterns between the 19 genes (Fig. 1B). The information is shown in Additional file 1: Appendix 1.

Fig. 1figure 1

A Boxplot illustrating 19 Cuprotosis-related genes expression in HNSCC samples and associated normal controls. * indicates a p value of less than 0.05; ** indicates a p value of less than 0.01. B The heatmaps showing the expression levels of the 19 Cuprotosis-related genes. X stands for no biological significance

Identification of cuproptosis subtypes in HNSCC

For the 19 CRGs identified in Step Expression patterns of the cuproptosis genes, an unsupervised cluster analysis was performed on HNSCC samples (Fig. 2A), setting the K value range from 2 to 6, the optimal K = 2 is selected. As shown in Fig. 2B, two different subtypes (C1 and C2) were obtained and contained 232 and 282 HNSCC samples, respectively (Additional file 1: Appendix 2). Proportion of ambiguous clustering (PAC) verification curve as shown in Fig. 2C further verifies the stability of the clustering results. The minimum value of PAC, namely with an optimal K, is 2.

Fig. 2figure 2

A Consensus clustering cumulative distribution function (CDF) for k = 2–6. B Consensus clustering matrix for k = 2. C PAC validation curve

Comparison of cuproptosis scores

For the 19 cuproptosis genes, the GSVA algorithm was used to calculate the enrichment score in each HNSCC sample to represent the cuproptosis score for each sample, as shown in Figure 3. The cuproptosis score was significantly different between the two subtypes (Additional file 1: Appendix 3), with the cuproptosis score lower in cluster C1. Therefore, the stratification of the cuproptosis subtypes was further confirmed.

Fig. 3figure 3

A Comparison of cuproptosis scores between different subtypes. B KM survival curves for cuproptosis scores

Survival analysis and clinical correlation of subtypes

Kaplan-Meier curve analysis was used to assess the correlation between survival and prognosis of different subtypes (Figure 4A). The survival prognosis information for different subtypes was significantly different, and cluster C2 was associated with a poor prognosis. The distribution of the expression of the 19 cuproptosis genes evaluated in each subtype was plotted on a heatmap (Figure 4B). For all HNSCC samples, clinical information of the samples was classified and the correlation analysis between the subtypes and clinical characteristics of the samples was performed (Additional file 1: Appendix 4). As shown by the Chi-square test results in Figure 4C, there were significant differences in the TMN staging (Clinical_N, Clinical_T, and Clinical_stage) between the two subtypes.

Fig. 4figure 4

A KM survival curves of different subtypes. B Heat map of the 50 most differentially expressed genes. Blue represents low expression; red represents high expression. C The clinical features between the two subtypes of patients

Comparison of the immune microenvironment between cuproptosis subtypes

Based on the expression profile data of the HNSCC samples, the CIBERSORT algorithm was used to calculate the immune cell types of each sample and the proportion of each of the 22 immune cell types was obtained. The information is shown in Additional file 1: Appendix 5. Differences in the proportion of various immune cells were compared between different subtypes. Based on P < 0.05 as the threshold, 12 distinct immune cell types (DICs) were identified, and Fig. 5A shows the comparison results between groups. Using the Estimate algorithm, immune and matrix scores were calculated, as shown in Additional file 1: Appendix 6. The differences in immune and stromal scores between different subtypes were then analyzed, as shown in Fig. 5B. Cluster C1 was found to have higher percentages of CD8+ T cells, follicular helper T cells, T-regs, monocytes, and mast cells resting, and lower percentages of activated mast cells, macrophages, CD4+ T cells, and plasma cells than cluster C2.

Fig. 5figure 5

A Immune cell types with significant differences between the two subtypes using CIBERSORT. B The violin plots of different infiltration levels of immune cells with immune scores and stromal scores in different subtypes.* indicates a p value of less than 0.05; ** indicates a p value of less than 0.01;*** P value of less than 0.001

Analysis of immune checkpoints and HLA family genes between subtypes

The expression of immune checkpoint genes extracted from TCGA dataset is shown in Additional file 1: Appendix 7. Differences in immune checkpoint gene expression between different subtypes were compared, with P < 0.05 as the threshold and a total of 11 immune checkpoint genes (CD278, LAG3, BTLA, PD-1, 4-1BB, CTLA-4, CD47, TIM3, PD-L1, TIGIT, and OX40) with significant differences were obtained; the comparative analysis between subtypes is shown in Fig. 6A. The expression of the HLA family gene was compared between different subtypes using the Wilcoxon test. The comparisons between subtypes are shown in Fig. 6B and the list is provided in Additional file 1: Appendix 8.

Fig. 6figure 6

A Differential expression of the immune checkpoint genes in different subtypes. B The expression difference of HLA family between different subtypes. * indicates a p value of less than 0.05; ** indicates a p value of less than 0.01; *** P value of less than 0.001. C Volcano plot for differential gene expression.Blue represents low expression; red represents high expression. D Heat map of differential gene expression

The expression of 19 HLA family genes (HLA-A, HLA-B, HLA-C, HLA-DMA, HLA-DMB, HLA-DOA, HLA-DPA1, HLA-DPB1, HLA-DPB2, HLA-DQA1, HLA-DQA2, HLA-DQB1, HLA-DQB2, HLA-DRA, HLA-DRB1, HLA-DRB5, HLA-DRB6, HLA-E, and HLA-F) was significantly different between clusters C1 and C2, and all genes were upregulated in cluster C1.

Identification of differentially expressed genes between subtypes

Using the method described in Step “Comparison of immune checkpoint genes and HLA family gene differences between subtypes”, a differential gene analysis was performed between the two subtypes (Fig. 6C, Additional file 1: Appendix 9), and 101 differentially expressed genes were obtained. Among these, the expression of 69 genes such as GCSH, B4GALNT1 was up-regulated and that of 32 genes such as PTPRCAP, ZNF683 was down-regulated.

Enrichment analysis of functional pathways

GO function and KEGG signal pathway enrichment analyses were performed for the DEGs obtained in Step “Identification of differentially expressed genes between subtypes” to explore the functional terms involving the key genes. Enrichment results are shown in Fig. 7, which lists the Top10 results having a P-value < 0.05 and FC > 2 as thresholds. A total of 146 significantly correlated DEGs enriched in 9 cell components, 26 biological processes involved in 15 KEGG signaling pathways, and 17 molecular functions were selected. The information is shown in Additional file 1: Appendix 10. Among the biological processes, the DEGs were mainly enriched in cell–cell signaling, epithelium development, and biological adhesion. Regarding the cellular components, the DEGs were significantly enriched in the collagen-containing extracellular matrix and the external encapsulating structure. For molecular functions, DEGs were significantly enriched in signaling receptor binding. Figure 7 shows all the detailed results of the GO term enrichment analysis. In addition, 5 DEGs were significantly enriched in chemical carcinogenesis pathways.

Fig. 7figure 7

GO functional and KEGG pathway enrichment of differentially expressed genes. AC Bubble chart, the horizontal axis represents the proportion of those genes, the vertical axis represents the pathway name, the size of the dot indicates the number of genes expressed in the pathway, and the color of the dot corresponds to the different Qvalue range. D Chord diagram, the outer circle shows differentially expressed genes involved in the pathway

Identification of genes significantly associated with prognosis

Among the DEGs identified in Sect. "Identification of differentially expressed genes between subtypes" above, a total of 42 genes, including CHGB, GRB14, PTPRCAP, with significant prognostic correlation were selected by univariate Cox regression analysis using the R3.6.1 language survival package (v2.41-1) with P < 0.05 as the threshold (Fig. 8), and the gene list is provided in Additional file 1: Appendix 11.

Fig. 8figure 8

Univariate regression forest plot of the 42 prognostic genes

Construction of the prognostic model and performance verification

Based on the expression data of 42 prognostic genes identified in 3.9 above, the LASSO algorithm was used to select optimized genes (Figs. 9) and 11 key genes were obtained. Stepwise COX regression algorithm was used to select the optimal gene combinations (Fig. 9). Finally, 8 model genes (CHGB, GRB14, SRPX, PTPRCAP, ZFR2, ZNF556, GZMH, and C1orf186) were obtained. The Risk Score model was then constructed based on the regression coefficients of these 8 genes and their expression levels in TCGA training data set. Of these, CHGB, GRB14, SRPX, and CCNA1 were positively correlated with the Risk Score. The Risk Score of each patient was calculated and the samples in TCGA training set and the GEO verification set were divided into High-risk (Risk Score higher than the median value of the Risk Score) and Low-risk (Risk Score equal to less than the median value of the Risk Score) groups. The distribution of the Risk Score value and the survival time distribution for each group is shown in Fig. 10A and D. The Kaplan–Meier curves were then used to assess the association between the High- and Low-risk groups and the actual prognoses of the patients. The Kaplan–Meier curves for each data set are shown in Fig. 10B and E, respectively. ROC curves of 1-, 3-, and 5 year survival based on genetic prognostic characteristics are shown in Fig. 10C and F. In TCGA training set and the GEO validation set, the different risk groups based on the prediction of the Risk Score model were significantly correlated with the actual prognosis; the high-risk group had a worse prognosis. The scores and grouping of the Risk Score in the training and validation sets are shown in Additional file 1: Appendix 12 and 13, respectively.

Fig. 9figure 9

A LASSO coefficient profiles of the 42 prognostic genes. B Partial likelihood deviance for the LASSO coefficient profiles. Two vertical dotted lines indicate lambda. min(the red line on the left) and lambda.1se (The black line on the right). C Forest plot of multivariate Cox regression analysis of 8 model genes

Fig. 10figure 10

A The risk score distribution, and patients survival status in TCGA training set. B KM survival curves based on riskscore prediction model. C ROC curves of the training set for 1-year, 3-year and 5-year survival. D The risk score distribution, and patients survival status in GEO validation set. E KM survival curves based on riskscore prediction model. F ROC curves of the validation set for 1-year, 3-year and 5-year survival

Prognostic independence analysis

The clinical information of all TCGA head and neck cancer samples was extracted (Additional file 1: Appendix 14). Univariate Cox regression analysis was performed on the clinical factors and for each risk group using the R3.6.1 language survival package. Factors with P < 0.05 were selected for multivariate Cox regression to identify significant independent prognostic factors (Fig. 11). Multivariate Cox regression analysis confirmed that the risk group was an independent prognostic factor after adjustment for other clinicopathological factors.

Fig. 11figure 11

Forest plots of univariate (A) and multivariate cox regression analysis (B) of clinical information

Development of a nomogram to predict survival based on independent prognostic factors

To further analyze the correlation between age and risk group, which were significantly correlated with prognosis and survival prognosis, age and risk group were included in the construction of a Nomogram survival model, as shown in Fig. 12A. Integrating various clinical indicators into the "Total points" axis in the first row predicted the survival of the samples. The consistency between the 1-, 3-, and 5-year survival rates predicted by the survival model and the actual 1-, 3-, and 5-year survival rates was analyzed and verified (Fig. 12B). Figure 12C shows that the nomogram is significantly associated with the patient's prognosis. The 1-, 3-, and 5-year ROC curves of the nomogram of the nomogram are shown in Fig. 12D.

Fig. 12figure 12

A The nomogram to predict overall survival based on independent prognostic factors. B Calibration plot for nomogram predicted and actual survival rate. Vertical axis represents actual survival, and horizontal axis represents the nomogram-predicted survival. C KM survival curves based on nomogram prediction model. D ROC curve of nomogram model predicting1-year, 3-year and 5-year survival

Analysis of drug sensitivity

The sensitivity of each patient to chemotherapeutic agents was estimated based on the cancer drug sensitivity genomics database, and the IC50 was quantified using the pRRophetic package in R. We compared the differences in the IC50 levels of 138 chemotherapeutic agents (Additional file 1: Appendix 15) and the six common chemotherapeutic agents identified are shown in Fig. 13.

Fig. 13figure 13

IC50 values of six chemotherapeutic drugs in different risk groups

Prediction of the efficacy of immunotherapy

Using the TIDE algorithm, a TIDE score was calculated for each patient with HNSCC to predict his response to immune checkpoint therapy (Additional file 1: Appendix 16). As shown in Fig. 14A, subjects in the high-risk group showed higher TIDE scores than those in the low-risk group.

Fig. 14figure 14

A Comparison of the TIDE score in different risk groups. B Hallmark gene sets with a significant enrichment. C Distribution between Cluster and RiskGroup

GSEA enrichment analysis

GSEA analysis revealed that there were 9 sets of hallmark genes with significant differences in the different risk groups with a value of P < 0.05 and the normal enrichment score > 1 as thresholds (Fig. 14B and Additional file 1: Appendix 17). The gene sets of the high-risk group samples were enriched in the epithelial-mesenchymal transition and the Hedgehog signaling pathways.

Association between high- and low-risk groups and cuproptosis subtypes

The relationship between cuproptosis subtype (Cluster) grouping and high and low risk groups (risk group) was evaluated (Fig. 14C). Through a one-to-one correspondence of the samples, the Sankey diagram was drawn as shown in Fig. 14D and in Additional file 1: Appendix 18. The high-risk groups had a significantly higher proportion of genes belonging to the C2 subtype.

留言 (0)

沒有登入
gif