Leveraging a disulfidptosis-related signature to predict the prognosis and immunotherapy effectiveness of cutaneous melanoma based on machine learning

Data resources

Based on the DRG, this study constructed a signature in CM and downloaded multiple CM cohorts from public databases, as well as immunotherapy cohorts from other cancers. The CM data for this study were obtained from the TCGA database, GTEx (The Genotype-Tissue Expression, https://gtexportal.org/home/) database, GEO (Gene Expression Omnibus, https://www.ncbi.nlm.nih.gov/geo/) database, and TIGER (Tumor Immunotherapy Gene Expression Resource, http://tiger.canceromics.org/#/home) database.

We obtained RNA sequencing (RNA-seq) data, as well as survival data, gender, age, clinical stage, and single nucleotide polymorphism (SNP) data of 473 SKCM patients from the TCGA database. In the GTEx database, we selected RNA-seq data from normal skin tissues. In the GEO database, GSE65904 includes RNA-seq data, survival data, gender, age, and other information of 214 CM patients. GSE54467 includes RNA-seq data, survival data, gender, age, clinical stage, and other information of 79 CM patients. GSE46517 includes RNA-seq data from 104 CM samples and 17 normal samples (including normal skin tissues, nevus tissues, and epithelial melanocytes). In the TIGER database, Melanoma-PRJEB23709 includes RNA-seq data, survival data, immune therapy response data, gender, age, and other information of 91 CM patients. STAD-PRJEB25780 includes RNA-seq data, survival data, immune therapy response data, and other information of 78 gastric cancer patients. In addition, we identified 18 disulfidptosis-related genes from the literature (Liu et al. 2020, 2023) on disulfidptosis.

Genetic roles and expression analysis of DRGs in CM

For the 18 identified DRGs obtained from the literature(Liu et al. 2020, 2023), we analyzed their roles in CM, including protein–protein interaction (PPI) and enrichment analysis, tumor somatic mutations and copy number variations, as well as differential expression in CM and normal skin tissues.

First, we used the online tool GeneMANIA (https://genemania.org/) for PPI network analysis and data visualization of the 18 DRGs, and performed gene enrichment analysis. Additionally, SNP data were obtained from TCGA (Blum and Wang 2018). The R package “maftools” (Mayakonda et al. 2018) was used to process and analyze the SNP data, counting the occurrences of missense mutations, nonsense mutations, silent mutations, and frameshift/inframe insertions and deletions. Moreover, Tumor Mutational Burden (TMB) was calculated using tumor somatic mutation data. Finally, we selected samples with DRG-associated somatic mutations or copy number variations in CM for analysis and created a waterfall plot.

Lastly, we validated the expression differences of the DRGs in tumor and normal tissues through TCGA-SKCM and GTEx data, as well as the GSE46517 dataset.

Identifying novel DRGs via unsupervised clustering

We cleaned the missing values in the RNA-seq data and clinical data. Based on the expression of 18 DRGs, we performed cluster analysis on the gene expression data of TCGA-SKCM, GSE65904, and GSE54467 using the R package “ConsensusClusterPlus”. First, we read the expression data file and survival data file. Then, we removed normal samples and preprocessed the data according to the selected grouping criteria, including operations such as transposition, mean subtraction, and logarithmic transformation. These steps ensure the uniformity and comparability of the data. Next, we merged the expression data with the survival data, which helps establish a joint analysis model based on expression data and patient survival time. Then, we conducted univariate COX regression analysis. COX regression models were applied to each gene, and their impact on patient survival was evaluated by calculating p-values, resulting in the selection of significant genes. We then proceeded with cluster analysis. Initially, the maximum number of clusters (maxK) was set to 10, and 50 clustering operations were performed to ensure the stability of the results. We adopted a widely used clustering algorithm (pam) and used Euclidean distance as the clustering metric. The optimal value of K, which provided the highest within-group correlation and lowest between-group correlation, was determined through analysis. Subsequently, the R packages “survival” and “survminer” were used for Kaplan–Meier (KM) analysis of clusters associated with disulfidptosis to compare differences in overall survival (OS). Additionally, we analyzed differences in clinical data and immune cell infiltration levels associated with disulfidptosis clusters. Clinical data included age, gender, staging, T-stage, N-stage, and M-stage, while immune cell infiltration abundance data were obtained through the ssGSEA algorithm, TIMER database, CIBERSORT algorithm, and ESTIMATE algorithm. Differential analysis using the R package “limma” was then performed to identify genes with significant differences between the two groups. The analysis results generated new DRGs.

Construction and validation of the disulfidptosis-related signature

Based on survival data, a “coxph” function was used to conduct Cox regression analysis on DRGs to identify DRGs significantly associated with survival (P < 0.05). Five machine learning algorithms, including Decision Tree, LASSO, Random Forest, GBDT, and XGBoost, were used to build models and calculate the importance of each DRG in relation to survival. Finally, the results of each model were combined for data fusion to calculate the comprehensive weight of survival correlation.

Subsequently, two R packages, “glmnet” and “survival,” were used to construct the lasso cox model. Firstly, the “cv.glmnet” function was used for ten-fold cross-validation to select the optimal penalty coefficient (λ) for model fitting and further analysis and prediction. Then, the “coef” function was used to extract DRGs associated with non-zero coefficients, and the sum of the product of the coefficients and expression levels of these DRGs represented the risk score. The median risk score of all samples in the training set was used as the cutoff value to divide all samples into high-risk and low-risk groups. KM analysis was performed on the high-risk and low-risk groups, and the R package “timeROC” was used for receiver operating characteristic (ROC) curve analysis to calculate the corresponding area under the curve (AUC) and evaluate the accuracy of the model. Subsequently, external validation of the model was conducted using data from the GEO database and the TIGER cohort.

In addition, to evaluate the relationship between the risk score and clinical data, we analyzed the differences in T stage, N stage, gender, and age between the high-risk and low-risk groups. Furthermore, to better utilize the prognostic model, we constructed a positive determinative graph by integrating the risk score and clinical data.

Evaluation of the tumor microenvironment in molecular subtypes

The tumor immune microenvironment (TME) plays a crucial role in the occurrence, development, and treatment of cancer. In order to examine the differences in TME between risk subtypes, we obtained evaluations of tumor immune cell infiltration levels from various algorithms based on RNA-seq data from TIMER2.0 (http://timer.cistrome.org/), including TIMER, CIBERSORT, CIBERSORT-ABS, QUANTISEQ, MCPCOUNTER, XCELL, EPIC. We analyzed the differences between the high and low-risk groups. Additionally, the ssGSEA evaluated the infiltration levels of 16 types of immune cells (aDCs, B cells, CD8 + T cells, DCs, iDCs, Macrophages, Mast cells, Neutrophils, NK cells, pDCs, T helper cells, Tfh, Th1 cells, Th2 cells, TIL, Treg) using the R package “GSVA.” We obtained the expression levels of immune checkpoint markers investigated in previous studies and analyzed whether there were differences between the high and low-risk groups. Furthermore, we used the ESTIMATE algorithm to assess the relationship between tumor purity and risk scores, including ESTIMATE Score, Immune Score, and Stromal Score. We also analyzed the immune scores obtained from the TIDE (Tumor Immune Dysfunction and Exclusion, http://tide.dfci.harvard.edu/) online tool.

Functional enrichment analysis among molecular subtypes

GSEA and GSVA are both methods used to analyze the enrichment of gene sets in gene expression data. In this study, these two methods were used to analyze the pathways and functions associated with the risk model. The KEGG enrichment analysis was conducted using the GSEA software (version 4.2.3). Additionally, we analyzed the correlation between risk scoring and pathways related to disulfidptosis as well as tumor-related pathways.

The disulfidptosis-related pathways include: functions related to cytoskeletal organization, pentose phosphate pathway, lipid homeostasis, glutathione metabolism, tricarboxylic acid cycle, P53 signaling pathway, glutathione peroxidase activity, Nrf2 signaling pathway, cell response to glucose starvation, calcium binding involved in cell adhesion, cell adhesion molecule binding, and calcium binding. Important signaling pathways in tumors include Hippo, Wnt, MAPK, PI3K/AKT, TGF-β, NF-kB, Notch, AMPK, JAK-STAT, PD-1/PD-L1, mTOR, Ras, TNF, HIF-1, and ErbB.

Antibodies and reagents

Five monoclonal antibodies anti-HLA-DQA1 antibody (1:10,000 dilution for WB, 1:200 dilution for IHC, ab128959), anti-GZMA antibody (1:1000 dilution, ab209205), anti-CD79A antibody (1:10,000 dilution, ab79414), anti-LTB antibody (1:1000 dilution, ab89568) and anti-HE5 antibody (1:1000 dilution, ab125071) were purchased from Abcam (Cambridge, UK). Monoclonal antibody anti-β-actin (1:20,000 dilution, 66009-1-Ig) was purchased from Proteintech (Wuhan, China).

Cell culture and transfection

The human immortalized keratinocyte HaCaT, the human malignant melanoma cell line A-375, A-875 and SK-MEL-1 was obtained from Haixing Biosciences, Suzhou, Jiangsu, China. Cells were cultured in Dulbecco's Modified Eagle’s Medium (Gibco, Invitrogen, Paisley, UK) supplemented with10% fetal bovine serum and 1% streptomycin-penicillin (Gibco, Invitrogen, Paisley, UK) at 37 °C in a humified 5% CO2 incubator. Lipofactamine3000 (Thermo Fisher Scientific, Waltham, MA, USA) was used to transfect cells with siRNAs (RiboBio, Guangzhou, China following the manufacturer’s guidelines. 5′-GATGGAGATGAGCAGTTCT′ (st-h-HLA-DQA1_001), 5′-CTTGTGGTGTAAACTTGTA-3′ st-h-HLA-DQA1_002), and 5′-CAACTTGAACATCATGATT-3′ (st-h-HLA-DQA1_003) were the three target sequences for siRNA for HLA-DQA1.

Real-time quantitative PCR assay

Total RNA was extracted from HaCaT and the three the human malignant melanoma cell lines, using NucleoZOL reagent (740404.200, Meilunbio). Total RNA was then reverse transcribed into cDNA using Evo M-MLV RT Kit with gDNA Clean for qPCR (AG11705, AG, Hunan, China). Real-time quantitative PCR (RT-qPCR) was performed using the SYBR Green Premix Pro Taq HS qPCR Kit IV (AG11746, AG, Hunan, China). Relative quantification was determined using the 2(−ΔΔCt) method.

Western blotting analysis

The cell precipitate was lysed with RIPA (C5029, Bioss) to extract the total protein. The protein concentration was measured using the BCA method. Around 25 μg of total protein underwent gel electrophoresis and was transferred to a PVDF membrane (Millipore, Kenilworth, NJ, USA). Afterwards, the membrane was blocked with Protein-Free Rapid Blocking Buffer (Yamei, Shanghai, China) at room temperature for 15 min. The membrane was then incubated overnight at 4 °C with the primary antibody. The next day, the membrane was washed three times with TBST for 5 min each, and then incubated with the secondary antibody and membrane at room temperature for 1 h. Detection was carried out using the Immobilon Western Chemilum HRP Substrate Kit (WBKL S0500, Millipore, Kenilworth, NJ, USA). Specific areas of the gel were scanned to acquire images, and Adobe Photoshop software (CS4, Adobe Systems, USA) and ImageLab software (Bio-Rad, USA) were used for quantification.

Immunohistochemistry analysis

The tissue microarray chip (HMelC112CD01, Gongxie, Shandong, China) in the CM group underwent immunohistochemistry (IHC) detection of anti-HLA-DQA1 antibody following the standard labeled streptavidin biotin (LSAB) protocol (Dako, Carpinteria, CA). The immune reactive score (IRS) for each sample was calculated by multiplying the positive cell percentage score (0: no positive cells, 1: < 10%, 2: 10–50%, 3: 51–80%, 4: > 80%) with the staining intensity score (0: no color reaction, 1: mild reaction, 2: moderate reaction, 3: strong reaction). Immunohistochemical staining of tissue chips was performed using a two-step method. When the HLA-DQA1 score in cancer tissue was higher than that in non-tumor tissue, the expression level of GS in cancer tissue was considered upregulated (high); all other scores were considered down-regulated (low).

Cell proliferation assay

Cell proliferation was analyzed using the Cell Counting Kit-8 (BIOSS, Beijing, China) according to the manufacturer’s protocol. The melanoma cells transfected with siRNA expressing HLA-DQA1 and empty vector were diluted and seeded at a density of 1 × 103 cells per well in a 96-well plate containing 0.1 mL of culture medium. After 24 h of incubation, each well was treated with 10 μl of CCK-8 solution at 37 °C. Subsequently, the absorbance at 450 nm was measured using the Multiskan SkyHigh (Thermo Fisher, Stony Creek, the US).

Clonogenic assay

Gather the a-375 and A-875 cell lines with downregulated HLA-DQA1 and the control group cells, and seed 600 cells per well in a 6-well plate. The plate was then incubated at 37 ℃ for a duration of 14 days. Each well was treated with 500 μl of paraformaldehyde and fixed at room temperature for 25 min. D The paraformaldehyde was then discarded and the wells were washed with PBS. Next, 400 μl of a 2.5% methanol-crystal violet staining solution was added to each well and stain for 15 min. Finally, the observed cells were inspected and the number of cell clones was counted under a microscope.

Cell migration and invasion assay

Corning’s 80 μm 24-well Transwell plates (Falcon) were coated with 30% Matrigel (300 μl/well) for use in migration and invasion assays. Each upper chamber of the Transwell plates received 1 × 105 cells in 100 μl of serum-free medium, while the lower chamber was filled with 600 μl of medium containing 20% fetal bovine serum. After a 6-h incubation at 37 °C, non-migrating cells that remained in the upper chamber were gently removed from the upper surface of the Matrigel using a cotton swab. Subsequently, the cells were stained with 2.5% methanol-crystal violet solution for 15 min and subsequently enumerated. Five random microscopic fields (× 100 magnification) were assessed per well, and the mean was calculated.

Statistical analysis

In this study, a variety of statistical and bioinformatics analysis methodologies were employed. We initially conducted Wilcoxon rank-sum tests to compare the differences between two groups of samples. This choice was based on the test’s suitability for analyzing non-normally distributed data. Furthermore, Pearson’s Chi-squared test was used to assess the association between categorical variables in the dataset.

Additionally, t-tests were employed to examine the significance of differences between means in normally distributed data. For analyzing differences among sample groups containing more than two samples, the Kruskal–Wallis test was utilized as a non-parametric alternative to one-way ANOVA.

The threshold for statistical significance throughout the analysis was set at P < 0.05, indicating that P-values less than 0.05 were considered statistically significant. All statistical analyses were conducted exclusively using the R programming language. R is well-suited for bioinformatics data analysis due to its extensive ecosystem of packages and functions.

留言 (0)

沒有登入
gif