Machine Learning Identifies FLNA as a Key Molecular Target Regulating Neuronal Apoptosis after Spinal Cord Injury

Animals

All the rat experiments were approved by the Animal Ethics Committee of Binzhou Medical University. Thirty healthy adult Sprague–Dawley rats, weighing 220–250 g, were provided by Jinan Pengyue Laboratory Animal Breeding Co., Ltd. The animals were housed under a 12-h light/dark cycle with appropriate temperature and humidity conditions. Three to five rats were housed per cage with water and food provided ad libitum.

Rat Model of SCI

In this study, 30 eight-week-old female Sprague–Dawley rats were used. After anaesthesia, the rats were randomly divided into two groups, the sham surgery group and the SCI group, with 5 rats in each group. All SCI surgeries were performed under sterile conditions. Briefly, after the hair was shaved to fully expose the dorsal skin over the T9–T11 vertebrae, the surgical area was thoroughly disinfected with iodophor. A midline skin incision was made along the back, centred at the T10 spinous process. The T10 vertebral lamina was removed, followed by the creation of a complete transverse section at the T10 level using a sharp surgical blade (Zhao et al. 2025). Successfully induced SCI manifested as spinal cord congestion, leg swinging, tail reflex movements, and slow paralysis. The sham surgery was performed using the same procedure but without inducing spinal cord contusion. The wounds were sutured, and all the animals were housed in individual environments at 24 °C with sufficient water, food, and clean bedding. The rats underwent intermittent assisted urination twice daily until recovery of the autonomic rhythm of the bladder.

Tissue Preparation

At 1, 3, 7, 14, and 28 days postinjury (dpi), the animals were deeply anaesthetised with sodium pentobarbital (40 mg/kg) for tissue sample collection. Spinal cord samples from the sham surgery group were collected 3 days after the sham surgery. Thirty adult Sprague–Dawley rats were randomly assigned to one of six experimental groups (sham-operated control and postoperative time points at 1, 3, 7, 14, and 28 days; n = 5 per group) for comprehensive molecular analyses. Each experimental group underwent three parallel detection modalities: (1) quantitative proteomic profiling, (2) Western blot validation, and (3) qRT‒PCR verification, ensuring systematic investigation at both the protein and mRNA expression levels. Animals were individually perfused with phosphate-buffered saline through the heart. Spinal cord tissues (5 mm rostral and 5 mm caudal to the injury site) was harvested and stored at − 80 °C.

Protein Extraction

Total protein from each spinal cord tissue sample was extracted using a tissue protein extraction reagent (Thermo Fisher Scientific, Inc.). Each spinal cord sample was processed into a protein lysate and transferred to a 2.5 ml centrifuge tube. The tissue was disrupted using magnetic beads and incubated on ice for 30 min, followed by centrifugation at 12,000 × g for 10 min at 4 °C. The supernatant was collected, and the protein concentration was determined using a bicinchoninic acid (BCA) protein assay kit (Thermo Fisher Scientific, Inc.). The protein was then prepared using a 5 × laemmli protein assay kit (Thermo Fisher Scientific, Inc.) and stored at −20 °C.

Liquid Chromatography‑Mass Spectrometry (LC‑MS)/MS and Data Processing

LC‒MS/MS experiments were performed on an EASY-nLC 1200 system. All analyses were conducted using a Q Exactive Orbitrap mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) equipped with a nanoelectrospray ion source. The dried samples were dissolved in water containing 0.1% formic acid. The method was performed using two-step column separation. The analytical column used was a 10 cm EASY column (SC2003, Thermo Fisher Scientific), and the precolumn used was a 2 cm EASY chromatography column (SC001, Thermo Fisher Scientific). Peptides were eluted with a 90-min linear gradient from 4 to 100% ACN at a flow rate of 250 nL/min. The mass spectrometer was operated in positive ion mode, and spectra were acquired with a resolution of 70,000. This was followed by continuous high-collision dissociation (HCD) fragmentation of the top 10 most abundant ions. The raw proteomics data were analysed using the UniProt database. The specific database entry for Rattus norvegicus was identified by the accession number UP0000002494. The search parameters were as follows: the maximum tolerances for survey scans and MS/MS analysis were 10 ppm and 5 ppm, respectively. Trypsin was used as the specific enzyme, allowing a maximum of two missed cleavage sites. Carbamidomethylation of cysteine was set as a static modification, and oxidation (M) was set as a dynamic modification. The maximum false discovery rate (FDR) for peptide and protein identification was set to 1%.

Data Processing and Differentially Expressed Protein (DEP) Identification

For the proteomics dataset, differential expression analysis was performed between the sham group and SCI group samples via t tests for rigorous statistical analysis of protein expression data. In this process, the t test function was used to precisely calculate the P value for each protein, which was used to assess the significance of differences in protein expression between the two groups. Moreover, on the basis of the mean changes in protein expression between the two groups, the log fold change (LogFC) for each protein was calculated. This indicator visually and quantitatively reflects the degree of changes in protein expression between the two sample groups. Proteins corresponding with DEGs were identified using the selection criteria of a P value < 0.05 and a |logFC|> 1. The DEGs were visualised using heatmaps and volcano plots generated with the ggplot2 and volcano packages (Martinez-Rojas et al. 2022). Additionally, to ensure the comprehensiveness and reliability of the data, batch differential protein processing was performed on the sequencing datasets GSE183591 and GSE45006 using R version 4.3.3. Under the conditions of |log2FC|> 1 and adjusted P < 0.05, the “limma” package was used to identify differentially expressed genes (DEGs), further validating and expanding the research findings.

Weighted Gene Coexpression Network Analysis (WGCNA)

WGCNA was implemented. The WGCNA tool used is available on Hiplot Pro (https://hiplot.com.cn/). Hiplot Pro, a comprehensive network service platform, can be used in the analysis and visualisation of biomedical data. To explore the intrinsic correlations between module genes (module eigengenes, MEs) and SCI, Pearson correlation coefficient was chosen as the quantification metric. The Pearson correlation coefficient is a classic method used in systems biology to describe gene association patterns between different samples. This approach enables the precise identification of genes with high coexpression patterns. On the basis of the biological importance of the gene sets and the intrinsic relationship between the gene sets and phenotypes, this method aids in efficiently selecting potential candidate biomarkers or phenotypes. Additionally, it has outstanding performance in identifying candidate biomarker genes and therapeutic targets.

Two Machine Learning Algorithms Were Used To Screen Markers

In this study, two machine learning algorithms, namely, random forest (RF) and least absolute shrinkage and selection operator (LASSO), were utilised to identify potential biomarker features. Random forest is a widely adopted machine learning approach that operates by constructing an ensemble of decision trees for feature protein selection. It is applicable to both classification and regression tasks (Tai et al. 2019; Wang et al. 2016; Ishwaran And Kogalur 2010). For the random forest analysis, the online data analysis platform Hiplot was employed. The parameters were configured as follows: the number of decision trees was set to 1000, cross-validation was performed with 10 folds, and the variable reduction rate was fixed at 1.5. These settings were determined on the basis of prior studies and preliminary experiments to ensure the reliability and stability of the analysis results. Conversely, LASSO is predominantly applied in regression analysis, variable selection, and dimensionality reduction. The primary objective of its use is to increase the predictive accuracy and interpretability of statistical models (An et al. 2023; Zanardi And Alessio 2021). In this study, the glmnet package in R was utilised to conduct cross-validation. During this process, the optimal regularisation parameter (lambda value) was carefully selected through a comprehensive evaluation of model performance across different lambda settings. This selection was crucial, as it directly influenced the effectiveness of variable selection and the overall performance of the LASSO model. Subsequently, web-based tools were leveraged to perform a cross-analysis of the outcomes from the LASSO regression and random forest analyses. By integrating the results of these two algorithms, a more comprehensive and accurate understanding of the relationships between variables was achieved. Through this cross-analysis, the key target protein FLNA was successfully identified.

ROC Curve Analysis

To assess the performance of the key biomarkers, we used the “pROC” package in R to analyse the training set proteomics dataset as well as the GSE45006 and GSE183591 validation sets. First, for the training set, we employed relevant functions from the “pROC” package to rigorously plot the receiver operating characteristic (ROC) curve of the key biomarkers on the basis of the expression values and classification information of the biomarkers in the samples. By using precise algorithms, we calculated the sensitivity (true positive rate) and specificity (false positive rate) at different thresholds, thereby generating an accurate curve. For the GSE45006 validation set, we followed the same protocol as the training set and used the “pROC” package to plot the corresponding ROC curve on the basis of the key biomarker data and sample classifications in the validation set, ensuring that the results were comparable. Similarly, for the GSE183591 validation set, we performed the same procedure using the “pROC” package to complete the curve plotting. After the curves were plotted, we used specific functions from the “pROC” package to precisely calculate the area under the curve (AUC) for each ROC curve. The AUC is a core metric for evaluating the diagnostic performance of key biomarkers, with values ranging from 0 to 1. The closer the AUC is to 1, the better the ability to distinguish between sample categories; when the AUC is 0.5, the ability is equivalent to random guessing. By calculating the AUCs for the training set and two validation sets, we comprehensively evaluated the performance of the key biomarkers across different datasets, providing strong data support for subsequent research and application.

Single-Cell RNA Sequencing (ScRNA-seq) Data Processing and Cell Type Identification

For single-cell genomics analysis of the GSE213240 dataset, we utilised the “Seurat” package for comprehensive exploration. For data preprocessing, strict filtering was applied. Cells with more than 6000 total expressed genes, mitochondrial gene counts above 15%, and genes expressed in fewer than 3 cells were excluded. We used log2(CPM + 1) values as the input matrix and normalised it with the “NormalizeData” function in Seurat to standardise gene expression levels across cells. To correct batch effects, the “Harmony” algorithm integrated in Seurat was employed, which effectively removed technical variation. Cell-type annotation was achieved through a multistep approach. We first compared the expression of known marker genes in our dataset with cell type-specific signatures from the Panglao DB. The “Find Clusters” function in Seurat was subsequently used with a resolution of 0.5 to identify cell clusters. Visual inspection of UMAP plots and gene expression profiles, along with references to previous studies, helped accurately label each cluster. In the analysis stage, the “Find Variable Features” function was used to identify highly variable proteins, and principal component analysis (PCA) was performed to extract key data features. For single-cell visualisation, we applied the uniform manifold approximation and projection (UMAP) method to map high-dimensional data onto a 2D plane. To identify DEGs between cell populations, we used the “Find All Markers” function with screening criteria of |log2(fold change)|> 0.25 and adjusted p value < 0.05. Finally, the “Dim Plot” and “Feature Plot” functions were utilised to visualise the single-cell map and gene expression patterns, respectively, facilitating the observation of the cell type distribution and specific gene expression levels (Zhang et al. 2023).

Protein–Protein Interaction (PPI) Network Analysis

STRING (https://string-db.org/) is a database that encompasses both direct (physical) and indirect (functional) associations (Szklarczyk et al. 2021) and is used for predicting protein–protein interactions. In this study, we input a protein list into the database and selected the species "R. norvegicus" within the STRING database to perform the search and construct a PPI network.

GO and KEGG Enrichment Analysis

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses are key bioinformatics approaches for analysing gene functions and biological processes. GO enrichment analysis aids in the identification of gene functions related to the dimensions of biological processes (BP), cellular components (CC), and molecular functions (MF), whereas KEGG pathway enrichment reveals the pathway associations between genes and biological processes. In this study, we carefully selected appropriate tools and databases. Specifically, the “clusterProfiler” R package was used for KEGG analysis to efficiently map the target gene set to KEGG pathways and select significantly enriched pathways. Similarly, the “Metascape” database was used for GO analysis, which integrates multiple data sources to accurately identify gene enrichment in different GO categories.

Western Blotting

Proteins were separated via SDS‒PAGE and transferred to a PVDF membrane. After being blocked with a 10% skim milk solution for 2 h, the membrane was incubated overnight at 4 °C with specific primary antibodies. The primary antibodies used included the following: anti-GAPDH antibody (1:5000, T004, Affinity), anti-FLNA antibody (1:1000, A3738, ABclonal), anti-phosphorylated PI3K antibody (1:1000, YP0765, Immunoligy), anti-phosphorylated AKT antibody (1:1000, RMAB48852, Bioswamp), anti-PI3K antibody (1:1000, AF6241, Affinity), and anti-AKT antibody (1:500, 6,020,203–2-IG, Proteintech). The next day, the membrane was washed three times with TBST for 10 min each, and then incubated with the appropriate secondary antibody (Sino Biological Inc., Beijing, China) at room temperature for 1 h. After three washes with TBST, the protein bands were visualised via enhanced chemiluminescence (ECL) reagents (Amersham Pharmacia Biotech, Freiburg, Germany) and imaged using an enhanced chemiluminescence imaging system (Clinx, Shanghai, China). The bands were then visualised and quantified using Image Lab 3.0 software.

Cell Culture and Transfection

The PC12 cell line is one of the most used cell lines in neuroscientific research (Wiatrak et al. 2020). In this study, PC12 cells were selected for in vitro experiments. The model of cell damage induced by treatment with H₂O₂ was used to simulate neuronal injury. PC12 cells with good growth status were seeded at a density of 1 × 104 cells per well in a 96-well plate and cultured for 6 h. After the cells had adhered, they were treated with H₂O₂ for 24 h at concentrations of 10, 20, 30, 40, and 50 μmol/L. The viability of the PC12 cells was assessed using a CCK-8 assay, and a dose‒response curve was generated by plotting the viability of the cells at various concentrations. The half-maximal inhibitory concentration, corresponding to 50% cell viability, was determined and subsequently utilised to establish the H2O2-induced injury model for subsequent experiments. Our study included three groups: (1) Control group: PC12 cells cultured under standard conditions; (2) H₂O₂ group: PC12 cells treated with H₂O₂ (30 μmol/L); (3) H₂O₂ + siFLNA group: PC12 cells transfected with siFLNA and treated with H₂O₂.

PC12 cells were seeded at a density of 3 × 105 cells per well in a 6-well plate. When the cell density reached 30–50%, transfection was performed using Lipofectamine 2000 with siFLNA (Obio Technology, Shanghai, China) at a concentration of 50 μM. The siRNA sequences used were as follows: forward, GAUCAAGAGUUCACAGUAATT; reverse, UUACUGUGAACUCUUGAUCTT. After 6 h incubation with the transfection reagent, the medium was replaced with normal culture medium, and after 42 h, the expression of FLNA and downstream proteins was analysed by Western blotting (WB) to verify the silencing effect and pathway expression.

Cell Counting Kit-8 (CCK-8) Determination

The proliferation ability of PC12 cells was assessed using the CCK-8 method (Miao et al. 2023). Specifically, for the evaluation of cell viability, 1 × 104 cells per well were seeded in a 96-well plate, with three wells per group. After transfection, the cells were treated with the half-maximal inhibitory concentration of H₂O₂ according to the groupings. After 24 h of cell damage, cell viability was measured using a CCK-8 assay (E—CK—A362, Elabscience, Wuhan, China).

DHE Fluorescence Staining

Dihydroethidium (DHE) is a fluorescent probe that is commonly used to label ROS in live cells. PC12 cells in good growth conditions were subsequently seeded at 8 × 104 cells per well in a 24-well plate. After the cells adhered to the culture plate, they were treated with siFLNA and H₂O₂. The medium was then replaced with medium containing 10 μM DHE working solution, and the cells were incubated at 37 °C for 30 min. Afterwards, the cells were washed with PBS, and ROS production was observed using an inverted fluorescence microscope (DMi8manual (192.0.0168), Leica, Germany).

RNA Extraction and qRT‒PCR

Total RNA from the spinal cord tissue samples was extracted using TRIzol (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s instructions. cDNA was synthesised from 2 μg of total RNA using a high capacity cDNA reverse transcription kit (TransGen Biotech, Beijing, China). Real-time quantitative PCR was performed using SYBR Green PCR Master Mix (Vazyme Biotech Co., Ltd., Nanjing, China), with GAPDH as the internal control. Each sample was measured in triplicate. The relative expression levels were calculated using the 2−ΔΔCt method and are presented as fold changes compared to the control. The sequences of primers used were as follows:

Statistical Analysis

In this study, statistical analysis was performed using GraphPad Prism 8.0.1 (GraphPad Software, La Jolla, CA, USA). Protein expression was analysed using one-way analysis of variance (ANOVA) followed by Bonferroni post hoc correction. A p value of less than 0.05 was considered statistically significant.

Comments (0)

No login
gif