Identification and Experimental Validation of Biomarkers Related to MiR-125a-5p in Chronic Obstructive Pulmonary Disease

Introduction

Chronic obstructive pulmonary disease (COPD) is a heterogeneous lung disorder marked by persistent respiratory symptoms such as dyspnea, cough, and sputum production. It results from airway (bronchitis, bronchiolitis) and alveolar (emphysema) abnormalities, leading to ongoing and often worsening airflow blockage.1 COPD ranks among the top three global mortality causes, with 90% of these fatalities occurring in low- and middle-income countries.2,3 COPD may result from gene-environment interactions that impair lung function or disrupt their typical development and aging processes.4 Primary environmental risk factors include tobacco smoke and exposure to harmful particles and gases from both household and outdoor air pollution. Additionally, factors like atypical lung development and rapid aging of the lungs may contribute.4–6 The genetic risk factors identified to date that are most associated with COPD (although epidemiologically rare) mutations was SERPINA1 gene, which cause α1-antitrypsin deficiency, but other genetic variants with smaller individual effects, such as genomic regions near FAM13A, RIN3, CYP2A6, and DSP, are also linked with reduced lung function and COPD risk.7 Currently, the diagnosis of COPD mainly relies on spirometry to identify airflow obstructions that are non-fully reversible. However, this method is full of limitations, such as it is difficult to diagnose early detection of COPD.8 Therefore, the identification of novel biomarkers and exploration of their potential molecular mechanisms will help to parse the pathogenesis of COPD, provide a basis for early clinical diagnosis and a new target for treatment.

MiroRNAs (miRNAs) are tiny non-coding RNA strands, about 22 nucleotides long, that control gene expression post-transcriptional either by degrading or by attaching to target miRNAs.9 The miR-125a is localized to chromosome 19q13,10 and miR-125a-5p is its mature body. A number of studies have shown that miR-125a has an inhibitory effect in a variety of tumors.11 Recent studies have also highlighted the involvement of miR-125a-5p in autoimmune diseases, suggesting its role in modulating immune responses.12 For instance, our previous research demonstrated that miR-125a is highly expressed in human bronchial epithelial cells exposed to cigarette smoke extract. Inhibition of this miRNA was found to reduce both apoptosis and inflammation in these cells, underscoring its importance in inflammatory responses.13 Furthermore, Wang et al collected 141 local patients with COPD, isolated peripheral leukocytes, and analyzed them by qRT-PCR, showing that the level of miR-125-5p increased in all COPD groups, and gradually decreased in groups A, B, and C, but increased in group D compared with group C.14 This suggests that miR-125a-5p may play a dynamic role throughout the progression of COPD. Additionally, another study indicated that in COPD patients and mice exposed to cigarette smoke who were infected with influenza A viruses, the upregulation of miR-125a resulted in decreased expression of A20 and mitochondrial antiviral signaling pathways. This led to exaggerated inflammation and impaired antiviral responses, further supporting the hypothesis that miR-125a plays a crucial role in regulating inflammatory processes.15 Given these findings, exploring the mechanisms by which miR-125a-5p-related genes influence COPD pathogenesis could provide valuable insights for clinical diagnosis and treatment strategies for this disease.

Mendelian randomisation (MR) functions similarly to a natural experiment, using genetic variation as an instrumental variable to determine the causal link between exposure and outcome. This method is effective in reducing the effects of possible confounders and reverse causality. MR analyses using pooled data can identify potential correlations.16 In this study, biomarkers linked with miR-125a-5p were identified using COPD transcriptome data from the Gene Expression Omnibus (GEO) public database. Single-cell analysis was conducted to investigate the potential mechanisms of these biomarkers at the single-cell level. Subsequently, MR analysis was employed to explore the causal relationship between the identified biomarkers and COPD, which is crucial for further unraveling the pathogenesis of the disease.

Materials and MethodsData Source and Study Design

The GSE100153 as a training set, containing 19 COPD and 24 control samples,17 and the GSE146560 as a validation set, including 8 COPD and 8 control samples,18 the scRNA-seq dataset, GSE227691, included samples from 8 COPD patients and 4 normal lung tissues, were mined from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). Besides, the related genes targeted miR-125a-5p were predicted via the miRWalk database (http://mirwalk.umm.uni-heidelberg.de/),19 and a sum of 2329 target genes of miR-125a-5p were obtained with “Score>0.95, Position=3UTR” as the filtering condition. Additionally, genome-wide association study (GWAS) data for COPD and expression quantitative trait loci (eQTL) data for exposure factors, used in Mendelian Randomization (MR) analyses, were acquired from the Integrated Epidemiology Unit’s (IEU) Open GWAS database (https://gwas.mrcieu.ac.uk/). The dataset finn-b-COPD_OPPORTUNIST_INFECTIONS comprised 186,957 samples (ncase:ncontrol=234:186,723) and 16,380,363 single-nucleotide polymorphisms (SNPs). All samples are of European descent. Figure 1 shows the flow chart of this study.

Figure 1 Flowchart of this study.

Weighted Gene Co-Expression Network Analysis (WGCNA)

In the training set, the module genes associated with COPD were screened by WGCNA.20 First, the samples were clustered and outliers removed. To best match a scale-free topology, we established an optimal soft threshold. Furthermore, we defined the minimum number of genes required for each module at 150, following the hybrid dynamic tree cutting method. Finally, Pearson analysis was conducted to find the gene modules strongly associated with COPD, in which the genes were as module genes.

Identification and Functional Enrichment Analysis of Candidate Genes

The limma package was performed to compare the differences in gene expression levels between COPD and control samples in GSE100153, thereby screening for differentially expressed genes (DEGs) with the screening criteria of |log2FC| > 0.5 and P < 0.05.21 Volcano plot was created using ggplot2, while heat map was generated with pheatmap.22 Subsequently, candidate genes were obtained by overlapping DEGs, module genes, and target genes using the eulerr package. The GO and KEGG enrichment analyses were executed via the clusterProfiler package,23 so as to look for common functions and related pathways between the candidate genes (P < 0.05).

Screening Biomarkers by Machine Learning Algorithms

Candidate genes were applied for Lasso regression analysis and SVM-RFE analysis using the glmnet package and e1071 package, respectively,24 where the feature genes were overlapped to identify key feature genes. ROC curves were plotted for key feature genes in the GSE100153 and GSE146560 datasets using the pROC package, and genes with AUC values greater than 0.7 in both datasets were defined as biomarkers.25 By the way, the expression of biomarkers in the control and COPD groups in the training and validation sets was analysed by Wilcoxon test.26

Functional Enrichment of Biomarkers

To further understand the classical pathway, and function analyses of DEGs were first performed using QIAGEN (www.qiagen.com/ingenuity). Then, the upstream regulators, downstream target genes and possible pathways of action of the biomarkers were analysed using ingenuity pathway analysis (IPA).27 At last, Spearman correlation analyses between each biomarker and the all genes were performed separately using the psych in R package,28 and gene set enrichment analysis (GSEA) was conducted via clusterProfiler package based on the KEGG background gene set in MSigDB database.29

Immune Infiltration Analysis

Based on the immune-related genes obtained as background gene sets from previous studies,30 the cluster analysis of infiltrating immune cells in control and COPD samples was conducted via Pheatmap package in R.31 Then, the Wilcoxon test was conducted to monitor differences in enrichment scores for each immune cell.26 Where after, based on the set of genes for 13 immune-related functions obtained from previous studies,32 the ssGSEA algorithm was employed to measure immune-related function scores for the COPD and control samples in the training set and to analyse the differences in each immune-related function using the Wilcoxon test.33 At last, correlation analyses between differential immune cells, between biomarkers and differential immune cells, and between biomarkers and differential immune-related functions were performed by Spearman.28

Construction of Gene-Gene Interaction Network (GGI) Network and Competing Endogenous RNA (CeRNA) Network

To search for key genes associated with biomarkers, the GGI network of biomarkers was constructed based on GeneMANIA (https://genemania.org/), in which the top 20 genes and the top 7 pathways were visualized.34 Next, in order to further understand the gene expression regulation pattern, the corresponding lncRNAs of miR-125a-5p were predicted by the Starbase database with the filtering criterion of “clipExpNum>10”,35 and the Cytoscape software was used for visualisation of the ceRNA network of lncRNA-miRNA-mRNA.36

Analysis of Biomarkers for Transcription Factors (TFs), Drugs and Related Diseases

TFs targeting biomarkers was screened out with “library=ENCODE, FET p-value<0.05” as the screening condition based on the ChEA3 database (https://maayanlab.cloud/chea3/).37 Then, DNA binding sites were predicted for the top three TFs based on the JASPAR database (https://jaspar.genereg.net/).38 In addition, biomarker target therapeutic drugs were predicted via the CTD database (http://ctdbase.org/)39 and the results were visualised by Cytoscape software. Finally, diseases associated with biomarkers were analyzed via the disgenet2r package based on the DisGeNET database, and the results were similarly visualized using Cytoscape.36,40

Single-Cell Sequencing Analysis

The R package Seurat (v 4.1.0)41 systematically processed and analyzed single-cell transcriptome sequencing data. First, we constructed Seurat objects and filtered out data that appeared in fewer than 3 cells or detected fewer than 200 genes per cell. Subsequently, quality control was performed to filter out cells with nFeature_RNA less than 200 or more than 4000, nCount_RNA less than 200 or more than 20,000, and percent.mt higher than 20%. Then, the FindVariableFeatures function was used to select the top 2000 highly variable genes. The JackStrawPlot function was used to identify the principal components for analysis. After completing the principal components analysis (PCA) dimensionality reduction, we used the Seurat (v 4.1.0) package to cluster the cells using the uniform manifold approximation and projection (UMAP) method. Then different cell types were annotated utilizing CellMarker database. Finally, the key cells were obtained by Wilcoxon test comparing the differences in biomarker expression in annotated cells.

Analysis of Cell Trajectory Analysis and Intercellular Interactions

To explore the developmental trajectories of key cells, simulation analysis was executed using the R package Monole2 (v 2.18.0).42 Finally, cell communication between the key cells and other annotated cell types was analyzed using the R package CellChat (v 1.5.0).43

MR Analysis

To further investigate the causal relationship between biomarkers and COPD, we performed MR analysis. Instrumental variables (IVs) used for MR analysis were required to fulfill the following three assumptions. 1) IVs must be strongly associated with exposure, 2) IVs must be independent of confounders, 3) associations between IVs and outcomes are mediated only by exposure factors. Exposure factors were read through extract_instruments in the R-package TwoSampleMR (v 0.5.7).44 In order to screen IVs significantly related to exposure factors, the SNPs with linkage disequilibrium (LD) were finally removed under the conditions of p < 5×10−5, clump = TRUE, r2 = 0.01 and kb = 10. The F-statistic for all IVs was then calculated to exclude weak IVs using .

Next, the R package two-sample MR function harmonise_data was used to standardize effect alleles and effect sizes, aligning them with the exposure factor-IV-outcome relationship. And MR analysis was performed by mr function combined with 5 algorithms (MR Egger, Weighted median, Inverse variance weighted (IVW), Simple mode, Weighted mode). In addition, scatter plot, forest plot and funnel plot were drawn to display MR results directly.

Sensitivity Analysis

To assess the robustness of the MR analysis results, we performed a sensitivity analysis that included a heterogeneity test, a horizontal pleiotropy test, and a leave-one-out analysis. The heterogeneity was evaluated using Cochran’s Q test to determine if there was significant variation among the samples (P>0.05). We also used the MR-Egger intercept to check for the presence of horizontal pleiotropy (P>0.05). Finally, we conducted a leave-one-out analysis, sequentially omitting each SNP to reassess the effects of the remaining SNPs on the outcome variable.

Quantitative Real-Time Polymerase Chain Reaction (qRT-PCR)

In this study, a total of 8 pairs blood samples from Shanxi Bethune Hospital were utilized qRT-PCR, including 8 COPD samples and 8 control samples. Basic patient characteristics are shown in Supplementary Table 1. The study was approved by the ethics committee of Shanxi Bethune Hospital (YXLL-2023-148). All patients had signed an informed consent form and the study complied with the Declaration of Helsinki.

First, the total RNA of the frozen COPD and control samples was extracted a by TRIzol reagent, after which RNA detection was performed based on the NanoPhotometer N50. Then, equal volumes of mRNA were reverse transcribed using the SureScript First-Strand cDNA Synthesis Kit, and the resulting cDNA was diluted between 5 to 20 times with RNase/DNase-free ddH2O. Next, the qPCR reaction was performed with 2 x Universal Blue SYBR Green qPCR Master Mix kit according to the following reaction system: predenaturation at 95°C for 1 min; Denatured 20s at 95°C, annealed 20s at 55°C, extended 30s at 72°C; 40 cycles. Finally, gene expression level was quantified using the 2−ΔΔCT method, with primer sequences detailed in Supplementary Table 2.

Statistical Analysis

Bioinformatics analyses were performed utilizing the R programming language (v 4.2.2). Wilcoxon rank sum test were used to compare the differences between two groups. Fisher’s exact test was used to calculate the significance p-value. P < 0.05 was considered statistically significant.

ResultsA Total of 3989 Modular Genes were Identified by WGCNA

This study was designed to identify module genes using the WGCNA. Firstly, cluster analysis of the samples revealed no outliers in this dataset (Figure 2A). When the soft threshold was 6, to ensure that the plot was as close as possible to a scale-free distribution, the R2 approached the 0.9 threshold (red line) (Figure 2B). Then, a total of 12 gene modules were obtained (Figure 2C), of which the brown module (cor = −0.51, p = 5×10−4) and black were selected as key modules (cor = 0.54, p = 2×10−4) (Figure 2D). The brown module contained 2668 genes and the black module contained 1321 genes, which were identified as module genes.

Figure 2 WGCNA identification analysis. (A) Sample clustering tree. (B) Soft threshold filtering. (C) Hierarchical clustering tree. (D) Module-trait correlation heat map.

Identification and Functional Enrichment Analysis of Candidate Genes

First, 126 DEGs were selected in GSE100153, of which 17 were up-regulated and 109 down-regulated (Figure 3A and B). Then, 10 candidate genes were generated from intersections between DEGs, target genes and module genes (Figure 3C). GO enrichment analysis revealed enrichment of candidate genes mainly related to positive regulation of myeloid cell differentiation, platelet α-granule membrane, etc (Figure 3D). In regard to KEGG enrichment analysis, these genes played important roles in bladder cancer, melanoma, etc (Figure 3E).

Figure 3 Identification and functional enrichment analysis of candidate genes. (A) Volcano map of gene differential expression between COPD and control group. (B) Heat map of gene differential expression between COPD and control group. (C) Genes with the same trend of difference. (D) GO annotation results. (E) KEGG enrichment results.

PITHD1, CNTNAP2 and GUCD1 were Identified as Biomarkers Associated with COPD

First of all, LASSO regression analysis yielded 8 feature genes, including NFIX, GUCD1, FAM210B, CLEC12A, PITHD1, RNF10, SPARC, and CNTNAP2 (Figure 4A and B). Next, 3 feature genes (PITHD1, CNTNAP2, and GUCD1) were also obtained by SVM-RFE (Figure 4C). Three key feature genes, PITHD1, CNTNAP2 and GUCD1, were identified at the intersection of feature genes derived from two different machine learning models (Figure 4D), and their AUC values in both the training and testing sets were greater than 0.7. Therefore, they were defined as biomarkers (Figure 4E and F). Finally, in GSE100153 and the validation set GSE146560, the Wilcoxon test showed that the expression trend of the three biomarkers was consistent (Figure 4G and H).

Figure 4 Screening of biomarkers. (A) The variation characteristics of the coefficient of variables. (B) The selection process of the optimum value of the parameter l in the Lasso regression model by cross-validation method. (C) Screening characteristic genes by SVM-RFE algorithm. (D) Genes with the same trend of difference. (E and F) Training and testing sets analysis. (G and H) Differential expression between COPD and control group. *indicates p<0.05; **indicates p<0.01.

Biomarkers were Mainly Concentrated in Ribosome, Proteasome, Endocytosis

A total of 256 pathways and 46 major disease modules associated with the DEGs were identified by IPA analysis. The top 10 pathways were illustrated in Figure 5A, eg interferon signaling, antigen presentation pathway and so on, and the top 12 disease modules could be seen in Figure 5B, including immunological disease, inflammatory disease, etc. In addition, the activation and inhibition of 46 disease modules can be seen in Figure 5C. Analysis of the upstream regulators, downstream target genes and the possible pathways among three biomarkers indicated that PITHD1 might indirectly inhibit PSMC2, and CNTNAP2 might be indirectly inhibited by SAFB and SAFB2 (Figure 5D). In GSEA pathway enrichment analysis, the main signaling pathways of PITHD1 included ribosome, spliceosome, etc (Figure 5E). GUCD1 was involved in biological processes such as ribosome, proteasome, etc (Figure 5F). CNTNAP2 played an important role in endocytosis, proteasome and so on (Figure 5G).

Figure 5 IPA analysis. (A) Identification of the top 10 pathways associated with DEGs. (B) Identification of top 12 disease modules associated with DEGs. (C) Heatmap of activation inhibition of disease modules. (D) IPA network analysis. (E–G) KEGG running Enrichment Score.

The Biomarkers were Significantly Correlated with CD56dim Natural Killer Cell, MDSC, Monocyte and Other Immune Cells

Firstly, heat maps elucidated abundance of immune cell infiltration in training set (Figure 6A). There were immune cells, including activated B cells and CD56dim natural killer cells, etc., had significantly different enrichment scores between the COPD and control groups (Figure 6B). Correlation analysis revealed that immature B cells had a strong positive correlation with activated B cells (r = 0.720, p<0.0001), and a strong negative correlation with monocytes in differential immune cells (r = −0.410, p<0.001) (Figure 6C). Besides, PITHD1 was significantly associated with three types of immune cells: CD56dim natural killer cell, Myeloid-derived suppressor cell (MDSC) and Monocyte (Figures 6D and S1). There were significant correlations between GUCD1 and three immune cells: MDSC, CD56dim natural killer cell, and Monocyte (Figures 6E and S2), while CNTNAP2 was significantly correlated with Activated B cell, Immature B cell, and Type 17 T helper cell (Figures 6F and S3).

Figure 6 Immune infiltration analysis. (A) ssGSEA scores of different immune cells in control and COPD groups. (B) Differential analysis of ssGSEA scores in control and COPD groups. (C) Correlation analysis of different immune cells. (D) Correlation analysis of PITHD1 with different immune cells. (E) Correlation analysis of GUCD1 with different immune cells. (F) Correlation analysis of CNTNAP2 with different immune cells. *indicates p<0.05; **indicates p<0.01, ***indicates p<0.001.

Moreover, Supplementary Figure S4A illustrated marked differences in the enrichment scores for two types of immune functions between the COPD and control groups, specifically in Type-I and Type-II interferon responses. Spearman analysis indicated there was a positive correlation relationship between these two immune-related functions (r = 0.44, p = 0.0036) (Supplementary Figure S4B). Finally, two biomarkers were significantly positively correlated with two differential immune-related functions, among which GUCD1 was significantly correlated with Type-II-IFN-Response, and PITHD1 was significantly correlated with Type-I-IFN-Response and Type-II-IFN-Response (Supplementary Figure S4C and D).

Regulatory Networks of Biomarkers were Constructed

By constructing GGI networks, key genes and associated pathways interacting with biomarkers were predicted. The key genes included CNTN2, CPE, etc., and the related pathways mainly included main axon, cell-cell junction assembly, etc (Figure 7A). Next, 12 lncRNAs targeting miR-125a-5p were predicted based on the Starbase database, which including AC007780.1, AL162258.1, AC018628.1, etc., (Figure 7B). Besides, a total of 5 TFs were identified, namely TAL1, GABPA, FOSL1, ETS1 and NFE2. Among them, 2 binding sites were predicted for both TAL1 and GABPA, and 7 binding sites were predicted for FOSL1 (Table 1 and Figure 7C, D). In order to explore the diseases associated with the biomarkers, 30 diseases associated with CNTNAP2, which contained Epilepsy, Language Delay, Bipolar Disorder and so on, and 2 diseases (Malignant neoplasm of Prostate and Prostatic Neoplasms) associated with GUCD1 were predicted in this study (Figure 7E). Finally, a total of 63 therapeutic drugs were predicted for 3 biomarkers based on the CTD database, such as Valproic Acid, Bisphenol A, Acetaminophen and so on. This result was visualized via Cytoscape and the results showed 74 interacting relationship pairs, including PITHD1-Copper Sulfate, GUCD1-Acrolein, CNTNAP2-Atrazine, etc (Figure 7F).

Table 1 The TFs of Biomarkers

Figure 7 GGI networks construction. (A) Biomarkers and their predictive gene interaction networks. (B) LncRNR targeting network map. (C and D) Identification of TFs. (E) Predicted 30 diseases associated with CNTNAP2. (F) Predicting therapeutic agents associated with 3 biomarkers.

Totally Six Cell Clusters were Annotated

Through quality control, a total of 20,738 genes and 37,190 cells were acquired for subsequent analysis (Supplementary Figure S5A). After standard data processing, we selected 2000 highly variable genes. The top 10 genes exhibiting the most pronounced intercellular expression changes were SLPI, CCL18, TPSD2, FABP4, SFTPA1, SFTPA2, SCGB3A1, SCGB3A2, SFTPC, and SCGB1A1 (Supplementary Figure S5B). And the linearly optimal dimensionality value for cell clustering was 30 (Supplementary Figure S6A). Then, a sum of 11 different cell clusters were identified (Supplementary Figure S6B), and only 6 cell clusters were annotated, namely Epithelial cells, B cells, Macrophage, Alveolar cells, Neutrophils, and T cells (Supplementary Figure S6C). The bubble plot demonstrated that the marker genes exhibited high specificity, thus the cells were named based on the marker genes (Supplementary Figure S6D). Supplementary Figure S6E showed the expression of biomarkers in different cells, where GUCD1 and PITHD1 were significantly different between case and control in T cells and Alveolar cells. However, CNTNAP2 was not expressed in the annotated cells.

Relationship Between T Cells and Other Annotated Cells

We further analyzed the differentiation trajectories of T cells and found that in the disease group, T cells were distributed across both the early and late stages of differentiation, whereas in the control group, they were predominantly found in the late stages (Supplementary Figure S7AC). Additionally, both GUCD1 and PITHD1 were expressed at higher levels during the early and late stages of differentiation (Supplementary Figure S7D).

To uncover the interactions between annotated cell types, we conducted a cellular communication analysis. This analysis revealed that alveolar cells in the control group exhibited strong interactions with other cell types, whereas T cells and neutrophils showed no interaction with macrophages (Supplementary Figure S7E and F). Interestingly, in the case group, macrophages did not interact with epithelial cells or T cells (Supplementary Figure S7G and H), suggesting a potential disruption or alteration in cellular communication pathways associated with the disease state. In addition, we found more receptor ligands in the Alveolar cells → B cells and B cells → Alveolar cells interactions in the disease group (Supplementary Figure S7I and J).

Causal Relationship Between Biomarkers and COPD

To further explore the causal relationship between biomarkers and COPD, we performed MR analysis. A total of 54 (GUCD1) and 40 (PITHD1) SNPs were used as IVs by screening (Supplementary Table 3). The MR analysis revealed no causal relationship between the biomarkers examined and COPD (Table 2 and Supplementary Figure S8AC), suggesting that other biological or environmental mechanisms might be influencing the development of this disease. Moreover, the robustness of these findings was confirmed through sensitivity analysis (Supplementary Tables 4, 5 and Supplementary Figure S8D), which accounted for potential confounders and biases, ensuring the reliability of the results. This emphasizes the intricate nature of COPD pathogenesis and highlights the necessity for continued research to uncover additional factors and pathways contributing to the disease.

Table 2 The MR Results of Biomarkers and COPD

The Expression of Biomarkers was Validated

At last, in GSE100153 and GSE146560, the expression levels of GUCD1 and PITHD1 were significantly down-regulated in COPD group (Figure 8A and B). In addition, the expression of biomarkers was further verified by qRT-PCR, and the solubilisation curves of biomarkers and internal parameters were shown in Supplementary Figure 9AD. The results displayed that the expression of GUCD1 and PITHD1 were also significantly decreased in COPD group (Figure 8C).

Figure 8 Differential expression analysis of different biomarkers. (A and B) Differential expression analysis of different biomarkers in control and COPD groups. (C) qRT-PCR analysis of different biomarkers in the control and COPD groups. *indicates p<0.05; **indicates p<0.01.

Discussion

COPD is defined by consistent airflow limitation and is an irreversible, yet preventable, condition.45 Although the death rate of COPD declined from 2005 to 2017,46 COPD remains a major health problem worldwide.8 However, the diagnosis of COPD predominantly depends on spirometry tests and clinical manifestations, and there is persistent underdiagnosis.47 Recent findings suggest miR125a’s involvement in inflammatory responses across various pathological conditions.48–50 Consequently, it is crucial to deepen our understanding of its molecular processes and identify new, trustworthy biomarkers associated with miR-125a-5p in COPD. In this study, a total of 10 candidate genes were obtained by overlapping the DEGs, the module genes, and target genes of miR-125a-5p, of which PITHD1, CNTNAP2 and GUCD1 were identified as biomarkers. The expression of biomarkers was further verified by qRT-PCR, and the expression of PITHD1 and GUCD1 was significantly decreased in COPD group.

Very little is known about PITH domain containing 1 (PITHD1). Recent research indicates that PITHD is down-regulated in leukemia, potentially regulating RUNX1 expression, promoting megakaryocyte differentiation, and activating internal ribosomal entry site.51 Additionally, while PITHD1 may act as an oncogene in epithelial ovarian carcinoma, its protein expression correlates with a favorable prognosis.52 PITHD1 is associated with immunoproteasome in the testis, while it shows no association with thymoproteasomes in the thymus. Mice lacking PITHD1 showed severe male infertility, accompanied by abnormal morphology and impaired spermatozoa motility.53 Expression of PITHD in the olfactory bulb escalates with age in wild-type (WT) mice but diminishes in Tg2576 Alzheimer’s disease mice during advanced stages. Stimulating olfactory neuroepithelial (ON) cells with PITHD1 modifies their phosphoproteome, affects cell proliferation rates, and triggers a pro-inflammatory phenotype.54 In this study, we have demonstrated for the first time that PITHD1 can serve as a potential biomarker for COPD, with its expression significantly down-regulated in patients with the disease. This finding is particularly noteworthy as it highlights the possibility of utilizing PITHD1 levels as an indicator for COPD diagnosis or progression. The down-regulation of PITHD1 may reflect underlying pathological changes associated with COPD, such as chronic inflammation and tissue remodeling. Our analysis revealed a strong correlation between PITHD1 expression and several immune cell types, including CD56dim natural killer cells and MDSCs. The significant association of PITHD1 with these immune cells suggests that it may play a role in modulating immune responses in COPD, potentially influencing disease severity through its interaction with inflammatory pathways. Furthermore, our GSEA pathway enrichment analysis revealed that PITHD1 is involved in the ribosomal signaling pathway, indicating that miR-125a-5p may promote inflammatory responses by inhibiting this pathway. Given that ribosomal signaling is crucial for protein synthesis and cellular stress responses, the inhibition of PITHD1 could lead to dysregulation of these processes, potentially exacerbating inflammatory pathways commonly observed in COPD.

A study on hepatocellular carcinoma (HCC) showed that inhibition of guanylyl cyclase domain containing 1 (GUCD1) expression in Hep3B cells was followed by up-regulation of E-cadherin expression and down-regulation of N-cadherin and vimentin expression, suggesting that down-regulation of GUCD1 can inhibit invasion and migration of HCC cells.55 Another study showed that GUCD1 was highly expressed in regenerating liver after partial hepatectomy. NEDD4-1 controlled the degradation of GUCD1 through the ubiquitin-proteasome system.56 GUCD1 was found to be under-expressed in COPD patients and showed significant correlations with immune functions related to type I and type II interferon responses. These findings suggest that GUCD1 may also be implicated in the pathogenesis of COPD through its involvement in immune regulation. Together, our results establish PITHD1 and GUCD1 as novel biomarkers for COPD and open avenues for further research into their functional roles and therapeutic potential within this complex disease landscape. However, the expression of CNTNAP2 exhibited inconsistencies when comparing our qPCR with those from the validation group. This discrepancy may be attributed to several factors, including differences in sample characteristics, variations in experimental conditions, or potential biological variability among subjects. Further investigation is warranted to understand the underlying mechanisms driving this inconsistency, as CNTNAP2 has been implicated in neurodevelopmental processes and could play a role in respiratory function.

GSEA enrichment analysis displayed that the down-regulated genes were linked to the ribosome pathway. The primary function of ribosomes is to synthesize proteins by translating mRNAs in the nucleolus,57 and to participate in the process of cell proliferation, growth, and survival.58 Abnormal overexpression of the ribosome pathway in polycystic ovary syndrome can lead to the accumulation of misfolded proteins, cause ovarian endoplasmic reticulum stress, inhibit oocyte maturation and impair embryonic development.59 Another finding suggested that upregulation of the ribosomal pathway played a critical role for histone deacetylase 4, which can improve chondrocyte survival rate and biofunction.60 Recent studies have shown that blocking ribosome biogenesis can lead to the stabilization and activation of p53, primarily via the ribosomal proteins (RPs)-MDM2-p53 pathway.61 Polymorphisms in the MDM2 and p53 are linked to apoptotic signaling and the emphysematous changes seen in smokers’ lungs. Additionally, cigarette smoke triggers p53-mediated apoptosis in various lung cell types such as pulmonary endothelial cells, alveolar epithelial cells, and terminal bronchiolar cells.62 Combining the above information, we conclude that miR-125a-5p may cause COPD susceptibility via the ribosome-p53 pathway.

The proteasome system is considered to be an important regulatory mechanism in cell cycle and growth.63 Prior research indicates that elevated proteasome activity contributes to the breakdown of contractile proteins, resulting in weakened diaphragm function in animal models of COPD.64 In individuals with mild to moderate COPD, increased ubiquitin-proteasome pathway activity in the diaphragm leads to damage to diaphragm fibers.65 In normal lung fibroblasts, cigarette smoke degrades Akt through the proteasome system, inducing cytotoxicity.63 Acute exposure to cigarette smoke directly stimulates proteasome activity in both mouse lungs and human epithelial cells.66 Degradation of the proteasome by methyltransferase proteolysis significantly mitigated cigarette smoke-mediated lung epithelial inflammation and cell apoptosis.67 Sestrin 2 interferes with lung injury repair by positively regulating the proteasome and inhibiting platelet-derived growth factor receptor β.68 In conclusion, the proteasome pathway is crucial for mediating inflammation, apoptosis and muscle injury in COPD. The miR-125a-5p may aggravate COPD through the proteasome pathway.

Immune infiltration analysis showed that the two down-regulated genes were significantly linked with three types of immune cells: CD56dim natural killer (NK) cell, myeloid-derived suppressor cell (MDSC) and monocyte. NK cells have direct cytotoxicity and influence immunoregulatory secretion of other immune cells,69 contributing to their involvement in inflammatory conditions like COPD.70 The highly differentiated CD56dimCD16+ NK cells are the dominant population of NK cells in the human lung.69 NK cell levels in the lungs of both current and former smokers were notably lower than those in non-smokers.69 MDSCs, composed of a varied group of both mature and immature granulocytic and monocytic cells from the bone marrow, suppress T cell responses and regulate immune responses in inflammatory (non-malignant) conditions.71 Current smoking has upregulation and activation of circulating MDSCs in patients with COPD, accompanied by significant downregulation of the T-cell receptor (TCR) ζ chain, and this effect persists after smoking cessation.72 A notable link was found between the presence of MDSCs in bronchoalveolar lavage fluid (BALF) and airflow limitation severity in COPD patients, independent of their smoking status. This indicates that airway MDSCs also play a key role in local inflammation in COPD.73 In COPD, the function of monocyte-derived macrophages is altered, leading to increased release of pro-inflammatory mediators.74 During acute COPD exacerbations, systemic monocyte priming leads to the recruitment and differentiation of these cells into hyper-activated macrophages. These macrophages exhibit increased inflammatory activity compared to their stable state74 and are involved in host defense, airway remodeling, and parenchymal destruction.75 CD56dimCD16+ NK, MDSC and monocyte all modulate the inflammatory response, suggesting that miR-125a-5p plays an important role in the inflammatory mechanism of COPD.

Single-cell analysis identified six cell clusters, among which PITHD1 and GUCD1 exhibited significant differences between disease and control groups in T cells and alveolar cells. T cells and alveolar cells are key components of the pulmonary immune response, and their functional dysregulation is closely related to the inflammatory responses in COPD. Studies have shown that a reduction in regulatory T cells leads to a rapid decline in lung function, which in turn triggers the onset of COPD.76 Moreover, in patients with COPD, an increase in T lymphocytes in the lung parenchyma, airways, and vasculature, along with a weakened memory CD4+ T cell response, has been reported. This immune imbalance may exacerbate the invasiveness of SARS-CoV-2 infection, leading to a higher prevalence of the disease.77 Given this, we hypothesize that PITHD1 and GUCD1 may influence the occurrence and progression of COPD by regulating the quantity and function of T cells. Further research into the specific roles of these biomarkers in T cells and alveolar cells could help elucidate their pathogenic mechanisms in COPD and may offer important insights for early intervention and personalized treatment.

The type I IFN response is crucial for the innate immune system.78 This response primarily involves the IFN-induced pathway and the JAK/STAT pathway, with the latter facilitating the expression of IFN-stimulated genes.78 Type I IFN is a family of cytokines that includes at least 13 IFN-α isotypes and one IFN-β isotype79 and is involved in autoinflammatory diseases.80 Type I IFN is derived primarily from alveolar macrophages and promotes natural killer cell functions while inhibiting pro-inflammatory pathways and cytokine production.81 IFN-β expression was reduced by 40–65% in lung cells from patients with stable COPD.81 The type II IFN response serves as a defensive mechanism, primarily driven by the molecule IFN-γ.82 IFN-γ is produced by Th1 lymphocytes.83 There is evidence of elevated IFN-γ levels in the airways and lungs of patients with COPD.84–86 IFN-γ leads to phosphorylation of the inflammatory promoter region STAT187 through phosphorylation of JAK2, activating the JAK/STAT signal pathway in macrophages.83 COPD patients have an increased number of alveolar macrophages that secrete chemokines, cytokines and proteases.83 Our study indicated that miR-125a-5p played a central role in disease pathophysiology by regulating the monocyte-macrophage system through type I IFN response and type II IFN response.

In COPD, airway compartment neutrophils release a variety of proteases, including prolyl endopeptidase (PE), serine proteinase, and matrix metalloproteinase (MMPs).88 In COPD patients, MMP-8, MMP-9, and PE can be detected in both serum and sputum,89,90 and their initial cleavage of collagen produces PGP.91 PGP acts as a neutrophil chemoattractant and induces features of COPD such as neutrophil inflammation, alveolar enlargement, and right ventricular hypertrophy when introduced into the airways of mice.92 One study showed that valproic acid (VPA), a PE inhibitor, modifies the secondary structure of PE in a way that makes substrate binding at the catalytic side of PE impossible, thereby reducing the severity of chronic inflammation. VPA notably suppresses PGP production in vitro and effectively reduces neutrophil influx induced by cigarette smoke in vivo.91 Our study also predicts VPA as a potential therapeutic agent for COPD based on the CTD database, suggesting that VPA has great potential in the treatment of chronic inflammation in COPD. However, there are still many limitations in this study. For example, the clinical application of biomarkers needs data support from more samples, and the application of targeted drugs needs further clinical trials. Therefore, in order to further verify the functions and mechanisms of miR-125-5p and its target genes PITHD1, CNTNAP2 and GUCD1 in COPD, further experimental studies are needed.

Conclusion

In summary, our study is the first to analyze the role of miR-125a-5p in COPD. In this study, we found that PITHD1, CNTNAP2 and GUCD1 may be novel biomarker related to miR-125a-5p for COPD. Our research enhances knowledge about the molecular mechanisms of COPD, potentially aiding in its clinical diagnosis and treatment.

Data Sharing Statement

The datasets analyzed in this study, GSE100153 and GSE146560, are available in the GEO database at (https://www.ncbi.nlm.nih.gov/geo/).

Ethics Approval and Informed Consent

The study was approved by the ethics committee of Shanxi Bethune Hospital (YXLL-2023-148). All patients had signed an informed consent form.

Consent for Publication

All authors have read and approved the final manuscript and consent to its publication.

Acknowledgment

We thank the entire team for their support and assistance with this study.

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Funding

Scientific Research Project of the Shanxi Provincial Health Commission (201601010), the Doctoral Foundation of Shanxi Bethune Hospital (2019YJ01), and the Fundamental Research Program of Shanxi Province (20210302123492).

Disclosure

The authors report no conflicts of interest in this work.

References

1. Celli B, Fabbri L, Criner G, et al. Definition and nomenclature of chronic obstructive pulmonary disease: time for its revision. Am J Respir Crit Care Med. 2022;206:1317–1325. doi:10.1164/rccm.202204-0671PP

2. Halpin DMG, Celli BR, Criner GJ, et al. The GOLD Summit on chronic obstructive pulmonary disease in low- and middle-income countries. Int J Tuberc Lung Dis. 2019;23:1131–1141. doi:10.5588/ijtld.19.0397

3. Meghji J, Mortimer K, Agusti A, et al. Improving lung health in low-income and middle-income countries: from challenges to solutions. Lancet. 2021;397:928–940. doi:10.1016/S0140-6736(21)00458-X

4. Agusti A, Melen E, DeMeo DL, et al. Pathogenesis of chronic obstructive pulmonary disease: understanding the contributions of gene-environment interactions across the lifespan. Lancet Respir Med. 2022;10:512–524. doi:10.1016/S2213-2600(21)00555-5

5. Sin DD, Doiron D, Agusti A, et al. Air pollution and COPD: GOLD 2023 committee report. Eur Respir J. 2023;61:2202469. doi:10.1183/13993003.02469-2022

6. Yang IA, Jenkins CR, Salvi SS. Chronic obstructive pulmonary disease in never-smokers: risk factors, pathogenesis, and implications for prevention and treatment. Lancet Respir Med. 2022;10:497–511. doi:10.1016/S2213-2600(21)00506-3

7. Cho MH, Hobbs BD, Silverman EK. Genetics of chronic obstructive pulmonary disease: understanding the pathobiology and heterogeneity of a complex disorder. Lancet Respir Med. 2022;10:485–496. doi:10.1016/S2213-2600(21)00510-5

8. Zhao Y, Li M, Yang Y, et al. Identification of macrophage polarization-related genes as biomarkers of chronic obstructive pulmonary disease based on bioinformatics analyses. Biomed Res Int. 2021;2021:9921012. doi:10.1155/2021/9921012

9. Hirai K, Shirai T, Shimoshikiryo T, et al. Circulating microRNA-15b-5p as a biomarker for asthma-COPD overlap. Allergy. 2021;76:766–774. doi:10.1111/all.14520

10. Guo S, Lu J, Schlanger R, et al. MicroRNA miR-125a controls hematopoietic stem cell number. Proc Natl Acad Sci U S A. 2010;107:14229–14234. doi:10.1073/pnas.0913574107

11. Zhang C, Zhao Y, Wang Q, et al. Overexpression of miR-125a-5p inhibits hepatocyte proliferation through the STAT3 regulation in vivo and in vitro. Int J Mol Sci. 2022;23:8661.

12. Zhan JL, Huang YL, Liang QW, et al. Anti-inflammatory effect of miR-125a-5p on experimental optic neuritis by promoting the differentiation of Treg cells. Neural Regen Res. 2023;18:451–455. doi:10.4103/1673-5374.346462

13. Jing X, Huo J, Li L, et al. Baicalin relieves airway inflammation in COPD by inhibiting miR-125a. Appl Biochem Biotechnol. 2023;196:3374–3386. doi:10.1007/s12010-023-04671-y

14. Wang R, Xu J, Liu H, et al. Peripheral leukocyte microRNAs as novel biomarkers for COPD. Int J Chron Obstruct Pulmon Dis. 2017;12:1101–1112. doi:10.2147/COPD.S130416

15. Hsu AC, Dua K, Starkey MR, et al. MicroRNA-125a and -b inhibit A20 and MAVS to promote inflammation and impair antiviral response in COPD. JCI Insight. 2017;2:e90443. doi:10.1172/jci.insight.90443

16. Li N, Yi H, Sun W, et al. Revealing genes associated with cervical cancer in distinct immune cells: a comprehensive Mendelian randomization analysis. Int J Cancer. 2024;155:149–158. doi:10.1002/ijc.34911

17. Altman MC, Rinchai D, Baldwin N, et al. Development of a fixed module repertoire for the analysis and interpretation of blood transcriptome data. Nat Commun. 2021;12:4385. doi:10.1038/s41467-021-24584-w

18. Trivedi A, Bade G, Madan K, et al. Effect of smoking and its cessation on the transcript profile of peripheral monocytes in COPD patients. Int J Chron Obstruct Pulmon Dis. 2022;17:65–77. doi:10.2147/COPD.S337635

19. Zhu H, Tan J, Wang Z, et al. Bioinformatics analysis constructs potential ferroptosis-related ceRNA network involved in the formation of intracranial aneurysm. Front Cell Neurosci. 2022;16:1016682. doi:10.3389/fncel.2022.1016682

20. Lin W, Wang Y, Chen Y, et al. Role of calcium signaling pathway-related gene regulatory networks in ischemic stroke based on multiple WGCNA and single-cell analysis. Oxid Med Cell Longev. 2021;2021:8060477. doi:10.1155/2021/8060477

21. Zhang Q, Li J, Weng L. Identification and validation of aging-related genes in Alzheimer’s disease. Front Neurosci. 2022;16:905722. doi:10.3389/fnins.2022.905722

22. Zhang MY, Huo C, Liu JY, et al. Identification of a five autophagy subtype-related gene expression pattern for improving the prognosis of lung adenocarcinoma. Front Cell Dev Biol. 2021;9:756911. doi:10.3389/fcell.2021.756911

23. Chen L, Zhang YH, Lu G, et al. Analysis of cancer-related lncRNAs using gene ontology and KEGG pathways. Artif Intell Med. 2017;76:27–36. doi:10.1016/j.artmed.2017.02.001

24. Liu Z, Mi M, Li X, et al. A lncRNA prognostic signature associated with immune infiltration and tumour mutation burden in breast cancer. J Cell Mol Med. 2020;24:12444–12456. doi:10.1111/jcmm.15762

25. Hillyar CR, Kanabar SS, Pufal KR, et al. A systematic review and meta

Comments (0)

No login
gif