Identification and analysis of key hypoxia- and immune-related genes in hypertrophic cardiomyopathy

Screening for differentially expressed hypoxia-associated genes

The expression of 898 genes was determined to be significantly different between normal and HCM in the GSE36961 dataset. Of these, 372 genes were upregulated, and 526 genes were downregulated in HCM. The volcano map and heatmap are shown in Fig. 1A and B. Thirty-five hypoxia-related genes differentially expressed in HCM were identified by overlapping the 493 hypoxia genes determined above and the DEGs (shown in Fig. 1C).

Fig. 1figure 1

Identification of differentially expressed genes (DEGs) in HCM and normal groups. A Volcano plot of DEGs in samples from HCM and controls in GSE36961. Red dots represent up-regulated genes and blue dots represent down-regulated genes in HCM samples, gray indicates no significance. B Heatmap for DEGs in HCM (red) and controls (green). C Venn diagram of thirty-five hypoxia-related DEGs in HCM

Screening target genes by LASSO regression and SVM-RFE analysis

LASSO regression analysis, with the parameter family set to binomial, produced gene coefficient plots and error plots for cross-validation (Fig. 2A and B). The model was optimized to when the λ value was smallest, and 11 genes were identified out the 35 hypoxia-related genes found above (Zfp36, Atp2a2, Slc2a1, Plau, Nfkbia, Tgfbr2, Ddah1, Ednra, Tspan12, Ddit3, and Oma1). Each of the 11 genes has a unique coefficient (Additional file 3). Next, we generated a receiver operating characteristic (ROC) curve to validate the model’s functionality (Fig. 2C). HCM and normal samples could be identified with an AUC value of 1.000, showing that the model performed well. We also analyzed 35 differentially expressed hypoxia genes with SVM-RFE. We plotted the generalization error versus the number of features and used 10-fold cross-validation to select eigengenes (Fig. 2D). When the “sweet spot” was 0.0176, 14 characteristic genes were detected (Zfp36, Atp2a2, Slc2a1, Plau, Nfkbia, Tgfbr2, Ddah1, Ednra, Tspan12, Ddit3, Oma1, Lmna, Penk, Myc, and Eno1). Intersection of the genes identified by LASSO analysis and SVM-RFE analysis yielded 10 genes (Zfp36, Atp2a2, Slc2a1, Plau, Nfkbia, Tgfbr2, Ddah1, Ednra, Tspan12, Ddit3, and Oma1) as shown in the Venn diagram (Fig. 2E). From this figure, we can compare the similarities and differences between the two machine learning algorithms, LASSO and SVM, increasing the reliability of the diagnostic marker screening approach here.

Fig. 2figure 2

Targeted genes screening through LASSO regression and SVM-RFE analyses. A Error plot for the logistic LASSO regression coefficient. B 11 genes were screened in the logistic LASSO model. C ROC curve under the LASSO model. D Plot of generalization error versus the number of features. E Venn diagram of ten intersection genes

MCP-counting algorithm identified significantly different immune cells

To study the relationship between disease and immunity, we used the MCP-counter algorithm to screen for the predominant cells types associated with normal and HDM samples in the in the GSE36961dataset. The genetic contents of 8 immune cells, fibroblasts and one type of epithelial cells are visualized as heatmaps (Fig. 3A) and boxplots (Fig. 3B). We found a substantial difference between normal and pathological tissue in the composition of five cell types (monocytic lineage, myeloid dendritic cells, neutrophils, endothelial cells, and fibroblasts), suggesting an important regulatory role for these cells in immune infiltration.

Fig. 3figure 3

Immune cell abundance by the MCP-counting algorithm. A Heatmap of different immune cell contents through the MCP-counting algorithm. B Boxplots of cell contents across different groups under the MCP-counting algorithm (*P < 0.05, **P < 0.01, HCM vs. Control)

Weighted co-expression network construction and key module identification

The samples in GSE36961 were grouped using the Pearson correlation coefficient. The five cell types found to be significant above were utilized as features to analyze the immune-related genes in HCM. As shown in Fig. 4A, we created a sample clustering tree and the corresponding heatmap of clinical characteristics (Fig. 4A, B), and no outliers needed to be removed. The ideal soft threshold was 3 (R2 = 0.85) from the scale-free soft-threshold distribution map (Fig. 4C). After dynamic tree trimming and average hierarchical clustering, 20 modules were ultimately identified, with gray modules denoting genes that did not fit into any of these modules (Fig. 4D). To identify modules related to the clinical features of HCM, we clustered a dendrogram of all DEGs (Fig. 4E) based on differential measures (1-TOM), and the heatmaps reflected the correlation between different modules and size distribution (Fig. 4F). For determining key module genes, we selected key modules under the following conditions: non-gray modules, P < 0.05 and correlation with any immune cell | cor | > 0.5. As shown in Fig. 4G, five modules were screened: salmon, light green, black, red, and tan. We then screened for module membership (MM) and GS, generated a scatter plot, and screened out key module genes based on the criteria of |GS| > 0.4 and |MM| > 0.6 (Fig. 4G). As a result, 612 key module genes were identified that represent the most closely related to immune function in HCM (Additional file 4).

Fig. 4figure 4

Weighted co-expression network construction analysis (WGCNA) based on five keyimmune cells. A Sample clustering dendrogram in GSE36961. B Merging data sample clustering and phenotypic information. C Analysis of the scale-free fit index (left) and the mean connectivity (right) for various soft-thresholding powers. D Module clustering dendrogram of all DEGs clustered based on a dissimilarity measure. E Cluster dendrogram and correlation heatmap of different modules. F Correlation heatmap of modules related to clinical features (Each cell contains the correlation coefficient and P value). G Critical module membership (MM) and gene significance (GS) scatter plot (vertical line for |MM|=0.6 and horizontal line for |GS|=0.4)

GSEA of hypoxia- and immune-related genes

Using a Venn diagram, the intersection of the hypoxia-related genes screened by LASSO + SVM-REF and the immune genes screened by WGCNA produced three hypoxia-immunity-related genes (Fig. 5A). The GO and KEGG pathways for these three hypoxia-immunity-related genes (ATP2a2, DDAH1, and OMA1) were identified using GSEA (Fig. 5B–D). We provide the top 10 entries for each gene in GO and KEGG individually. For example, Atp2a2 was enriched in 717 related GO entries and 45 KEGGs, while Ddah1 was enriched in 569 related GO entries and 30 KEGGs. In total, 1296 related GO entries and 72 KEGGs were enriched for Oma12.

Fig. 5figure 5

Analysis of hypoxia- and immune-related genes using gene set enrichment analysis (GSEA). A Venn diagram of hypoxia- and immune-related genes. B GO/KEGG enrichment analyses using GSEA for the ATP2A2 gene (top10). C GO/KEGG enrichment analyses using GSEA for the DDAH1 gene (top10). D GO/KEGG enrichment analyses using GSEA for the OMA1 gene (top10)

Interestingly, ATP2A2 and DDAH1 share three GO entries: activation of immune responses, adaptive immune responses, and responses based on spontaneous recombination of immune receptors with immunoglobulin superfamily structures, though the former is mainly enriched in leukocyte-related entries, while the latter is enriched in cell-related entries. The OMA1 entry is mainly fat metabolism-related. However, the three genes with KEGG entries are very different (Additional file 5).

Validation of the expression levels of the hypoxia- and immune-related genes

To validate the expression of hypoxia- and immune-related genes in HCM, the GSE36961, GSE141910 and GSE130036 datasets were used. The results of GSE141910 and GSE130036 before and after batch correction are shown in Additional file 6A, B, documenting diverse distributions of samples from different sources. The expression of three hypoxia- and immune-related genes in GSE36961, GSE141910, and combined cohorts are displayed in Fig. 6A, B and Additional file 6C, revealing that the expression trends of ATP2A2 and DDAH1 were consistent across the three datasets, in which ATP2A2 expression was higher in the normal group than in HCM, DDAH1 expressed higher in HCM than in the normal group. OMA1 expression had distinct differences between two groups in GSE36961 and GSE141910, while there was no significantly alterations in the combined cohorts. Similarly, the area under the curve (AUC) value of the ROC curve for ATP2A2 (0.89) and DDAH1 (0.902) rather than OMA1 (0.546) suggested excellent prognostic accuracy for HCM (Additional file 6D).

Fig. 6figure 6

Validation of hypoxia and immune-related gene expression. A GSE36961 boxplot. B GSE141910 boxplot (*P < 0.05, **P < 0.01, HCM vs. Control). C qRT-PCR verification for critical genes. Values represent means ± SEM, n = 6/group, **P < 0.01 vs. Sham group

Using the animal model established above, the mouse heart group with statistically significant HW/TL ratio (Additional file 2) and echocardiographic LVEF below 50% (Additional file 9) was selected as the HCM group, and the left ventricular tissue samples from the normal and HCM groups were verified by qRT-PCR. Again, the expression trends of the three genes were consistent between the mouse model and the dataset validation results (Fig. 6C).

Construction of ceRNA network based on three hypoxia- and immune-related genes

Based on the GSE188324 and GSE143786 microarray data in the GEO database, we identified 53 differentially expressed miRNAs (all upregulated) (Fig. 7A) and 1,561 differentially expressed lncRNAs (Fig. 7B) (1115 upregulated and 446 downregulated). These differentially expressed genes were used to construct a ceRNA network.

Fig. 7figure 7

ceRNA network construction. A Heatmap and volcano plot of differentially expressed miRNAs (top100) between HCM and control samples in GSE188324. B Heatmap and volcano plot of differentially expressed lncRNAs (top100) between HCM and control samples in GSE143786. For volcano plot, red dots represent up-regulated genes and blue dots represent down-regulated genes in HCM samples, gray indicates no significance. C The mRNA-miRNA-lncRNA network diagram. Blue oval indicates the key hypoxia and immune-related gene, pink diamond indicates miRNA, blue trangle indicates lncRNA

Through the miRWalk website, we also predicted the miRNAs from the three hypoxia- and immunity-related genes, resulting in 2224 predicted miRNAs (Additional file 7). In addition, we selected hub gene-miRNA relationship pairs with opposite expression trends; there were four matching miRNAs, namely, hsa-miR-513c-5p, hsa-let-7e-5p, hsa-miR-371a-5p, and hsa-miR-497-5p. Similarly, we obtained 643 predicted lncRNAs using the starbase website to predict the lncRNAs of the miRNAs that matched the previous step (Additional file 8). Again, the miRNA-lncRNA relationship pairs with opposite expression trends were selected; there were three matching lncRNAs, namely THUMPD3-AS1, CASC9, SCGB1B2P, and LINC00963. Finally, the network included eight nodes (one mRNA, four lncRNAs, and three miRNAs screened, as shown in the appendix) and eight edges. The network map of ceRNA was visualized with Cytoscape software (v3.7.2; Fig. 7C). The map revealed that ATP2A2 plays an important role in HCM and could be used as a diagnostic marker for the disease.

留言 (0)

沒有登入
gif