Benchmarking of bioinformatics tools for NGS-based microRNA profiling with RT-qPCR method

MicroRNA expression profiling is a commonly used analysis in many types of experiment. It provides valuable information on the functioning of the cellular machinery of various tissues and species, as well as its disorders in various disease states. In-depth characterization of the miRNA profile is enabled by next-generation sequencing technology. There is a large number of miRNA identification programs allowing for the annotation of already known microRNAs and prediction of potentially new ones. Since they are based on different algorithms and may apply different identification criteria, they may give different results. Therefore, in this study, we empirically evaluated the performance of three programs for microRNA analysis with the RT-qPCR method. However, it should be noted that other software may perform differently than the ones analysed in this research.

We ran each tool with the default analysis settings since we assumed that this approach is taken by most users. The exceptions involved the exclusion of sequences with only one read count to decrease the likelihood of the presence of sequencing artefacts. Moreover, the miRNA sequence length was set to 18–25 nt, since this is the length range in which most miRNAs fall (Filipowicz et al. 2005; Pawlina et al. 2017; Pawlina-Tyszko et al. 2020). Additionally, longer sequences could increase the number of false positives (Fromm et al. 2022). Nevertheless, it cannot be ruled out that the adjustment of analysis parameters could boost algorithm performance. The performance may also be influenced by the technology of library preparation. The analysed libraries were prepared with ligation-based kits and total RNA input. Libraries based on enriched small RNAs input and/or ligation-free kits utilising poly(A) tailing and template switching could affect the sequencing output differently.

The performed comparison analysis showed differences in the number of known miRNAs and potentially new sequences between individual programs. Each of them also identified a number of unique microRNA sequences (Table 1 and Fig. 1). The qPCR validation confirmed the expression of all selected microRNAs in the tested samples, which proves the high reliability of each of the programs used (Tables 2 and 3). To shed some light on possible functional implications of different performance of the investigated algorithms, we carried out a pathway enrichment analysis for the differentially identified miRNAs. The analysis revealed numerous biological pathways, of which most were common for the tools. On the other hand, similarly to the miRNA profiling results, a number of program-specific pathways were defined (Online Resource 8). This emphasises a possible influence of software choice on the results of in silico functional analysis and, as a result, conclusions drawn. However, further investigation is needed to state if these program-specific microRNAs are always associated with specific biological processes.

Next, we calculated correlation coefficients between the qPCR results and read counts of the examined microRNAs outputted by the analysed algorithms to determine if there is a superior one. In this step, we included qPCR results for additional miRNAs identified by all the algorithms. These served as positive controls not only for qPCR reactions but also for the correlation coefficient calculations since they were identified irrespectively of the ones used in this study program. Significant correlations were identified for two of the six analysed positive controls, by each tested program (Table 4). When it comes to the software unique miRNAs, two of 11 miRDeep2 miRNAs, one of seven sRNAtoolbox-sRNAbench miRNAs, and six of seven UEA sRNA Workbench microRNAs had significant correlation coefficients. Thus, UEA sRNA Workbench showed the highest number of significantly correlated miRNAs. However, since this was true only for its unique miRNAs and was not observed for the positive controls, it is difficult to determine with certainty if it is the result of better algorithm performance or the coincidental choice of better correlating miRNAs for the qPCR validation.

In general, it was noticeable that if a miRNA was detected by at least two programs, its correlation coefficient and p-value were similar; i.e. if a microRNA was strongly and significantly correlated, it was so for all the tools (Table 4). This suggests that there may exist some miRNAs which pose a difficulty for identification algorithms. Similar observations were made by Everaert and colleagues on the RNA-seq data. They stated that such a situation takes place for genes that are typically low expressed (Everaert et al. 2017). This remains in agreement with our results; however, it may not be the whole truth for miRNAs. The correlation analysis results obtained in this study for the high-quality and probability miRNAs identified by all the programs and serving as positive controls were significantly correlated for only two of six miRNAs, and significant and insignificant coefficients were noted for sequences with read counts below 100 as well as those above 1000. Thus, other factors could also play substantial roles. One of them may be the afore-mentioned library preparation protocol. It may influence the obtained results introducing library bias associated with the preference of a ligation enzyme for some miRNAs over others (Fuchs et al. 2015). Moreover, primers or assays used for qPCR validation itself may be in some cases the source of variability and discrepancies, as was implicated by the results of qPCR with alternative primers. This may be true especially for unique miRNAs and custom-made assays, which are not wet-lab validated by a manufacturer.

Furthermore, the ubiquitous expression of isomiRs that is microRNA length and sequence variants might be involved. Numerous studies revealed a wide repertoire of isomiRs for many miRNAs. They are very often composed of several dozen different variants with different expression levels (Neilsen et al. 2012; Pawlina et al. 2017), which may vary with statistical significance between tissues, developmental stages, or treatments (Bizuayehu et al. 2012; Fernandez-Valverde et al. 2010; Guo et al. 2016; Loher et al. 2014; Pawlina et al. 2017; Telonis et al. 2015). To shed some light on the potential involvement and significance of these miRNA variants for qPCR validation, we took a closer look at their length and sequence characteristics and the share of the total pool. To this end, we calculated correlation coefficients between the qPCR data and read counts as well as the abundance of separate isomiRs. First, an analysis was performed for the isomiR with the exact same sequence as the used PCR primer assay, assuming the best hybridisation efficiency. This approach caused an increase of correlation coefficients for two of 11 examined miRNAs, namely hsa-miR-100-5p and hsa-miR-433-3p; however, only the hsa-miR-433-3p result was statistically significant (Table 4).

In the next step, we carried out the same analysis for other isomiRs detected for these miRNAs (Fig. 2 and Online Resource 9). This revealed quite a complex picture. First of all, 3′ isomiRs were the most common alteration, which remains in agreement with other research (Loher et al. 2014; Pawlina et al. 2017). We also observed that even variants three nucleotides longer than the used assay can be significantly correlated (hsa-miR-433-3p), so they seem to be quite well tolerated by the PCR assays. Moreover, correlation coefficients may vary for the same length isomiRs but with a different nucleotide added, which may be exemplified by hsa-miR-708-5p for which two-nucleotide 3′ additions were detected (Fig. 2 and Online Resource 9). They were statistically significant with strong and similar correlations except for the “GA” addition, which had the lowest statistics. When it comes to 5′ variants, although they are rarer than 3′ ones and usually shorter (one nucleotide addition or deletion), they were also significantly correlated for many analysed miRNAs (e.g. hsa-miR-433-3p, hsa-miR-100-5p, hsa-miR-223-3p). Furthermore, internal single nucleotide substitutions did not seem to substantially affect PCR performance, since their presence did not rule out significant correlations (e.g. hsa-miR-100-5p and hsa-miR-30c-5p), implying that the assays may bear a level of tolerance for at least some substitutions.

In general, the picture which emerges from this analysis is that the isomiRs composition influence on qPCR results and their correlation with NGS results may be profound and very complex. This may be reflected by hsa-miR-223-3p, for which a moderately significant correlation was established between UEA sRNA Workbench and qPCR results. This miRNA was composed of 13 isomiRs in the analysed samples. The single-isomiR analysis revealed significant correlations with the qPCR results for only three of them (Online Resource 9). The hsa-miR-100-5p analysis showed that the whole miRNA profile correlations between the NGS analysis programs and qPCR results were not significant. However, 21 of 232 detected isomiRs had significant correlation coefficients. On the other hand, hsa-miR-451a was characterised by a strong and significant correlation between the whole miRNA profile NGS results and qPCR results, while 12 of the detected 13 isomiRs were strongly and significantly correlated. The obtained results suggest that if the correlation of the whole miRNA profile NGS results with qPCR results is very strong and statistically significant, the same might be noted for most of the detected isomiRs. When it comes to moderately correlated and less-significant or insignificant miRNAs, there may exist some isomiRs for which significant correlations may be identified if the isomiR profile is broad.

In order to further investigate possible causes of such diversified results for isomiRs, we looked at the number of reads of single isomiRs, assuming that the likelihood of binding of primer sequences to the most abundant variants might be higher. Of the 11 investigated isomiR profiles, the expression level of the most abundant isomiR was strongly and significantly correlated for three miRNAs (hsa-miR-451a, hsa-miR-708-5p, and hsa-miR-375-3p); however, there were other, less-numerous isomiRs with stronger correlations (Fig. 2 and Online Resource 9).

Interestingly, during the analysis of the positive control results, we noticed that two assays examined in samples from two different species (the cow and the pig) performed differently. The correlation analysis results for hsa-miR-375-3p and hsa-miR-708-5p in the cow samples revealed moderately and strongly significant correlations for all the analysed programs (except for UEA sRNA Workbench results for hsa-miR-708-5p). The same assays were not significantly correlated in the pig samples, except for hsa-miR-375-3p UEA sRNA Workbench results and its two separate isomiRs (Fig. 2 and Online Resource 9). Thus, we looked at the isomiR profiles of these miRNAs in the tested samples to see if they may underlie these differences, since isomiR profiles may be even gender and population specific (Loher et al. 2014). However, the profile did not visibly differ in the case of hsa-miR-375-3p and for hsa-miR-708-5p minor differences in the number of identified isomiRs were observed (data not shown). Therefore, it remains to be determined with a higher number of samples and assays if these results are species dependent, tissue dependent, or associated with other factors.

To sum up, factors influencing concordance between miRNA-seq and qPCR data may be very numerous. Some of these may dominate over others and their combination may create unique signatures responsible for correlation results of single miRNAs. Apart from the library protocol bias, the expression of miRNA variants (isomiRs), their abundance, length, sequence composition, and profile characteristics may account in the case of microRNAs. Investigated species could also play a role, although further research is needed to confirm this influence and elucidate its exact nature. The performed analysis implies that the influence of these factors is virtual even for different miRNA-seq identifying algorithms in the case of high-quality and likelihood miRNAs detected by at least two of the analysed programs. This may also be true for other software; however, it remains to be elucidated. Furthermore, additional features that were not assessed here may be of vital importance. Similar observations and conclusions were made for RNA-sequencing analysis workflows also validated with RT-qPCR (Everaert et al. 2017). Finally, we cannot exclude the possibility that miRNAs chosen for the validation favour accurate quantification by NGS or qPCR. The performed analysis would also benefit in the future from an increase in the number of qPCR-quantified miRNAs, single isomiRs, different species and tissues as well as library preparation protocols, used assays, and validation techniques to gain a more comprehensive view.

The most prominent differences between the analysed algorithms were noted for the number of predicted potentially new sequences. sRNAtoolbox-sRNAbench identified 39 while miRDeep2 as many as 2825 novel miRNA candidates. Although these numbers would probably change when fine tuning the algorithm analysis settings, it is rather unlikely that they would even out. All the new sequences unique for the investigated programs, selected for qPCR validation in this study, were confirmed to be expressed, which implies that they may be true novel miRNAs. However, since they constituted only a small portion of all predicted new sequences, this does not guarantee the authenticity of the other ones. Moreover, these results become especially important in the light of the recent analysis performed by Fromm and colleagues who, as a result, postulated that the limits of human miRNA annotation had been met (Fromm et al. 2022). According to the current 22.1 version of the miRBase, 2654 human mature miRNAs are deposited. When it comes to the species investigated here, 1025, 690, and 475 mature miRNAs are currently deposited for the cow, horse, and pig, respectively. These numbers are far from being close to the human data; thus, there still might be a pool of new microRNAs to be discovered in these species, especially in the scarcely investigated tissues and developmental stages. Nevertheless, caution and thorough analysis are advised when identifying novel miRNAs since their repertoire is not limitless.

The results obtained in this study shed some light on the performance of three algorithms for miRNA-seq data analysis and their concordance with RT-qPCR validation. Coenye recently suggested that the added value of qPCR validation is low in the case of high-quality RNA-seq data, highly expressed genes, and those with large differences in expression levels (Coenye 2021). On the other hand, it may be still necessary if these terms are not met. We suggest that in the case of microRNA data, other factors may also be involved and their thorough and detailed characterisation necessitates further, more extensive studies. Whatever the validation decision, miRNA-seq results should be quality controlled and carefully inspected. On the other hand, it should be taken into account that qPCR validation results with moderate concordance with miRNA-seq results are not necessarily the sign of poor NGS quality but may stem from the intrinsic nature of miRNA sequences, the investigated research material, and applied technologies.

留言 (0)

沒有登入
gif