Dynamic Visualization of Computer-Aided Peptide Design for Cancer Therapeutics

Introduction

Cancer is a major global public health problem. According to a report by the International Agency for Research on Cancer (IARC), there will be approximately 20 million new cases of cancer and 9.7 million deaths from cancer worldwide in 2022.1 The current conventional cancer treatments are surgery, radiotherapy and chemotherapy. Treatment choice usually depends on the tumor’s type, size, and location, the patient’s overall health, and whether the tumor has spread. If a tumor is detected early, surgery is usually the preferred treatment. In most cases, the tumor has spread by the time it is found, and other treatments, such as radiotherapy or chemotherapy, are needed necessitating.2 Radiotherapy involves the use of X-rays to destroy cancer cells and reduce the size of tumors but is not recommended for tumors diagnosed at an advanced stage or tumors in vulnerable sites. Chemotherapy, on the other hand, involves using cytotoxic drugs, either singly or in combination, to target rapidly dividing cells. However, chemotherapeutic drugs lack specificity and kill rapidly dividing normal cells, such as those in the intestinal lining.2 Consequently, chemotherapeutic drugs can cause damage to the gastrointestinal tract and affect hair follicles and bone marrow, leading to alopecia and hematopoietic disorders. In addition, chemotherapeutic drugs can cause kidney damage due to their excretion through the urine and liver damage resulting from their metabolism and detoxification in the liver.3 This underscores the pressing need for more effective anti-tumor drug candidates with fewer side effects. With the discovery of increasing protein receptors, peptide receptors, and protein-associated pathways, peptide anticancer drugs offer significant promise.4

Since the discovery of insulin in 1921, peptides have been developed to treat various diseases, including cancer, immune disorders, metabolic disorders, viral infections, cardiovascular disease, and osteoporosis.5 The term “peptide drugs” refers to medications designed and synthesized based on peptides, and ACPs specifically refer to peptides with anticancer activities. ACPs possess numerous advantages over traditional chemotherapeutic drugs. With their good targeting, biocompatibility, and ease of synthesis, ACPs exhibit great potential in cancer therapy.4,6 Over the past 60 years, peptide approvals for cancer therapy have steadily increased. The global market for peptide therapeutics has grown at an average rate of 7.7%. Around 17% of approved peptides are used in cancer therapy, including octreotide, lanreotide, and pasireotide, which have been approved to treat neuroendocrine tumors, and degarelix, which has been approved for the treatment of prostate cancer.7–9 Since natural peptides have the disadvantages of poor chemical-physical stability as well as short circulating plasma half-lives, both their stability and bioactivity must be optimized by modification methods such as main-chain reconstruction (amino acid substitution) or side-chain modification (cholesterol modification, phosphorylation, polyethylene glycol modification) before being used as a drug.10,11

Usually, the development of ACP drugs is slow due to the high cost of ACP synthesis and the long experimental screening period.12 With the continuous advancement of computer technology and the rapid development of bioinformatics, computer-based rational design strategies have been used to develop more economical and effective ACPs, which are expected to accelerate the drug discovery process and reduce the cost. Computer-aided peptide design provides modes of prediction to evaluate the functional potential of peptides before synthesis by bringing together crucial information such as chemical parameters and bioactivities in the sequence.13 For example, network pharmacology analysis is a powerful tool to predict potential drug targets and identify drug candidates by integrating knowledge from multiple fields, such as bioinformatics, systems biology, and pharmacology, and combining computational tools and data analysis to systematically study the interaction networks and complex pathways between drugs and biomolecules;14,15 Machine learning provides an intelligent and efficient method to optimize ACP sequences by learning from comprehensive training data.16 The development of ACP databases has provided significant support for research in the field of ACPs. Most ACP information is scattered across bioactive peptide databases, such as DRAMP, APD, DBAASP, HORDB, CPPsite, and SATPdb.17–22 Although these databases offer some ACP information, the data is relatively limited. CancerPPD is an experimentally validated database of ACPs and proteins, with data manually collected from published research articles, patents, and other databases, first released in 2015.23 It provides comprehensive peptide information, including their sources, properties, anticancer activity, N- and C-terminal modifications, and conformation. In addition, DCTPep is a novel, open, and comprehensive database for cancer therapy peptides, released in 2024.22 It covers a broader range of ACP types, offering more comprehensive data support for developing peptide drugs in cancer treatment.

Bibliometric analysis is the quantitative analysis of literature data and measurement characteristics using mathematical and statistical methods to understand research progress in a discipline and to analyze the research frontier and hot spot in the discipline.24 There is no comprehensive bibliometric analysis of research on computer-aided design methods targeting ACPs in oncology. However, scholars have previously analyzed publication trends and research hotspots using bibliometric methods for artificial intelligence and bioactive peptides.24,25 We conducted a bibliometric analysis of cancer therapeutic peptides in computer-aided design to fill this gap from 2006 to 2024. We visualized and analyzed the literature on computer-aided peptide design and provided the research progress, hotspots, and emerging trends to help researchers better grasp the future research direction.

Methods Data Sources and Search Strategy

To ensure the quality and completeness of the literature, this study chose to conduct a literature search in the Web of Science Core Collection (WoSCC). This most popular and authoritative scientific literature database provides broad access to important research worldwide. WoSCC is a multidisciplinary database with more than 100 disciplines and is widely used for bibliometric studies, providing essential information on journals and other bibliometric indices.26,27

All data were retrieved and exported from WoSCC on 3 December 2024 using the query TS=(computational OR “in silico” OR computer NEAR/2 aid* OR computer NEAR/2 assist*) AND TS=(peptide$ OR polypeptide$) AND TS=(cancer* OR anticancer* OR tumor* OR tumor* OR oncology OR neoplasm* OR carcinoma*) AND DOP=2006-01-01/2024-12-03, the language was set to English and the type of literature was set to articles and reviews. To include as much relevant literature as possible, some related, synonymous, and hypogynous words were added to the search expression, such as the alternative words tumor and neoplasm for cancer and the hyponym carcinoma, and the initial search returned 2453 publications. Three investigators independently screened, validated, and discussed titles, abstracts, keywords, and full text, finally including 1547 documents (1368 articles, 179 reviews), all completed by December 3, 2024 (Figure 1).

Figure 1 Literature filtration process.

Data Processing and Collection

The retained documents were recorded as “full records and cited references” and exported as “plain text files” and “tab-delimited files” to accommodate different visual analysis software and websites.

Visual Analysis

The visualization software CiteSpace version 6.2.R4 64 bit (Drexel University, USA), VOSviewer version 1.6.19 (Leiden University, The Netherlands), and Bibliometrix running on R4.1.3 were used for the analysis. Bibliometrix (https://www.bibliometrix.org) and the online bibliometric analysis platform (http://bibliometric.com) were used to visualize the analysis. Graphs were generated using the graphing software Origin. Origin is a scientific graphing and data analysis software developed by OriginLab and was used in this study to create a bar chart of the number of publications published annually. CiteSpace is a widely used bibliometric analysis software that shows the boundaries of the field and analyzes the links between the literature. This study mainly used it for clustering analysis, such as clustering of keywords, authors, and institutions. It can perform co-occurrence analysis of countries, institutions, journals, co-cited journals, authors and co-cited authors, and keywords in a time slice from January 2006 to December 2024.

VOSviewer was used in this study to mine, map, and cluster literature data, analyze bibliographic linkage metrics between countries/regions, institutions, and authors, cluster keywords, and present images through network and overlay visualization. VOSviewer was used in this study for the mining, mapping, and clustering of literature data, the analysis among countries/regions, institutions, authors, and keywords, and the presentation of images through network and overlay visualization. In addition, we carried out the basic country, author, and institution analysis and created network diagrams, etc., using the online bibliometric analysis website (http://bibliometric.com). We also analyzed and mapped the number of publications in each country, collaborations between governments, etc., using the Bibliometrix software (https://www.bibliometrix.org) run in R4.1. Scimago Graphica and PyCharm Community Edition 2022.2.1 have also been used to produce geographic cooperation maps, collaboration networks of authors, and keyword clouds, respectively.

Results Publication of Annual Trend

The global publication volume of a specific field sorted by year is a crucial indicator for assessing the progress in that field. A total of 1547 publications from 2006 to 2024 were retrieved from the WoSCC database. To reflect the changes in the number of publications over the past 19 years, a statistical chart is plotted (Figure 2), from which we find that the global publication volume is exhibiting a steadily increasing trend overall.

Figure 2 Trends in annual and accumulated publications from 2006 to 2024.

Analysis of National (Regional) Cooperation

To identify the countries/regions that have published many influential papers on the computer-aided design for peptides in the cancer therapeutics field, we analyzed the cooperation among countries/regions. The national cooperation network (Figure 3A) has 80 nodes and 592 edges, with a network density of 0.1873. As shown in Figure 3B, the larger the nodes, the greater the number of publications; the lighter the color, the more recent the publications. Figure 3B presents the top 10 countries ranked by centrality and publication volume. By the most literature and the greatest influence, the United States ranks first with 401 publications and a centrality of 0.32, followed by China (355 publications and a centrality of 0.19), India (172 publications and a centrality of 0.04, Iran (117 publications and a centrality of 0.01) and Germany (111 publications and a centrality of 0.06). Apart from these countries, countries with significant centrality (>0.10) include Italy (104 publications and a centrality of 0.13) and Pakistan (47 publications and a centrality of 0.11).

Over the past 19 years, 80 countries have contributed to research linked to computer-aided design for the ACPs field. We created a cooperation network using visualization software in countries researching this field (Figure 3C). Most research collaborations are between North America, Europe, and Asia. The thickest line connects the United States and China, representing their closest cooperation. Furthermore, these two countries, with total link strengths of 276 and 126, emerge as the closest national cooperators in academic research. Figure 3D represents a network of international cooperation. The United States and China exhibit the most notable contributions regarding the quantity of publications and maintain closer connections with the others. The most cited review from The United States focuses on the predictive methods for the coding potential of ncRNAs and the approaches for peptide recognition. CircRNAs encode tumor-related functional peptides and their molecular mechanisms of Cancer-promoting and cancer-suppressing effects.28 The second most cited paper investigates a sequence-based identification tool for ACPs, iACP, which outperforms the method of Hajisharifi et al in both accuracy and Matthew’s correlation coefficient as demonstrated by the Jackknife test. This indicates that the current predictive factors achieve a higher overall success rate and more excellent stability.29 The third and the fourth papers explore the Epitope landscape in breast and colorectal cancer.30 Chinese researchers focus on developing ACP prediction factors or tools, such as Li’s ACPred-FL developed in 2018, which accurately predicts ACPs based on sequence information.31 It is noteworthy that the two most cited papers by Chinese researchers are also among the top three cited papers in the United States, highlighting the extensive research collaboration between China and the United States in this field. Given the increasing global incidence of cancer, deeper and broader inter-institutional and international cooperation is particularly crucial.

Figure 3 (A) Knowledge map visualized by CiteSpace; (B) Total publications and centrality of the top 10 countries; (C) Geographic cooperation map visualized by Scimago; (D) Cooperation network of countries.

Analysis of Institutions

Nodes of varying sizes represent institutions with different publication counts, while the node’s color correlates with the average publication time (Figure 4). University of California System in the US, with 37 publications, holds the highest publication count and occupies a central position in the network visualization map. Following the University of California System, the Helmholtz Association and the German Cancer Research Center from Germany rank second and third with 32 and 25 publications, respectively. Also in the top 5 are Le Reseau International des Institut Pasteur (RIIP) from France and the Chinese Academy of Sciences, each with 22 publications. RIIP has the most recent publications with an average publication time of 2017, closer than the other four institutions.

Figure 4 Institutional collaboration network visualization map generated by Citespace.

Analysis of Journals and Co-Cited Journals

Figure 5A and B present a global analysis overlay of citing and cited journals. The nodes in the graphs represent over 10,000 journals included in the Journal Citation Reports (JCR) based on the 2011 Science and Social Science Citation Indices (SSCI). Journals from which the publications in this study originate are categorized into different subjects by color, while journals not included are displayed in a background color (grey). These analyses provide insights into the diversity of disciplines covered by the journals. The top 5 citing journals regarding weight (representing publication count) are Scientific Reports, Molecules, Nature Biotechnology, Computational Structural Biotechnology Journal, and Briefings in Bioinformatics (Figure 5A). In contrast, the top 5 cited journals regarding weight are Nucleic Acids Research, Proceedings of the National Academy of Sciences of the United States of America, Nature, PLOS One, and Journal of Biological Chemistry (Figure 5B).

Figure 5 (A) Overlay visualization map of citing journals analysis. (B) Overlay visualization map of cited journals analysis.

In the journal overlay diagram (Figure 6), the left side represents citing journals, indicating the forefront of disciplines. In contrast, the right side represents cited journals, representing the foundational aspects of the disciplines. These journals are divided into several topics based on research areas. Each point represents a journal, and the lines indicate citation relationships: ellipses represent journals covering a specific topic. Publications in the molecular biology and immunology fields in citing journals are notably influenced by molecular, biological, chemical, materials, physical, and genetic publications. Through this analysis, the historical development of a research field can be inferred, guiding the forefront of the field’s growth and the required references for research.

Figure 6 Dual-map of the overlay of journals.

Co-Authorship Analysis

By setting author collaboration conditions in VOSviewer (maximum number of authors per document=7), 75 prolific authors were selected for visual analysis. A cooperation network graph of 75 nodes and 89 edges was generated using Scimago Graphica (Figure 7). Authors such as Shoombuatong Watshara, Nantasenamat Chanin, Banerjee Ipsita A, Charoenkwan Phasit, and Manavalan Balachandran exhibited leading publication volume and link strength, indicating extensive cooperation with the others. Among the most prolific authors from 2006 to 2024, Navid Nezafat has published 13 papers in this field, primarily focusing on in silico design of peptides vaccine. The second most productive author, David Gfeller, is committed to deciphering HLA-I motifs and developing new computational strategies to predict HLA-I alleles to improve neoantigen prediction. The most prolific author is Stefan Stevanovic. His research involves using immunopeptidomics to identify novel epitopes for cancer therapy. Collaborating with these researchers would provide valuable insights into this evolving area.

Figure 7 Collaboration network of authors.

Analysis of Highly Cited Literature in the Past Three Years

Citation analysis is a crucial indicator of paper quality, reflecting global impact and attention. Table 1 lists the top 10 highly cited publications in the past three years, mostly from 2021, each of which has been cited more than 40 times. The paper “Cancer proteogenomics: current impact and future prospects” had the highest citation count of 99. Among them, six papers are related to computational tools or deep learning, reflecting recent research trends in this field. In the field of computer-aided peptide design for cancer therapy, highly cited publications are predominantly found in journals related to molecular, biological, chemical, materials, physical, and genetic disciplines (Figure 6). An analysis of the WoS Citation Report for the past 10, 5, and 3 years indicates a rapid increase in publication volume in the categories of Mathematical Computational Biology and Computer Science Interdisciplinary Applications.

Table 1 The Top 10 Most Cited Publications During the Period of 2022–2024

Analysis of Keywords Keywords Co-Occurrence Analysis

A total of 539 keywords were filtered in this study. Using CiteSpace, a co-occurrence network map was presented to identify research hotspots. (Figure 8A). A word cloud was also created using the top 100 keywords between 2006 and 2024, with font size indicating frequency (Figure 8B). Keywords with frequencies more than 100 include “prediction”, “molecular dynamics simulation”, “identification”, “cancer”, “peptide”, “expression”, “protein”, “binding”, and “molecular docking” (Table 2). The keyword with the highest centrality is “identification”, followed by “peptide”, “molecular dynamics simulation”, “cancer”, and “binding”, respectively.

Figure 8 (A) The co-occurrence network of keywords. (B) The keyword clouds.

Table 2 Keywords With a Frequency of No Less Than 100

Keyword Burst Analysis

Keyword burst analysis refers to a significant increase in the frequency of a keyword within a short period, reflecting changes in research hotspots and emerging trends in specific research areas. By analyzing the top fifty keywords with burst strength (Figure 9A), we identified 5 emerging keywords: “server”, “mechanisms”, “deep learning”, “system”, and “cell-penetrating peptides”. Moreover, we created a keyword heat map using R software, showing the temporal changes of 30 high-frequency keywords (Figure 9B). We standardized the frequency of keywords to range between 0 and 1. Each cell represents the frequency of a word within a year, with colors ranging from black to yellow, where black indicates the lowest frequency and yellow indicates the highest. Notably, in recent years, keywords such as “machine learning”, “cancer”, “immunotherapy”, “bioinformatics”, “drug discovery”, and “neoantigen” have been frequently observed.

Figure 9 (A) Top 50 keywords with the strongest citation bursts. (B) keyword heat map.

Keyword Cluster Analysis

Keyword cluster diagrams help visualize different research focuses within a specific field, reflecting the composition of various research topics. By conducting cluster analysis on keywords, we obtained 8 clusters (Figure 10). The smaller the number in the cluster label, the more keywords it contains. The cluster modularity value (Q) = 0.3646 > 0.3 indicates effective clustering, and the cluster silhouette index (S) = 0.6684 > 0.5 suggests reasonable cluster analysis results. In the keyword cluster network, we can roughly categorize the clusters into three groups: #6 human papillomavirus focuses on research objects of peptide therapy; #1 cancer immunotherapy, #3 targeted therapy, #4 identification and #5 expression emphasizes studies on mechanisms of tumor treatment; #0 molecular dynamics and #2 machine learning focus on computer-aided methods for peptide design.

Figure 10 The cluster network of keywords.

Discussion Global Research Trend

The number of publications on computer-aided ACP therapy has steadily increased from 2006 to 2024. Over 88% of these publications are research articles, highlighting the importance of producing more original research with high potential in this field. The United States has the highest publication output and centrality, followed by China. Its 3 most cited papers focus on the predictive methods for tumor-related functional peptides, identification tools for ACPs, epitope landscape in cancer and neoantigen prediction.28–30 Chinese researchers focus more on developing ACP prediction factors or tools.31 Furthermore, there is significant collaboration between researchers in the United States and China, and this trend is expected to persist. Given the increasing global cancer incidence, more profound and broader inter-institutional and international cooperation is particularly crucial. Notably, interdisciplinarity is a prominent feature, with cutting-edge molecular biology and immunology studies relying heavily on computational assistance. These studies also reference computer science, algorithms, systems, chemistry, materials science, physics, and genetics literature. We identified the top 10 most cited publications from the past three years, of which six focus on model predictions about peptides or protein sites. This indicates that computational platforms, predictive models, and databases are increasingly gaining attention.

Research Hotspots and Frontiers

In the keyword cluster network, the clusters can be roughly divided into three categories: research objects of peptide therapy, studies on tumor treatment mechanisms, and computer-aided peptide design methods. The keyword cluster analysis reveals that cancers associated with the human papillomavirus are the most extensively studied. The principal mechanism of peptide anti-cancer activity is illustrated in Figure 11.32–38 Literature analysis reveals that the current peptide design for anticancer mechanisms primarily focuses on cell-penetrating peptides, and immunotherapy is also an extensively researched area. In specific studies, cell-penetrating peptides are commonly employed in the design of drug delivery systems, while immunotherapy is predominantly directed toward peptide vaccine research.

Figure 11 The main mechanism of ACPs.

The targeting and drug delivery of peptides against cancer are inextricably linked to the various mechanisms of action of peptides. The interest in cell-penetrating peptides has increased significantly, with the keyword emerging as one of the emerging keywords with the highest burst intensity in recent years. This may be attributed to its ability to deliver loaded molecules, such as peptides and small molecules with therapeutic properties, into the cytoplasm of target cells. At present, cell-penetrating peptides are extensively employed in the field of tumor-targeted drug delivery.39,40 The emerging keyword in 2021–2022 is “peptide vaccine”. Peptide vaccines are cancer immunotherapy that shows significant potential for cancer treatment by targeting specific antigens and activating the patient’s immune system to produce a particular response to cancer cells. In recent years, immunotherapy has attracted considerable attention in cancer treatment. Peptide vaccines have the potential to be a breakthrough due to their multiple advantages. These include their specificity for cancer cells, potential to stimulate long-term immune memory, ease of production and scale-up, and relatively low toxicity.41 Consequently, peptide vaccines may have great potential in cancer immunotherapy. Mahdevar and his team employed an immunoinformatics approach to successfully design a novel multi-epitope vaccine for breast cancer, demonstrating significant efficacy and therapeutic potential.42 Sanami and her team also designed a multi-epitope vaccine with therapeutic potential for cervical cancer with the aid of immunoinformatics.34 Unfortunately, numerous studies on peptide-based vaccines have not been successful in clinical trials due to the immune evasion of tumor cells and the loss of tumor antigens.43

The design and optimization of ACPs are essential for enhancing their clinical application potential. Currently, modification strategies for ACPs primarily focus on three key areas: improving activity, enhancing stability, and reducing toxicity (Figure 12). Firstly, enhancing the activity of ACPs can be achieved through various methods, including amino acid substitution and sequence shortening, construction of hybrid peptides, peptide methylation, lipid modification, and glycosylation. For instance, replacing L-amino acids in somatostatin with D-amino acids and shortening the amino acid sequence to eight residues successfully yielded octreotide. Research indicates that this modification significantly enhances the peptide’s activity and extends its plasma half-life to 1.5 hours.5 Additionally, peptides are prone to hydrolysis or degradation in vivo, prompting researchers to employ multiple strategies to improve their stability. These strategies encompass peptide cyclization, N-terminal acetylation, C-terminal amidation, lipid modification, glycosylation, and polyethylene glycol modification. Researchers have developed scaffolds supported by carbon-carbon bonds or other linkages to stabilize the α-helix structure of peptides. ACPs derived from this cyclization modification method are known as stapled peptides.44 For example, Walensky et al reported a study in which the BCL-2 protein was redesigned into an active fragment with a stapled peptide structure. This redesigned BCL-2 stapled peptide fragment can engage in protein-protein interactions within cells and demonstrates enhanced in vivo metabolic stability.45

Figure 12 Peptide design strategy. Created in BioRender. B, H. (2025) https://BioRender.com/s80t622.

Furthermore, numerous studies have highlighted the significant role of advanced drug delivery technologies and nanomedicine in mitigating the cytotoxicity of peptides in vivo. These technologies facilitate the precise delivery of drugs to pathological tissues while minimizing side effects in other areas, thereby reducing unnecessary drug action and enhancing therapeutic efficacy.46 Peptides can self-assemble or co-assemble with other materials to form multifunctional nanomaterials for targeted modification and responsive drug release. These nanomaterials offer advantages such as high drug loading, low molecular weight, low immunogenicity, and low production costs and are extensively utilized in biomedical fields like tissue engineering and gene therapy.47 For instance, Cheng et al developed a polymer–peptide conjugate (PPC) that responds to excessive ROS in the tumor microenvironment. This material can self-assemble into nanoparticles and target mitochondria. In a high ROS environment, the polyethylene glycol shell of PPC sheds, releasing the cytotoxic peptide KLAK, which forms a nanofiber structure that interacts with mitochondria, inducing apoptosis of cancer cells and significantly enhancing the selective cytotoxicity and in vivo tumor suppression effects of the ACP.48

Encouragingly, several designed peptides have already been utilized in cancer therapy. Among them, two types of Peptide-drug conjugates (PDCs), namely Lutathera and melflufen, have commenced clinical application. Lutathera ([177Lu]Lu-DOTA-(Tyr3)-octreotate) is the First FDA- and EMA-Approved Radiopharmaceutical for Peptide Receptor Radionuclide Therapy.49,50 The radionuclide 177Lu is complexed by the bifunctional chelator DOTA, which is bound to the somatostatin affine peptide (Tyr3)-octreotate. Melphalan flufenamide (melflufen) is a first-in-class PDC that consists of melphalan conjugated to the peptide para-fluro-l-phenylalanine developed by Oncopeptides for the treatment of multiple myeloma and amyloid light-chain amyloidosis.51,52 A rationally designed peptide-conjugated gold/platinum nanosystem is also used for cancer therapy. Based on bimetallic nanoparticles named Au@Pt, the nanosystem was conjugated with a rationally designed peptide (LyP-1-PLGVRG-DPPA-1).53 The obtained Au@Pt-LMDP nanosystem can serve as a matrix metalloproteinase-activated tumor-targeting agent for enhancing tumor photothermal immunotherapy. While progress has been noted in peptide modification design, several limitations remain. Achieving high activity, low toxicity, and strong stability simultaneously for ACPs is challenging. Peptides are prone to degradation by proteases in vivo, and while modifications can enhance their stability, they may also introduce new issues. For instance, PEGylation can increase the molecular weight of peptides and extend their half-life in the body. Still, it may alter the original structure and physicochemical properties of the peptides, thereby affecting their biological activity. Although some modifications can enhance the activity of peptides, improper modifications may impair their activity, such as by changing the conformation of the peptides, preventing effective binding to receptors. Therefore, researchers need to conduct repeated experiments and screenings to identify the most suitable modification strategies for specific ACPs.

Between 2006 and 2024, the results of this study’s bibliometric analysis demonstrate an overall increase in the number of publications within the relevant literature, with an increasing number of researchers employing various computer-aided methodologies to develop novel peptide drugs. Computer-aided drug design has emerged as a highly effective technology pivotal in developing new drug molecules. Computer-aided peptide design is primarily divided into two categories: structure-based peptide design and ligand-based peptide design (Figure 13). The keyword analysis results indicate that molecular dynamics simulation is a frequently employed methodology in this field, with molecular docking also commonly utilized. Both techniques are also frequently utilized in structure-based peptide design. Molecular docking is a method of modeling the interaction between a drug molecule and a target molecule to determine how they bind and the stability of the binding.54 This is a more widely used method in structure-based peptide design.55 Commonly used molecular docking software includes AutoDock, AutoDock Vina, Gold, and Glide.56–60 The limitations of molecular docking were identified almost two decades ago, yet they remain a subject of active research.61 Two key components of docking methods are search algorithms and scoring functions. Conformational search algorithms are particularly susceptible to limitations when dealing with longer and more pliable ligands, particularly in shallow and less chemically characterized binding sites. Furthermore, there is still scope for enhancing the computational accuracy of force-field-based scoring functions, given the need for computational efficiency.54,62 Molecular dynamics (MD) simulation is a powerful tool that enables insight into dynamic properties of biomolecular systems. Examples of such systems include transfer coefficient simulation, protein folding and stability, ligand binding, and protein complexation.63 MD simulation provides important guidance for experiments and thus is widely used. However, it also has some challenges to overcome. The phenomena and systems studied by MD simulations are limited by time and spatial scales.64 Furthermore, force field limitations represent a significant challenge.64 Additionally, MD simulation can be a complex computational technique for beginners, which can easily lead to biased results without a thorough understanding of its complexity.65

Figure 13 Flowchart of computer-aided peptide design methodology. Structure-based peptide design is a powerful approach when the spatial structure of the target is known. Leveraging the properties and characteristics of the macromolecule’s spatial structure, structure-based peptide design enables the creation of compounds that possess complementary qualities to the desired target site. Homology modeling, molecular docking, and MD simulations are prevalent methodologies in Structure-based peptide design.66 Ligand-based drug design is considered an indirect technique primarily because the structure of the biomolecular target is unknown and cannot be reliably anticipated using approaches such as homology modeling.67 Pharmacophore modeling and quantitative structure-activity relationships (QSAR) are among the most significant and widely employed methods in ligand-based drug discovery.66 Created in BioRender. B, H. (2025) https://BioRender.com/b26j385.

The keywords burst analysis has revealed that integrating deep learning with peptide drug design is becoming an emerging research focus. Over the past decade, numerous machine learning-based ACP predictors have been proposed, utilizing a wide range of algorithms, including support vector machines (SVM), random forests (RF), K-nearest neighbor (KNN), and various ensemble methods.68–72 Notable examples include AntiCP, iACP, ACPP, iACP-gaeNSc, MLACP, ACPred, AntiCP 2.0, and ACPred-Fuse.29,73–79 In recent years, the rapid advancement of deep learning has prompted researchers to increasingly adopt deep learning-based models, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), for ACP prediction.80,81 Furthermore, some studies have combined machine learning and deep learning techniques to improve prediction accuracy, exemplified by the LGBM-OE model.82 Several reviews on ACP predictors are available in the literature.83,84

Table 3 lists some deep learning-based ACP predictors in the last three years. In a recent study, Arif et al proposed an ensemble-based Cascade Deep Forest (CDF) model named PLMACPred.85 This model combines evolutionary features, compositional properties, and protein language model (PLM) encodings. Moreover, it incorporates two-dimensional wavelet denoising techniques to eliminate noise from the extracted features effectively. PLMACPred demonstrates a remarkable performance advantage over most existing models, achieving up to an 18-fold improvement in prediction accuracy. In a parallel development, Wang et al proposed the iACP-DFSRA model, which leverages a Residual Convolutional Neural Network (ResCNN) to capture local features and attention mechanisms to extract global features.86 These features are subsequently integrated through attention, allowing the model to outperform traditional methods across multiple evaluation metrics by effectively combining local and global features. Furthermore, Yue et al introduced an innovative three-channel ACP predictor, CNBT-ACPred, which integrates a CNN, a hybrid convolution and bidirectional long short-term memory network, and a Transformer model.87 The cytotoxicity assay results have demonstrated that the accuracy of CNBT-ACPred in predicting ACPs surpasses 90%. Moreover, through the synergy of model predictions and experimental validation, the research team has identified tPep14 as a promising candidate for an ACP.

Table 3 ACP Predictors Based on Deep Learning

Compared with traditional machine learning methodologies, deep learning models have demonstrated superior capabilities in managing complex nonlinear relationships and vast datasets, circumventing laborious feature engineering processes. This renders deep learning particularly advantageous in the identification of potential ACPs. Nonetheless, deep learning technology confronts several challenges, including reliance on extensive labeled datasets, a lack of model interpretability, and the demand for substantial computational resources during large-scale data training and model optimization. These challenges must be progressively addressed in future research endeavors. Overall, the application of deep learning in the field of ACP identification is continuously overcoming these limitations and revealing significant potential.

Limitations

Due to the limitations of the search method, only English publications from the last 19 years were included in this search. In addition, although the Web of Science, as the preferred platform for bibliometric analysis, has extensive and authoritative database resources, it cannot cover all journals and publications, which means that a small amount of literature may have been omitted from our study. Some recently published high-level publications were not included in the study due to low citation frequency or the data analyzed not reaching the analysis threshold, which is also one of the limitations of this review.

Although we adopted strict criteria and procedures during the publication screening process, the limitations of manual screening and the continuous updating of databases may have led to the omission of some recently published high-level publications because they were not included promptly or their citation frequency did not reach the analysis threshold. This limitation has had an impact on the comprehensiveness and timeliness of our study. Furthermore, this study has not conducted an in-depth bibliometric analysis of cancer subcategories. Cancer, as a complex class of diseases, may differ in its different subclasses regarding research methods, treatment strategies, and research hotspots. Consequently, the failure to investigate the research trends and hotspots of various cancers and the distribution of the number of studies on each cancer subcategory represents another limitation of this study.

Conclusion

A review of research in computer-aided peptide design reveals a consistent and notable increase over the past 19 years, with a clear upward trajectory since 2006. In recent years, ACP research has focused more on cell-penetrating peptides, which correspond to drug delivery, respectively, and could be a hot direction for future research. Furthermore, peptide vaccines related to immunotherapy also merit attention. However, two significant issues must be addressed in peptide vaccine research: preventing tumor cell immune evasion and avoiding the loss of tumor antigens. The resolution of these issues will be a pivotal factor in the advancement of peptide vaccine research.

In the present context, the necessity for accelerating the discovery process of ACPs and reducing costs has led to the emergence of computer-aided peptide design as an indispensable tool in this field. In this context, techniques such as molecular docking and MD simulations have provided robust support for predicting ACPs. These methods have enhanced the accuracy of predictions and significantly accelerated the pace of research. Nevertheless, further enhancing prediction precision while guaranteeing computational efficiency represents a pivotal challenge for current and future research. With the rapid advancement of deep learning technologies, predictive models for ACP based on deep learning have emerged continuously and been refined, becoming a hot topic in research. Deep learning has significantly enhanced the accuracy and efficiency of ACP prediction. However, challenges remain in the research and application process, such as the dependence on large-scale labeled data, limitations in model interpretability, and the demand for high computational resources. Future studies must address these issues to improve the models’ generalizability and practicality.

Abbreviations

AAConv, Attention Augmented Convolutional Neural Network; ACP, Anticancer Peptide; BiLSTM, Bidirectional Long Short-Term Memory; CAM, Channel Attention Module; CDF, Cascade Deep Forest; CNN, Convolutional Neural Network; CNN_Bi-LSTM, Hybrid Convolution and Bidirectional Long Short-Term Memory Network; DNN, Deep Neural Network; GCN, Graph Convolution Network; IARC, International Agency for Research on Cancer; JCR, Journal Citation Reports; KNN, K-Nearest Neighbor; MD, Molecular Dynamics; PDC, Peptide-Drug Conjugate; PLM, Protein Language Model; PPC, polymer–peptide conjugate; QSAR, Quantitative Structure-Activity Relationships; RIIP, Le Reseau International des Instituts Pasteur; RF, Random Forest; RNN, Recurrent Neural Network; SSCI, Social Science Citation Indices; SVM, Support Vector Machine; WoSCC, Web of Science Core Collection.

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis, and interpretation, or all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Funding

This work was supported by the Natural Science Foundation of Guangdong Province (No. 2023A1515030055), the Science and Technology Planning Projects of Guangzhou City, China (No. 2024A03J0143, 202201020203, 202201020117), the Characteristic Innovation Project of Education Department of Guangdong (No. 2022KTSCX095), the Medical Science and Technology Research Fund Project of Guangdong Province (No. A2023225), the Research Capacity Enhancement Program of Guangzhou Medical University (No. 2024SRP154).

Disclosure

The authors declare that they have no competing interests. Figures 1 and 11 were created by FigDraw (www.figdraw.com). Figures 12 and 13 were produced by BioRender (www.biorender.com).

References

1. Bray F, Laversanne M, Sung HYA, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–263. doi:10.3322/caac.21834

2. Singh S, Utreja D, Kumar V. Pyrrolo 2,1-f 1,2,4 triazine: a promising fused heterocycle to target kinases in cancer therapy. Med Chem Res. 2022;31(1):1–25. doi:10.1007/s00044-021-02819-1

3. Oun R, Moussa YE, Wheate NJ. The side effects of platinum-based chemotherapy drugs: a review for chemists. Dalton Trans. 2018;47(19):6645–6653. doi:10.1039/C8DT00838H

4. Pan X, Xu J, Jia X. Research progress evaluating the function and mechanism of anti-tumor peptides. Cancer Manage Res. 2020;12:397–409. doi:10.2147/CMAR.S232708

5. Li CM, Haratipour P, Lingeman RG, et al. Novel peptide therapeutic approaches for cancer treatment. Cells. 2021;10(11):2908. doi:10.3390/cells10112908

6. Hilchie AL, Hoskin DW, Power Coombs MR. Anticancer activities of natural and synthetic peptides. Adv Exp Med Biol. 2019;1117:131–147.

7. Muttenthaler M, King GF, Adams DJ, Alewood PF. Trends in peptide drug discovery. Nat Rev Drug Discov. 2021;20(4):309–325. doi:10.1038/s41573-020-00135-8

8. Reubi JC, Schonbrunn A. Illuminating somatostatin analog action at neuroendocrine tumor receptors. Trends Pharmacol Sci. 2013;34(12):676–688. doi:10.1016/j.tips.2013.10.001

9. Nickols NG, Goetz MB, Graber CJ, et al. Hormonal intervention for the treatment of veterans with COVID-19 requiring hospitalization (HITCH): a multicenter, Phase 2 randomized controlled trial of best supportive care vs best supportive care plus degarelix: study protocol for a randomized controlled trial. Trials. 2021;22(1):8. doi:10.1186/s13063-020-04978-9

10. Fosgerau K, Hoffmann T. Peptide therapeutics: current status and future directions. Drug Discov Today. 2015;20(1):122–128. doi:10.1016/j.drudis.2014.10.003

11. Xie M, Liu D, Yang Y. Anti-cancer peptides: classification, mechanism of action, reconstruction and modification. Open Biol. 2020;10(7):200004. doi:10.1098/rsob.200004

12. Basith S, Manavalan B, Shin TH, Lee DY, Lee G. Evolution of machine learning algorithms in the prediction and design of anticancer peptides. Curr Protein Pept Sci. 2020;21(12):1242–1250. doi:10.2174/1389203721666200117171403

13. Cardoso MH, Orozco RQ, Rezende SB, et al. Computer-aided design of antimicrobial peptides: are we generating effective drug candidates? Front Microbiol. 2020;10:15. doi:10.3389/fmicb.2019.03097

14. Huang C, Zhan L. Network pharmacology identifies therapeutic targets and the mechanisms of glutathione action in ferroptosis occurring in oral cancer. Front Pharmacol. 2022;13.

15. Iksen I, Witayateeraporn W, Wirojwongchai T, et al. Identifying molecular targets of Aspiletrein-derived steroidal saponins in lung cancer using network pharmacology and molecular docking-based assessments. Sci Rep. 2023;13(1). doi:10.1038/s41598-023-28821-8

16. Priya S, Tripathi G, Singh DB, Jain P, Kumar A. Machine learning approaches and their applications in drug discovery and design. Chem Biol Drug Des. 2022;100(1):136–153. doi:10.1111/cbdd.14057

17. Shi G, Kang X, Dong F, et al. DRAMP 3.0: an enhanced comprehensive data repository of antimicrobial peptides. Nucleic Acids Res. 2022;50(D1):D488–D496. doi:10.1093/nar/gkab651

18. Wang G, Li X, Wang Z. APD3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 2016;44(D1):D1087–1093. doi:10.1093/nar/gkv1278

19. Pirtskhalava M, Amstrong AA, Grigolava M, et al. DBAASP v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Res. 2021;49(D1):D288–D297. doi:10.1093/nar/gkaa991

20. Agrawal P, Bhalla S, Usmani SS, et al. CPPsite 2.0: a repository of experimentally validated cell-penetrating peptides. Nucleic Acids Res. 2016;44(D1):D1098–1103. doi:10.1093/nar/gkv1266

21. Singh S, Chaudhary K, Dhanda SK, et al. SATPdb: a database of structurally annotated therapeutic peptides. Nucleic Acids Res. 2016;44(D1):D1119–1126. doi:10.1093/nar/gkv1114

22. Sun X, Liu Y, Ma T, Zhu N, Lao X, Zheng H. DCTPep, the data of cancer therapy peptides. Scientific Data. 2024;11(1):541. doi:10.1038/s41597-024-03388-9

23. Tyagi A, Tuknait A, Anand P, et al. CancerPPD: a database of anticancer peptides and proteins. Nucleic Acids Res. 2015;43(Database issue):D837–843. doi:10.1093/nar/gku892

24. Shen ZF, Wu HY, Chen ZS, et al. The global research of artificial intelligence on prostate cancer: a 22-year bibliometric analysis. Front Oncol. 2022;12:16.

25. Encalada IP, Cocom LMC, Bojorquez NDQ, Campos MRS. Bibliometric analysis of the role of bioactive peptides in cancer therapy. Int J Pept Res Ther. 2023;29(4):16.

26. Zhang GY, Zhang YJ, Zhang YW, et al. Global trends in indocyanine green fluorescence navigation in the field of gastric cancer: bibliometrics and knowledge atlas analysis. Quant Imaging Med Surg. 2023;13:7117–7141. doi:10.21037/qims-23-391

27. Li K, Rollins J, Yan E. Web of Science use in published research and review papers 1997–2017: a selective, dynamic, cross-domain, content-based analysis. Scientometrics. 2018;115(1):1–20. doi:10.1007/s11192-017-2622-5

28. Wu P, Mo YZ, Peng M, et al. Emerging role of tumor-related functional peptides encoded by lncRNA and circRNA. Mol Cancer. 2020;19(1):14. doi:10.1186/s12943-020-1147-3

29. Chen W, Ding H, Feng P, Lin H, Chou KC. iACP: a sequence-based tool for identifying anticancer peptides. Oncotarget. 2016;7(13):16895–16909. doi:10.18632/oncotarget.7815

30. Segal NH, Parsons DW, Peggs KS, et al. Epitope landscape in breast and colorectal cancer. Cancer Res. 2008;68(3):889–892. doi:10.1158/0008-5472.CAN-07-3095

31. Wei LY, Zhou C, Chen HR, Song JN, Su R. ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics. 2018;34(23):4007–4016. doi:10.1093/bioinformatics/bty451

Comments (0)

No login
gif