Metachromatic leukodystrophy (MLD) is an autosomal recessive lysosomal storage neurodegenerative disorder, pathologically characterized by progressive motor and cognitive dysfunction (Miyake et al., 2021; Lamichhane and Cabrero, 2023; Penati et al., 2017; Singh and Singh, 2024b). MLD is caused by the deficiency of aryl sulfatase A (ARSA), which leads to the accumulation of sulfatide, the main glycolipid involved in stabilizing the myelin sheath equilibrium in the central and peripheral nervous systems (Sessa et al., 2016; Fumagalli et al., 2022). Sulfatide accumulation may cause various pathophysiological conditions, including progressive demyelination, neuroinflammation, communication gap, astrocyte dysfunction, developmental delay, speech disorder, and in severe cases it causes death within 5–6 years of the early onset of the disease (Miyake et al., 2021; Lamichhane and Cabrero, 2023; Fumagalli et al., 2022). Based on the age of onset, MLD is categorized into three major clinical types: late-infantile (≤30 months); juvenile, which is subdivided into early juvenile [30 months–6 years] and late juvenile [7–16 years]; and adult MLD (≥17 years) (Miyake et al., 2021; Fumagalli et al., 2022; Chang et al., 2024). Worldwide, the prevalence of MLD is 1.4–1.8 in 100,000 or 1 in 40,000, bringing it to the loop of rare diseases of greater concern (Lamichhane and Cabrero, 2023; Chang et al., 2024; Shaimardanova et al., 2020).
To date, more than 280 mutations have been reported in the ARSA gene, which poses a major challenge in the success of existing therapeutic options, including gene therapy, hematopoietic stem cell therapy, enzyme replacement therapy, and chaperone therapy, thus making these options costly and, therefore, demanding a case-by-case approach that leads the treatment of this rare disease beyond the reach of the larger population (Shaimardanova et al., 2020; Kurtzberg, 2022; Eichler et al., 2022; Fernández-Pereira et al., 2021; Sevin and Deiva, 2021). Another major obstacle to the success of existing therapies is the blood–brain barrier (BBB) in delivering the therapeutic gene or enzyme to the target site (Miyake et al., 2021). In contrast to existing therapies, which are mainly ARSA-dependent, a new approach of substrate reduction therapy (SRT) can be an alternate therapeutic option for metachromatic leukodystrophy. SRT has shown its success in various single-gene disorders including some key lysosomal storage disorders (Mistry et al., 2023; Komada et al., 2021; Istaiti et al., 2022). SRT focuses on the development of oral drugs to treat pathophysiological conditions. Miglustat and eliglustat are the two FDA-approved oral drugs for Gaucher’s disease (Mistry et al., 2023; Komada et al., 2021; Istaiti et al., 2022).
In MLD, the therapeutic focus in substrate reduction therapy is the development of a specific, potent, and competitive inhibitor which targets the catalytic action of the rate-limiting enzyme, cerebroside sulfotransferase, that is involved in the biosynthesis of sulfatides (Singh and Singh, 2024b; Yaghootfam et al., 2007; Li et al., 2020). To date, the development in substrate reduction therapy has been at the nascent stage in MLD. Experimental data with available CST inhibitors are limited, even though the full-length CST sequence and cDNA are available (Li et al., 2020). The lack of structural information of proteins due to the unavailability of the three-dimensional structure of CST has been a major roadblock in initiating preliminary in silico-based drug discovery, which has great potential in screening drug-like candidates in the shortest possible time with less expenses. The development of the three-dimensional homology model of CST by our group, which used a multipronged modeling and validation approach, is a breakthrough in SRT research in MLD (Singh and Singh, 2024a). This study uses this model for the screening of phytoconstituents from Medhya Rasayana.
Medhya Rasayana is a group of Ayurvedic medicinal herbs with neuroprotective effects against various neurodegenerative diseases and known to improve cognitive function and neural tissue regeneration, retard brain aging, and balance mental health (Kulkarni et al., 2012; Sarokte and Rao, 2013; Rashmi et al., 2017). Medhya Rasayana comprises four major herbs: Mandukaparni (Centella asiatica Linn.), Yastimadhu (Glycyrrhiza glabra Linn.), Guduchi (Tinospora cordifolia Miers), and Shankhpushpi (Convolvulus pluricaulis Chois) (Kulkarni et al., 2012). These herbs are rich in phytoconstituents with unique bioactive properties. This study focuses on identifying potent and specific bioactive compounds as inhibitors against CST through multipronged in silico studies. High-throughput virtual screening is the most common and reliable strategy used to identify the potent and specific inhibitors and potential lead molecules from the diverse dataset of small molecules at low cost and rapid pace using the recent advancement in the field of bioinformatics (Singh et al., 2018; Singh et al., 2017; Yasuo and Sekijima, 2019). With this perspective, the present study opens the door for many futuristic studies to achieve a marketed oral drug for metachromatic leukodystrophy that could bring this hereditary disease within the reach of a possible cost-effective treatment loop.
2 Methodology2.1 ResourcesThe computational study was executed using the high-performance super-computing facility, PARAM Shivay, installed at the Indian Institute of Technology, Banaras Hindu University, Varanasi, India, with a capacity of 837 TFLOPS with Intel(R) Xeon(R) Gold 6148 CPU @2.40 GHz and 40 CPUs per node. This study used key software programs including AutoDock 4.2, GROMACS 2023, Origin 2024, Chimera 1.17.3, PyMOL, Discovery Studio visualizer, VMD, and Bio3D tool with R package and key webtools including PharmMapper, pkCSM, SwissADME, and ProTox 3.0.
2.2 Protein structural analysis and receptor grid generationFor screening of bioactive phytoconstituents, the 3D model of CST developed in our earlier study was used as a receptor. The model comprises 69–336 amino acid residues of the full-length protein with 423 residues and covers the entire catalytic region of the protein with a sulfuryl acceptor substrate (galactocerebroside)-binding site and a sulfuryl donor co-substrate (PAPS)-binding site on a linear horizontal plane (Singh and Singh, 2024a; Singh and Singh, 2024b). The model protein was processed using AutoDock 4.2 by the removal of water molecules and heteroatoms and the addition of hydrogen atoms with proper assignment of atom type and Gasteiger charge to generate a PDBQT file of the protein. A grid box of 90 × 90 × 90 Å and spacing of 0.253 Å around the protein active site was generated considering the key residues LYS82, HIS84, LYS85, HIS141, PHE170, TYR176, PHE177, TYR203, and ARG202, and a grid file (.gpf) was created.
2.3 Ligand preparationWe used a set of compounds from the Indian Medicinal Plants, Phytochemistry, And Therapeutics (IMPPAT 2.0) online database (Vivek-Ananth et al., 2023; Mohanraj et al., 2018). The 3D (.mol2) structures of 81 compounds from C. asiatica Linn (Mandukaparni), 310 from G. glabra Linn (Yastimadhu), 52 from T. cordifolia Miers (Guduchi), and 18 from C. pleuricaulis Chois (Shankhpushpi) were converted to .pdbqt files after energy minimization and assigning proper atom types using AutoDock Raccoon, a virtual screening file preparation tool (Forli et al., 2016).
Following the preparation of the protein, ligands, and grid files, AutoDock Raccoon was further used for the preparation of docking (.dpf) files for each ligand and the subsequent arrangement of all files in a single separate folder for each protein–ligand complex with the generation of a single virtual screening script file (.sh) for performing molecular docking-based virtual screening of all ligands simultaneously under the Linux environment. The parameters used for the generation of .dlg files for docking run were 100 GA run, a population size of 300, maximum number of generations of 27,000, and maximum number of evaluations of 25,000,000. AutoDock applied the Lamarckian genetic algorithm and a gradient-based local search method for protein–ligand interactions. The pKi and ligand efficiency are two major parameters for assessing the binding potential of ligands in the active site. The best-docked conformation was selected, processed using custom Python scripts, and visualized using .pdb visualization tools including PyMOL and Discovery Studio Visualizer to analyze protein–ligand interactions and ligand-binding patterns in the active site.
2.4 In silico drug-likeness properties, ADME, and toxicity analysisThe selected top hits based on the binding score and number of conformations in the largest cluster were subjected to pharmacokinetic property analysis using pkCSM, SwissADME, and ProTox webservers (Banerjee et al., 2018; Daina et al., 2017; Pires et al., 2015). The canonical Simplified Molecular Input Line Entry System (SMILE) of the selected compounds was used as the entry data in these servers for their pharmacokinetic and drug-likeness property analysis through adsorption, distribution, metabolism, and excretion along with toxicity studies.
2.5 Molecular dynamics simulationsAll atom molecular dynamic (MD) simulations were performed using the GROMACS 2023.1 software package using the CHARMM27 all-atom additive force field. For the simulation, a dodecahedron simulation box was created with a minimum distance of 1.2 Å from the box edge, and periodic boundary conditions were applied to minimize the edge effect. Each box was solvated with the TIP3P water solvation model, and the charges on the system were then neutralized by the addition of chloride (Cl−) ions, and thereafter, the solvated system was energy-minimized using the steepest descent algorithm. The energy-minimized system was then equilibrated to 1,000-ps (1 ns) NVT simulation with a time step of 2 fs at 300 K. Next, the system was equilibrated to 1,000-ps NPT simulations with a time step of 2.0 fs at 300 K. The LINCS algorithm was used to constrain the bond lengths. The final MD simulation run was performed for 100 ns, and trajectories were analyzed using different MD parameters including root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (Rg), solvent-accessible surface area (SASA), hydrogen bonds, principal component analysis (PCA), and dynamic cross-correlation matrix (DCCM) analysis of the protein–ligand complexes.
2.6 Cross-target predictionThe PharmMapper server was used to identify a wide range of targets using an innovative reverse pharmacophore mapping approach (Wang et al., 2017; Liu et al., 2010). The best mapping poses of the submitted molecule (.mol2) were aligned against all target proteins available in PharmTargetDB. The algorithm used to perform the reverse pharmacophore matching protocol comprises a sequential combination of triangle hashing (TriHash) and genetic algorithm (GA) optimization (Liu et al., 2010). Based on the calculated highest fit-score (cutoff >5.0) between the small compound and the pharmacophore models, the probable protein targets were ranked. It was imperative to check the disease-causing potential of the targets.
3 Results3.1 Molecular docking-based virtual screeningVirtual screening is a computational approach used to screen libraries of compounds available in databases against the target protein to identify a potential drug candidate for a targeted disease (Singh et al., 2018; Singh et al., 2021; Zare et al., 2024; Pirolli et al., 2023). In the CST-led research to develop substrate reduction therapy for metachromatic leukodystrophy, the state-of-the-art in silico approach could be the most promising and directional method for future studies. Toward this approach, the development of a homology model of the CST protein using various computational algorithms is an important step (Singh and Singh, 2024a). In the present study, this 3D model was utilized for screening specific and potent phytoconstituents of four major herbs of Medhya Rasayana. Out of a total of 461 bioactive compounds, 81 compounds from C. asiatica Linn., 310 from G. glabra Linn., 52 from T. cordifolia Miers, and 18 from C. pleuricaulis Chois were screened against CST, which are detailed in Supplementary Tables S1–S4. In the initial level of screening, using the lowest free energy of a binding cutoff of ≤ −7.5 kcal/mol and the number of conformation cutoff of ≥75 in the largest conformation cluster, the top 15 compounds were selected, i.e., 3 from C. asiatica and 12 from G. glabra. Compounds belonging to T. cordifolia and P. Chois were found to be less potent and less specific toward CST and, hence, were not considered for further study. The docking score of these top 15 compounds falls between −7.57 and −10.32 kcal/mol. With a ligand efficiency of ≤0.19 per non-hydrogen atom, the top 15 compounds were considered promising for drug-likeness property analysis. Apart from α- and β-carotene, all 13 compounds strictly followed the Lipinski rule of 5 parameters in terms of molecular mass (<500), number of hydrogen bond donors (<5), hydrogen bond acceptors (<10), number of rotatable bonds, and topological surface area (TPSA). Despite failing at the Lipinski rule of five parameters, α- and β-carotene were considered for further studies because of their wider therapeutic potential in neurodegenerative diseases, and these molecules have already been tested in the human body (Banerjee et al., 2023; Abrego-Guandique et al., 2023; Hira et al., 2019; Narisawa et al., 1996). Additionally, these compounds showed good lipophilicity with a log p-value > 0, which is imperative for compounds to cross the lipophilic membrane and, thus, strengthen the movement of these molecules to the target site. Therefore, all 15 hits were considered for absorption, distribution, metabolism, and excretion (ADME) and toxicity analysis to eliminate false positives from this list, which is provided in Table 1.
Table 1. Binding score and physiochemical properties of the top 15 compounds.
3.2 Absorption, distribution, metabolism, excretion, and toxicity profilingAfter the administration of the drug in the body, it goes through absorption, distribution, metabolism, and excretion processes. In this process, the drug interacts with desirable or undesirable targets and causes a pharmacological impact (Zhang and Tang, 2018). Thus, the bioavailability of a drug depends on the rate of absorption, metabolism, and transportation of the drug to the target site (Zhang and Tang, 2018; Stielow et al., 2023). Human intestinal absorption and lipophilicity of compounds are other important parameters to be considered to ensure the easy diffusion of compounds. The cytochrome p450-based drug response to body metabolism exhibits genetic variability and plays a critical role in the detoxification of drugs and homeostasis (Zhao et al., 2021). In this study, CYP2C9, CYP2D6, and CYP3A4 inhibitors were used to analyze the drug response at the metabolism stage. Renal OCT2 is a renal transporter that plays an important role in determining the renal clearance or deposition of drugs (Zou et al., 2021; Wright, 2019). Except few, most of the selected compounds showed good intestinal absorption capacity, along with body metabolism and renal excretion potential. Since MLD is a brain disorder, the major deciding parameters for the potential drug candidates were BBB permeability, which screened four compounds—IMPHY012226 (18alpha-glycyrrhetinic acid), IMPHY012473 (lupeol), IMPHY011609 (alpha-carotene), and IMPHY011707 (beta-carotene)—with considerable blood–brain barrier permeability. In order to select a potential lead molecule, in silico toxicity study becomes another crucial criterion due to its accuracy, accessibility, and rapidity in preclinical-level screening and providing a potential lead scaffold for further optimization (Raies and Bajic, 2016; Hemmerich and Ecker, 2020; Yang et al., 2018). pkCSM and ProTox 3.0 are freely available in silico toxicity study servers that use the SMILE of each compound to evaluate various toxicity parameters including Ames mutagenicity, hepatotoxicity, and cytotoxicity (Banerjee et al., 2018; Pires et al., 2015). These four phytoconstituents belonged to class “4” of the ProTox-predicted toxicity class based on their LD50 range between 560 and 2,000 mg/kg, suggesting that the selected four compounds are mainly nontoxic. Table 2 provides the details of ADME and toxicity analysis of the top 15 hits, and Figure 1 shows the structural details of the best 4 compounds screened through ADME and toxicity analysis.
Table 2. Pharmacological profile of the top 15 ligand molecules that were derived from pkCSM, SwissADME, and ProTox webservers.
Figure 1. Top four selected compounds. (A) α-carotene. (B) β-carotene. (C) 18alpha-glycyrrhetinic acid. (D) lupeol.
3.3 Pre-MD/docking-based protein-ligand interaction pose analysisProtein–ligand interaction analysis was imperative to obtain an insight into the binding pattern of compounds in the substrate-binding site of the CST protein. The substrate-binding site comprises a left-end polar site dominated by Lys85, Phe170, Lys82, and Ser88; a middle part flanked on both sides by His84 and His141; and an aromatic site at the right end dominated by Tyr203 and Phe170. With the lowest free binding energy of −10.32 kcal/mol and 98 similar conformations in the largest cluster, 18alpha-glycyrrhetinic acid was found to be the most potential ligand with the highest binding affinity among the 4 in the binding pocket of the CST protein. The compound consists of five steroidal rings with a carboxylic group attached at one end and a hydroxyl group at the other end. These ends determined the orientation of the compound toward the polar site based on the degree of polarity of the carboxylic group, which interacted with Lys85 by hydrogen bonding. One oxygen atom was attached by a double bond at the third ring in the middle of the compound positioned at the middle of the active site and was flanked on both sides by His84 and His141 and formed a hydrogen bond with His84. At the right side of the binding pocket, methyl groups located on the rightmost ring of the compound interacted with the residues Tyr203 and Phe177 via pi–sigma and pi–alkyl bonds, respectively (Figure 2I). Both α-carotene and β-carotene complexes shared similarity in the interaction pattern in the CST-binding pocket. α-Carotene and β-carotene are isomers, differing in the position of the double bond on the ring at one end oriented toward the polar site in the binding pocket. Lys82 and Lys85 played a critical role in the interaction with the isomeric fragment of these compounds. In α-carotene, the methyl group at the C2 position was oriented toward Lys85 and, thus, interacted with it, whereas in β-carotene, the methyl group at the C2 position was oppositely oriented and away from Lys85. In the CST–β-carotene complex, the two methyl groups at the C6 position were oriented toward Lys85 and interacted via the pi–alkyl interaction, whereas in the CST–α-carotene complex, the methyl groups at the C6 carbon were positioned away from Lys85 and interacted with Lys82. The aliphatic chain between the two aromatic rings interacted in a nearly similar fashion in both complexes with key residues Phe177, Tyr203, phe170, and His84, while Arg202 interacted with the aromatic ring on the right side of the binding pocket (Figures 2II, III). In contrast to the other three compounds that broadly occupied the active site core, in the CST–lupeol complex, the ligand predominantly occupied the right side of the binding pocket, which might be due to its small size and the missing of a polar carboxylic group at one end as it was in 18alpha-glycyrrhetinic acid. The lack of occupancy of the polar site by lupeol is due to its five-carbon nonpolar ring that interacted strongly at the aromatic site of the binding pocket (Figure 2IV), while the six-membered polar ring in 18alpha-glycyrrhetinic acid stretched the compound toward the polar site and, thus, was strongly accommodated in the active site (Figure 2I). The details of the interaction types are given in Table 3. To further understand the protein–ligand interaction under a dynamic environment, all four compounds were considered for molecular dynamic simulations.
Figure 2. Interaction of the top four docked complexes, including (A) surface view (upper panel), (B) 3D view (middle panel), and (C) 2D view (lower panel) of the protein–ligand interaction at the substrate-binding site of CST protein in (I) CST–18alpha-glycyrrhetinic acid, (II) CST–α-carotene, (III) CST–β-carotene, and (IV) CST–lupeol complexes. In the 2D view, conventional hydrogen bond, pi–sigma, pi–alkyl, and van der Waals interactions are depicted in green, violet, pink, and light green, respectively.
Table 3. Non-bonded interaction in docked protein–ligand conformation.
3.4 Molecular dynamic simulationMD simulation is a computational approach used to analyze and optimize the overall stability of the protein–ligand complexes under atomistic simulation conditions with a dynamic aqueous environment (Guterres and Im, 2020; Kurniawan and Ishida, 2022; Khan et al., 2016). MD simulation provides a cumulative idea about the movement of every atom or atom in the protein over the simulation time span to study important biological processes, including the impact of ligand binding on the overall protein dynamics and the way the macromolecule responds at the atomic level with the binding or unbinding of the ligand (Alrouji et al., 2023; Hollingsworth and Dror, 2018). In this study, to understand the in-depth dynamic behavior of the protein–ligand interaction, MD simulation was carried out for a time span of 100 ns to evaluate the strength and stability of the protein–ligand complexes through trajectory analysis with various parameters including RMSD, RMSF, Rg, SASA, and hydrogen bonding. We also carried out principal component analysis, free energy landscape analysis, and dynamic cross-correlation analysis to understand the dominant motions responsible for the binding pattern and stability of the protein–ligand complex.
3.4.1 Structural deviation and flexibility with RMSD and RMSF analysisThe RMSD measures the conformational deviation of the protein structure from the initial docked conformation to the final conformation under the dynamic aqueous environment over the simulation time span to ensure the stability of the predicted protein–ligand complex after ligand binding (Zare et al., 2024; Aier et al., 2016). In this study, in a simulation time of 100 ns, the CST–18alpha-glycyrrhetinic acid and CST–α-carotene complexes were found to be the most stable with negligible fluctuation, while the CST–β-carotene complex showed overall stability with little fluctuation at 40 ns The CST–lupeol complex showed initial rapid fluctuation, but after 25 ns, it also achieved stability. The average deviation found for CST, CST-GC, CST–18alpha-glycyrrhetinic acid, CST–α-carotene, CST–β-carotene, and CST-Lupeol was 0.49, 0.76, 0.57, 0.89, 0.92, and 0.67, respectively. Thus, RMSD simulation showed that the four complexes of CST could maintain overall structural stability during the simulation and no significant conformational changes occurred in the protein structure after ligand binding as the RMSD was maintained throughout the simulation either between the RMSD of free CST and the CST-GC complex or closer to the RMSD of the CST–GC complex, which is a good indication toward achieving competitive inhibition (Figures 3IA–D) (Singh and Singh, 2024a).
Figure 3. Structural dynamics of CST in complex with (A) α-carotene, (B) β-carotene, (C) 18alpha-glycyrrhetinic acid, and (D) lupeol. (I) Root mean square deviation (RMSD) plot over time, highlighting any deviations in the spatial structure of CST over a time scale of 100 ns from its original conformation; the plot used a protein backbone for considering deviation throughout simulation. (II) Root mean square fluctuation (RMSF) of residues in CST under different ligand-bound states during a time scale of 100 ns.
Following the protein conformational deviation studies with RMSD, protein structural fluctuation at the residue level was analyzed with RMSF calculation of the protein–ligand complex (Martínez, 2015; Paul et al., 2022). In this study, the fluctuations in the residues of the CST protein in the presence of the selected ligand were compared with the RMSF of the free CST and CST-GC complex. The average RMSF of free CST, CST-GC, CST–18alpha-glycyrrhetinic acid, CST–α-carotene, CST–β-carotene, and CST–lupeol was 0.15, 0.17, 0.18, 0.176, 0.18, and 0.23, respectively. As depicted in (Fig 3IIA–D), a wider residual fluctuation was clearly observed in free CST between the amino acid residues 175 to 190, which represents the loop region in the substrate-binding site, while in the CST–GC complex, residual fluctuation was minimized significantly, as interaction with the ligand may provide less room for fluctuation (Singh and Singh, 2024a). Among the four test complexes, CST–18alpha-glycyrrhetinic acid and CST–β-carotene showed relatively better stability between amino acid residues 180 and 190, while in the CST–lupeol complex, the fluctuation in this region was close to that of the free CST. Overall, the CST–18alpha-glycyrrhetinic acid and CST–β-carotene complexes showed residual fluctuation close to that of the CST–GC complex, which is again a good indication of the competitive inhibition of CST. The CST–lupeol complex was found to be least stable at the residual level.
3.4.2 Structural compactness with Rg, SASA, and hydrogen bonding analysisThe next level of screening was to evaluate changes in the size or structural compactness of the protein in the presence of ligands during the simulation by measuring Rg of the protein–ligand complex (Lobanov et al., 2008; Rampogu et al., 2022; Funari et al., 2022). Rg also estimates the flexibility of the protein under complex formation by comparing it with Rg of either the free protein or the substrate-bound protein. The larger Rg in the free CST indicated lesser compactness and greater flexibility, whereas the smaller Rg in the CST–GC complex indicated greater compactness and rigidity in the protein structure in the presence of the substrate. As depicted in Figure 4A, a similar trend was visible with inhibitor binding. Of the four, the Rg of the CST–β-carotene complex was most closely related to the Rg of the protein–substrate complex. In the CST–18alpha-glycyrrhetinic acid complex, structural compactness of the protein was between that of free CST and the CST-GC complex, suggesting that the presence of the ligand slightly compressed the protein structure. The Rg values of the CST–α-carotene and CST–lupeol complexes were found to be between the Rg of the free CST and the Rg of the CST-GC complex, also suggesting that the protein structure takes on stiffness and compactness in the presence of these ligands (Figure 4A).
Figure 4. Structural compactness analysis of CST in the CST–18alpha-glycyrrhetinic acid, CST–α-carotene, CST–β-carotene, and CST–lupeol complexes. Time evolution of (A) Rg values and (B) SASA values during the simulation.
The SASA measures the solvent-accessible surface area of the free protein or protein in the protein–ligand complex. It basically calculates the surface area of molecules that is exposed to solvent molecules (Alrouji et al., 2023; Durham et al., 2009; Ali et al., 2014; Savojardo et al., 2021). The SASA analyzes how parts of the protein come into contact with the solvent over the simulation time. The average SASA range for the selected compounds was between 170 and 180 nm2, which falls between the range of the SASA of free CST (180.1 nm2) and that of CST complexed with the substrate (173.07 nm2) (Singh and Singh, 2024a). The average SASA of the CST–18alpha-glycyrrhetinic acid, CST–α-carotene, CST–β-carotene, and CST–lupeol complex was 175.83 nm2, 180.89 nm2, 175.38 nm2, and 175.30 nm2, respectively (Figure 4B). Of the four hits, the SASA of CST–18alpha-glycyrrhetinic acid, CST–β-carotene, and CST–lupeol falls within the expected range of the free CST and CST–substrate complex, suggesting better compactness of the protein in the presence of these ligands with relatively lesser unwanted solvent accessibility. The SASA of the CST–α-carotene complex was close to the SASA of free CST, suggesting that ligand binding has a negligible impact on the solvent accessibility of the protein structure. Overall, the SASA of the protein–ligand complex of the selected compounds showed no major changes in the exposed protein structure after protein–ligand binding and maintained the natural structural integrity of the protein.
Furthermore, intramolecular hydrogen bonding analysis was vital for understanding the stability of the protein–ligand complex as ligand binding impacts the overall protein intramolecular dynamics (Pace et al., 2014; Hubbard and Haider, 2010). Compared to the free CST protein, substrate binding to the substrate binding site in CST slightly increased the intramolecular hydrogen bonding, suggesting that when binding to the active site, the substrate pushed the protein structure inward, facilitating the proximity of atoms and, thus, facilitated additional contacts at the intramolecular level. Among the protein–ligand complexes of the selected compounds, the CST–18alpha-glycyrrhetinic acid, CST–β-carotene, and CST–lupeol complexes showed a negligible impact on the intramolecular hydrogen bond dynamics of the CST protein, suggesting the stability of protein intramolecular dynamics with ligand binding (Figures 5A, C, D). Probability distribution function (PDF) plots show good consistency of these three complexes. The CST–α-carotene complex was found to be the least stable and most deviated complex, and the protein structure in this complex even showed negatively more flexibility than that of the free CST (Figure 5B).
Figure 5. Time evolution of intramolecular hydrogen bond analysis in CST in (A) CST–18alpha-glycyrrhetinic acid, (B) CST–α-carotene, (C) CST–β-carotene, and (D) CST–lupeol complexes. The lower panel represents the probability distribution function (PDF) plot of each complex.
Thus, at the structural compactness level, the CST–18alpha-glycyrrhetinic acid, CST–β-carotene, and CST–lupeol complexes were found to be relatively more stable than the CST–α-carotene complex. In the presence of these three compounds, the CST protein underwent the expected natural stiffness and compactness phenomenon, as observed in the case of protein–substrate binding.
3.4.3 Principal component analysisThe specific catalytic action of each protein is executed through coordinated and collective atomic motion, which determines the stability of the protein (Zare et al., 2024; Yuan et al., 2024). PCA is widely used to analyze the atomic-level conformational changes through ligand binding. Basically, the principal components are the dominant mode of motion of the system that determine the structural and dynamic properties of the protein–ligand complex (Zare et al., 2024; Alrouji et al., 2023). In this study, principal components for the four selected CST–ligand complexes were largely determined by the first two (PC1 and PC2) and the first three eigenvectors (PC1, PC2, and PC3), which reflected the overall dynamics of the molecular subspace of the CST protein bound with the selected ligands (Figures 6A, B). With eigenvector 1 spanning between −3.3 and 4.5, eigenvector 2 between −3.07 and 3.09, and eigenvector 3 between −3.23 and 4.26, the CST–18alpha-glycyrrhetinic acid complex showed better compactness among the four. In the CST–β-carotene complex, eigenvector 1 covers −6.18 to 3.87, eigenvector 2 was between −3.3 and 2.7, and eigenvector 3 was between −1.7 and 3.4. With eigenvector 1 valued between −4.6 and 5.1 nm, eigenvector 2 from −7.3 to 4.1, and eigenvector 3 from −2.0 to 2.7 nm, the CST–α-carotene complex was found to be the most dispersed among the four, followed by the CST–lupeol complex, of which eigenvector 1 ranged between −4.3 and 5.9, eigenvector 2 from −2.6 to 3.8 nm, and eigenvector 3 from −3.6 to 4.6. In the first two eigenvectors, CST–18alpha-glycyrrhetinic acid, CST–β-carotene, and CST–lupeol complexes fell within the expected range of free CST and CST complexed with its substrate (GC) and showed relatively less motion and higher stability, while the CST–α-carotene complex was found to be the most diverted and less stable complex (Figure 6A). Subsequently, comparing the results of the first three principal components, we found that all four complexes primarily fall within the expected range of the free CST and CST–substrate complex, with slight diversions observed in the CST–lupeol complex (Figure 6B). Thus, in both sets of eigenvectors, CST–18alpha-glycyrrhetinic acid and CST–β-carotene demonstrated relatively better stability because their principal components occupied a smaller subspace. Thus, these findings further strengthen the candidature of 18alpha-glycyrrhetinic acid and β-carotene as inhibitors for CST.
Figure 6. Principal component analysis of conformational projections of free CST and its complexes substrate (violet), 18alpha-Glycyrrhetinic acid (dark yellow), α-carotene (red), β-carotene (light green), and Lupeol (blue). (A) Projection of the first two eigenvectors 1 and 2 (B) projections of first three eigenvectors and plot between PC1 and PC3.
3.4.4 Free energy landscape analysisThe free energy landscape (FEL) analysis is primarily applied to understand the folding pattern of proteins and the impact of ligand binding on the structure of the protein (Plattner and Noé, 2015; Lau and Roux, 2007; Khan et al., 2019; Maisuradze et al., 2010). In this study, the first two eigenvectors were used from the principal component analysis to generate the FEL plot of each protein–ligand complex based on the dominant and stable conformation during a simulation of 100 ns. In contrast to the free CST, which showed relatively increased conformational space with global minima of 16.60 kJ/mol for attaining a stable structure, the conformational space of CST-GC complex slightly shifted with global minima of 17.10 KJ/mol (Singh and Singh, 2024a). In this study, in the spectrum of the free energy landscape, dark blue signified the most energetically favored region, and green depicted the moderately favored region, where the conformation of the protein could maintain its stability, while yellow represented the relatively unfavorable region and high energy state. According to the FEL plot, CST–18alpha-glycyrrhetinic acid, CST–β-carotene, and CST–lupeol complexes showed a wider blue zone with a global minima of 16.40 kJ/mol, 17.20 kJ/mol, and 15.10 kJ/mol, respectively (Figures 7A, C, D). The global minima of CST–18alpha-glycyrrhetinic acid and CST–β-carotene complexes were within the range of the global minima of the free CST (16.60 kJ/mol) and CST-GC complex (17.10 kJ/mol) (Singh and Singh, 2024a), thus showing potential for competitive inhibition in terms of energy requirement for binding. Among the four hits, the FEL of the CST–α-carotene complex with a global minima of 19.00 kJ/mol was found to have the least stable protein–ligand interaction (Figure 7B). Overall, the free energy landscape of the protein–ligand complexes provided a valuable insight into the interaction potential of the ligand and the stability of the protein in its bound form.
Figure 7. Free energy landscape of (A) CST–18alpha-glycyrrhetinic acid, (B) CST–α-carotene, (C) CST–β-carotene, and (D) CST–lupeol complexes.
3.4.5 Dynamic cross-correlation matrix analysisUsing the MD trajectory data, a DCCM analysis was performed to understand the dominant correlation network of amino acid residues in the CST protein in the presence of ligands (Du et al., 2010; Avti et al., 2022). The dynamic correlation mapping of protein residues in the presence of different ligands indicated the impact of the ligand on the overall protein structure conformational stability. For the DCCM study of the four complexes, the MD simulation structure in the last 10 ns was considered to obtain an insight into the protein–ligand complex. In the 2D matrix of DCCM analysis, the dark blue region represents the highly correlated motion of residues in the positive direction, the white region represents the highly anti-correlated motion of residues, and the no-color region indicates no correlation in the motion of the residue. Protein residues in the CST–18alpha-glycyrrhetinic acid complex were considered highly correlated because they occupied a wider area of dark blue (Figure 8A). The protein residues in the CST–β-carotene complex showed a relatively better residue correlation than that in the CST–α-carotene complex (Figures 8B, C). The residues in the CST–lupeol complex were found to have the least correlated residues as the dark blue region is relatively sparse among the four protein–ligand complexes, and the large area in this complex is uncolored, indicating no significant correlation and reduced relation with the target protein (Figure 8D). Overall, the correlated motion of the residue analyzed in this study showed the quality of the protein–ligand complex. The DCCM analysis of residues bound to 18alpha-glycyrrhetinic acid revealed the strongest correlation with the protein among all four complexes, thus indicating a strong protein–ligand interaction. In line with other MD trajectory analysis results, β-carotene also showed positive results in the DCCM analysis, showing a strong correlation with protein residues.
Figure 8. Dynamic cross-correlation map for the protein complexed with (A) 18alpha-glycyrrhetinic acid, (B) α-carotene, (C) β-carotene, and (D) lupeol, using the Bio3D package in RStudio.
3.5 Post-MD protein–ligand interaction pose analysisPost-simulation interaction analysis was an attempt to understand the binding pattern of the selected compounds under a dynamic aqueous environment, where the protein was free to take its most stable state. Of the four hits, 18alpha-glycyrrhetinic acid successfully retained its interaction with key residues, as well as its orientation and positioning in the binding pocket of the protein. In the CST–18alpha-glycyrrhetinic acid complex, as in the docked conformation, C=O of the carboxylic group formed a hydrogen bond with Lys85 in the polar site. In the middle of the binding pocket, His84 and His141 flank the compound on both sides, and hydrogen bond formation took place with His84, while Tyr203 at the aromatic side interacted with the compound through two pi–alkyl bonds (Figure 9A). Additionally, Ser173, as in the docked conformation, retained van der Waals interaction with the C=O group of the third aromatic ring of the compound. Thus, the CST–18alpha-glycyrrhetinic acid complex showed a strong protein–ligand interaction. During simulation, β-carotene also largely maintained a similar orientation in the binding pocket, with a slight change in the interaction pattern. Lys85 and Tyr176 in the left polar region, His84 and Phe177 in the middle, and Tyr203, Phe170, and Arg202 at the right end of the binding pocket determined the positioning of β-carotene in the binding pocket (Figure 9C). In contrast to the similarity in the docked conformation of CST–α-carotene and CST–β-carotene complexes, α-carotene was found to be the least stable in the binding pocket during the simulation. The isomeric aromatic ring of α-carotene shifted outward from the polar region away from Lys85, which was a key interacting residue in the docked conformation (Figure 9B). The simulation-led structural shift is shown in pre- and post-MD aligned complexes (Figure 10B). As in the docked conformation, the CST–lupeol complex maintained its position in the binding pocket, occupying mostly the right side of the binding pocket. Due to its relatively small size, lupeol left major room for ligand flexibility, which might not be suitable for competitive inhibition (Figure 9D). The distance of most of the pi-interactions was relatively short in both the CST–18alpha-glycyrrhetinic acid and CST–β-carotene-simulated complexes than their respective docked complexes, suggesting the strongest binding potential of 18alpha-glycyrrhetinic acid and the second strongest binding potential of β-carotene in the protein-binding pocket in the dynamic environment (Tables 3 and 4). Therefore, based on the interaction pattern analysis and alignment of pre- and post-MD complexes, 18alpha-glycyrrhetinic acid and β-carotene were found to be suitable candidates for further reverse pharmacophore analysis.
Figure 9. Post-MD interaction analysis of protein–ligand complexes with a surface view; 3D and 2D protein–ligand interaction of (A) CST–18alpha-glycyrrhetinic acid, (B) CST–α-carotene, (C) CST–β-carotene, and (D) CST–lupeol. Dark green, violet, light pink, and light green represent conventional hydrogen bonds, pi–sigma bonds, pi–alkyl bonds, and van der Waals interactions, respectively.
Comments (0)