A second life for the crystallographic structure of Berenil-dodecanucleotide complex: a computational revisitation thirty years after its publication

Hardware overview

Molecular modeling operations for the structure preparation and the setup for molecular dynamics (MD) were carried out on a Linux workstation (Linux 20.04) comprising a 25-core Intel Core i9-9820 × 3.3 GHz processor. Molecular dynamics simulations (TTMD and SuMD) were computed on a GPU cluster composed of 30 NVIDIA devices ranging from GTX980 to RTX4090.

Structure preparation

The experimentally determined three-dimensional coordinates of the Berenil-Dodecanucleotide complex were retrieved from the Protein Data Bank [2] (PDB ID: 2DBE [1]). The structure was prepared by exploiting several tools from the Molecular Operating Environment (MOE) 2022.02 suite [18]. In particular, the “Structure Preparation” tool was adopted to check and fix the inconsistencies deriving from the original structure. Then, each water molecule was removed, and the “Protonate3D” tool was used to add missing hydrogens according to the prevailing tautomeric and protomeric state (pH = 7.4, T = 310 K, i.f. = 0.154). Finally, hydrogen atoms were energy minimized, and partial charges were assigned according to the AMBER14:ETH force field. The prepared structure was kept for molecular dynamics and docking calculations. The Berenil ligand underwent a comprehensive preparation process for computational analysis, utilizing various tools from the OpenEye suite [19]. Three-dimensional coordinates of the molecule were generated with Omega [20], and the protonation states of ionizable groups were then determined with Fixpka [21]. Lastly, partial charges were assigned to the atoms within the ligand database thanks to Molcharge [22].

System setup and equilibration protocol

The systems obtained were processed for MD simulations, exploiting several tools from Visual Molecular Dynamics (VMD) 1.9.2 [23], and Ambertools22 [24]. The parameters for each nucleic atom were assigned according to the DNA OL15 force field [25]. The ligand was instead parametrized according to the General Amber force field (GAFF) [26]. Each system was solvated in a rectangular box using the TIP3P water model [27], keeping a padding of 15 Å. The net charge of the system was then neutralized through the addition of Na+ and Cl− ions, reaching a salt concentration of 0.154 M. Afterwards, 500 steps of energy minimization through the conjugate gradient algorithm were performed to reduce clashes and bad contacts.

Before entering the production phase, a two-step equilibration protocol was carried out.

The first equilibration stage of 0.1 ns was performed in the canonical ensemble (NVT), applying a 5 kcal mol−1 Å−2 harmonic positional restraint to each DNA and ligand atom. The second equilibration consists of a 0.5 ns simulation in the isothermal-isobaric ensemble (NPT), where the same force constant is used to constrain only the ligand and the backbone of the DNA, and the pressure is fixed at 1 atm through a Monte Carlo barostat [28]. The temperature for all the equilibration phases was kept constant through a Langevin thermostat [29], keeping the value at 310 K for SuMD and 350 K for TTMD.

Each MD simulation was conducted by exploiting the ACEMD 3.5 engine [30] based on the open-source Python library OpenMM [31]. An integration timestep of 2 fs was used, and the M-SHAKE algorithm was applied to constrain the length of the bonds involving hydrogen atoms. The particle-mesh Ewald method [32] was exploited to compute electrostatic interactions using cubic spline interpolation. A 9.0 Å cutoff was applied for the calculation of Lennard-Jones interactions.

SuMD

Supervised Molecular Dynamics (SuMD) [10] is an enhanced sampling MD method used to explore ligand-target recognition pathways on the nanosecond timescale. This unbiased technique allows for the sampling of infrequent events, such as the binding one, reducing the simulation time and related costs of classical MD. SuMD is based on a series of short MD simulations followed by an evaluation of the simulation progress by a tabu-like algorithm. Every subsequent short simulation defined as a SuMD-step is computed in the canonical ensemble at 310 K for 500 ps.

The distance between the ligand and the binding site is monitored during each step, specifically the distance between the center of mass of the ligand and the center of mass of the user-defined binding site residues is calculated and fitted into a linear function. If the slope of the fitted line is negative, the ligand is approaching the binding site, and the step is considered productive and retained to generate the final trajectory. In the opposite scenario, whenever the slope is positive, indicating the distancing between the ligand and the binding site, the step is discarded and repeated, exploiting the Langevin thermostat to randomly reassign particle velocities, starting from the final coordinates of the previous SuMD-step. The supervision algorithm operates until the distance between the centers of mass reaches a threshold value (9 Å, in this case). As the distance decreases, once it crosses the threshold value, the algorithm is switched off, and the simulation proceeds further with 10 ns of classical MD.

TTMD

Thermal Titration Molecular Dynamics (TTMD) [11, 12] is an enhanced sampling MD Python tool useful for the evaluation of complex unbinding kinetics. This technique, initially developed to compare protein-ligand complexes considering the persistence of the native intermolecular interactions, is now applicable to nucleic acid-small molecule complexes.

TTMD is based on a series of short classical MD simulations performed at progressively increasing temperatures, defined as TTMD-steps. The user defines the length of each step and the total explorable temperature ramp. Throughout the simulation, the program checks for changes in the native binding mode of the ligand through the evaluation of the interaction fingerprints of each frame. If the original binding mode is retained, the simulation proceeds until the end temperature is reached; on the contrary, if the binding mode is lost during a TTMD-step, the simulation is stopped.

In this work, the starting temperature is 250 K, while the end temperature is 400 K. The temperature increase between every TTMD-step is 10 K, and the length of each simulation window is 10 ns. ProLIF [33] interaction fingerprint Python package has been exploited to compute the interaction fingerprint of the ligand for each frame, allowing the evaluation of the binding mode retention operated by a fingerprint-based scoring function (IFPCS) [34].

Trajectory analysisSuMD

SuMD trajectories were carried out exploiting the ‘NAMD Energy’ plugin for VMD [35]. The interaction energy, defined as the sum of van der Waals and electrostatic contributions, was calculated through this package according to the force field.

TTMD

During a TTMD simulation, every frame generated is computed with the ProLIF Python package, and the interaction fingerprint of each frame is compared to the last frame of the second equilibration. The cosine similarity metric is applied to evaluate the persistence of the original binding mode throughout the simulation. The value obtained is multiplied by −1, following the scale of most scoring functions. The interaction fingerprint cosine similarity (IFPCS) is a value that ranges from −1 to 0, where −1 highlights the persistence of the original binding mode. On the other hand, 0 indicates that the native binding mode is lost. The IFPCS is computed for the last 10% of each TTMD-step, if the value is above −0.05 the simulation is stopped as all the native interactions are lost.

The average IFPCS score for each TTMD-step is plotted against the temperature in the “Titration Profile” plot. In this graph, the slope of the straight line that connects the first and last IFPCS point, defined as the “MS coefficient” is a proxy measure for the stability of the complex.

This coefficient can range from 0 to 1. Specifically, 0 is indicative of a lasting conservation of the binding mode, and 1 indicates a labile binding mode.

The mathematical formulation of the MS score is reported in the equation below.

$$=\frac}_}^}-(-1)}^}-^}}$$

For both the minor and major groove complexes, five independent TTMD simulations were carried out. Following the calculated MS score, the two simulations with the highest and lowest values were discarded, and the mean MS value was calculated among the three remaining trajectories, leading to the selection of the one with the closest value to the mean.

A second graph, defined as the “Titration timeline” is generated. This plot displays the time-dependent evolution of the IFPCS scores throughout the simulation and both the ligand and the DNA RMSD computed by exploiting the MDAnalysis Python package [36, 37].

The “Titration Timeline” plot reports the time-dependent evolution of the ligand, binding site, and receptor’s backbone RMSD and the IFPCS score.

AquaMMapS

AquaMMapS [13] is a Python tool developed to map hydration sites according to the occupancy rate during MD simulations. This program performs a grid-based inspection of the frequency of occupation of the water molecules by creating cells designed to accommodate only one water molecule. For every cell, the occupancy value is computed as the ratio between the number of frames in which a water molecule is present and the total number of frames. A second occupancy value is calculated as the ratio between the number of frames in which the cell is occupied by a stationary water molecule (RMSF below 1.4), and the total frames.

View original article

MEDICINAL CHEMISTRY RESEARCH

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

A second life for the crystallographic structure of Berenil-dodecanucleotide complex: a computational revisitation thirty years after its publication

Comments (0)