@Billy-The-Crescent
2019-10-18T07:35:08.000000Z
字数 15630
阅读 1769
Protein-Ligand complex simulation GROMACS
Software: GROMACS
Platform: Linux
Initial Protein: HisG_MD.pdb
Initial Ligand: ADN.pdb
GROMACS location:
/usr/local/gromacs/bin/
Set environment variable:
export PATH=$PATH:/usr/local/gromacs/bin/#test if environment worksgmx -version
As the long time consumed in this simulation, we only conduct the simulation for HisG. Since we have conducted Molecular Dynamics Simulation of the enzyme cns1, cns2, cns3 and its HisG domain, we'd like to use the result structure as the input protein file. For cns3-HisG domain we call it HisG_MD.pdb.
The first thing to do is to find and download the ligand structure, here, we search "Adenosine" in PDBeChem and dowload the ".pdb" file, we call it ADN.pdb.
According to the manual, the ligand is not a recognized entity in any of the force fields provided with GROMACS, so we can not use pdb2gmx to generate the topology of the ligand.
The force field we will be using in this tutorial is CHARMM36, obtained from the MacKerell lab website. While there, download the latest CHARMM36 force field tarball and the "cgenff_charmm2gmx.py" conversion script, which we will use later. Download the version of the conversion script that corresponds to your installed Python version (Python 2.x or 3.x).
Unarchive the force field tarball in your working directory:
tar -zxvf charmm36-mar2019.ff.tgz
There should now be a "charmm36-mar2019.ff" subdirectory in your working directory.
Then, Write the topology for HisG with pdb2gmx:
gmx pdb2gmx -f HisG_MD.pdb -ignh -o HisG_MD_processed.gro
In this operation, we ignore the water using -ignh or we will get a fatal error due to the inconsistence of the name of the water. Choose the default water model when prompted (TIP3P). The structure will be processed by pdb2gmx, and you will be prompted to choose a force field:

Choose the first force filed which is just installed by us manually, CHARMM36 all-atom force field (March 2019).
Let's move to deal with the ligand.
Here, we use CGenFF, an official CHARMM General Force Field server to generate the topology of the ligand.
Since the strict requirement of this online server on the file format of the ligand. We need to:
- Add Hydrogen Atoms to the ligand
- Convert the pdb file to mol2 file
- Several corrections to the generated mol2 file
We use avogadro to add the hydrogen atoms and convert it to mol2 format.
Open Adenosine.pdb in Avogadro, and from the "Build" menu, choose "Add Hydrogens." Avogadro will build all of the H atoms onto the Adenosine ligand. Save a .mol2 file (File -> Save As... and choose Sybyl Mol2 from the drop-down menu) named "ADN.mol2."
In order to correct the mol2 file, we need to look through it.
- Change the name of the molecule to Adenosine
- Change the sixth column to 1
- Change the seventh column to Adenosine
- Change the name of the atoms uniquely

Last, notice the strange bond order in the @BOND section. All programs seem to have their own method for generating this list, but not all are created equal. There will be issues in constructing a correct topology with matching coordinates if the bonds are not listed in ascending order. To fix this problem, download the sort_mol2_bonds.pl script and execute it:
Run the script to fix the bonds.
perl sort_mol2_bonds.pl ADN.mol2 ADN_fix.mol2
Use "ADN_fix.mol2" in the next step.
Now, we'd like to generate the topology of the ligand using CGenFF server.
The ADN_fix.mol2 file is now ready for use to produce a topology. Visit the CGenFF server, log into your account, and and click "Upload molecule" at the top of the page. Upload Adenosine_fix.mol2 and the CGenFF server will quickly return a topology in the form of a CHARMM "stream" file (extension .str). Save its contents from your web browser into a file called "ADN.str."
Check the penalty of the .str file carefully! If the penalty is high, please double check the original ligand file whether it is a valid structure.
Alter the resi name of the .str file of to ADN, or it will create error in the next step.
Then we use the python script to generate the topology:
python cgenff_charmm2gmx.py ADN ADN_fix.mol2 ADN.str charmm36-mar2019.ff
If we do not alter the name of the RESI (the figure below),

we will get the error:

It means that the mol2 file and the str file is not consistent.
When we successfully run the transcript, we will get the result below:

We get adn.itp, adn.prm, adn.top, adn_ini.pdb.
We convert the adn_ini.pdb to adn.gro using editconf:
gmx editconf -f adn_ini.pdb -o adn.gro
Copy HisG_MD_processed.gro to a new file to be manipulated, for instance, call it "complex.gro," as the addition of ADN to the protein will generate our protein-ligand complex. Next, simply copy the coordinate section of adn.gro and paste it into complex.gro, below the last line of the protein atoms, and before the box vectors, like so:

Since we add 32 atoms into the .gro file, increment the second line of complex.gro to reflect this change. There should be 12439 atoms in the coordinate file now.
Insert a line that says #include "adn.itp" into topol.top after the position restraint file is included. The inclusion of position restraints indicates the end of the "Protein" moleculetype section.
; Include Position restraint file#ifdef POSRES#include "posre.itp"#endif; Include water topology#include "./charmm36-mar2019.ff/tip3p.itp"
becomes
; Include Position restraint file#ifdef POSRES#include "posre.itp"#endif; Include ligand topology#include "adn.itp"; Include water topology#include "./charmm36-mar2019.ff/tip3p.itp"
The ligand introduces new dihedral parameters, which were written to "adn.prm" by the cgenff_charmm2gmx_py3.py script. At the TOP of topol.top, insert an #include statement to add these parameters:
; Include forcefield parameters#include "./charmm36-mar2019.ff/forcefield.itp"; Include ligand parameters#include "adn.prm"[ moleculetype ]; Name nrexclProtein_chain_A 3
The last adjustment to be made is in the [ molecules ] directive. To account for the fact that there is a new molecule in complex.gro, we have to add it here, like so:
[ molecules ]; Compound #molsProtein_chain_A 1ADN 1
Remember to substitude the original topol.top with the updated one.
At this point, the workflow is just like any other MD simulation. We will define the unit cell and fill it with water.
gmx editconf -f complex.gro -o newbox.gro -bt dodecahedron -d 1.0gmx solvate -cp newbox.gro -cs spc216.gro -p topol.top -o solv.gro
Use em.mdp to build the ion.tpr and conduct energy minimization:
em.mdp:
; LINES STARTING WITH ';' ARE COMMENTStitle = Minimization ; Title of run; Parameters describing what to do, when to stop and what to saveintegrator = steep ; Algorithm (steep = steepest descent minimization)emtol = 1000.0 ; Stop minimization when the maximum force < 10.0 kJ/molemstep = 0.01 ; Energy step sizensteps = 50000 ; Maximum number of (minimization) steps to perform; Parameters describing how to find the neighbors of each atom and how to calculate the interactionsnstlist = 1 ; Frequency to update the neighbor list and long range forcescutoff-scheme = Verletns_type = grid ; Method to determine neighbor list (simple, grid)rlist = 1.2 ; Cut-off for making neighbor list (short range forces)coulombtype = PME ; Treatment of long range electrostatic interactionsrcoulomb = 1.2 ; long range electrostatic cut-offvdwtype = cutoffvdw-modifier = force-switchrvdw-switch = 1.0rvdw = 1.2 ; long range Van der Waals cut-offpbc = xyz ; Periodic Boundary ConditionsDispCorr = no
Type the code:
gmx grompp -f em.mdp -c solv.gro -p topol.top -o ion.tpr
We now pass our .tpr file to genion:
gmx genion -s ion.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -neutral
In order to run energy minimization by mdrun, we have to use the grompp operation again.
gmx grompp -f em.mdp -c solv_ions.gro -p topol.top -o HisGcomplex_em.tpr
Then, we run energy minimization:
gmx mdrun -v -deffnm HisGcomplex_em
Before conducting equilibration, we need to apply restraints to the ligand and do the treatment of temperature coupling groups.
gmx make_ndx -f adn.gro -o index_adn.ndx> 0 & ! a H*> q
Then, execute the genrestr module and select this newly created index group (which will be group 3 in the index_jz4.ndx file):
gmx genrestr -f adn.gro -n index_adn.ndx -o posre_adn.itp -fc 1000 1000 1000
Now, we need to include this information in our topology. We can do this in several ways, depending upon the conditions we wish to use. If we simply want to restrain the ligand whenever the protein is also restrained, add the following lines to your topology in the location indicated:
; Include Position restraint file#ifdef POSRES#include "posre.itp"#endif; Include ligand topology#include "adn.itp"; Ligand position restraints#ifdef POSRES#include "posre_adn.itp"#endif; Include water topology#include "./charmm36-mar2019.ff/tip3p.itp"
gmx make_ndx -f HisGcomplex-em.gro -o index.ndx
Merge the "Protein" and "ADN" groups with the following, where ">" indicates the make_ndx prompt:
> 1 | 13> q
Proceed with NVT equilibration using this .mdp file.
title = Protein-ligand complex NVT equilibrationdefine = -DPOSRES ; position restrain the protein and ligand; Run parametersintegrator = md ; leap-frog integratornsteps = 50000 ; 2 * 50000 = 100 psdt = 0.002 ; 2 fs; Output controlnstenergy = 500 ; save energies every 1.0 psnstlog = 500 ; update log file every 1.0 psnstxout-compressed = 500 ; save coordinates every 1.0 ps; Bond parameterscontinuation = no ; first dynamics runconstraint_algorithm = lincs ; holonomic constraintsconstraints = h-bonds ; bonds to H are constrainedlincs_iter = 1 ; accuracy of LINCSlincs_order = 4 ; also related to accuracy; Neighbor searching and vdWcutoff-scheme = Verletns_type = grid ; search neighboring grid cellsnstlist = 20 ; largely irrelevant with Verletrlist = 1.2vdwtype = cutoffvdw-modifier = force-switchrvdw-switch = 1.0rvdw = 1.2 ; short-range van der Waals cutoff (in nm); Electrostaticscoulombtype = PME ; Particle Mesh Ewald for long-range electrostaticsrcoulomb = 1.2 ; short-range electrostatic cutoff (in nm)pme_order = 4 ; cubic interpolationfourierspacing = 0.16 ; grid spacing for FFT; Temperature couplingtcoupl = V-rescale ; modified Berendsen thermostattc-grps = Protein_JZ4 Water_and_ions ; two coupling groups - more accuratetau_t = 0.1 0.1 ; time constant, in psref_t = 300 300 ; reference temperature, one for each group, in K; Pressure couplingpcoupl = no ; no pressure coupling in NVT; Periodic boundary conditionspbc = xyz ; 3-D PBC; Dispersion correction is not used for proteins with the C36 additive FFDispCorr = no; Velocity generationgen_vel = yes ; assign velocities from Maxwell distributiongen_temp = 300 ; temperature for Maxwell distributiongen_seed = -1 ; generate a random seed
Remember to change the nvt.mdp file according to the protein name in the line tc-grps = Protein_JZ4 Water_and_ions to tc-grps = Protein_ADN Water_and_ions
gmx grompp -f nvt.mdp -c HisGcomplex-em.gro -r HisGcomplex-em.gro -p topol.top -n index.ndx -o HisGcomplex-nvt.tprnohup gmx mdrun -deffnm HisGcomplex-nvt &#tail -n 25 HisGcomplex-nvt.log
NPT equilibrium
title = Protein-ligand complex NPT equilibrationdefine = -DPOSRES ; position restrain the protein and ligand; Run parametersintegrator = md ; leap-frog integratornsteps = 50000 ; 2 * 50000 = 100 psdt = 0.002 ; 2 fs; Output controlnstenergy = 500 ; save energies every 1.0 psnstlog = 500 ; update log file every 1.0 psnstxout-compressed = 500 ; save coordinates every 1.0 ps; Bond parameterscontinuation = yes ; continuing from NVTconstraint_algorithm = lincs ; holonomic constraintsconstraints = h-bonds ; bonds to H are constrainedlincs_iter = 1 ; accuracy of LINCSlincs_order = 4 ; also related to accuracy; Neighbor searching and vdWcutoff-scheme = Verletns_type = grid ; search neighboring grid cellsnstlist = 20 ; largely irrelevant with Verletrlist = 1.2vdwtype = cutoffvdw-modifier = force-switchrvdw-switch = 1.0rvdw = 1.2 ; short-range van der Waals cutoff (in nm); Electrostaticscoulombtype = PME ; Particle Mesh Ewald for long-range electrostaticsrcoulomb = 1.2pme_order = 4 ; cubic interpolationfourierspacing = 0.16 ; grid spacing for FFT; Temperature couplingtcoupl = V-rescale ; modified Berendsen thermostattc-grps = Protein_JZ4 Water_and_ions ; two coupling groups - more accuratetau_t = 0.1 0.1 ; time constant, in psref_t = 300 300 ; reference temperature, one for each group, in K; Pressure couplingpcoupl = Berendsen ; pressure coupling is on for NPTpcoupltype = isotropic ; uniform scaling of box vectorstau_p = 2.0 ; time constant, in psref_p = 1.0 ; reference pressure, in barcompressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1refcoord_scaling = com; Periodic boundary conditionspbc = xyz ; 3-D PBC; Dispersion correction is not used for proteins with the C36 additive FFDispCorr = no; Velocity generationgen_vel = no ; velocity generation off after NVT
Likewise, remember to change the nvt.mdp file according to the protein name in the line tc-grps = Protein_JZ4 Water_and_ions
gmx grompp -f npt.mdp -c HisGcomplex-nvt.gro -t HisGcomplex-nvt.cpt -r HisGcomplex-nvt.gro -p topol.top -n index.ndx -o HisGcomplex-npt.tprnohup gmx mdrun -deffnm HisGcomplex-npt &#tail -n 25 HisGcomplex-npt.log
md.mdp
title = Protein-ligand complex MD simulation; Run parametersintegrator = md ; leap-frog integratornsteps = 5000000 ; 2 * 5000000 = 10000 ps (10 ns)dt = 0.002 ; 2 fs; Output controlnstenergy = 10000 ; save energies every 20.0 psnstlog = 10000 ; update log file every 20.0 psnstxout-compressed = 10000 ; save coordinates every 20.0 ps; Bond parameterscontinuation = yes ; continuing from NPTconstraint_algorithm = lincs ; holonomic constraintsconstraints = h-bonds ; bonds to H are constrainedlincs_iter = 1 ; accuracy of LINCSlincs_order = 4 ; also related to accuracy; Neighbor searching and vdWcutoff-scheme = Verletns_type = grid ; search neighboring grid cellsnstlist = 20 ; largely irrelevant with Verletrlist = 1.2vdwtype = cutoffvdw-modifier = force-switchrvdw-switch = 1.0rvdw = 1.2 ; short-range van der Waals cutoff (in nm); Electrostaticscoulombtype = PME ; Particle Mesh Ewald for long-range electrostaticsrcoulomb = 1.2pme_order = 4 ; cubic interpolationfourierspacing = 0.16 ; grid spacing for FFT; Temperature couplingtcoupl = V-rescale ; modified Berendsen thermostattc-grps = Protein_JZ4 Water_and_ions ; two coupling groups - more accuratetau_t = 0.1 0.1 ; time constant, in psref_t = 300 300 ; reference temperature, one for each group, in K; Pressure couplingpcoupl = Parrinello-Rahman ; pressure coupling is on for NPTpcoupltype = isotropic ; uniform scaling of box vectorstau_p = 2.0 ; time constant, in psref_p = 1.0 ; reference pressure, in barcompressibility = 4.5e-5 ; isothermal compressibility of water, bar^-1; Periodic boundary conditionspbc = xyz ; 3-D PBC; Dispersion correction is not used for proteins with the C36 additive FFDispCorr = no; Velocity generationgen_vel = no ; continuing from NPT equilibration
Likewise, remember to change the nvt.mdp file according to the protein name in the line tc-grps = Protein_JZ4 Water_and_ions
gmx grompp -f md.mdp -c HisGcomplex-npt.gro -t HisGcomplex-npt.cpt -p topol.top -n index.ndx -o HisGcomplex-md_10ns.tprnohup gmx mdrun -deffnm HisGcomplex-md_10ns &#tail -n 25 HisGcomplex-md_10ns.log
To recenter the protein and rewrap the molecules within the unit cell to recover the desired rhombic dodecahedral shape, invoke trjconv:
gmx trjconv -s HisGcomplex-md_10ns.tpr -f HisGcomplex-md_10ns.xtc -o HisGcomplex-md_10ns_center.xtc -center -pbc mol -ur compact
Choose "Protein" for centering and "System" for output. Note that centering complexes (protein-ligand, protein-protein) may be difficult for longer simulations involving many jumps across periodic boundaries. In those instances (particularly in protein-protein complexes), it may be necessary to create a custom index group to use for centering, corresponding to the active site of one protein or the interfacial residues of one monomer in a complex.
To extract the first frame (t = 0 ns) of the trajectory, use trjconv -dump with the recentered trajectory:
gmx trjconv -s HisGcomplex-md_10ns.tpr -f HisGcomplex-md_10ns_center.xtc -o HisGcomplex_start.pdb -dump 0
For even smoother visualization, it may be beneficial to perform rotational and translational fitting. Execute trjconv as follows:
gmx trjconv -s HisGcomplex-md_10ns.tpr -f HisGcomplex-md_10ns_center.xtc -o HisGcomplex-md_10ns_fit.xtc -fit rot+trans
Choose "Backbone" to perform least-squares fitting to the protein backbone, and "System" for output. Note that simultaneous PBC rewrapping and fitting of the coordinates is mathematically incompatible. Should you wish to perform fitting (which is useful for visualization, but not necessary for most analysis routines), carry out these coordinate manipulations separately, as indicated here.
Transfer the xtc trace file into pdb file for visualization.
gmx trjconv -s HisGcomplex-md_10ns.gro -f HisGcomplex-md_10ns_fit.xtc -n index.ndx -o HisGcomplex_ADN-md_10ns_movie_fit.pdb
Using methods described in Molecular Dynamics Using GROMACS, users can generate trace video.
Enter 22 Protein-ADN to select both the structure of protein and ligand.
Change the gro structure to pdb structure.
gmx editconf -f HisGcomplex-md_10ns.gro -o HisGcomplex-md_10ns.pdb
Additional 1 nanosecond simulation, changing simulation time to 1ns in md1.mdp. And run:
gmx grompp -f md1.mdp -c HisGcomplex-md_10ns.gro -t HisGcomplex-md_10ns.cpt -p topol.top -n index.ndx -o HisGcomplex-md_Addi_1ns.tprnohup gmx mdrun -deffnm HisGcomplex-md_Addi_1ns &#tail -n 25 HisGcomplex-md_Addi_1ns.log
Using visualization methods above to see the simulation process.