Escolar Documentos
Profissional Documentos
Cultura Documentos
Synopsis: In this tutorial you will perform Molecular Dynamics simulations of the human androgen receptor (PDB entry code 2AM9), solvated in a box of explicit water molecules. Note that today we will work with the receptor without any ligand bound. You will learn how to set up the system and the required files, as well as run the simulation. Finally, a series of analysis tools will be discussed and applied to the trajectories. We will use the simulation package GROMACS.
GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers Free download and manual: http://www.gromacs.org
Notes: - This document gives the command lines (in bold and framed) for the different steps required to setup and run a molecular dynamics simulation, while describing briefly the procedure. - Some questions are asked along the document (in bold and italics) to guide the understanding of what you are doing. - Structure (*.pdb) and trajectory (*.trr) files will be generated during the practical. Use Chimera to visualize them. - For any of the utility modules in gromacs type name_module h for a description of options. - In order to be able to run the simulations during the session, the simulation times are very short and not reliable for real studies. At the end of the session you will be provided with a more extended simulation to be used in the Analysis Part of the tutorial.
Now we have a structure in the correct format for the chosen force field, and the corresponding topology file. Visualize the structure file generated with Chimera to detect possible errors. Read the topology file and identify the data it contains. How many atoms does the protein have?
1.2 Solvate with explicit waters. Now, the environment conditions will be defined and the system solvated. First of all, the simulation space has to be specified. We will use an orthogonal box. The dimensions of the box will be so that a minimal distance of 1.0 nm (10 ) exists between the edge of the box and the protein.
The molecule will be oriented along its principal axis (-princ) Now, the solvent water molecules are added (by default the $GMXLIB/spc216.gro water coordinates are used). There are several solvent models; the GROMOS96 force fields are generally used with the simple point charge (SPC) water model. The topology is not required for solvent addition, but may be updated to include the new water molecules.
genbox -cp definebox.pdb -cs -p system.top -o system_box.pdb
How many atoms does the system have now? How many solvent water molecules? Has the topology file changed? Where are the solvent molecules specified? What is the total charge of the system? Visualize the solvated system in Chimera.
1.3 Energy Minimization. Some energy minimization steps are run for this structure in order to relax possible clashes existent in the crystal structure or generated when adding the hydrogen atoms and the solvent molecules. To run a simulation with Gromacs (energy minimization or molecular dynamics) two steps are necessary: a) First, structure and topology are combined into a single description of the system, together with a number of control parameters (minimization.mdp, file containing the options of the simulation we want to run). This is done with grompp.
grompp -v -f minimization.mdp -c system_box.pdb -p system.top -o minimization.tpr
Have a look at the contents of the minimization.mdp file. Note the integrator chosen. These would not be the most adequate control parameters if we wanted to fully minimize the system. But as we just pretend to relax strains, it is enough. Notice the smooth convergence criteria used. Also, for the non-bonding interactions, cut-offs of ~14 (1.4 nm) are recommended for the force field used here. However, larger cut-offs mean longer computational times and, for the present practical, they are set to 0.9 nm. b) The input file generated in the last step can now be used, alone, as input file for the run.
Which method was used for energy minimization? How many steps were specified and how many steps did it take? What was allowed to freely move during the minimization? Visualize the minimized structure in Chimera and compare it to the previous structure.
; MINIMIZATION.MDP ; Example of energy minimization options in GROMACS ; Everything following ';' is a comment title = Energy Minimization with PME ; Title of run
; The following line tell the program the standard locations where to find certain files cpp = /usr/bin/cpp ; Preprocessor ("which cpp" to find it in your machine) ; Define can be used to control processes define = -DFLEXIBLE constraints = none ; Bond types to replace by constraints ; Parameters describing what to do, when to stop and what to save integrator = steep ; Algorithm (steep = steepest descent minimization) emtol = 1000.0 ; Stop mini when the maximum force < 1000.0 kJ/mol!! emstep = 0.01 ; Initial step size (in nm). nsteps = 500 ; Maximum number of (minimization) steps to perform nstenergy = 1 ; Write energies to disk every nstenergy steps energygrps = System ; Which energy group(s) to write to disk ; Parameters describing how calculate the interactions ns_type = grid rlist = 0.9 coulombtype = PME rcoulomb = 0.9 rvdw = 0.9 fourierspacing = 0.12 optimize_fft = yes to find the neighbors of each atom and how to ; ; ; ; ; Method to determine neighbor list (simple, grid) Cut-off distance for short-range neighbor list Treatment of long range electrostatic interactions long range electr. cut-off, desirable value 1.4 nm long range Van der Waals cut-off, desirable 1.4 nm ; Parameters related to PME ; Parameters realted to PME
1.4 Neutralizing the system. In order to neutralize the system, some counterions will be added to the box (Cl- if the system has a positive charge and Na+ if the system has a negative total charge). One could also want to add ions up to a certain concentration. We will replace water molecules by the number of ions required to neutralize the system. The program will ask you to specify the group of solvent molecules from which extract the waters to be replaced (select SOL, option 12). Again, we need to combine the structure and the topology files:
IMPORTANT NOTE: The topology file has not been updated automatically after the replacement of the water molecules with chloride ions. This will have to be done manually. The atom/molecules nomenclature can change from one force field to another (see ions.itp file)
Edit the topology file and decrease the number of water molecules in the second SOL segment by the number of ions added. Also add a line specifying the number of chloride ions added (CL-) Check that the #include ions.itp line is present in the topology file.
Now the entire solvated system is again submitted to some relaxation steps. As before, first use grommp to combine structure, topology and controls, and then launch mdrun to actually run the minimization job.
grompp -v -f minimization.mdp -c added_ions.pdb -p system.top -o minimization2.tpr
1.5 Equilibration of solvent water. The last step before starting the molecular dynamics simulation is already a molecular dynamics simulation, but with positional restrains. We will be applying restrains to the position of the protein atoms (define = -DPOSRES), but allowing the water molecules to move freely. The positional restrains will be read from the posre.itp file that was created by default with the pdb2gmx command. By this, the waters added accommodate around the protein. Moreover, we will apply the Linear Constrain algorithm (LINCS, constraints = all-bonds) for fixing all bond lengths in the system (important to use this for dt > 1fs). This is usually used to save computational time. NOTE: Here we will be running only 5ps of restrained MD. In a real simulation the equilibration of the water would run for 20-200ps long.
You could try running in parallel: (here 2 processors): mpirun np 2 mdrun s . (the job takes around 3 minutes in serial) mdrun -s md_pr.tpr -o md_pr.trr -c md_pr.pdb -g md_pr.log -e md_pr.edr &
; MD_pr.mdp ; Example of position-restained MD in GROMACS ; Everything following ';' is a comment title warnings cpp = ANDROGEN RECEPTOR with water = 10 = /usr/bin/cpp
; Apply restrains define = -DPOSRES constraints ; Run integr dt nsteps nstxout nstlist ns_type rlist coulombtype rcoulomb rvdw fourierspacing fourier_nx fourier_ny fourier_nz pme_order ewald_rtol optimize_fft
; read force constants from posre.itp and apply ; positional restrains to protein atoms = all-bonds ; LINCs to all bonds
= md ; Molecular Dynamics run = 0.002 ; integration step, in ps = 2500 ; total number of steps: 5 ps (in real, at least 20ps) = 250 = = = = = = = = = = = = = 5 grid 0.9 PME 0.9 0.9 0.12 0 0 0 4 1e-5 yes ; save coordinates every 0.5 ps (in .trr) ; frequency for updating the non-bonding list
; Berendsen temperature coupling is on in two groups Tcoupl = berendsen ; thermostat type tau_t = 0.1 0.1 0.1 ; time constant for the T coupling (in ps) ; one value per tc-group (same order) tc-grps = protein SOL CL; the groups are listed ref_t = 300 300 300 ; reference temperature, i.e. T of the MD ; Pressure coupling is on Pcoupl = berendsen tau_p = 0.5 compressibility = 4.5e-5 ref_p = 1.0
; ; ; ;
barostat to control the simulation pressure time constant for coupling (in ps) value for water at 300 K and 1 atm reference pressure for the coupling (in bar)
; Generate velocites is on at 300 K. gen_vel = yes gen_temp = 300.0 gen_seed = 8378922 ; seed number for the random generation of velocities
To which atoms are the positional restrains applied? How many steps would be required to simulate 100 ps?
= md = 0.002 = 10000 ; the = 500 = = = = = = = = = = = = = 10 grid 0.9 PME 0.9 0.9 0.12 0 0 0 4 1e-5 yes
; Molecular Dynamics run ; integration step, in ps ; total number of steps: 20 ps (in real, it depends on application, from 500ps to several ns to hundreds of ns) ; save coordinates every 500 steps (in .trr) ; frequency for updating the non-bonding list
nstxout nstlist ns_type rlist coulombtype rcoulomb rvdw fourierspacing fourier_nx fourier_ny fourier_nz pme_order ewald_rtol optimize_fft
; Berendsen temperature coupling is on in two groups Tcoupl = berendsen ; thermostat type tau_t = 0.1 0.1 0.1 ; time constant for the T coupling (in ps) ; one value per tc-group (same order) tc-grps = protein SOL CL; the groups are listed ref_t = 300 300 300 ; reference temperature, i.e. T of the MD ; Pressure coupling is on Pcoupl = berendsen tau_p = 0.5 compressibility = 4.5e-5 ref_p = 1.0
; ; ; ;
barostat to control the simulation pressure time constant for coupling (in ps) value for water at 300 K and 1 atm reference pressure for the coupling (in bar)
; Generate velocites is on at 300 K. gen_vel = yes gen_temp = 300.0 gen_seed = 392811 ; seed number for the random generation of velocities
You could try running in parallel: (here 2 processors): mpirun np 2 mdrun s . (the job takes ~24 minutes in serial)
mdrun -s md1.tpr -o md1.trr -c md1.pdb -g md1.log -e md1.edr &
What is the simulation time? How often will the structure be saved? What are the simulated Temperature and Pressure? Usually, a molecular dynamics simulation includes three parts: heating, equilibration and production run. How is this achieved here?
To make the simulation longer you can run another bit by simply doing:
# Examples of energy analysis: the energy term to analyze is selected in a menu g_energy -f md1.edr -s md1.tpr -o T_md1.xvg g_energy -f md1.edr -s md1.tpr -o Etot_md1.xvg # Concatenate several trajectory files: echo c c | trjcat -f md1.trr md2.trr -o full_md.trr -cat -settime
# trjconv is used to manipulate trajectory files: change format, extract the trajectory for one group # align a trajectory to a reference structure trjconv -f md1.trr -o md1_fit.trr -s md_pr.pdb -fit rot+trans # get one snapshot out of the trajectory file . Here gets pdb at 340 ps trjconv -f md1.trr -s md1.tpr -o MD_ 340.pdb -b 340 -e 341 # RMSD calculation: g_rms -f md1.trr -o rmsd_md1.xgv -s md_pr.pdb -fit rot+trans
# Geometrical analysis: (monitoring a distance, a dihedral angle) g_angle -f md1_fit.trr -n dihedre.ndx -type dihedral -of dihfrac.xvg -oc dihcorr.xvg -oh trhisto.xvg -ot dihtrans.xvg -od angdist.xvg # Structural clustering: g_cluster -f md1_fit.trr -s md_pr.pdb -cl cluster_12.pdb -g cluster_12.log -dist rmsd_dist_12.xvg -cutoff 0.12 -noav -method gromos -o rmsd-clust_12.xpm -sz clust-size_12.xvg -clid clust-id_12.xvg -ev rmsd-eig_12.xvg -n index_clustering.ndx
You have seen how a lot of analysis tools in GROMACS allow the selection of parts of the structure in order to perform the analysis for only these parts. Often a menu allows the user to easily choose the selection (Calpha, protein, protein-H, backbone, sidechains...). However, sometimes we may want just a selection of residues, one chain, 'selection within 10.0 of ligand', a dihedral angle... In those cases, a index.ndx file can be used, in which the atom numbers of those atoms selected for the analysis are specified.
For example, a dihedre.ndx file for selecting dihedral angles would look like:
[SER115] 1139113811371135 [SER116] 1147114611451143 [SER118] 117211711170116
There is a program called make_ndx which facilitates preparation of this index.ndx file. Example of usage: make_ndx -f md1.pdb -o index.ndx
A menu permits selection of the 'group' you want to include in the index.ndx output file. The output can contain several 'groups'. When the index.ndx file will be read by an analysis tool, the latter will ask for the group you are interested on from the ones included in the index.ndx file. You can edit the index.ndx file and add the selection of atoms you want. Notice that the required items are a title for the group and the list of the atom numbers included in the group.