Amber Workshop - Tutorial 2

(Note: These tutorials are meant to provide illustrative examples of how to use the AMBER software suite to carry out simulations that can be run on a simple workstation in a reasonable period of time. They do not necessarily provide the optimal choice of parameters or methods for the particular application area.)
Copyright Ross Walker 2004

In the final stage of this tutorial we will use the skills we have learnt to run a simulation on a more practical example. Up to this point we have been using canonical B-DNA as our starting structure. An alternative structure for DNA is canonical A-DNA, compared below:

Which structure is the more stable, however? In this section we will try running a long MD simulation starting from canonical A form poly(A)-poly(T). We will create the input structure, charge neutralise and solvate it in the same method as we learned earlier. We will then minimise and equilibrate the structure using the same approach as we did for the solvated B-DNA simulation and then finally we shall analyse the results and see what happened. For a more in-depth discussion of A-form and B-form DNA and MD simulations see Cheatham, T.E., Kollman, P.A. J. Mol. Biol., 1996, 259, 434-444.

Note, the simulations in this tutorial may take a considerable amount of time to run - the production MD simulations take about 57,000 s (15.8 hours) [31.6 s per ps] on eight 1.3GHz processors of a SGI ALTIX. While I urge you to set up and run these calculations the output files at each stage are provided so that you can continue with the remainder of the tutorial without waiting for the simulation to finish running.

6.1) Creating the starting structure

The first stage of setting up our simulation is to obtain a starting structure. As before we shall use nucgen for this. Previously we instructed nucgen to construct our DNA 10-mer in right handed B-form geometry with the keyword $ABDNA. Changing this to $ADNA will give us what we want. (See the AMBER manual for more details on nucgen usage). Here is the input file for nucgen:

a-dna.nucgen.in

  NUC   1
D
A5   A    A    A    A    A    A    A    A    A3                                 
                                                                                
  NUC   2
D
T5   T    T    T    T    T    T    T    T    T3                                 
                                                                                
END                                                                             
$ADNA

Check the output file to make sure there are no error messages. Particularly check for the line that says:

We should also take a quick look at the pdb structure nucgen created for us. Is it what we want?

The next step is to create the prmtop and inpcrd files for sander. As before we will use xleap for this. The first step is to add the missing TER card between residues 10 and 11 of the pdb. Here is the modified pdb file (a-dna.pdb).

Load our pdb file into a new unit called adna. In the main xleap window enter:

Now we will save a prmtop and inpcrd file for our neutralised (but not solvated) system. We will use the prmtop file later when we visualise our trajectory.

Next step, solvate our system with a truncated octahedral box containing an 8 angstrom buffer of TIP3P water:

This should have added about 3,572 water molecules. Edit the model to check everything looks ok:

The next stage is to minimise the system prior to performing MD in order to remove any bad contacts created by hydrogenation and solvation. For this we will use the same protocol as we used in the previous section of this tutorial.

Stage 1: 500 steps of steepest descent followed by 500 steps of conjugate gradient minimisation, with a 500 kcal/mol restraint force on the DNA molecule (Res 1-20)

a-dna_min1.in

A-DNA 10-mer: initial minimisation solvent + ions
 &cntrl
  imin   = 1,
  maxcyc = 1000,
  ncyc   = 500,
  ntb    = 1,
  ntr    = 1,
  cut    = 10
 /
Hold the DNA fixed
500.0
RES 1 20
END
END

Stage 2: 1,000 steps of steepest descent followed by 1,500 steps of conjugate gradient minimisation with no restraints

a-dna_min2.in

A-DNA 10-mer: initial minimisation solvent + ions
 &cntrl
  imin   = 1,
  maxcyc = 2500,
  ncyc   = 1000,
  ntb    = 1,
  ntr    = 0,
  cut    = 10
 /

The next stage is to start running molecular dynamics. We will initially run 20 ps of heating/equilibration at constant volume in the same fashion as we did for the B-DNA explicit water simulation. This will be done using the same protocols: i.e. 20 ps at constant volume periodic boundaries with weak (10 kcal/mol) restraints on the DNA. An initial temperature of 0 K, final temperature of 300 K, Langevin temperature controls, SHAKE constraints on hydrogen atoms and a 2 fs time step. To save disk space we will only write to our output and mdcrd files every 250 steps (0.5 ps). We will also compress our mdcrd file upon completion using gzip. This will give us a compression ratio of about 60 %. This is important because the next step will be to run 1.8 ns of MD which can create some very big mdcrd files:

a-dna_md1.in

A-DNA 10-mer: 20ps MD with res on DNA
 &cntrl
  imin   = 0,
  irest  = 0,
  ntx    = 1,
  ntb    = 1,
  cut    = 10,
  ntr    = 1,
  ntc    = 2,
  ntf    = 2,
  tempi  = 0.0,
  temp0  = 300.0,
  ntt    = 3,
  gamma_ln = 1.0,
  nstlim = 10000, dt = 0.002,
  ntpr = 250, ntwx = 250, ntwr = 10000
 /
Keep DNA fixed with weak restraints
10.0
RES 1 20
END
END

This takes approximately 522 s (8 mins 42 s) on eight 1.3 GHz processors of a SGI ALTIX.

NOTE: For this section of the tutorial I have not included links to the mdcrd files since they are too big to reliable host on a webserver. Not having the mdcrd files will, if you haven't run the simulations yourself, prevent you from interactively conducting some of the analysis below. This should not be a problem, however, since I provide links to the analysis output files. I will also provide links later on to smaller versions of these mdcrd files that have the explict water removed. These will be sufficient to watch the dynamics in VMD.

In the next stage of our simulation we are going to run a further 1.8 ns of MD. Rather than run a separate equilibration and production phase we will, for convenience, combine the two since our production conditions will be identical to our final equilibration conditions. Here we will use the same conditions as we did for the B-DNA explicit solvent equilibration. Namely we will run without restraints on the DNA. We shall maintain the temperature at 300 K using the Langevin thermostat, run with constant pressure (1 atm) periodic boundaries, use SHAKE constraints on the hydrogen atoms and a 2 fs time step. Since we will be writing 1.8 ns of trajectory we will only write to our output and mdcrd files every 500 fs (250 steps). We shall also compress all of our files to save disk space as 1.8 ns of simulation will lead to some very big files.

Since this simulation will take a long time to run it is probably not a good idea to run all 1.8 ns in one shot. Thus we will actually run 9 x 200 ps simulations with each successive simulation continuing on from the previous one. In this way if a job crashes we only lose the progress of that job and can restart from where we got to. The input file will be the same in each case:

a-dna_md_1800ps.in

A-DNA 10-mer: 200ps of MD
 &cntrl
  imin = 0, irest = 1, ntx = 7,
  ntb = 2, pres0 = 1.0, ntp = 1,
  taup = 2.0,
  cut = 10, ntr = 0,
  ntc = 2, ntf = 2,
  tempi = 300.0, temp0 = 300.0,
  ntt = 3, gamma_ln = 1.0,
  nstlim = 100000, dt = 0.002,
  ntpr = 250, ntwx = 250, ntwr = 10000
 /

We will run the 9 simulations one after the other using the restrt file from the previous run as the input for the next run. While you could run each of these jobs manually using a command line as below (with the relevant filenames incremented by one each time):

It is probably best to write a script to run all of these jobs for us so that we can leave it running over night. Here is an example script:

run_a-dna_20-1820ps.x

#!/bin/csh
set AMBERHOME="/usr/local/AMBER8"
set MDSTARTJOB=2
set MDENDJOB=10
set MDCURRENTJOB=$MDSTARTJOB
set MDINPUT=0

echo -n "Starting Script at: "
date
echo ""

while ( $MDCURRENTJOB <= $MDENDJOB )
   echo -n "Job $MDCURRENTJOB started at: "
   date
   @ MDINPUT = $MDCURRENTJOB - 1
   $AMBERHOME/exe/sander -O -i a-dna_md_1800ps.in \
                            -o a-dna_md$MDCURRENTJOB.out \
                            -p a-dna_wat.prmtop \
                            -c a-dna_md$MDINPUT.rst \
                            -r a-dna_md$MDCURRENTJOB.rst \
                            -x a-dna_md$MDCURRENTJOB.mdcrd
   gzip -9 -v a-dna_md$MDCURRENTJOB.mdcrd
   echo -n "Job $MDCURRENTJOB finished at: "
   date
   @ MDCURRENTJOB = $MDCURRENTJOB + 1
end
echo "ALL DONE"

Now we will run the script - we use nohup so that we can logout without killing the job. The ampersand '&' means run in background. We 'pipe' the output to a log file so that we can simply look at the log file to see the current progress of the job:

This will take a long time to run. Each 200ps stage takes approximate 5,700s (1.58 hours) on eight 1.3 GHz processors of a SGI ALTIX.

Time Span (ps)	mdout	rstrt	mdcrd
20 - 220	a-dna_md2.out	a-dna_md2.rst	a-dna_md2.mdcrd.gz
220 - 420	a-dna_md3.out	a-dna_md3.rst	a-dna_md3.mdcrd.gz
420 - 620	a-dna_md4.out	a-dna_md4.rst	a-dna_md4.mdcrd.gz
620 - 820	a-dna_md5.out	a-dna_md5.rst	a-dna_md5.mdcrd.gz
820 - 1020	a-dna_md6.out	a-dna_md6.rst	a-dna_md6.mdcrd.gz
1020 - 1220	a-dna_md7.out	a-dna_md7.rst	a-dna_md7.mdcrd.gz
1220 - 1420	a-dna_md8.out	a-dna_md8.rst	a-dna_md8.mdcrd.gz
1420 - 1620	a-dna_md9.out	a-dna_md9.rst	a-dna_md9.mdcrd.gz
1620 - 1820	a-dna_md10.out	a-dna_md10.rst	a-dna_md10.mdcrd.gz

The next stage is to analyse our results and look for anything interesting. First of all lets process the output files:

Take a look at the summary files and you will see that after initial relaxation the temperature, pressure, volume etc stay fairly constant. This indicates that there are no major problems with the simulation.

Lets calculate the RMSd of the backbone atoms and see what is happening to the DNA structure. While we are at it we will also get ptraj to re-image the whole trajectory and also remove the waters from the trajectory file. The reason we do this, instead of just hiding the waters in VMD, is that our mdcrd files are now considerably bigger than previously and loading everything into VMD may cause it to run very slowly. It is also fiddly to load 10 separate mdcrd files into a single molecule. This is why we needed the a-dna_charges_only.prmtop file. Thus we will get ptraj to strip the waters and append all of the mdcrd files to a single file. Here is the input file for ptraj (note we don't need to unzip the files before use, ptraj does this on the fly):

combine_mdcrds_and_strip.ptraj

trajin a-dna_md1.mdcrd.gz
trajin a-dna_md2.mdcrd.gz
trajin a-dna_md3.mdcrd.gz
trajin a-dna_md4.mdcrd.gz
trajin a-dna_md5.mdcrd.gz
trajin a-dna_md6.mdcrd.gz
trajin a-dna_md7.mdcrd.gz
trajin a-dna_md8.mdcrd.gz
trajin a-dna_md9.mdcrd.gz
trajin a-dna_md10.mdcrd.gz
trajout a-dna_0-1820ps_no_wat.mdcrd
rms first out a-dna_0-1820ps.rmsfit @P,O3',O5',C3',C4',C5' time 0.50
center :1-20
image familiar
strip :WAT

Interesting... The final structure differs from the initial structure by over 5 angstroms. This is quite a large RMSd and would in some cases indicate a serious problem with the MD simulation. However, the RMSd increases slowly between 20 ps (when the restraints were switched off) and 500 ps. From 500 ps to end of the simulation the RMSd remains more or less constant at between 4.5 and 5.5 angstroms. This suggests that our starting structure changed slowly changed over the first 500 ps of our simulation into a more stable structure. This more stable structure then remains intact for the remainder of the simulation. Could it be that our A-form DNA converted to B-form? Lets visualise the trajectory and note what we see.

As already mentioned the trajectories including water are a little large to open comfortably in VMD so we shall use the stripped trajectory ptraj created for us. For this we need to use the prmtop file we created before solvation a-dna_charges_only.prmtop.

Load the a-dna_charges_only.prmtop file as parm7. Then load the a-dna_0-1820ps_no_wat.mdcrd file into this molecule as crdbox (Make sure you unzip it first). You should find that the trajectory loads just showing the DNA and charges (no water). Take a good look at the trajectory, have a look down the minor groove of the DNA and observe what happens to the structure:

0 ps Minor Groove (A-Form)	0 ps Major Groove (A-Form)
100 ps Minor Groove (Still A-Form)	*200* ps Minor Groove (Still A-Form)
300 ps Minor Groove (A-Form)	400 ps Minor Groove (Distorted A-Form)
500 ps Minor Groove (B-Form)	600 ps Minor Groove (B-Form)
700 ps Minor Groove (B-Form)	800 ps Minor Groove (B-Form)
1820 ps Minor Groove (B-Form)	1820 ps Major Groove (B-Form)

As you can see somewhere between step 600 and 800 (300 ps and 400 ps) our structure begins to change from A-form DNA to B-form, you may need to rotate the structure in order for this to become apparent. By step 1,000 (500 ps) the conversion is complete. From this point on the structure of our DNA remains in B-form.

This demonstrates a practical use of such an MD simulation. We have been able to use molecular dynamics to move away from a local minima (our A-form DNA) to a lower minima (B-form DNA).

Finally, lets calculate the average structure over the 1,800 ps of our simulation in which the DNA was not restrained, mdcrd files 2 to 10. Again, for this we will use ptraj. Here is the input file to create an average pdb from all 3600 structures in our constant pressure mdcrd trajectory file:

a-dna_calc_av_struc.ptraj

trajin a-dna_md2.mdcrd.gz
trajin a-dna_md3.mdcrd.gz
trajin a-dna_md4.mdcrd.gz
trajin a-dna_md5.mdcrd.gz
trajin a-dna_md6.mdcrd.gz
trajin a-dna_md7.mdcrd.gz
trajin a-dna_md8.mdcrd.gz
trajin a-dna_md9.mdcrd.gz
trajin a-dna_md10.mdcrd.gz
average a-dna_20-1820ps_average.pdb pdb nowrap

And there you have it, the average structure over the entire constant pressure stage of our simulation.

You now have the necessary knowledge to set-up and run your own simulations. Have a go and see how you get on. To learn about other aspects of AMBER 8.0, including more complex simulations that involve creating your own residues please see the other workshop tutorials (http://www.rosswalker.co.uk/tutorials/amber_workshop/).

Feel free to use what you have learned to experiment with alternative starting geometries. Do they all change to B-DNA if run for long enough? What about in a mixture of 80% water, 20% ethanol?