Tutorial 1: A Standard Protein

In this tutorial, we will show you how to use pdbtop to prepare a protein for calculations from a real case.

Download Structure

Assume we want to study a protein with PDB ID 5VBM, you can download it from the RCSB PDB database https://www.rcsb.org/. The file is called 5vbm.pdb, and is shown below:

5vbm.pdb
 1EADER    HYDROLASE                                           5VBM
 2TITLE     CRYSTAL STRUCTURE OF SMALL MOLECULE DISULFIDE 2C07 BOUND TO K-RAS CYS
 3TITLE    2 LIGHT M72C GDP
 4KEYWDS    GTPASE,INHIBITOR,GDP,HYDROLASE
 5EXPDTA    X-RAY DIFFRACTION
 6REMARK   2
 7REMARK   2 RESOLUTION.    1.49 ANGSTROMS.
 8...
 9ATOM     11  CA  MET A   1     -19.356 -15.943  24.176  1.00 17.37           C
10ANISOU   11  CA  MET A   1     1877   1974   2749   -442    959    401       C
11ATOM     12  C   MET A   1     -17.969 -16.202  23.628  1.00 16.64           C
12ANISOU   12  C   MET A   1     1846   1881   2596   -347    853    470       C
13ATOM     13  O   MET A   1     -17.377 -17.264  23.848  1.00 17.10           O
14ANISOU   13  O   MET A   1     1987   1897   2613   -333    837    563       O
15ATOM     14  CB  MET A   1     -19.217 -14.992  25.375  1.00 18.43           C
16ANISOU   14  CB  MET A   1     2017   2205   2782   -486   1034    371       C
17ATOM     15  CG  MET A   1     -18.149 -15.406  26.395  1.00 38.06           C
18ANISOU   15  CG  MET A   1     4623   4745   5091   -513   1025    485       C
19ATOM     16  SD  MET A   1     -17.536 -14.004  27.341  1.00 44.81           S
20ANISOU   16  SD  MET A   1     5501   5720   5806   -545   1049    443       S
21ATOM     17  CE  MET A   1     -18.860 -13.868  28.537  1.00 67.52           C
22ANISOU   17  CE  MET A   1     8364   8638   8651   -676   1205    342       C
23ATOM     18  H   MET A   1     -19.406 -17.766  24.902  1.00 22.03           H
24ATOM     19  HA  MET A   1     -19.898 -15.510  23.498  1.00 20.85           H
25...

The structure is shown below:

_images/p2.png

We can see that, there are several water molecules, ions, and ligands in the file. We want to study the protein now. The covalently bonded ligand 92V will be considered in Tutorial 3: A Protein and its Covalently Bonded Ligand. We will show you how to prepare the protein for calculations.

check Structure

This file cannot be used directly in computations, since it contains some useless information and bad atoms. So, the first step is to check the structure using the following command:

$ pdbtop check -i 5vbm.pdb -o 5vbm-1

This command means that with input (-i) file 5vbm.pdb, pdbtop will check the structure and write the output (-o) to `5vbm-1.pdb`. The output files are shown below:

$ pdbtop check -i 5vbm.pdb -o 5vbm-1
Read: 5VBM.pdb
Warning: The residue name of the 423-th atom is changed from "HIS" to "HSE".
Warning: The residue name of the 424-th atom is changed from "HIS" to "HSE".
...
Warning: The atom name of the 325-th atom is changed from "CD1" to "CD".
Warning: The atom name of the 380-th atom is changed from "CD1" to "CD".
...
Warning: The atom N in residue LYS16 at chain A has an occupancy of 0.490. Probably, only 1 of atom N220 and N221 can be kept!
Warning: The atom N in residue LYS16 at chain A has an occupancy of 0.510.
Warning: The atom CA in residue LYS16 at chain A has an occupancy of 0.490. Probably, only 1 of atom CA222 and CA223 can be kept!
Warning: The atom CA in residue LYS16 at chain A has an occupancy of 0.510.
...
Current molecule:
Molecule:                 5vbm.pdb
Number of atoms:          2849
Number of residues:       276
 Number of amino acids:   168
 Number of nucleic acids: 0
 Number of waters:        105
 Number of ions:          1
 Number of ligands:       2
Write PDB: 5vbm-1.pdb

We strongly recommend you to read all Warning statements carefully:

  1. Warning: The residue name of the 423-th atom is changed from "HIS" to "HSE". The residue name of the 423-th atom is changed from “HIS” to “HSE”.

  2. Warning: The atom name of the 325-th atom is changed from "CD1" to "CD". The atom name of the 325-th atom is changed from “CD1” to “CD”.

  3. The atom N in residue LYS16 at chain A has an occupancy of 0.490. Probably, only 1 of atom N220 and N221 can be kept! This is very important. This and the following lines mean that there are 2 sets of conformation for this residue LYS16 at chain A. We can check this in the file 5vbm-1.pdb:

5vbm-1.pdb
 1...
 2ATOM    219  HA3 GLY A  15     -11.046   9.692  14.589  1.00 13.71         H H
 3ATOM    220  N   LYS A  16      -9.204   7.791  16.304  0.49  9.02         N N
 4ATOM    221  N   LYS A  16      -9.203   7.790  16.301  0.51  8.97         N N
 5ATOM    222  CA  LYS A  16      -9.168   6.488  16.966  0.49 10.42         C C
 6ATOM    223  CA  LYS A  16      -9.163   6.488  16.967  0.51 10.37         C C
 7ATOM    224  C   LYS A  16     -10.013   6.496  18.236  0.49  9.00         C C
 8ATOM    225  C   LYS A  16     -10.009   6.495  18.237  0.51  9.03         C C
 9ATOM    226  O   LYS A  16     -10.840   5.600  18.455  0.49 10.68         O O
10ATOM    227  O   LYS A  16     -10.835   5.597  18.456  0.51 10.55         O O
11ATOM    228  CB  LYS A  16      -7.724   6.108  17.278  0.49 10.05         C C
12ATOM    229  CB  LYS A  16      -7.717   6.106  17.274  0.51 10.08         C C
13ATOM    230  CG  LYS A  16      -6.871   6.043  16.021  0.49 11.81         C C
14ATOM    231  CG  LYS A  16      -6.901   5.904  16.006  0.51 12.28         C C
15...

There are 2 N in residue LYS16 at chain A having an occupancy of 0.49 and 0.51, respectively. For the same reason, there are 2 CA in residue LYS16 at chain A having an occupancy of 0.49 and 0.51, respectively. Only 1 set of this conformation should be kept (see check for more information). Note that pdbtop will NOT do this and you should do this manually. Now we only keep the conformation of occupancy 0.51. So, delete all the atoms with occupancy 0.49 in LYS16 at chain A and save it to a new file, say 5vbm-2.pdb:

5vbm-2.pdb
1...
2ATOM    219  HA3 GLY A  15     -11.046   9.692  14.589  1.00 13.71         H H
3ATOM    221  N   LYS A  16      -9.203   7.790  16.301  0.51  8.97         N N
4ATOM    223  CA  LYS A  16      -9.163   6.488  16.967  0.51 10.37         C C
5ATOM    225  C   LYS A  16     -10.009   6.495  18.237  0.51  9.03         C C
6ATOM    227  O   LYS A  16     -10.835   5.597  18.456  0.51 10.55         O O
7ATOM    229  CB  LYS A  16      -7.717   6.106  17.274  0.51 10.08         C C
8ATOM    231  CG  LYS A  16      -6.901   5.904  16.006  0.51 12.28         C C
9...

Also, you need only to delete heavy atoms, since the hydrogen atoms will be deleted in the next step.

remove Structure

Now we remove everything except the protein:

$ pdbtop remove -i 5vbm-2.pdb -o 5vbm-3 --waters --ions --ligands --Hs

The additional options are:

  1. --waters: remove water molecules.

  2. --ions: remove ions.

  3. --ligands: remove ligands.

  4. --Hs: remove hydrogen atoms. The hydrogen atoms in this file have nonstandard names, so we need to remove them.

Now the structure look like this, no hydrogens, no waters, no ions, no ligands:

_images/p3.png

Generate topology

Now we generate the topology:

$ pdbtop topol -i 5vbm-3.pdb -o 5vbm-4

The output are shown below:

$ pdbtop topol -i 5vbm-3.pdb -o 5vbm-4
Read: 5vbm-3.pdb
...
Building topology ...
Building topology done.
Patching N-terminus in residue GLY0 at chain A.
Patching C-terminus in residue LYS169 at chain A.
Write PDB: 5vbm-4.pdb
Write PSF: 5vbm-4.psf
Total charge: -5.00000

The output indicates that pdbtop has built the topology and patched the N- and C-terminus for each protein chain. The output files are 5vbm-4.pdb and 5vbm-4.psf, shown below:

_images/p4.png

At this stage, with 5vbm-4.pdb and 5vbm-4.psf, one can start to do calculations for the protein.

solvate System

Now, we need to add water to solvate the system, and add ions to neutralize it. The box size is 70x70x70 Angstrom^3. pdbtop will use NaCl to neutralize the system. The command is:

$ pdbtop.exe solvate -i 5vbm-4.pdb -t 5vbm-4.psf -o 5vbm-sol --box "70 70 70"

The output is:

 $ pdbtop.exe solvate -i 5vbm-4.pdb -t 5vbm-4.psf -o 5vbm-5 --box "70 70 70"
 ...
Building water box: 70.000 x 70.000 x 70.000 Angstrom^3.
12552 water molecules are added.
Add ions:
 Target charge: 0
 Target ionic strength: 0.010 mol/L
5 cations and 0 anions are added.
Final ionic strength: 0.012 mol/L
Write PDB: 5vbm-sol.pdb
Write PSF: 5vbm-sol.psf
Total charge: -0.00000

You can adjust the ionic strength in mol/L with --ionic-strength:

$ pdbtop.exe solvate -i 5vbm-4.pdb -t 5vbm-4.psf -o 5vbm-5 --box "70 70 70" --ionic-strength 0.02
...
12552 water molecules are added.
Add ions:
 Target charge: 0
 Target ionic strength: 0.020 mol/L
6 cations and 1 anions are added.
Final ionic strength: 0.017 mol/L
Write PDB: 5vbm-5.pdb
Write PSF: 5vbm-5.psf
Total charge: -0.00000

Now there are more cations and anions.

Now, we have a solvated, neutralized protein solvation box, which is ready for calculations!

_images/p8.png