Tutorial 1: A Standard Protein
In this tutorial, we will show you how to use pdbtop to prepare a protein for calculations from a real case.
Download Structure
Assume we want to study a protein with PDB ID 5VBM
, you can download it from the RCSB PDB database https://www.rcsb.org/. The file is called 5vbm.pdb
, and is shown below:
1EADER HYDROLASE 5VBM
2TITLE CRYSTAL STRUCTURE OF SMALL MOLECULE DISULFIDE 2C07 BOUND TO K-RAS CYS
3TITLE 2 LIGHT M72C GDP
4KEYWDS GTPASE,INHIBITOR,GDP,HYDROLASE
5EXPDTA X-RAY DIFFRACTION
6REMARK 2
7REMARK 2 RESOLUTION. 1.49 ANGSTROMS.
8...
9ATOM 11 CA MET A 1 -19.356 -15.943 24.176 1.00 17.37 C
10ANISOU 11 CA MET A 1 1877 1974 2749 -442 959 401 C
11ATOM 12 C MET A 1 -17.969 -16.202 23.628 1.00 16.64 C
12ANISOU 12 C MET A 1 1846 1881 2596 -347 853 470 C
13ATOM 13 O MET A 1 -17.377 -17.264 23.848 1.00 17.10 O
14ANISOU 13 O MET A 1 1987 1897 2613 -333 837 563 O
15ATOM 14 CB MET A 1 -19.217 -14.992 25.375 1.00 18.43 C
16ANISOU 14 CB MET A 1 2017 2205 2782 -486 1034 371 C
17ATOM 15 CG MET A 1 -18.149 -15.406 26.395 1.00 38.06 C
18ANISOU 15 CG MET A 1 4623 4745 5091 -513 1025 485 C
19ATOM 16 SD MET A 1 -17.536 -14.004 27.341 1.00 44.81 S
20ANISOU 16 SD MET A 1 5501 5720 5806 -545 1049 443 S
21ATOM 17 CE MET A 1 -18.860 -13.868 28.537 1.00 67.52 C
22ANISOU 17 CE MET A 1 8364 8638 8651 -676 1205 342 C
23ATOM 18 H MET A 1 -19.406 -17.766 24.902 1.00 22.03 H
24ATOM 19 HA MET A 1 -19.898 -15.510 23.498 1.00 20.85 H
25...
The structure is shown below:

We can see that, there are several water molecules, ions, and ligands in the file. We want to study the protein now. The covalently bonded ligand 92V
will be considered in Tutorial 3: A Protein and its Covalently Bonded Ligand. We will show you how to prepare the protein for calculations.
check
Structure
This file cannot be used directly in computations, since it contains some useless information and bad atoms. So, the first step is to check the structure using the following command:
$ pdbtop check -i 5vbm.pdb -o 5vbm-1
This command means that with input (-i
) file 5vbm.pdb
, pdbtop will check the structure and write the output (-o
) to `5vbm-1.pdb`
. The output files are shown below:
$ pdbtop check -i 5vbm.pdb -o 5vbm-1
Read: 5VBM.pdb
Warning: The residue name of the 423-th atom is changed from "HIS" to "HSE".
Warning: The residue name of the 424-th atom is changed from "HIS" to "HSE".
...
Warning: The atom name of the 325-th atom is changed from "CD1" to "CD".
Warning: The atom name of the 380-th atom is changed from "CD1" to "CD".
...
Warning: The atom N in residue LYS16 at chain A has an occupancy of 0.490. Probably, only 1 of atom N220 and N221 can be kept!
Warning: The atom N in residue LYS16 at chain A has an occupancy of 0.510.
Warning: The atom CA in residue LYS16 at chain A has an occupancy of 0.490. Probably, only 1 of atom CA222 and CA223 can be kept!
Warning: The atom CA in residue LYS16 at chain A has an occupancy of 0.510.
...
Current molecule:
Molecule: 5vbm.pdb
Number of atoms: 2849
Number of residues: 276
Number of amino acids: 168
Number of nucleic acids: 0
Number of waters: 105
Number of ions: 1
Number of ligands: 2
Write PDB: 5vbm-1.pdb
We strongly recommend you to read all Warning
statements carefully:
Warning: The residue name of the 423-th atom is changed from "HIS" to "HSE".
The residue name of the 423-th atom is changed from “HIS” to “HSE”.Warning: The atom name of the 325-th atom is changed from "CD1" to "CD".
The atom name of the 325-th atom is changed from “CD1” to “CD”.The atom N in residue LYS16 at chain A has an occupancy of 0.490. Probably, only 1 of atom N220 and N221 can be kept!
This is very important. This and the following lines mean that there are 2 sets of conformation for this residueLYS16
at chainA
. We can check this in the file5vbm-1.pdb
:
1...
2ATOM 219 HA3 GLY A 15 -11.046 9.692 14.589 1.00 13.71 H H
3ATOM 220 N LYS A 16 -9.204 7.791 16.304 0.49 9.02 N N
4ATOM 221 N LYS A 16 -9.203 7.790 16.301 0.51 8.97 N N
5ATOM 222 CA LYS A 16 -9.168 6.488 16.966 0.49 10.42 C C
6ATOM 223 CA LYS A 16 -9.163 6.488 16.967 0.51 10.37 C C
7ATOM 224 C LYS A 16 -10.013 6.496 18.236 0.49 9.00 C C
8ATOM 225 C LYS A 16 -10.009 6.495 18.237 0.51 9.03 C C
9ATOM 226 O LYS A 16 -10.840 5.600 18.455 0.49 10.68 O O
10ATOM 227 O LYS A 16 -10.835 5.597 18.456 0.51 10.55 O O
11ATOM 228 CB LYS A 16 -7.724 6.108 17.278 0.49 10.05 C C
12ATOM 229 CB LYS A 16 -7.717 6.106 17.274 0.51 10.08 C C
13ATOM 230 CG LYS A 16 -6.871 6.043 16.021 0.49 11.81 C C
14ATOM 231 CG LYS A 16 -6.901 5.904 16.006 0.51 12.28 C C
15...
There are 2 N
in residue LYS16 at chain A having an occupancy of 0.49
and 0.51
, respectively. For the same reason, there are 2 CA
in residue LYS16 at chain A having an occupancy of 0.49
and 0.51
, respectively. Only 1 set of this conformation should be kept (see check for more information). Note that pdbtop will NOT do this and you should do this manually. Now we only keep the conformation of occupancy 0.51
. So, delete all the atoms with occupancy 0.49
in LYS16 at chain A and save it to a new file, say 5vbm-2.pdb
:
1...
2ATOM 219 HA3 GLY A 15 -11.046 9.692 14.589 1.00 13.71 H H
3ATOM 221 N LYS A 16 -9.203 7.790 16.301 0.51 8.97 N N
4ATOM 223 CA LYS A 16 -9.163 6.488 16.967 0.51 10.37 C C
5ATOM 225 C LYS A 16 -10.009 6.495 18.237 0.51 9.03 C C
6ATOM 227 O LYS A 16 -10.835 5.597 18.456 0.51 10.55 O O
7ATOM 229 CB LYS A 16 -7.717 6.106 17.274 0.51 10.08 C C
8ATOM 231 CG LYS A 16 -6.901 5.904 16.006 0.51 12.28 C C
9...
Also, you need only to delete heavy atoms, since the hydrogen atoms will be deleted in the next step.
remove
Structure
Now we remove everything except the protein:
$ pdbtop remove -i 5vbm-2.pdb -o 5vbm-3 --waters --ions --ligands --Hs
The additional options are:
--waters
: remove water molecules.--ions
: remove ions.--ligands
: remove ligands.--Hs
: remove hydrogen atoms. The hydrogen atoms in this file have nonstandard names, so we need to remove them.
Now the structure look like this, no hydrogens, no waters, no ions, no ligands:

Generate topol
ogy
Now we generate the topology:
$ pdbtop topol -i 5vbm-3.pdb -o 5vbm-4
The output are shown below:
$ pdbtop topol -i 5vbm-3.pdb -o 5vbm-4
Read: 5vbm-3.pdb
...
Building topology ...
Building topology done.
Patching N-terminus in residue GLY0 at chain A.
Patching C-terminus in residue LYS169 at chain A.
Write PDB: 5vbm-4.pdb
Write PSF: 5vbm-4.psf
Total charge: -5.00000
The output indicates that pdbtop has built the topology and patched the N- and C-terminus for each protein chain. The output files are 5vbm-4.pdb
and 5vbm-4.psf
, shown below:

At this stage, with 5vbm-4.pdb
and 5vbm-4.psf
, one can start to do calculations for the protein.
solvate
System
Now, we need to add water to solvate the system, and add ions to neutralize it. The box size is 70x70x70 Angstrom^3. pdbtop will use NaCl to neutralize the system. The command is:
$ pdbtop.exe solvate -i 5vbm-4.pdb -t 5vbm-4.psf -o 5vbm-sol --box "70 70 70"
The output is:
$ pdbtop.exe solvate -i 5vbm-4.pdb -t 5vbm-4.psf -o 5vbm-5 --box "70 70 70"
...
Building water box: 70.000 x 70.000 x 70.000 Angstrom^3.
12552 water molecules are added.
Add ions:
Target charge: 0
Target ionic strength: 0.010 mol/L
5 cations and 0 anions are added.
Final ionic strength: 0.012 mol/L
Write PDB: 5vbm-sol.pdb
Write PSF: 5vbm-sol.psf
Total charge: -0.00000
You can adjust the ionic strength in mol/L with --ionic-strength
:
$ pdbtop.exe solvate -i 5vbm-4.pdb -t 5vbm-4.psf -o 5vbm-5 --box "70 70 70" --ionic-strength 0.02
...
12552 water molecules are added.
Add ions:
Target charge: 0
Target ionic strength: 0.020 mol/L
6 cations and 1 anions are added.
Final ionic strength: 0.017 mol/L
Write PDB: 5vbm-5.pdb
Write PSF: 5vbm-5.psf
Total charge: -0.00000
Now there are more cations and anions.
Now, we have a solvated, neutralized protein solvation box, which is ready for calculations!
