1. check
In this task, pdbtop will check the input molecule and do some corrections if necessary.
1.1. Arguments
- -i, --in
Mandatory
Yes
Argument
PDB filename or XYZ filename
Default
None
Give the input molecule.
- -o, --out
Mandatory
No
Argument
A filename prefix
Default
None
Give the output filename prefix. If not given, the output will be written to
x-check.pdb
.Example:
$ pdbtop check -i 3kab.pdb -o new # Read the molecule from a PDB file
It will output the following information:
... Warning: The residue name of the 1112-th atom is changed from "HIS" to "HSE". Warning: The atom name of the 1120-th atom is changed from "CD1" to "CD". Warning: The atom name of the 1128-th atom is changed from "CD1" to "CD". Warning: The atom name of the 1164-th atom is "OXT". It is probably a non-standard atom in C-terminus and is deleted. ... Write PDB: new.pdb
The output indicates that some residues and atoms are renamed, and some atoms are deleted. The corrected molecule will be written to
new.pdb
. Usually, pdbtop can do these corrections in the right way. However, if you have special requirements, you have to correct the input molecule manually. So, we strongly recommend you read all theWarning
information carefully.
1.2. Theoretical Background
PDB files obtained from https://www.rcsb.org/ are often generated from X-ray experiments. The PDB file format is not always suitable to be used in computations and often contains ambiguouities that CANNOT be treated automatically. You have to treat some issues manually. This section will explain what pdbtop does in the check task.
1.2.1. Long Atom Names and Location Indicators
Below are two examples of PDB files:
1ATOM 2225 CG LEU A 160 37.975 11.053 20.429 1.00 32.24 C
2ATOM 2226 HG LEU A 160 38.934 11.441 20.009 1.00 0.00 H H
3ATOM 2227 CD1 LEU A 160 38.210 9.708 21.137 1.00 34.92 C
4ATOM 2228 HD11LEU A 160 38.574 8.946 20.415 1.00 0.00 H H
5ATOM 2229 HD12LEU A 160 38.969 9.817 21.942 1.00 0.00 H H
6ATOM 2230 HD13LEU A 160 37.268 9.334 21.591 1.00 0.00 H H
7ATOM 2231 CD2 LEU A 160 36.995 10.852 19.267 1.00 33.22 C
8ATOM 2232 HD21LEU A 160 37.422 10.150 18.519 1.00 0.00 H H
9ATOM 2233 HD22LEU A 160 36.030 10.429 19.613 1.00 0.00 H H
10ATOM 2234 HD23LEU A 160 36.791 11.819 18.759 1.00 0.00 H H
1ATOM 812 CA AARG A 119 31.582 16.256 14.775 0.50 35.04 C
2ATOM 813 CA BARG A 119 31.512 16.275 14.809 0.50 34.80 C
3ATOM 814 C ARG A 119 32.251 16.478 16.124 1.00 35.47 C
4ATOM 815 O ARG A 119 32.920 15.565 16.637 1.00 36.21 O
5ATOM 816 CB AARG A 119 32.441 15.211 14.037 0.50 34.55 C
6ATOM 817 CB BARG A 119 32.145 15.106 14.033 0.50 34.28 C
7ATOM 818 CG AARG A 119 31.785 14.470 12.894 0.50 34.42 C
8ATOM 819 CG BARG A 119 31.511 14.870 12.670 0.50 33.10 C
9ATOM 820 CD AARG A 119 32.831 14.137 11.844 0.50 31.29 C
10ATOM 821 CD BARG A 119 32.183 13.779 11.867 0.50 29.12 C
11ATOM 822 NE AARG A 119 32.779 15.183 10.842 0.50 30.39 N
12ATOM 823 NE BARG A 119 31.239 13.328 10.868 0.50 27.73 N
13ATOM 824 CZ AARG A 119 33.598 16.233 10.751 0.50 27.44 C
14ATOM 825 CZ BARG A 119 31.217 12.113 10.329 0.50 26.91 C
15ATOM 826 NH1AARG A 119 34.654 16.422 11.573 0.50 28.45 N
16ATOM 827 NH1BARG A 119 32.123 11.182 10.695 0.50 23.52 N
17ATOM 828 NH2AARG A 119 33.357 17.080 9.788 0.50 18.54 N
18ATOM 829 NH2BARG A 119 30.263 11.846 9.442 0.50 19.86 N
If you have ever peroformed some molecular dynamic simulations, you may have encountered some files like md.pdb
. You can see 3 highlighted hydrogen atoms named HD11
, HD12
, and HD13
. However, this is NOT a standard format.
In standard PDB format, the 4th character is actually called location indicator. In the example 3kab.pdb
, you can see that in residue ARG119, there are two sets of atoms with location indicators A
and B
. The location indicator is used to distinguish atoms with the same name in the same residue. The conformation A
and B
have a ratio given by occupancy factor, i.e. 0.5
:0.5
. The 2 conformations are shown below.

Therefore, sometimes it is difficult to distinguish the location indicator from the long atom name. In the example md.pdb
, the atom names are actually HD11
, HD12
, and HD13
without location indicators, but they can also be interpreted as an atom with atom name HD1
with location indicator 1
, 2
, and 3
. Therefore, when you are dealing with PDB files from X-ray experiments, this must be taken into account.
pdbtop can recognize some location indicators and atom names, and gives some suggestions. For example,
$ pdbtop check -i 3kab.pdb -o new # Read the molecule from a PDB file ... Warning: The atom CA in resiude ARG119 at chain A has an occupancy of 0.500. Probably, only 1 of atom CA812 and CA813 can be kept! Warning: The atom CA in resiude ARG119 at chain A has an occupancy of 0.500. Warning: The atom CB in resiude ARG119 at chain A has an occupancy of 0.500. Probably, only 1 of atom CB816 and CB817 can be kept! Warning: The atom CB in resiude ARG119 at chain A has an occupancy of 0.500. Warning: The atom CG in resiude ARG119 at chain A has an occupancy of 0.500. Probably, only 1 of atom CG818 and CG819 can be kept! Warning: The atom CG in resiude ARG119 at chain A has an occupancy of 0.500. ...
You can see that pdbtop has realized that there are 2 sets of atoms with the same name in the same residue, like ARG119. But, pdbtop does NOT do modifications. You have to do it manually. For example, for this ARG119, we only keep the A
set of conformations, by deleting the B
set of conformations:
1ATOM 811 N ARG A 119 31.491 17.524 14.015 1.00 34.28 N
2ATOM 812 CA ARG A 119 31.582 16.256 14.775 0.50 35.04 C
3ATOM 814 C ARG A 119 32.251 16.478 16.124 1.00 35.47 C
4ATOM 815 O ARG A 119 32.920 15.565 16.637 1.00 36.21 O
5ATOM 816 CB ARG A 119 32.441 15.211 14.037 0.50 34.55 C
6ATOM 818 CG ARG A 119 31.785 14.470 12.894 0.50 34.42 C
7ATOM 820 CD ARG A 119 32.831 14.137 11.844 0.50 31.29 C
8ATOM 822 NE ARG A 119 32.779 15.183 10.842 0.50 30.39 N
9ATOM 824 CZ ARG A 119 33.598 16.233 10.751 0.50 27.44 C
10ATOM 826 NH1 ARG A 119 34.654 16.422 11.573 0.50 28.45 N
11ATOM 828 NH2 ARG A 119 33.357 17.080 9.788 0.50 18.54 N
Note that we also delete A
.
1.2.2. Different Name for the Same Thing
Due to some reasons, some atoms or residues have slightly different names in different PDB files. For example, the residue name of amino acid “histidine” can be HIS
or HSE
; the atom name of \(\delta\)C in isoleucine ILE
can be CD1
or CD
. This is not a big issue. pdbtop will rename them consistently in the output file.
1.2.3. Element Symbols
In many PDB files, the element symbol is not given. In this case, pdbtop will add the element symbol to each atom. For example,
1ATOM 181 HA SER A 18 29.928 -4.431 43.044 1.00 0.00 H H
2ATOM 182 C SER A 18 31.411 -5.775 42.354 1.00 60.56 C C
3ATOM 183 O SER A 18 30.902 -6.728 42.936 1.00 60.65 O O
Unfortunatelly, the element symbol CANNOT be always determined from the atom name. For example, the atom name CL
can be carbon or chlorine. In this case, you have to manually correct the element symbol!
1.2.4. Models in the Same File
In a PDB file, there can be multiple models. The models are seperated by MODEL
and ENDMDL
. For exapmle,
1MODEL 1
2ATOM 1 N LYS A 267 26.791 -7.054 -26.130 1.00 0.00 N
3ATOM 2 CA LYS A 267 27.796 -6.500 -27.025 1.00 0.00 C
4...
5ATOM 708 HD3 PRO A 312 5.473 -10.093 -12.553 1.00 0.00 H
6TER 709 PRO A 312
7ENDMDL
8MODEL 2
9ATOM 1 N LYS A 267 21.011 -23.102 -13.354 1.00 0.00 N
10ATOM 2 CA LYS A 267 19.831 -22.754 -12.570 1.00 0.00 C
11...
12ATOM 706 HG3 PRO A 312 2.202 -3.813 -8.777 1.00 0.00 H
13ATOM 707 HD2 PRO A 312 -0.115 -5.325 -8.685 1.00 0.00 H
14ATOM 708 HD3 PRO A 312 1.253 -5.649 -9.765 1.00 0.00 H
15TER 709 PRO A 312
16ENDMDL
When pdbtop detect that there are several models, it will save each model in a file. You can then treat the model you need with pdbtop again. For example,
$ pdbtop check -i 2mz7.pdb -o 2mz7-model
Read: 2mz7.pdb
There are 20 models in "2mz7.pdb".
Each of them is saved to "2mz7-model-X.pdb".
$ ls
2mz7-model-1.pdb 2mz7-model-13.pdb 2mz7-model-17.pdb 2mz7-model-20.pdb 2mz7-model-6.pdb 2mz7.pdb
2mz7-model-10.pdb 2mz7-model-14.pdb 2mz7-model-18.pdb 2mz7-model-3.pdb 2mz7-model-7.pdb 2mz7-model-11.pdb
2mz7-model-15.pdb 2mz7-model-19.pdb 2mz7-model-4.pdb 2mz7-model-8.pdb 2mz7-model-12.pdb 2mz7-model-16.pdb
2mz7-model-2.pdb 2mz7-model-5.pdb 2mz7-model-9.pdb