1. check

In this task, pdbtop will check the input molecule and do some corrections if necessary.

1.1. Arguments

-i, --in

Mandatory

Yes

Argument

PDB filename or XYZ filename

Default

None

Give the input molecule.

-o, --out

Mandatory

No

Argument

A filename prefix

Default

None

Give the output filename prefix. If not given, the output will be written to x-check.pdb.

Example:

$ pdbtop check -i 3kab.pdb -o new # Read the molecule from a PDB file

It will output the following information:

...
Warning: The residue name of the 1112-th atom is changed from "HIS" to "HSE".
Warning: The atom name of the 1120-th atom is changed from "CD1" to "CD".
Warning: The atom name of the 1128-th atom is changed from "CD1" to "CD".
Warning: The atom name of the 1164-th atom is "OXT". It is probably a non-standard atom in C-terminus and is deleted.
...
Write PDB: new.pdb

The output indicates that some residues and atoms are renamed, and some atoms are deleted. The corrected molecule will be written to new.pdb. Usually, pdbtop can do these corrections in the right way. However, if you have special requirements, you have to correct the input molecule manually. So, we strongly recommend you read all the Warning information carefully.

1.2. Theoretical Background

PDB files obtained from https://www.rcsb.org/ are often generated from X-ray experiments. The PDB file format is not always suitable to be used in computations and often contains ambiguouities that CANNOT be treated automatically. You have to treat some issues manually. This section will explain what pdbtop does in the check task.

1.2.1. Long Atom Names and Location Indicators

Below are two examples of PDB files:

md.pdb, a PDB file for molecular dynamic simulation
 1ATOM   2225  CG  LEU A 160      37.975  11.053  20.429  1.00 32.24           C
 2ATOM   2226  HG  LEU A 160      38.934  11.441  20.009  1.00  0.00         H H
 3ATOM   2227  CD1 LEU A 160      38.210   9.708  21.137  1.00 34.92           C
 4ATOM   2228  HD11LEU A 160      38.574   8.946  20.415  1.00  0.00         H H
 5ATOM   2229  HD12LEU A 160      38.969   9.817  21.942  1.00  0.00         H H
 6ATOM   2230  HD13LEU A 160      37.268   9.334  21.591  1.00  0.00         H H
 7ATOM   2231  CD2 LEU A 160      36.995  10.852  19.267  1.00 33.22           C
 8ATOM   2232  HD21LEU A 160      37.422  10.150  18.519  1.00  0.00         H H
 9ATOM   2233  HD22LEU A 160      36.030  10.429  19.613  1.00  0.00         H H
10ATOM   2234  HD23LEU A 160      36.791  11.819  18.759  1.00  0.00         H H

If you have ever peroformed some molecular dynamic simulations, you may have encountered some files like md.pdb. You can see 3 highlighted hydrogen atoms named HD11, HD12, and HD13. However, this is NOT a standard format.

In standard PDB format, the 4th character is actually called location indicator. In the example 3kab.pdb, you can see that in residue ARG119, there are two sets of atoms with location indicators A and B. The location indicator is used to distinguish atoms with the same name in the same residue. The conformation A and B have a ratio given by occupancy factor, i.e. 0.5:0.5. The 2 conformations are shown below.

_images/p1.png

Therefore, sometimes it is difficult to distinguish the location indicator from the long atom name. In the example md.pdb, the atom names are actually HD11, HD12, and HD13 without location indicators, but they can also be interpreted as an atom with atom name HD1 with location indicator 1, 2, and 3. Therefore, when you are dealing with PDB files from X-ray experiments, this must be taken into account.

pdbtop can recognize some location indicators and atom names, and gives some suggestions. For example,

$ pdbtop check -i 3kab.pdb -o new # Read the molecule from a PDB file
...
Warning: The atom CA in resiude ARG119 at chain A has an occupancy of 0.500. Probably, only 1 of atom CA812 and CA813 can be kept!
Warning: The atom CA in resiude ARG119 at chain A has an occupancy of 0.500.
Warning: The atom CB in resiude ARG119 at chain A has an occupancy of 0.500. Probably, only 1 of atom CB816 and CB817 can be kept!
Warning: The atom CB in resiude ARG119 at chain A has an occupancy of 0.500.
Warning: The atom CG in resiude ARG119 at chain A has an occupancy of 0.500. Probably, only 1 of atom CG818 and CG819 can be kept!
Warning: The atom CG in resiude ARG119 at chain A has an occupancy of 0.500.
...

You can see that pdbtop has realized that there are 2 sets of atoms with the same name in the same residue, like ARG119. But, pdbtop does NOT do modifications. You have to do it manually. For example, for this ARG119, we only keep the A set of conformations, by deleting the B set of conformations:

3kab.pdb, a PDB file from RCSB PDB
 1ATOM    811  N   ARG A 119      31.491  17.524  14.015  1.00 34.28           N
 2ATOM    812  CA  ARG A 119      31.582  16.256  14.775  0.50 35.04           C
 3ATOM    814  C   ARG A 119      32.251  16.478  16.124  1.00 35.47           C
 4ATOM    815  O   ARG A 119      32.920  15.565  16.637  1.00 36.21           O
 5ATOM    816  CB  ARG A 119      32.441  15.211  14.037  0.50 34.55           C
 6ATOM    818  CG  ARG A 119      31.785  14.470  12.894  0.50 34.42           C
 7ATOM    820  CD  ARG A 119      32.831  14.137  11.844  0.50 31.29           C
 8ATOM    822  NE  ARG A 119      32.779  15.183  10.842  0.50 30.39           N
 9ATOM    824  CZ  ARG A 119      33.598  16.233  10.751  0.50 27.44           C
10ATOM    826  NH1 ARG A 119      34.654  16.422  11.573  0.50 28.45           N
11ATOM    828  NH2 ARG A 119      33.357  17.080   9.788  0.50 18.54           N

Note that we also delete A.

1.2.2. Different Name for the Same Thing

Due to some reasons, some atoms or residues have slightly different names in different PDB files. For example, the residue name of amino acid “histidine” can be HIS or HSE; the atom name of \(\delta\)C in isoleucine ILE can be CD1 or CD. This is not a big issue. pdbtop will rename them consistently in the output file.

1.2.3. Element Symbols

In many PDB files, the element symbol is not given. In this case, pdbtop will add the element symbol to each atom. For example,

1ATOM    181  HA  SER A  18      29.928  -4.431  43.044  1.00  0.00         H H
2ATOM    182  C   SER A  18      31.411  -5.775  42.354  1.00 60.56         C C
3ATOM    183  O   SER A  18      30.902  -6.728  42.936  1.00 60.65         O O

Unfortunatelly, the element symbol CANNOT be always determined from the atom name. For example, the atom name CL can be carbon or chlorine. In this case, you have to manually correct the element symbol!

1.2.4. Models in the Same File

In a PDB file, there can be multiple models. The models are seperated by MODEL and ENDMDL. For exapmle,

2mz7.pdb, a PDB file from RCSB PDB
 1MODEL        1
 2ATOM      1  N   LYS A 267      26.791  -7.054 -26.130  1.00  0.00           N
 3ATOM      2  CA  LYS A 267      27.796  -6.500 -27.025  1.00  0.00           C
 4...
 5ATOM    708  HD3 PRO A 312       5.473 -10.093 -12.553  1.00  0.00           H
 6TER     709      PRO A 312
 7ENDMDL
 8MODEL        2
 9ATOM      1  N   LYS A 267      21.011 -23.102 -13.354  1.00  0.00           N
10ATOM      2  CA  LYS A 267      19.831 -22.754 -12.570  1.00  0.00           C
11...
12ATOM    706  HG3 PRO A 312       2.202  -3.813  -8.777  1.00  0.00           H
13ATOM    707  HD2 PRO A 312      -0.115  -5.325  -8.685  1.00  0.00           H
14ATOM    708  HD3 PRO A 312       1.253  -5.649  -9.765  1.00  0.00           H
15TER     709      PRO A 312
16ENDMDL

When pdbtop detect that there are several models, it will save each model in a file. You can then treat the model you need with pdbtop again. For example,

$ pdbtop check -i 2mz7.pdb -o 2mz7-model
Read: 2mz7.pdb
There are 20 models in "2mz7.pdb".
Each of them is saved to "2mz7-model-X.pdb".
$ ls
2mz7-model-1.pdb   2mz7-model-13.pdb  2mz7-model-17.pdb  2mz7-model-20.pdb  2mz7-model-6.pdb  2mz7.pdb
2mz7-model-10.pdb  2mz7-model-14.pdb  2mz7-model-18.pdb  2mz7-model-3.pdb   2mz7-model-7.pdb  2mz7-model-11.pdb
2mz7-model-15.pdb  2mz7-model-19.pdb  2mz7-model-4.pdb   2mz7-model-8.pdb   2mz7-model-12.pdb 2mz7-model-16.pdb
2mz7-model-2.pdb   2mz7-model-5.pdb   2mz7-model-9.pdb