GROMOS拓扑（、坐标、轨迹、能量）相关文件解读&手册第5章阅读笔记II

CocoCream

已于 2022-07-12 18:21:21 修改

阅读量8.2k

点赞数 4

分类专栏： GROMACS 文章标签： gromacs

于 2022-04-03 15:05:03 首次发布

本文链接：https://blog.csdn.net/cococream/article/details/123769268

版权

GROMACS 专栏收录该内容

19 篇文章 79 订阅

订阅专栏

本文将介绍一些借助 gromacs 模拟时生成或运用的文件，更细致的内容可以参考官方手册（https://manual.gromacs.org/current/reference-manual/file-formats.html）

molecule.itp(5.7.2)

有些时候，为了方便运用某一个常用分子的拓扑或者使 topol.top 文件结构更加简洁，可以把它单独写入一个 .itp 文件中。这个文件仅包含一个特定分子的信息，可以被任意引用，这样就避免了每次都需要使用pdb2gmx或者重复地复制粘贴。一般力场中都会提供水分子、离子（有些还提供少许脂质分子）的 .itp 文件；此外有些常用小分子（如ATP）的信息会被储存在立场文件夹的 aminoacids.rtp 中，在pdb2gmx过程中与大分子一并生成拓扑。

ff文件夹中所携带的.itp文件

接下来以 spc 水分子为例简单展示 .itp 的结构：

[ moleculetype ]
; molname	nrexcl
SOL		2

[ atoms ]
;   nr   type  resnr residue  atom   cgnr     charge       mass
#ifndef HEAVY_H
     1     OW      1    SOL     OW      1      -0.82   15.99940
     2      H      1    SOL    HW1      1       0.41    1.00800
     3      H      1    SOL    HW2      1       0.41    1.00800
#else
     1     OW      1    SOL     OW      1      -0.82    9.95140
     2      H      1    SOL    HW1      1       0.41    4.03200
     3      H      1    SOL    HW2      1       0.41    4.03200
#endif

#ifndef FLEXIBLE
[ settles ]
; OW	funct  doh	dhh
1	1	0.1	 0.16330

[ exclusions ]
1	2	3
2	1	3
3	1	2
#else
[ bonds ]
; i	j	funct	length	force.c.
1	2	1	0.1	345000	0.1     345000
1	3	1	0.1	345000	0.1     345000
	
[ angles ]
; i	j	k	funct	angle	force.c.
2	1	3	1	109.47	383	109.47	383
#endif

可以发现，.itp 主要由以下几个部分构成：

[ moleculetype ]：定义分子名称和非键排除（nrexcl，排除相邻nrexcl个键的非键相互作用）；
[ atoms ]、[ settles ]、[ exclusions ]、[ bonds ]、[ angles ]等：这一部分是分子的具体信息，对应于相应的[ moleculetype ]，具体含义与设置请见手册表5.5（中文手册（李继存老师组织翻译版）120页）。

由`pdb2gmx`产生的一类文件

在此我们以 Phi29 DNA聚合酶结合ssDNA复合物（PDB:1XHZ）的 chain A & chain E 为例，用 amber99sb-ildn.ff 生成了他们的拓扑和其他一系列文件：
在这里插入图片描述
接下来将细致的讲解他们各自的内容和功能。

topol.top(5.7.1)

拓扑文件是gromacs运行模拟所必需的文件，它提供了模拟体系中所有分子的拓扑结构、力场文件的引用、约束力参数……；拓扑文件必须包含三个层次：

参数级别；这一部分包括了力场设定（详细请见手册表5.4参数大栏（不过他那个表做的确实不怎么好看）），但一般直接使用#include "力场文件路径/forcefield.itp"进行引用（如下）。

; Include forcefield parameters
#include "amber99sb-ildn.ff/forcefield.itp"

分子定义级别：这一部分包含了一个或多个分子（可以是protein/DNA/小分子；如果是多个分子，在使用pdb2gmx时一般会额外生成每个小分子对应的 .itp 文件，然后再于 .top 中进行引用，详细请见下面的代码示例）的定义，包括了[ moleculetype ]、[ atoms ]、[ bond ]、[ pairs ]……（这些定义中的参数如gd_23具体所指的数据都是由上一部分参数级别给出的，也就是对 forcefield.itp 的内容的调用），以及对分子的约束参数（posre.itp）。实际上，.itp 文件可以看做是 .top 文件分子定义级别（针对每单个分子）单独拿出储存的信息，他们形成了一个嵌套式的引用关系（其实 .itp 文件下也可以再嵌套 .itp 文件，请见下面代码示例），这样做一来节省了空间，二来逻辑更加清晰。

topol.top文件：

; Include chain topologies
#include "topol_Protein_chain_A.itp"
#include "topol_DNA_chain_E.itp"

; Include water topology
#include "amber99sb-ildn.ff/tip3p.itp"

#ifdef POSRES_WATER
; Position restraint for each water oxygen
[ position_restraints ]
;  i funct       fcx        fcy        fcz
   1    1       1000       1000       1000
#endif

; Include topology for ions
#include "amber99sb-ildn.ff/ions.itp"

topol_DNA_chain_E.itp文件：

[ moleculetype ]
; Name            nrexcl
DNA_chain_E         3

[ atoms ]
;   nr       type  resnr residue  atom   cgnr     charge       mass  typeB    chargeB      massB
; residue   1 DT  rtp DT5  q -0.3
     1         OH      1     DT    O5'      1    -0.6318         16   ; qtot -0.6318
     2         HO      1     DT    H5T      2     0.4422      1.008   ; qtot -0.1896
     3         CT      1     DT    C5'      3    -0.0069      12.01   ; qtot -0.1965

;******

; Include Position restraint file
#ifdef POSRES
#include "posre_DNA_chain_E.itp"
#endif

对于这一部分有几点需要注意的：(1) 对于每一个[ moleculetype ]，与其对应的[ atoms ]、[ bond ]、[ pairs ]等必须置于其阅读顺序后方，否则将被判断为其他分子的参数或者毫无意义；(2) 分子中原子应从1开始连续编号，相同电荷组的原子必须连续列出。

体系级别：只包含体系的特定信息（[ system ]&[ molecules ]，信息内容不言自明）。

[ system ]
; Name
Protein

[ molecules ]
; Compound        #mols
Protein_chain_A     1
DNA_chain_E         1

posres.itp

这个文件包含了源自 .pdb 文件中对于重原子的位置限制，这也就意味着由pdb2gmx产生的质子是没有位置限制的。通常作为 .top 文件分子定义级别的一部分进行引用（实际上也可以直接书写在 .top 文件中，这对于比较小的（比如水分子）会方便一些）。

[ position_restraints ]
; atom  type      fx      fy      fz
     1     1  1000  1000  1000
     4     1  1000  1000  1000
     7     1  1000  1000  1000

molecule.gro

.gro 文件中涵盖了从 .pdb 得到的分子的位置信息。文件的第一行是体系的名称（我也不太清楚为什么被自动命名成了这个，一般都会是 Protein in water 之类的）；第二行为原子总数；接下来是每个原子的归属和坐标信息；最后一行标注了盒子的三个 box vectors。

GROtesk MACabre and Sinister
 9506
    5PRO      N    1   4.832  -0.760   7.635
    5PRO     H1    2   4.818  -0.841   7.692
    5PRO     H2    3   4.874  -0.688   7.690
    5PRO     CD    4   4.922  -0.794   7.523
    5PRO    HD1    5   4.972  -0.878   7.543
    5PRO    HD2    6   4.987  -0.720   7.507
;          ******
 9.66620  15.03970  19.83250

由`grompp`&`mdrun`产生的文件(待续

在这里插入图片描述
上图显示了这两步产生的一系列文件，其中 .tpr 是由grompp产生的；其余均由mdrun阶段产生。除了 .log 文件外，其余均由二进制格式书写，所以读取的时候需要借助 gmx 的dump指令（一般我们并不需要细致的查看（除 .log 外）这些文件的具体内容，gmx （或其他软件）中一般均有提取相应信息的指令，也会在相应部分提到（所以你可以不必使用dump阅读文件的所有内容，简单看一看我展示的部分片段就足够了））。

gmx dump [-s [<.tpr/.tpb/...>]] [-f [<.xtc/.trr/...>]] [-e [<.edr>]]
         [-cp [<.cpt>]] [-p [<.top>]] [-mtx [<.mtx>]] [-om [<.mdp>]]
         [-nice ] [-[no]nr] [-[no]sys]

具体各选项含义请自行查看 gromacs 手册。我们可以将dump的输出写到 .txt 里，例如：

gmx dump -s md_0_1.tpr | tee tpr.txt

md.log

这是输出的日志文件，根据 .mdp 文件中nstlog = 500的间隔进行写入。里面包含了 gromacs 一些基本信息、输入参数以及模拟进程，并且如果模拟中发生了致命错误，也会被写入 .log 的末尾（代码块仅摘取了部分内容，并不严格对应于上述模块）。

Log file opened on Tue Jul 12 15:54:24 2022
Host: localhost.localdomain  pid: 22244  rank ID: 0  number of ranks:  1
GROMACS:    gmx mdrun, VERSION 5.0.7

GROMACS is written by:
Emile Apol         Rossen Apostolov   Herman J.C. Berendsen Par Bjelkmar
#……
Gromacs version:    VERSION 5.0.7
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled    
#……
Input Parameters:
   integrator                     = md
   tinit                          = 0
   dt                             = 0.002
   nsteps                         = 50000000
   init-step                      = 0
#……
Initializing Domain Decomposition on 40 ranks
Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition
Initial maximum inter charge-group distances:
    two-body bonded interactions: 0.429 nm, LJ-14, atoms 6484 6492
  multi-body bonded interactions: 0.429 nm, Proper Dih., atoms 6484 6492
#……
Started mdrun on rank 0 Tue Jul 12 15:54:25 2022
           Step           Time         Lambda
              0        0.00000        0.00000

   Energies (kJ/mol)
          Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    1.50593e+04    1.89724e+04    9.68484e+02    7.88669e+03    7.82556e+04
        LJ (SR)  Disper. corr.   Coulomb (SR)   Coul. recip.   COM Pull En.
    1.24363e+05   -3.79778e+03   -1.25162e+06    2.40987e+03    3.22628e-14
      Potential    Kinetic En.   Total Energy    Temperature Pres. DC (bar)
   -1.00750e+06    2.03942e+05   -8.03557e+05    3.11421e+02   -7.94998e+01
 Pressure (bar)   Constr. rmsd
    5.19546e+01    2.69715e-05

DD  step 24 load imb.: force 25.4%  pme mesh/force 0.813

At step 25 the performance loss due to force load imbalance is 15.2 %

NOTE: Turning on dynamic load balancing

DD  load balancing is limited by minimum cell size in dimension Y
DD  step 999  vol min/aver 0.743! load imb.: force  1.5%  pme mesh/force 0.952
#……

md.tpr

你可以自己获得相应的 tpr.txt 然后打开观察（如果体系很大的话输出的 .txt 会比二进制的 .tpr 大很多），其中有几个部分：

模拟的初始设置：inputrec、qm-opts等，主要是 .mdp 中输入的；
体系的拓扑信息：topology，读取自 .top 文件；

在大多数情况下我们并不需要查看这个文件的具体内容，除非发生了什么致命错误或者你需要对比两个模拟体系的差异，后者可以使用check指令：

gmx check [-f [<.xtc/.trr/...>]] [-f2 [<.xtc/.trr/...>]]
          [-s1 [<.tpr/.tpb/...>]] [-s2 [<.tpr/.tpb/...>]]
          [-c [<.tpr/.tpb/...>]] [-e [<.edr>]] [-e2 [<.edr>]] [-n [<.ndx>]]
          [-m [<.tex>]] [-nice ] [-vdwfac ] [-bonlo ]
          [-bonhi ] [-[no]rmsd] [-tol ] [-abstol ]
          [-[no]ab] [-lastener ]

像这样：gmx check -s1 top1 -s2 top2

md.xtc & md.trr

这种文件储存了模拟过程中的部分轨迹信息（根据 .mdp 文件中nstxout-compressed = XXX的设置提取相应帧），一般需要一个结构文件（比如 .gro、.pdb）来实现可视化。

其中 .trr （可以）是全精度的轨迹数据，它保留了每个原子确切的坐标值。可以用check进行一个简略的查看：gmx check -f traj.trr

对于 .xtc ，手册是这么说的：“The xtc format is a portable format for trajectories. It uses the xdr routines for writing and reading data which was created for the Unix NFS system.”（xdr 方法请自行百度）。它存储的信息是经过压缩和近似的（可能甚至不是高精度的），主要是通过对坐标缩放（一般是乘以1000）然后四舍五入为整数值；利用原子在序列上接近则通常在空间上也很接近(例如水分子)的特点，对xdr库进行了扩展，使用一个特殊的方式来编写3-D浮点坐标等等。

（似乎可以直接用 C 或 FORTRAN 来处理 xdr，没咋看明白，挖个坑以后在研究

然后可以直接用 VMD 来载入 .xtc 文件查看轨迹。除了配合相应结构文件进行可视化以外，gmx 也有很多指令可以直接操作轨迹文件，比如gmx trjconv：

gmx trjconv [-f [<.xtc/.trr/...>]] [-o [<.xtc/.trr/...>]]
            [-s [<.tpr/.tpb/...>]] [-n [<.ndx>]] [-fr [<.ndx>]]
            [-sub [<.ndx>]] [-drop [<.xvg>]] [-nice ] [-b ]
            [-e ] [-tu ] [-[no]w] [-xvg ] [-skip ]
            [-dt ] [-[no]round] [-dump ] [-t0 ]
            [-timestep ] [-pbc ] [-ur ] [-[no]center]
            [-boxcenter ] [-box ] [-trans ]
            [-shift ] [-fit ] [-ndec ] [-[no]vel]
            [-[no]force] [-trunc ] [-exec ] [-split ]
            [-[no]sep] [-nzero ] [-dropunder ] [-dropover ]
            [-[no]conect]

具体各个选项含义可以在手册里查到，这里就不再赘述，一般常用功能有这么几个：

纠正分子在盒子里的跳跃现象（其中 PBC treatment: none, mol（使分子重心位于盒子内，其余含义类似）, res, atom, cluster, whole, nojump；此处有更详细讲解）：gmx trjconv -s md_0_1.tpr -f md_0_1.xtc -o md_0_1_noPBC.xtc -pbc mol
提取所有帧：gmx trjconv -s md.tpr -f md.xtc -o frame.gro -sep

以及gmx trjcat用来拼接两个轨迹文件（如果你续跑时没有选择将轨迹继续写入先前的轨迹文件的话，这个很有用）：

gmx trjcat [-f [<.xtc/.trr/...> [...]]] [-o [<.xtc/.trr/...> [...]]]
           [-n [<.ndx>]] [-demux [<.xvg>]] [-nice ] [-tu ]
           [-xvg ] [-b ] [-e ] [-dt ] [-[no]vel]
           [-[no]settime] [-[no]sort] [-[no]keeplast] [-[no]overwrite]
           [-[no]cat]

md.cpt

一个暂存文件（check point），可以在模拟结束之后（或者突然终止了）进行续跑。

gmx mdrun -s md_0_1.tpr -cpi md_0_1.cpt -append md_0_2

md.edr

储存了模拟过程中与能量相关的数据（根据 .mdp 文件中nstenergy = 500的间隔进行存储）。转化成 ASCII 之后是这样的：

energy components:
    0  Bond                     (kJ/mol)
    1  U-B                      (kJ/mol)
    2  Proper Dih.              (kJ/mol)
    3  Improper Dih.            (kJ/mol)
    4  CMAP Dih.                (kJ/mol)
    5  LJ-14                    (kJ/mol)
    #……
    #……
                   time:   1.00000e+01         step:          5000
                                             nsteps:          5000
                delta_t:   2.00000e-03    sum steps:            50
               Component        Energy    Av. Energy    Sum Energy
                    Bond   5.61662e+03   6.63002e+05   2.82822e+05
                     U-B   1.61728e+04   2.20891e+06   8.07135e+05
             Proper Dih.   1.04111e+04   4.26090e+05   5.16354e+05
           Improper Dih.   8.96692e+02   1.18386e+05   4.67360e+04
               CMAP Dih.  -2.61315e+03   1.07140e+05  -1.26451e+05
                   LJ-14   4.90394e+03   2.50376e+05   2.47294e+05
    #……

第一部分告诉你所存储信息的组成（你使用gmx energy时提示你进行选择的序号就是从这部分读取的），随后是按所抽提出来的帧的相应数值，文件中已经写得很清晰了。

gmx energy是处理 .edr 文件最主要的命令：

gmx energy [-f [<.edr>]] [-f2 [<.edr>]] [-s [<.tpr/.tpb/...>]] [-o [<.xvg>]]
           [-viol [<.xvg>]] [-pairs [<.xvg>]] [-ora [<.xvg>]] [-ort [<.xvg>]]
           [-oda [<.xvg>]] [-odr [<.xvg>]] [-odt [<.xvg>]] [-oten [<.xvg>]]
           [-corr [<.xvg>]] [-vis [<.xvg>]] [-ravg [<.xvg>]] [-odh [<.xvg>]]
           [-nice ] [-b ] [-e ] [-[no]w] [-xvg ]
           [-[no]fee] [-fetemp ] [-zero ] [-[no]sum] [-[no]dp]
           [-nbmin ] [-nbmax ] [-[no]mutot] [-skip ]
           [-[no]aver] [-nmol ] [-[no]fluct_props] [-[no]driftcorr]
           [-[no]fluc] [-[no]orinst] [-[no]ovec] [-acflen ]
           [-[no]normalize] [-P ] [-fitfn ] [-beginfit ]
           [-endfit ]