GROMACS Tutorial - dcTMD 估算非平衡状态的下自由能

CocoCream

已于 2023-11-09 14:58:52 修改

阅读量161

点赞数

分类专栏： GROMACS 文章标签： Gromacs python

于 2023-10-29 23:11:29 首次发布

GROMACS 专栏收录该内容

19 篇文章 62 订阅

订阅专栏

dcTMD

Introduction
Step 1. Create trajectories with Gromacs
Step 2.1. dcTMD via Work
Step 2.2. dcTMD via Force
Step 2.3. Command Line Interface 指令界面

😇本教程为基于 Multisecond ligand dissociation dynamics from atomistic simulations 第二个模拟体系 benzamidine-trypsin 编写，意在指导读者运行蛋白质-配体的dcTMD（牵引动力学模拟）模拟体系和后续ΔG(x)的估算等。本教程所用到的软件/脚本均可在Stock Lab课题组的网站获得😇。“😇”内的内容为译者添加。

Introduction

该 python 包有助于根据耗散校正的牵引动力学模拟（dcTMD）方法分析目标分子动力学轨迹。TMD模拟通过constraint设置的拉力强制配体与蛋白质发生解结合事件。利用dcTMD，可以估计沿着反应坐标的自由能分布和摩擦系数。有关方法论概述，请参阅原作者的文章 https://pubs.acs.org/doi/full/10.1021/acs.jctc.8b00835。

请注意：

此软件包仍处于测试阶段。如果遇到任何错误，还请予以汇报。同时如果您使用了该软件包，恳请您引用这些文章：

S. Wolf, and G. Stock, Targeted molecular dynamics calculations of free energy profiles using a nonequilibrium friction correction., Journal of chemical theory and computation (2018)

V. Tänzel, and M. Jäger, and S. Wolf, Dissipation Corrected Targeted Molecular Dynamics, in preparation (2022)

您可通过下述指令安装：

python3 -m pip install git+ssh://git@github.com/moldyn/dcTMD.git
#OR
pip install dcTMD

Step 1. Create trajectories with Gromacs

下载此链接的教程包 😇GitHub上的，可能需要魔法😇，解压：

tar -xzvf ./tutorial_files.tar

一般来说。对于您自己项目中的输入结构，我们建议您进行至少10ns长度的初始NPT平衡模拟 😇10ns雀食很保险😇。在这里，我们已经为你做了这件事，并产生了一个平衡结构。现在你将需要一堆具有不同初始速度分布的平衡轨迹（最佳在100-200个之间，但这里我们仅演示10个）。为此，使用以下脚本生成和运行具有不同初始速度分布的 .tpr 文件：

mkdir equib
cd equib/

for i in {000..009}
do
gmx grompp -f ../3ptb_AMBER99SB_ben_pushEQUIBRUN.mdp -c ../3ptb_AMBER99SB_ben.gro -r ../3ptb_AMBER99SB_ben.gro -p ../3ptb_AMBER99SB_ben.top -n ../3ptb_AMBER99SB_ben.ndx -o 3ptb_AMBER99SB_ben_pushEQUIBRUN_"$i".tpr -maxwarn 1 
gmx mdrun -v -deffnm 3ptb_AMBER99SB_ben_pushEQUIBRUN_"$i"
done

由于只有 0.1 ns 长度的模拟，因此它们应该在相当短的时间内完成。然后通过以下方式为生成牵引模拟：

cd ..
mkdir v0.001
cd v0.001/

for i in {000..009}
do
gmx grompp -f ../3ptb_AMBER99SB_ben_pushRUN_v0.001.mdp -c ../equib/3ptb_AMBER99SB_ben_pushEQUIBRUN_"$i".gro -p ../3ptb_AMBER99SB_ben.top -n ../3ptb_AMBER99SB_ben.ndx -o 3ptb_AMBER99SB_ben_pushRUN_0.001_"$i".tpr
gmx mdrun -v -deffnm 3ptb_AMBER99SB_ben_pushRUN_0.001_"$i"
done

请注意，符号_0.001代表以 0.001 nm/ps，即 1 m/s 进行牵引。根据我们目前的经验，这是一个最佳速度，在缓慢拉动和计算工作量最小之间进行了最佳权衡。最终3ptb_AMBER99SB_ben_pushRUN_0.001_*_pullf.xvg内容类似下图：
在这里插入图片描述

Step 2.1. dcTMD via Work

从约束拉动模拟中分析非平衡力-时间轨迹的一种方法是首先计算功，然后估计自由能。该方法在WorkEstimator类中实现，并且在计算上比计算力自相关函数（在ForceEstimator类别中实现）更高效，因为该类别通过力-时间轨迹的积分来处理工作数据，使得在获得相似结果的同时显著降低分辨率。因此，我们建议将这种方法用于大型数据集 😇以下内容请自行在Python环境中编写（例如使用 pycharm 或 jupyter notbook），原教程使用 jupyter notbook 进行演示，此处限于翻译环境不再展示每个 cell 的输出。需要注意的是，这个程序包也有指令界面，如果你更喜欢 bash 指令这样的操作方式，可以简单看看这一部分，然后去看Step 2.3. Command Line Interface 指令界面😇

0. Load packages and define variables

# load packages
import numpy as np
import dcTMD
from dcTMD.storing import WorkSet
from dcTMD.dcTMD import WorkEstimator
from dcTMD.utils import plotting

# define variables
velocity = 0.001
res = 1
verbose = True
temperature = 300

1. Create a work set

要计算自由能和摩擦估计，需要一个用于储存功（work）数据集（set）。它包含综合力-时间轨迹、生成包含文件名的数组。这可以通过函数dcTMD.io.load_pullf()来完成，该函数以 glob 模式或文件名内包含 pullf 的文件作为参数。

# load the files
pullf_files = '../../tests/testdata/pullf_filenames.dat' # or
pullf_files = '../../tests/testdata/*pullf.xvg'
filenames = dcTMD.io.load_pullf(pullf_files)

filenames

随后功集是通过 WorkSet 实例来创建的。分辨率参数控制数据集的步长。在对力-时间轨迹进行积分之后执行缩减。对于长轨迹，例如3500000帧，建议将 resolution 分辨率值设置为 >1000，以确保不要超过您的硬件。

# create WorkSet instance
workset = WorkSet(velocity=velocity,
                  resolution=res,
                  verbose=False,
                  )
workset

# fit/fill workset
workset.fit(filenames)
# save workset
#dcTMD.storing.save('my_workset', workset)

2. Check normality of work distribution

dcTMD 需要满足的主要条件之一是功的正态分布。这可以通过不同的方法进行检查，例如绘制功-时间的轨迹、不同x位置的正态性检查、Kolmogorov-Smirnov 检验、Shapiro-Wilk 检验。注意：如果功的分布不正常，则会影响结果。路径分离是必要的。有关路径分离的理论，请参阅…😇原作者也没给出，您已进入未知领域🦹😇

# plot workset
import matplotlib.pyplot as plt
from dcTMD.utils import plotting

fig, ax = plt.subplots()
plotting.plot_worklines(workset.position_, workset.work_, ax)
plt.show()

# check if work distribution follows a normal distribution
from scipy.stats import kstest, shapiro
from dcTMD.utils import plotting

index = [5000, 15000]  #这个参数对应我们的模拟时间或反应距离（二者是正比关系嘛），这里即为 0.5nm-1.5nm
x = workset.position_

fig, axs = plotting.plot_worknormalitychecks(x, workset.work_, index)

for i, p in enumerate(index):
    # Shapiro-Wilk Test
    shapiro_test = shapiro(workset.work_[:,p])
    print(f'shapiro wilkins results at x={x[p]} is {shapiro_test}')
    # Kolmogorov-Smirnov Test
    kstest_test = kstest(workset.work_[:,p], 'norm')
    print(f'Kolmogorov-Smirnov results at x={x[p]} is {kstest_test}')

结果类似下图：
在这里插入图片描述

3. Derive estimates from workset and get result

此处需要先创建WorkEstimator实例，然后使用WorkEstimator实例拟合以前创建的功集。

# create WorkEstimator instance
workestimator = WorkEstimator(temperature)
# fit existing workset
# or load an existing workset
# workset = dcTMD.storing.load(my_workset)
workestimator.fit(workset)
vars(workestimator) #这句在实际应用中可以不写

在软件包中，有几个简单的绘图函数来帮助你获得结果的大致图像。例如plot_dcTMD_results()。而我们一般还要需要对摩擦估计进行平滑处理，这可以通过dcTMD.utils.smooting.gaussfilter_friction()或dcTMD.WorkEstimator.smooth_friction(sigma, mode)，sigma 是以 nm 为单位的高斯分步的标准偏差😇或者说平滑窗口大小😇，mode 参数决定了如何将输入阵列扩展到其边界之外。注意：使用大数据集和大平滑窗口可能导致计算时间过长。

# simply plot dcTMD results
fig, axs = plotting.plot_dcTMD_results(workestimator.position_,
                                      workestimator,
                                      workestimator.friction_)

plt.show()

# smooth friction and plot results
workestimator.smooth_friction(sigma=0.1, mode='nearest')

fig, axs = plotting.plot_dcTMD_results(workestimator.position_,
                                      workestimator,
                                      workestimator.friction_smooth_)
plt.show()

在这里插入图片描述

使用不同的平滑窗口大小和模式会显著更改结果。由于模式中的边界处理方式不同，mode=‘earest’会导致对右边界的高估（模拟结束），而mode=‘reflect’会导致左边界偏低。以下是一些示例：

# smooth friction and plot results

workestimator.smooth_friction(sigma=0.1, mode='nearest')

fig, axs = plt.subplots()

plotting.plot_Gamma(workestimator.position_,
                    workestimator.friction_smooth_,
                    axs,
                    label='nearest 0.1nm'
                    )

# using different smoothing windows and modes changes the results significantly
smooth_friction = dcTMD.utils.gaussfilter_friction(workestimator.friction_,
                                                   workestimator.position_,
                                                   sigma=.01,
                                                   mode='reflect',
                                                   )
axs.plot(workestimator.position_, smooth_friction, label='reflect .01nm')
smooth_friction = dcTMD.utils.gaussfilter_friction(workestimator.friction_,
                                                   workestimator.position_,
                                                   sigma=.01, 
                                                   mode='nearest',
                                                   )
axs.plot(workestimator.position_, smooth_friction, label='nearest .01nm')
smooth_friction = dcTMD.utils.gaussfilter_friction(workestimator.friction_,
                                                   workestimator.position_,
                                                   sigma=.1,
                                                   mode='reflect',
                                                   )
                                                   
axs.plot(workestimator.position_, smooth_friction, label='reflect .1nm')
axs.legend()

plt.show()

😇下图左侧为教程数据，中间为译者重新计算的数据，右侧为原作者已发表的文章中的数据，不知为何教程数据差别挺大的😇

在这里插入图片描述

4. Error estimation

结果的误差估计是通过自举法实现的，下面提供两种不同的模式：

n_resamples = 1000

# bootstrapping error in mode std
mode = 'std'
workestimator.estimate_free_energy_errors(n_resamples, mode)
x = workestimator.position_
dG = workestimator.dG_
s_dG = workestimator.s_dG_

fig, ax = plt.subplots()
plotting.plot_dG(x, dG, ax)
ax.fill_between(x, dG-s_dG, dG+s_dG, alpha=0.5)
plt.show()

# bootstrapping error in with confidence interval 
# this gives a lower and upperbound estimate
confidence_interval = 0.9
mode = confidence_interval
workestimator.estimate_free_energy_errors(n_resamples, mode)

x = workestimator.position_
dG = workestimator.dG_
s_dG = workestimator.s_dG_

fig, ax = plt.subplots()
plotting.plot_dG(x, dG, ax)
plt.fill_between(x, s_dG[0], s_dG[1], alpha=0.5)
plt.show()

结果如下图：

在这里插入图片描述

5. Save and load results

两种保存方式：

# save forceestimator instance
dcTMD.storing.save('my_workestimator', workestimator)

# save data as .npz and .dat file
outname = 'my_workestimator_results'
dcTMD.io.write_output(outname, workestimator)

results = np.load(f'{outname}_N{len(workestimator.names_)}_dG.npz')

results.files

Step 2.2. dcTMD via Force

😇因为两种方法较为相似，所以译者对这一部分进行了整合及概括。首先需要加载包和设定参数。随后类似于使用功作为输入，使用力作为输入时，为了计算自由能和摩擦估计，需要一个包含所有力-时间轨迹的力集。这可以通过函数dcTMD.io.load_pullf()来完成，该函数以 glob 模式或文件名内包含pullf的文件作为参数。随后ForceSet实例来创建力集；利用forceset.fit将此力集拟合为相应的功；再使用ForceEstimator完成自由能等的估算😇

# load packages
import numpy as np
from dcTMD.dcTMD import ForceEstimator
from dcTMD.storing import ForceSet, load
import dcTMD
import matplotlib.pyplot as plt
from dcTMD.utils import plotting

# define variables
velocity = 0.001
res = 1
verbose = True
temperature = 300

# load the files
pullf_files = '../../tests/testdata/pullf_filenames.dat'
# OR
pullf_files = '../../tests/testdata/*pullf.xvg'
filenames = dcTMD.io.load_pullf(pullf_files)

# create ForceSet instance
forceset = ForceSet(velocity=velocity,
                  resolution=res,
                  verbose=verbose,
                  )
# fit/fill workset
forceset.fit(filenames)
# save workset
#dcTMD.storing.save('my_forceset', forceset)

# plot workset
fig, ax = plt.subplots()
plotting.plot_worklines(forceset.position_, forceset.work_, ax)
plt.show()

# check if work distribution follows a normal distribution
from scipy.stats import kstest, shapiro
from dcTMD.utils import plotting
index = [5000, 15000]     #这个参数对应我们的模拟时间或反应距离（二者是正比关系嘛），这里即为 0.5nm-1.5nm
x = forceset.position_
fig, axs = plotting.plot_worknormalitychecks(x, forceset.work_, index)
for i, p in enumerate(index):
    # Shapiro-Wilk Test
    shapiro_test = shapiro(forceset.work_[:,p])
    print(f'shapiro wilkins results at x={x[p]} is {shapiro_test}')
    # Kolmogorov-Smirnov Test
    kstest_test = kstest(forceset.work_[:,p], 'norm')
    print(f'Kolmogorov-Smirnov results at x={x[p]} is {kstest_test}')

# Instantiate a ForceEstimator instance and fit it with the ForceSet instance
forceestimator = ForceEstimator(temperature)
forceestimator.fit(forceset)
vars(forceestimator)

# plot dcTMD results
fig, axs = plotting.plot_dcTMD_results(forceestimator.position_,
                                      forceestimator,
                                      forceestimator.friction_)
plt.show()
# smooth friction and plot results
forceestimator.smooth_friction(sigma=0.1, mode='nearest')
fig, axs = plotting.plot_dcTMD_results(forceestimator.position_,
                                      forceestimator,
                                      forceestimator.friction_smooth_)
plt.show()

# different smoothing setting
fig, axs = plt.subplots()
plotting.plot_Gamma(forceestimator.position_,
                    forceestimator.friction_smooth_,
                    axs,
                    label='nearest 0.1nm'
                    )
# using different smoothing windows and modes changes the results significantly
smooth_friction = dcTMD.utils.gaussfilter_friction(forceestimator.friction_,
                                                   forceestimator.position_,
                                                   sigma=.01,
                                                   mode='reflect',
                                                   )
axs.plot(forceestimator.position_, smooth_friction, label='reflect .01nm')
smooth_friction = dcTMD.utils.gaussfilter_friction(forceestimator.friction_,
                                                   forceestimator.position_,
                                                   sigma=.01, 
                                                   mode='nearest',
                                                   )
axs.plot(forceestimator.position_, smooth_friction, label='nearest .01nm')
smooth_friction = dcTMD.utils.gaussfilter_friction(forceestimator.friction_,
                                                   forceestimator.position_,
                                                   sigma=.1,
                                                   mode='reflect',
                                                   )                                            
axs.plot(forceestimator.position_, smooth_friction, label='reflect .1nm')
axs.legend()
plt.show()

# save forceestimator instance
dcTMD.storing.save('my_forceestimator', forceestimator)
# OR save data as .npz and .dat file
outname = 'my_forceestimator_results'
dcTMD.io.write_output(outname, forceestimator)

results = np.load(f'{outname}_N{len(forceestimator.names_)}_dG.npz')
results.files

Step 2.3. Command Line Interface 指令界面

我们的软件包具有一个名为 “dcTMD” 的Python命令行界面（CLI）脚本。脚本加载力文件，生成功/力集，并计算平均功、耗散功、自由能和摩擦系数。输出保存为 .npz 和 .dat 文件，可以使用--save_dataset选项分别保存功/力集。此外，使用--plot选项可以生成结果的概览图。CLI还提供了用于调试和误差分析的选项。要获得所有模块的概述，只需在shell中运行以下命令。

python -m dcTMD

内容如下：

Usage: python -m dcTMD [OPTIONS]

  ------------------------- |         dcTMD         |  ------------------------- 
  Calculate free energy and friction for given constraint force files.

  Analysis tools for dissipation-corrected targeted molecular dynamics, which
  is an enhanced sampling method to enforce rare events in biomolecular
  systems. When publishing results gained with this python package, please
  cite the following publications: (1) Tänzel, Victor and Jäger, Miriam and
  Wolf, Steffen in preparation. (2) Wolf, Steffen, and Gerhard Stock.
  "Targeted molecular dynamics calculations of free energy profiles using a
  nonequilibrium friction correction." Journal of chemical theory and
  computation 14.12 (2018): 6175- 6182.

Options:
  -m, --mode [work|force]  Use either work or force autocovariance function to
                           calculatedcTMD quantities.  [default: work;
                           required]
  -f, --file TEXT          Input: File containing list of all constraint force
                           file namesor glob pattern e.g."*.xvg" to generate a
                           list of all constraint force files using
                           glob.glob()  [required]
  -o, --outname PATH       Output: Path/prefix of output names.
  -T, --temperature FLOAT  Simulation temperature in K.  [required]
  -vel, --velocity FLOAT   Pulling velocity in nm/ps.  [required]
  --res INTEGER            Striding to reduce size of returned free energy and
                           friction  [default: 1]
  -s, --sigma FLOAT        Standard deviation of gaussian filter in nm.
  --resamples INTEGER      Number of resamples used in optional bootstrapping.
                           This is only available in mode work
  -v, --verbose            Enable verbose mode.  [default: False]
  -p, --plot               Plots free energy and smoothed friction.  [default:
                           False]
  -sd, --save_dataset      Save the Work/ForceSet class to file.  [default:
                           False]
  --help                   Show this message and exit.

例如：

python -m dcTMD -f '*.xvg' -o path/to/output/name  -T 290.5 -v 0.001 --verbose --plot --save_dataset

😇意味着将当前文件夹下所有以 .xvg 为后缀的文件导入为约束力文件，输出到path/to/output/name文件夹内，温度设置为 290.5，牵引速度为 0.001，verbose 模式进行平滑和误差分析等（像2.1. 3/4 一样，把所有设置都输出一遍），绘制概略图，保存功/力集，其余选项采用默认参数😇

CocoCream

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
GROMACS Tutorial - dcTMD 估算非平衡状态的下自由能

本教程基于 [Multisecond ligand dissociation dynamics from atomistic simulations](https://www.nature.com/articles/s41467-020-16655-1) ^1^ 第二个模拟体系 benzamidine-trypsin 编写，意在指导读者运行蛋白质-配体的模拟体系，以及dcTMD（牵引动力学模拟）和后续ΔG(x)等的计算。本教程所用到的软件/脚本均可在[Stock Lab课题组的网站]找到
复制链接

扫一扫