dcTMD
😇本教程为基于 Multisecond ligand dissociation dynamics from atomistic simulations 第二个模拟体系 benzamidine-trypsin 编写,意在指导读者运行蛋白质-配体的dcTMD(牵引动力学模拟)模拟体系和后续ΔG(x)的估算等。本教程所用到的软件/脚本均可在Stock Lab课题组的网站获得😇。“😇”内的内容为译者添加。
Introduction
该 python 包有助于根据耗散校正的牵引动力学模拟(dcTMD)方法分析目标分子动力学轨迹。TMD模拟通过constraint
设置的拉力强制配体与蛋白质发生解结合事件。利用dcTMD,可以估计沿着反应坐标的自由能分布和摩擦系数。有关方法论概述,请参阅原作者的文章 https://pubs.acs.org/doi/full/10.1021/acs.jctc.8b00835。
请注意:
此软件包仍处于测试阶段。如果遇到任何错误,还请予以汇报。同时如果您使用了该软件包,恳请您引用这些文章:
S. Wolf, and G. Stock, Targeted molecular dynamics calculations of free energy profiles using a nonequilibrium friction correction., Journal of chemical theory and computation (2018)
V. Tänzel, and M. Jäger, and S. Wolf, Dissipation Corrected Targeted Molecular Dynamics, in preparation (2022)
您可通过下述指令安装:
python3 -m pip install git+ssh://git@github.com/moldyn/dcTMD.git
#OR
pip install dcTMD
Step 1. Create trajectories with Gromacs
下载此链接的教程包 😇GitHub上的,可能需要魔法😇,解压:
tar -xzvf ./tutorial_files.tar
一般来说。对于您自己项目中的输入结构,我们建议您进行至少10ns长度的初始NPT平衡模拟 😇10ns雀食很保险😇。在这里,我们已经为你做了这件事,并产生了一个平衡结构。现在你将需要一堆具有不同初始速度分布的平衡轨迹(最佳在100-200个之间,但这里我们仅演示10个)。为此,使用以下脚本生成和运行具有不同初始速度分布的 .tpr 文件:
mkdir equib
cd equib/
for i in {000..009}
do
gmx grompp -f ../3ptb_AMBER99SB_ben_pushEQUIBRUN.mdp -c ../3ptb_AMBER99SB_ben.gro -r ../3ptb_AMBER99SB_ben.gro -p ../3ptb_AMBER99SB_ben.top -n ../3ptb_AMBER99SB_ben.ndx -o 3ptb_AMBER99SB_ben_pushEQUIBRUN_"$i".tpr -maxwarn 1
gmx mdrun -v -deffnm 3ptb_AMBER99SB_ben_pushEQUIBRUN_"$i"
done
由于只有 0.1 ns 长度的模拟,因此它们应该在相当短的时间内完成。然后通过以下方式为生成牵引模拟:
cd ..
mkdir v0.001
cd v0.001/
for i in {000..009}
do
gmx grompp -f ../3ptb_AMBER99SB_ben_pushRUN_v0.001.mdp -c ../equib/3ptb_AMBER99SB_ben_pushEQUIBRUN_"$i".gro -p ../3ptb_AMBER99SB_ben.top -n ../3ptb_AMBER99SB_ben.ndx -o 3ptb_AMBER99SB_ben_pushRUN_0.001_"$i".tpr
gmx mdrun -v -deffnm 3ptb_AMBER99SB_ben_pushRUN_0.001_"$i"
done
请注意,符号_0.001
代表以 0.001 nm/ps,即 1 m/s 进行牵引。根据我们目前的经验,这是一个最佳速度,在缓慢拉动和计算工作量最小之间进行了最佳权衡。最终3ptb_AMBER99SB_ben_pushRUN_0.001_*_pullf.xvg
内容类似下图:
Step 2.1. dcTMD via Work
从约束拉动模拟中分析非平衡力-时间轨迹的一种方法是首先计算功,然后估计自由能。该方法在WorkEstimator
类中实现,并且在计算上比计算力自相关函数(在ForceEstimator
类别中实现)更高效,因为该类别通过力-时间轨迹的积分来处理工作数据,使得在获得相似结果的同时显著降低分辨率。因此,我们建议将这种方法用于大型数据集 😇以下内容请自行在Python环境中编写(例如使用 pycharm 或 jupyter notbook),原教程使用 jupyter notbook 进行演示,此处限于翻译环境不再展示每个 cell 的输出。需要注意的是,这个程序包也有指令界面,如果你更喜欢 bash 指令这样的操作方式,可以简单看看这一部分,然后去看Step 2.3. Command Line Interface 指令界面😇
0. Load packages and define variables
# load packages
import numpy as np
import dcTMD
from dcTMD.storing import WorkSet
from dcTMD.dcTMD import WorkEstimator
from dcTMD.utils import plotting
# define variables
velocity = 0.001
res = 1
verbose = True
temperature = 300
1. Create a work set
要计算自由能和摩擦估计,需要一个用于储存功(work)数据集(set)。它包含综合力-时间轨迹、生成包含文件名的数组。这可以通过函数dcTMD.io.load_pullf()
来完成,该函数以 glob 模式或文件名内包含 pullf
的文件作为参数。
# load the files
pullf_files = '../../tests/testdata/pullf_filenames.dat' # or
pullf_files = '../../tests/testdata/*pullf.xvg'
filenames = dcTMD.io.load_pullf(pullf_files)
filenames
随后功集是通过 WorkSet
实例来创建的。分辨率参数控制数据集的步长。在对力-时间轨迹进行积分之后执行缩减。对于长轨迹,例如3500000帧,建议将 resolution 分辨率值设置为 >1000,以确保不要超过您的硬件。
# create WorkSet instance
workset = WorkSet(velocity=velocity,
resolution=res,
verbose=False,
)
workset
# fit/fill workset
workset.fit(filenames)
# save workset
#dcTMD.storing.save('my_workset', workset)
2. Check normality of work distribution
dcTMD 需要满足的主要条件之一是功的正态分布。这可以通过不同的方法进行检查,例如绘制功-时间的轨迹、不同x位置的正态性检查、Kolmogorov-Smirnov 检验、Shapiro-Wilk 检验。注意:如果功的分布不正常,则会影响结果。路径分离是必要的。有关路径分离的理论,请参阅…😇原作者也没给出,您已进入未知领域🦹😇
# plot workset
import matplotlib.pyplot as plt
from dcTMD.utils import plotting
fig, ax = plt.subplots()
plotting.plot_worklines(workset.position_, workset.work_, ax)
plt.show()
# check if work distribution follows a normal distribution
from scipy.stats import kstest, shapiro
from dcTMD.utils import plotting
index = [5000, 15000] #这个参数对应我们的模拟时间或反应距离(二者是正比关系嘛),这里即为 0.5nm-1.5nm
x = workset.position_
fig, axs = plotting.plot_worknormalitychecks(x, workset.work_, index)
for i, p in enumerate(index):
# Shapiro-Wilk Test
shapiro_test = shapiro(workset.work_[:,p])
print(f'shapiro wilkins results at x={x[p]} is {shapiro_test}')
# Kolmogorov-Smirnov Test
kstest_test = kstest(workset.work_[:,p], 'norm')
print(f'Kolmogorov-Smirnov results at x={x[p]} is {kstest_test}')
结果类似下图:
3. Derive estimates from workset and get result
此处需要先创建WorkEstimator
实例,然后使用WorkEstimator
实例拟合以前创建的功集。
# create WorkEstimator instance
workestimator = WorkEstimator(temperature)
# fit existing workset
# or load an existing workset
# workset = dcTMD.storing.load(my_workset)
workestimator.fit(workset)
vars(workestimator) #这句在实际应用中可以不写
在软件包中,有几个简单的绘图函数来帮助你获得结果的大致图像。例如plot_dcTMD_results()
。而我们一般还要需要对摩擦估计进行平滑处理,这可以通过dcTMD.utils.smooting.gaussfilter_friction()
或dcTMD.WorkEstimator.smooth_friction(sigma, mode)
,sigma 是以 nm 为单位的高斯分步的标准偏差😇或者说平滑窗口大小😇,mode 参数决定了如何将输入阵列扩展到其边界之外。注意:使用大数据集和大平滑窗口可能导致计算时间过长。
# simply plot dcTMD results
fig, axs = plotting.plot_dcTMD_results(workestimator.position_,
workestimator,
workestimator.friction_)
plt.show()
# smooth friction and plot results
workestimator.smooth_friction(sigma=0.1, mode='nearest')
fig, axs = plotting.plot_dcTMD_results(workestimator.position_,
workestimator,
workestimator.friction_smooth_)
plt.show()
使用不同的平滑窗口大小和模式会显著更改结果。由于模式中的边界处理方式不同,mode=‘earest’
会导致对右边界的高估(模拟结束),而mode=‘reflect’
会导致左边界偏低。以下是一些示例:
# smooth friction and plot results
workestimator.smooth_friction(sigma=0.1, mode='nearest')
fig, axs = plt.subplots()
plotting.plot_Gamma(workestimator.position_,
workestimator.friction_smooth_,
axs,
label='nearest 0.1nm'
)
# using different smoothing windows and modes changes the results significantly
smooth_friction = dcTMD.utils.gaussfilter_friction(workestimator.friction_,
workestimator.position_,
sigma=.01,
mode='reflect',
)
axs.plot(workestimator.position_, smooth_friction, label='reflect .01nm')
smooth_friction = dcTMD.utils.gaussfilter_friction(workestimator.friction_,
workestimator.position_,
sigma=.01,
mode='nearest',
)
axs.plot(workestimator.position_, smooth_friction, label='nearest .01nm')
smooth_friction = dcTMD.utils.gaussfilter_friction(workestimator.friction_,
workestimator.position_,
sigma=.1,
mode='reflect',
)
axs.plot(workestimator.position_, smooth_friction, label='reflect .1nm')
axs.legend()
plt.show()
😇下图左侧为教程数据,中间为译者重新计算的数据,右侧为原作者已发表的文章中的数据,不知为何教程数据差别挺大的😇
4. Error estimation
结果的误差估计是通过自举法实现的,下面提供两种不同的模式:
n_resamples = 1000
# bootstrapping error in mode std
mode = 'std'
workestimator.estimate_free_energy_errors(n_resamples, mode)
x = workestimator.position_
dG = workestimator.dG_
s_dG = workestimator.s_dG_
fig, ax = plt.subplots()
plotting.plot_dG(x, dG, ax)
ax.fill_between(x, dG-s_dG, dG+s_dG, alpha=0.5)
plt.show()
# bootstrapping error in with confidence interval
# this gives a lower and upperbound estimate
confidence_interval = 0.9
mode = confidence_interval
workestimator.estimate_free_energy_errors(n_resamples, mode)
x = workestimator.position_
dG = workestimator.dG_
s_dG = workestimator.s_dG_
fig, ax = plt.subplots()
plotting.plot_dG(x, dG, ax)
plt.fill_between(x, s_dG[0], s_dG[1], alpha=0.5)
plt.show()
结果如下图:
5. Save and load results
两种保存方式:
# save forceestimator instance
dcTMD.storing.save('my_workestimator', workestimator)
# save data as .npz and .dat file
outname = 'my_workestimator_results'
dcTMD.io.write_output(outname, workestimator)
results = np.load(f'{outname}_N{len(workestimator.names_)}_dG.npz')
results.files
Step 2.2. dcTMD via Force
😇因为两种方法较为相似,所以译者对这一部分进行了整合及概括。首先需要加载包和设定参数。随后类似于使用功作为输入,使用力作为输入时,为了计算自由能和摩擦估计,需要一个包含所有力-时间轨迹的力集。这可以通过函数dcTMD.io.load_pullf()
来完成,该函数以 glob 模式或文件名内包含pullf
的文件作为参数。随后ForceSet
实例来创建力集;利用forceset.fit
将此力集拟合为相应的功;再使用ForceEstimator
完成自由能等的估算😇
# load packages
import numpy as np
from dcTMD.dcTMD import ForceEstimator
from dcTMD.storing import ForceSet, load
import dcTMD
import matplotlib.pyplot as plt
from dcTMD.utils import plotting
# define variables
velocity = 0.001
res = 1
verbose = True
temperature = 300
# load the files
pullf_files = '../../tests/testdata/pullf_filenames.dat'
# OR
pullf_files = '../../tests/testdata/*pullf.xvg'
filenames = dcTMD.io.load_pullf(pullf_files)
# create ForceSet instance
forceset = ForceSet(velocity=velocity,
resolution=res,
verbose=verbose,
)
# fit/fill workset
forceset.fit(filenames)
# save workset
#dcTMD.storing.save('my_forceset', forceset)
# plot workset
fig, ax = plt.subplots()
plotting.plot_worklines(forceset.position_, forceset.work_, ax)
plt.show()
# check if work distribution follows a normal distribution
from scipy.stats import kstest, shapiro
from dcTMD.utils import plotting
index = [5000, 15000] #这个参数对应我们的模拟时间或反应距离(二者是正比关系嘛),这里即为 0.5nm-1.5nm
x = forceset.position_
fig, axs = plotting.plot_worknormalitychecks(x, forceset.work_, index)
for i, p in enumerate(index):
# Shapiro-Wilk Test
shapiro_test = shapiro(forceset.work_[:,p])
print(f'shapiro wilkins results at x={x[p]} is {shapiro_test}')
# Kolmogorov-Smirnov Test
kstest_test = kstest(forceset.work_[:,p], 'norm')
print(f'Kolmogorov-Smirnov results at x={x[p]} is {kstest_test}')
# Instantiate a ForceEstimator instance and fit it with the ForceSet instance
forceestimator = ForceEstimator(temperature)
forceestimator.fit(forceset)
vars(forceestimator)
# plot dcTMD results
fig, axs = plotting.plot_dcTMD_results(forceestimator.position_,
forceestimator,
forceestimator.friction_)
plt.show()
# smooth friction and plot results
forceestimator.smooth_friction(sigma=0.1, mode='nearest')
fig, axs = plotting.plot_dcTMD_results(forceestimator.position_,
forceestimator,
forceestimator.friction_smooth_)
plt.show()
# different smoothing setting
fig, axs = plt.subplots()
plotting.plot_Gamma(forceestimator.position_,
forceestimator.friction_smooth_,
axs,
label='nearest 0.1nm'
)
# using different smoothing windows and modes changes the results significantly
smooth_friction = dcTMD.utils.gaussfilter_friction(forceestimator.friction_,
forceestimator.position_,
sigma=.01,
mode='reflect',
)
axs.plot(forceestimator.position_, smooth_friction, label='reflect .01nm')
smooth_friction = dcTMD.utils.gaussfilter_friction(forceestimator.friction_,
forceestimator.position_,
sigma=.01,
mode='nearest',
)
axs.plot(forceestimator.position_, smooth_friction, label='nearest .01nm')
smooth_friction = dcTMD.utils.gaussfilter_friction(forceestimator.friction_,
forceestimator.position_,
sigma=.1,
mode='reflect',
)
axs.plot(forceestimator.position_, smooth_friction, label='reflect .1nm')
axs.legend()
plt.show()
# save forceestimator instance
dcTMD.storing.save('my_forceestimator', forceestimator)
# OR save data as .npz and .dat file
outname = 'my_forceestimator_results'
dcTMD.io.write_output(outname, forceestimator)
results = np.load(f'{outname}_N{len(forceestimator.names_)}_dG.npz')
results.files
Step 2.3. Command Line Interface 指令界面
我们的软件包具有一个名为 “dcTMD” 的Python命令行界面(CLI)脚本。脚本加载力文件,生成功/力集,并计算平均功、耗散功、自由能和摩擦系数。输出保存为 .npz
和 .dat
文件,可以使用--save_dataset
选项分别保存功/力集。此外,使用--plot
选项可以生成结果的概览图。CLI还提供了用于调试和误差分析的选项。要获得所有模块的概述,只需在shell中运行以下命令。
python -m dcTMD
内容如下:
Usage: python -m dcTMD [OPTIONS]
------------------------- | dcTMD | -------------------------
Calculate free energy and friction for given constraint force files.
Analysis tools for dissipation-corrected targeted molecular dynamics, which
is an enhanced sampling method to enforce rare events in biomolecular
systems. When publishing results gained with this python package, please
cite the following publications: (1) Tänzel, Victor and Jäger, Miriam and
Wolf, Steffen in preparation. (2) Wolf, Steffen, and Gerhard Stock.
"Targeted molecular dynamics calculations of free energy profiles using a
nonequilibrium friction correction." Journal of chemical theory and
computation 14.12 (2018): 6175- 6182.
Options:
-m, --mode [work|force] Use either work or force autocovariance function to
calculatedcTMD quantities. [default: work;
required]
-f, --file TEXT Input: File containing list of all constraint force
file namesor glob pattern e.g."*.xvg" to generate a
list of all constraint force files using
glob.glob() [required]
-o, --outname PATH Output: Path/prefix of output names.
-T, --temperature FLOAT Simulation temperature in K. [required]
-vel, --velocity FLOAT Pulling velocity in nm/ps. [required]
--res INTEGER Striding to reduce size of returned free energy and
friction [default: 1]
-s, --sigma FLOAT Standard deviation of gaussian filter in nm.
--resamples INTEGER Number of resamples used in optional bootstrapping.
This is only available in mode work
-v, --verbose Enable verbose mode. [default: False]
-p, --plot Plots free energy and smoothed friction. [default:
False]
-sd, --save_dataset Save the Work/ForceSet class to file. [default:
False]
--help Show this message and exit.
例如:
python -m dcTMD -f '*.xvg' -o path/to/output/name -T 290.5 -v 0.001 --verbose --plot --save_dataset
😇意味着将当前文件夹下所有以 .xvg 为后缀的文件导入为约束力文件,输出到path/to/output/name
文件夹内,温度设置为 290.5,牵引速度为 0.001,verbose 模式进行平滑和误差分析等(像2.1. 3/4 一样,把所有设置都输出一遍),绘制概略图,保存功/力集,其余选项采用默认参数😇