Centos7.5系统,conda本地安装,使用AF2.3.2的UniRef30和BFD数据库
工作发表于Science:https://www.science.org/doi/10.1126/science.adl2528
GitHub:https://github.com/baker-laboratory/RoseTTAFold-All-Atom
安装
-
Clone the package
git clone https://github.com/baker-laboratory/RoseTTAFold-All-Atom cd RoseTTAFold-All-Atom
-
Create conda environment
conda env create -f environment.yaml -y conda activate RFAA # dllogger可能报错,删掉yaml中这一行即可 pip install git+https://github.com/NVIDIA/dllogger.git # 如果报错Failed connect to github.com:443; Connection timed out,取消代理即可 # git config --global --unset https.proxy # git config --global --unset http.proxy # rf2aa/SE3Transformer/requirements.txt已经包括在environment.yaml里 cd rf2aa/SE3Transformer/ python3 setup.py install cd ../../
-
Configure signalp6
进入https://services.healthtech.dtu.dk/services/SignalP-6.0/,选择Downloads-Version 6.0h-fast,填写教育邮箱后即可收到下载邮件,~ 1.4 GB
signalp6-register signalp-6.0h.fast.tar.gz mv $CONDA_PREFIX/lib/python3.10/site-packages/signalp/model_weights/distilled_model_signalp6.pt $CONDA_PREFIX/lib/python3.10/site-packages/signalp/model_weights/ensemble_model_signalp6.pt
-
Install input preparation dependencies
# 下载并安装csblast-2.2.3.tar.gz bash install_dependencies.sh
-
Download the model weights
# ~ 1.2 GB wget http://files.ipd.uw.edu/pub/RF-All-Atom/weights/RFAA_paper_weights.pt
-
Download sequence databases for MSA and template generation
可以用
ln -s [源文件] [目标文件]
将AF2数据库软链接到RFAA文件夹下,或直接修改make_msa.sh
文件中的地址。注意AF2使用的UniRef30为2021_03版本,RFAA使用2021_06版本,需要修改make_msa.sh
文件中的DB_UR30
。# uniref30 [46G] wget http://wwwuser.gwdg.de/~compbiol/uniclust/2020_06/UniRef30_2020_06_hhsuite.tar.gz mkdir -p UniRef30_2020_06 # 检查make_msa.sh可发现此处与README不同,应解压到uniclust文件下 tar xfz UniRef30_2020_06_hhsuite.tar.gz -C ./uniclust # BFD [272G] wget https://bfd.mmseqs.com/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz mkdir -p bfd tar xfz bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt.tar.gz -C ./bfd # structure templates (including *_a3m.ffdata, *_a3m.ffindex) wget https://files.ipd.uw.edu/pub/RoseTTAFold/pdb100_2021Mar03.tar.gz tar xfz pdb100_2021Mar03.tar.gz
推理
给出以下config作为示例
# 预测7u7w蛋白单体结构
python -m rf2aa.run_inference --config-name protein
# 预测7u7w蛋白与DNA的复合物结构
python -m rf2aa.run_inference --config-name nucleic_acid
# 预测7u7w蛋白与DNA与小分子的复合物结构
python -m rf2aa.run_inference --config-name protein_na_sm
# 预测7qxr蛋白与小分子的复合物结构
python -m rf2aa.run_inference --config-name protein_sm
# 预测3fap蛋白二聚体与小分子的复合物结构
python -m rf2aa.run_inference --config-name protein_complex_sm
PyMOL比对PDB网站上的实验结构与预测结构:
Job | RMSD |
---|---|
7u7w_protein | 1.552 |
7u7w_protein_nucleic | 2.115 |
7u7w_protein_nucleic_sm | 1.657 |
7qxr | 0.503 |
3fap | 0.550 |
与博客复现相比,RSMD偏高
Debug
-
7u7w_protein/A/t000_.msa0.ss2.a3m is empty
检查
./7u7w_protein/A/log/make_ss.stderr
可知文件缺少执行权限,chmod +x ./input_prep/make_ss.sh
-
sequence ss_pred contains no residues.
参考博客,重新安装blast-2.2.2,在
make_msa.sh
文件里添加export BLASTMAT=./blast-2.2.26/data/
wget https://ftp.ncbi.nlm.nih.gov/blast/executables/legacy.NOTSUPPORTED/2.2.26/blast-2.2.26-x64-linux.tar.gz tar -zxvf blast-2.2.26-x64-linux.tar.gz