alphafold复现（conda版）

最新推荐文章于 2024-05-23 11:52:07 发布

户泽荣

最新推荐文章于 2024-05-23 11:52:07 发布

阅读量2.3k

点赞数 6

本文链接：https://blog.csdn.net/huzerong/article/details/119710008

版权

alphafold复现（conda版

欢迎使用Markdown编辑器
- 测试服务器
- 首次设置
Example run

欢迎使用Markdown编辑器

你好！这是你第一次使用 Markdown编辑器 所展示的欢迎页。如果你想学习如何使用Markdown编辑器, 可以仔细阅读这篇文章，了解一下Markdown的基本语法知识。
AlphaFold 是 DeepMind 开源的人工智能系统，借助 AlphaFold 可以更加准确的预测蛋白质的形状。主要应用于医疗保健和生命科学领域，有可能加速药物的研究与发现。
在这里插入图片描述

AlphaFold2开源地址：https://github.com/deepmind/alphafold

AlphaFold2预测的结构数据库：https://www.alphafold.ebi.ac.uk/

测试服务器

CPU: Intel® Xeon® CPU E5-2683 v3 @ 2.00GHz
RAM: 128 GB
GPU: NVIDIA 1080ti 11G
OS: CentOS 7
Cuda: 11.4
NVIDIA driver version: 470.57.02

首次设置

要想运行AlphaFold，需要进行以下步骤：

1.创建新的conda环境并进行更新

`conda create --name alphafold python==3.8conda update -n base conda`

2.激活conda环境

source activateconda activate alphafold

3.安装依赖关系

conda install -y -c conda-forge openmm==7.5.1 cudnn==8.2.1.32 cudatoolkit==11.0.3 pdbfixer==1.7
conda install -y -c bioconda hmmer==3.3.2 hhsuite==3.3.0 kalign2==2.04

4.下载alphafold git repo

git clone https://github.com/deepmind/alphafold.gitalphafold_path="/path/to/alphafold/git/repo"

5.将化学特性下载到普通文件夹

pip install absl-py==0.13.0 biopython==1.79 chex==0.0.7 dm-haiku==0.0.4 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.2.14 ml-collections==0.1.0 numpy==1.19.5 scipy==1.7.0 tensorflow==2.5.0
pip install --upgrade jax jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html

6.应用OpenMM patch(根据自己的

# $alphafold_path variable is set to the alphafold git repo directory (absolute path)
cd ~/anaconda3/envs/alphafold/lib/python3.8/site-packages/ && patch -p0 < $alphafold_path/docker/openmm.patch
# or
cd ~/miniconda3/envs/alphafold/lib/python3.8/site-packages/ && patch -p0 < $alphafold_path/docker/openmm.patch

7.下载数据推荐使用迅雷，下好在传服务器，再解压
下载好database应成结构
在这里插入图片描述

8.运行alphafold运用这个bash脚本

[]Usage: run_alphafold.sh <OPTIONS>Required Parameters:
-d <data_dir>     Path to directory of supporting data
-o <output_dir>   Path to a directory that will store the results.
-m <model_names>  Names of models to use (a comma separated list)
-f <fasta_path>   Path to a FASTA file containing one sequence
-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test setsOptional Parameters:
-n <openmm_threads>   OpenMM threads (default: all available cores)
-b <benchmark>    Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'False')
-g <use_gpu>      Enable NVIDIA runtime to run with GPUs (default: True)
-a <gpu_devices>  Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)
-p <preset>       Choose preset model configuration - no ensembling and smaller genetic database config (reduced_dbs), no ensembling and full genetic database config  (full_dbs) or full genetic database config and 8 model ensemblings (casp14)

Example run

bash run_alphafold.sh -d ./alphafold_data/ -o ./dummy_test/ -m model_1 -f ./example/query.fasta -t 2020-05-14

9.结果说明

output_dir/    
features.pkl    
ranked_{0,1,2,3,4}.pdb    
ranking_debug.json    
relaxed_model_{1,2,3,4,5}.pdb    result_model_{1,2,3,4,5}.pkl    
timings.json    
unrelaxed_model_{1,2,3,4,5}.pdb    
msas/        bfd_uniclust_hits.a3m        mgnify_hits.sto        uniref90_hits.sto

The contents of each output file are as follows:
* features.pkl – A file containing the input feature Numpy arrays used by the models to produce the structures.pickle
* unrelaxed_model_.pdb – A PDB format text file containing the predicted structure, exactly as outputted by the model.
* relaxed_model_.pdb – A PDB format text file containing the predicted structure, after performing an Amber relaxation procedure on the unrelaxed structure prediction, see Jumper et al. 2021, Suppl. Methods 1.8.6 for details.
* ranked_.pdb – A PDB format text file containing the relaxed predicted structures, after reordering by model confidence. Here should contain the prediction with the highest confidence, and the prediction with the lowest confidence. To rank model confidence, we use predicted LDDT (pLDDT), see Jumper et al. 2021, Suppl. Methods 1.9.6 for details.ranked_0.pdbranked_4.pdb
* ranking_debug.json – A JSON format text file containing the pLDDT values used to perform the model ranking, and a mapping back to the original model names.
* timings.json – A JSON format text file containing the times taken to run each section of the AlphaFold pipeline.
* msas/ - A directory containing the files describing the various genetic tool hits that were used to construct the input MSA.
* result_model_.pkl – A file containing a nested dictionary of the various Numpy arrays directly produced by the model. In addition to the output of the structure module, this includes auxiliary outputs such as distograms and pLDDT scores. If using the pTM models then the pTM logits will also be contained in this file.pickle

户泽荣

关注

6
点赞
踩
14

收藏

觉得还不错? 一键收藏
4
评论
alphafold复现（conda版）

alphafold复现（conda版欢迎使用Markdown编辑器测试服务器首次设置合理的创建标题，有助于目录的生成如何改变文本的样式插入链接与图片如何插入一段漂亮的代码片生成一个适合你的列表创建一个表格设定内容居中、居左、居右SmartyPants创建一个自定义列表如何创建一个注脚注释也是必不可少的KaTeX数学公式新的甘特图功能，丰富你的文章UML 图表FLowchart流程图导出与导入导出导入欢迎使用Markdown编辑器你好！这是你第一次使用 Markdown编辑器所展示的欢迎页。如果你想学
复制链接

扫一扫