alphafold复现(conda版
欢迎使用Markdown编辑器
你好! 这是你第一次使用 Markdown编辑器 所展示的欢迎页。如果你想学习如何使用Markdown编辑器, 可以仔细阅读这篇文章,了解一下Markdown的基本语法知识。
AlphaFold 是 DeepMind 开源的人工智能系统,借助 AlphaFold 可以更加准确的预测蛋白质的形状。主要应用于医疗保健和生命科学领域,有可能加速药物的研究与发现。
AlphaFold2开源地址:https://github.com/deepmind/alphafold
AlphaFold2预测的结构数据库:https://www.alphafold.ebi.ac.uk/
测试服务器
CPU: Intel® Xeon® CPU E5-2683 v3 @ 2.00GHz
RAM: 128 GB
GPU: NVIDIA 1080ti 11G
OS: CentOS 7
Cuda: 11.4
NVIDIA driver version: 470.57.02
首次设置
要想运行AlphaFold,需要进行以下步骤:
1.创建新的conda环境并进行更新
`conda create --name alphafold python==3.8conda update -n base conda`
2.激活conda环境
source activateconda activate alphafold
3.安装依赖关系
conda install -y -c conda-forge openmm==7.5.1 cudnn==8.2.1.32 cudatoolkit==11.0.3 pdbfixer==1.7
conda install -y -c bioconda hmmer==3.3.2 hhsuite==3.3.0 kalign2==2.04
4.下载alphafold git repo
git clone https://github.com/deepmind/alphafold.gitalphafold_path="/path/to/alphafold/git/repo"
5.将化学特性下载到普通文件夹
pip install absl-py==0.13.0 biopython==1.79 chex==0.0.7 dm-haiku==0.0.4 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.2.14 ml-collections==0.1.0 numpy==1.19.5 scipy==1.7.0 tensorflow==2.5.0
pip install --upgrade jax jaxlib==0.1.69+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html
6.应用OpenMM patch(根据自己的
# $alphafold_path variable is set to the alphafold git repo directory (absolute path)
cd ~/anaconda3/envs/alphafold/lib/python3.8/site-packages/ && patch -p0 < $alphafold_path/docker/openmm.patch
# or
cd ~/miniconda3/envs/alphafold/lib/python3.8/site-packages/ && patch -p0 < $alphafold_path/docker/openmm.patch
7.下载数据推荐使用迅雷,下好在传服务器,再解压
下载好database应成结构
8.运行alphafold运用这个bash脚本
[]Usage: run_alphafold.sh <OPTIONS>Required Parameters:
-d <data_dir> Path to directory of supporting data
-o <output_dir> Path to a directory that will store the results.
-m <model_names> Names of models to use (a comma separated list)
-f <fasta_path> Path to a FASTA file containing one sequence
-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test setsOptional Parameters:
-n <openmm_threads> OpenMM threads (default: all available cores)
-b <benchmark> Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'False')
-g <use_gpu> Enable NVIDIA runtime to run with GPUs (default: True)
-a <gpu_devices> Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)
-p <preset> Choose preset model configuration - no ensembling and smaller genetic database config (reduced_dbs), no ensembling and full genetic database config (full_dbs) or full genetic database config and 8 model ensemblings (casp14)
Example run
bash run_alphafold.sh -d ./alphafold_data/ -o ./dummy_test/ -m model_1 -f ./example/query.fasta -t 2020-05-14
9.结果说明
output_dir/
features.pkl
ranked_{0,1,2,3,4}.pdb
ranking_debug.json
relaxed_model_{1,2,3,4,5}.pdb result_model_{1,2,3,4,5}.pkl
timings.json
unrelaxed_model_{1,2,3,4,5}.pdb
msas/ bfd_uniclust_hits.a3m mgnify_hits.sto uniref90_hits.sto
The contents of each output file are as follows:
* features.pkl – A file containing the input feature Numpy arrays used by the models to produce the structures.pickle
* unrelaxed_model_.pdb – A PDB format text file containing the predicted structure, exactly as outputted by the model.
* relaxed_model_.pdb – A PDB format text file containing the predicted structure, after performing an Amber relaxation procedure on the unrelaxed structure prediction, see Jumper et al. 2021, Suppl. Methods 1.8.6 for details.
* ranked_.pdb – A PDB format text file containing the relaxed predicted structures, after reordering by model confidence. Here should contain the prediction with the highest confidence, and the prediction with the lowest confidence. To rank model confidence, we use predicted LDDT (pLDDT), see Jumper et al. 2021, Suppl. Methods 1.9.6 for details.ranked_0.pdbranked_4.pdb
* ranking_debug.json – A JSON format text file containing the pLDDT values used to perform the model ranking, and a mapping back to the original model names.
* timings.json – A JSON format text file containing the times taken to run each section of the AlphaFold pipeline.
* msas/ - A directory containing the files describing the various genetic tool hits that were used to construct the input MSA.
* result_model_.pkl – A file containing a nested dictionary of the various Numpy arrays directly produced by the model. In addition to the output of the structure module, this includes auxiliary outputs such as distograms and pLDDT scores. If using the pTM models then the pTM logits will also be contained in this file.pickle