AlphaFold2 Docker环境安装及可视化

AlphaFold2 Docker环境安装

使用Dockerfile安装AlphaFold2 的环境会遇到各种各样的问题,安装过程如下:

目录

AlphaFold2 Docker环境安装

安装环境的前置工作

使用提供的脚本下载需要用到的数据文件,数据目录如下:

将docker/run_docker.py中的DOWNLOAD_DIR 修改为存储数据的路径

Dockerfile文件内容解析,修改了安装HH-suit的部分

Docker环境构建步骤:

可视化



安装环境的前置工作

从github
下载源码,并完成前置工作(下载地址:deepmind/alphafold: Open source code for AlphaFold. (github.com)

docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

  1. Install Docker.

  2. Install NVIDIA Container Toolkit for GPU support.

  3. Setup running Docker as a non-root user.

  4. Download genetic databases (see below).

  5. Download model parameters (see below).

  6. Check that AlphaFold will be able to use a GPU by running:

使用提供的脚本下载需要用到的数据文件,数据目录如下:

$DOWNLOAD_DIR/                             # Total: ~ 2.2 TB (download: 438 GB)
    bfd/                                   # ~ 1.7 TB (download: 271.6 GB)
        # 6 files.
    mgnify/                                # ~ 64 GB (download: 32.9 GB)
        mgy_clusters_2018_12.fa
    params/                                # ~ 3.5 GB (download: 3.5 GB)
        # 5 CASP14 models,
        # 5 pTM models,
        # LICENSE,
        # = 11 files.
    pdb70/                                 # ~ 56 GB (download: 19.5 GB)
        # 9 files.
    pdb_mmcif/                             # ~ 206 GB (download: 46 GB)
        mmcif_files/
            # About 180,000 .cif files.
        obsolete.dat
    small_bfd/                             # ~ 17 GB (download: 9.6 GB)
        bfd-first_non_consensus_sequences.fasta
    uniclust30/                            # ~ 86 GB (download: 24.9 GB)
        uniclust30_2018_08/
            # 13 files.
    uniref90/                              # ~ 58 GB (download: 29.7 GB)
        uniref90.fasta

docker/run_docker.py中的DOWNLOAD_DIR 修改为存储数据的路径

Dockerfile文件内容解析,修改了安装HH-suit的部分

# Copyright 2021 DeepMind Technologies Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
​
# 设置CUDA版本号
​
ARG CUDA=11.0
FROM nvidia/cuda:${CUDA}-cudnn8-runtime-ubuntu18.04
# FROM directive resets ARGS, so we specify again (the value is retained if
# previously set).
ARG CUDA
​
# Use bash to support string substitution.
SHELL ["/bin/bash", "-c"]
​
# 安装相关软件并清理缓存
​
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
      build-essential \
      cmake \
      cuda-command-line-tools-${CUDA/./-} \
      git \
      hmmer \
      kalign \
      tzdata \
      wget \
    && rm -rf /var/lib/apt/lists/*
​
# 安装HH-suite 由于git访问https经常报错,改成了git
​
# Compile HHsuite from source.
#RUN git clone --branch v3.3.0 https://github.com/soedinglab/hh-suite.git /tmp/hh-suite \
RUN git clone --branch v3.3.0 git://github.com/soedinglab/hh-suite.git /tmp/hh-suite \
    && mkdir /tmp/hh-suite/build \
    && pushd /tmp/hh-suite/build \
    && cmake -DCMAKE_INSTALL_PREFIX=/opt/hhsuite .. \
    && make -j 4 && make install \
    && ln -s /opt/hhsuite/bin/* /usr/bin \
    && popd \
    && rm -rf /tmp/hh-suite
​
# 安装Miniconda
​
# Install Miniconda package manager.
RUN wget -q -P /tmp \
  https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
    && bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda \
    && rm /tmp/Miniconda3-latest-Linux-x86_64.sh
​
# 使用conda安装python包
​
# Install conda packages.
ENV PATH="/opt/conda/bin:$PATH"
RUN conda update -qy conda \
    && conda install -y -c conda-forge \
      openmm=7.5.1 \
      cudatoolkit==${CUDA_VERSION} \
      pdbfixer \
      pip \
      python=3.7
​
# 标准氨基酸的键长、键角等参数
​
COPY . /app/alphafold
RUN wget -q -P /app/alphafold/alphafold/common/ \
  https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt
  
# pip安装Python包
​
# Install pip packages.
RUN pip3 install --upgrade pip \
    && pip3 install -r /app/alphafold/requirements.txt \
    && pip3 install --upgrade jax jaxlib==0.1.69+cuda${CUDA/./} -f \
      https://storage.googleapis.com/jax-releases/jax_releases.html
​
#对OpenMM做了一些修改
# Apply OpenMM patch.
WORKDIR /opt/conda/lib/python3.7/site-packages
RUN patch -p0 < /app/alphafold/docker/openmm.patch
#设置Docker虚拟机的入口
# We need to run `ldconfig` first to ensure GPUs are visible, due to some quirk
# with Debian. See https://github.com/NVIDIA/nvidia-docker/issues/1399 for
# details.
# ENTRYPOINT does not support easily running multiple commands, so instead we
# write a shell script to wrap them up.
WORKDIR /app/alphafold
RUN echo $'#!/bin/bash\n\
ldconfig\n\
python /app/alphafold/run_alphafold.py "$@"' > /app/run_alphafold.sh \
  && chmod +x /app/run_alphafold.sh
ENTRYPOINT ["/app/run_alphafold.sh"]

Docker环境构建步骤:

  1. 使用docker build -f docker/Dockerfile -t alphafold .创建镜像

  2. 使用conda安装虚拟环境conda create -n alphafold python=3.7

  3. 进入conda创建的虚拟环境conda activate alphafold

  4. 安装启动的依赖pip3 install -r docker/requirements.txt

  5. 运行alphafold2 --fasta_paths是fasta文件的路径 max_template_data指定了使用模板的截止时间 --preset指定了运行模式 --gpu_devices选择用到的GPU

    python3 docker/run_docker.py --fasta_paths=/data8t/zhengrongtao/test/P94485.fasta --max_template_data=2020-05-14 --preset=reduced_dbs --gpu_devices 1

    --preset的模型

    --preset=reduced_dbs, --preset=full_dbs or --preset=casp14 to the run command. We provide the following presets:

    • reduced_dbs: This preset is optimized for speed and lower hardware requirements. It runs with a reduced version of the BFD database and with no ensembling. It requires 8 CPU cores (vCPUs), 8 GB of RAM, and 600 GB of disk space.

    • full_dbs: The model in this preset is 8 times faster than the casp14 preset with a very minor quality drop (-0.1 average GDT drop on CASP14 domains). It runs with all genetic databases and with no ensembling.

    • casp14: This preset uses the same settings as were used in CASP14. It runs with all genetic databases and with 8 ensemblings.

可视化

可以使用chirema进行可视化,下载地址:Download UCSF Chimera

点击open可以打开.pdb文件

或者使用chimeraX来进行预测Download UCSF ChimeraX

 

  • 1
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值