HiC-Pro安装

最新推荐文章于 2024-09-10 19:02:52 发布

hzau_t

最新推荐文章于 2024-09-10 19:02:52 发布

阅读量310

点赞数

文章标签： linux centos

本文链接：https://blog.csdn.net/hzau_t/article/details/132981672

版权

wget https://github.com/nservant/HiC-Pro/archive/refs/tags/v3.1.0.tar.gz
tar -zxvf HiC-Pro-3.1.0.tar.gz
conda env create -f /public/home/myname/software/HiC-Pro-3.1.0/environment.yml -p /public/home/myname/miniconda3/envs/ HiC-Pro
conda activate HiC-Pro
make configure
make

HiC-Pro官网： https://github.com/nservant/HiC-Pro

根据HiC-pro Github官网指南，我们可以知道需要以下依赖

The HiC-Pro pipeline requires the following dependencies :

The bowtie2 mapper
Python (>3.7) with pysam (>=0.15.4), bx-python(>=0.8.8), numpy(>=1.18.1), and scipy(>=1.4.1) libraries.
Note that the current version no longer supports python 2
R with the RColorBrewer and ggplot2 (>2.2.1) packages
g++ compiler
samtools (>1.9)
Unix sort (which support -V option) is required ! For Mac OS user, please install the GNU core utilities !

Note that Bowtie >2.2.2 is strongly recommanded for allele specific analysis.

但是逐个去安装依赖环境非常复杂，因此官网提供了yml文件，便于配置环境

#   conda env create -f environment.yml
name: HiC-Pro_v3.1.0
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - conda-forge::python=3.8.10=h49503c6_1_cpython
  - conda-forge::scipy=1.7.0=py38h7b17777_1
  - conda-forge::numpy=1.21.1=py38h9894fe3_0
  - bioconda::iced=0.5.10=py38h803c66d_0
  - bioconda::bx-python=0.8.11=py38h024e602_1
  - bioconda::pysam=0.16.0.1=py38hf7546f9_3
  - bioconda::cooler=0.8.11=pyh3252c3a_0

  - conda-forge::r-base=4.0.3=h349a78a_8
  - conda-forge::r-ggplot2=3.3.5=r40hc72bb7e_0
  - conda-forge::r-rcolorbrewer=1.1_2=r40h785f33e_1003
  - conda-forge::r-gridbase=0.4_7=r40hc72bb7e_1003
  
  - conda-forge::tbb=2020.2=hc9558a2_0
  - bioconda::bowtie2=2.4.4=py38h72fc82f_0
  - bioconda::samtools=1.12=h9aed4be_1
  - bioconda::multiqc=1.11=pyhdfd78af_0

可以发现在此环境中配置了python=3.8.10 scipy numpy iced bx-python pysam cooler

R=4.0.3 ggplot2 rcolorbrewer gridbase

tbb=2020.2

bowtie2=2.4.4

samtools=1.12

multiqc=1.1

如果是手动下载依赖环境，则需要指定依赖环境的路径，配置config-install.txt

使用which命令查找路径

如which R

可以export PATH=/../bin:$PATH 方便使用

HiC-Pro使用

将下载好的.sra数据用fastq-dump转换成fastq或fastq.gz保存在rawdata文件夹下

如何获得HiC-pro必备的三个文件：

基因组bowtie2索引

bowtie2-build genome.fa genome

酶切片段文件：

/home/lixingze/software/HiC-Pro-3.0.0/bin/utils/digest_genome.py genome.fa -r dpnii -o genome_dpnii.bed

基因组中序列大小文件：

samtools faidx genome.fa
awk '{print $1"\t" $2}' genome.fa.fai >genome.sizes

修改config-hicpro.txt 文件:

BOWTIE2_IDX_PATH = # bowtie2索引文件目录，索引文件提前下载或用bowtie2-build生成
REFERENCE_GENOME = # bowtie2索引的文件名
GENOME_SIZE = # 染色体大小文件，可从http://hgdownload.cse.ucsc.edu/goldenPath/ 下载
GENOME_FRAGMENT = # 储存消化碎片位置信息的bed文件，一般在HiC-Pro的annotation文件夹下
LIGATION_SITE = # 酶切位点重连接后的序列
# 注：GENOME_FRAGMENT 和 LIGATION_SITE 完全取决于使用了什么酶，一般来说是Hind III，所以要仔细检查数据来源的说明。 
PAIR2_EXT = 
# 根据你的.sra文件的名称，分别输入双端数据文件名称，不用输入后缀

如果annotation下没有对应的消化片段文件，则需要用digest_genome.py产生，命令为：

digest_genome.py -r A^AGCTT -o HindIII_resfrag_hg19.bed hg19_rCRSchrm.fa

最后的执行：

HiC-Pro -c config-hicpro.txt -i [rawdata目录] -o [输出文件目录]