wget https://github.com/nservant/HiC-Pro/archive/refs/tags/v3.1.0.tar.gz
tar -zxvf HiC-Pro-3.1.0.tar.gz
conda env create -f /public/home/myname/software/HiC-Pro-3.1.0/environment.yml -p /public/home/myname/miniconda3/envs/ HiC-Pro
conda activate HiC-Pro
make configure
make
HiC-Pro官网: https://github.com/nservant/HiC-Pro
根据HiC-pro Github官网指南,我们可以知道需要以下依赖
The HiC-Pro pipeline requires the following dependencies :
- The bowtie2 mapper
- Python (>3.7) with pysam (>=0.15.4), bx-python(>=0.8.8), numpy(>=1.18.1), and scipy(>=1.4.1) libraries.
Note that the current version no longer supports python 2 - R with the RColorBrewer and ggplot2 (>2.2.1) packages
- g++ compiler
- samtools (>1.9)
- Unix sort (which support -V option) is required ! For Mac OS user, please install the GNU core utilities !
Note that Bowtie >2.2.2 is strongly recommanded for allele specific analysis.
但是逐个去安装依赖环境非常复杂,因此官网提供了yml文件,便于配置环境
# conda env create -f environment.yml
name: HiC-Pro_v3.1.0
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- conda-forge::python=3.8.10=h49503c6_1_cpython
- conda-forge::scipy=1.7.0=py38h7b17777_1
- conda-forge::numpy=1.21.1=py38h9894fe3_0
- bioconda::iced=0.5.10=py38h803c66d_0
- bioconda::bx-python=0.8.11=py38h024e602_1
- bioconda::pysam=0.16.0.1=py38hf7546f9_3
- bioconda::cooler=0.8.11=pyh3252c3a_0
- conda-forge::r-base=4.0.3=h349a78a_8
- conda-forge::r-ggplot2=3.3.5=r40hc72bb7e_0
- conda-forge::r-rcolorbrewer=1.1_2=r40h785f33e_1003
- conda-forge::r-gridbase=0.4_7=r40hc72bb7e_1003
- conda-forge::tbb=2020.2=hc9558a2_0
- bioconda::bowtie2=2.4.4=py38h72fc82f_0
- bioconda::samtools=1.12=h9aed4be_1
- bioconda::multiqc=1.11=pyhdfd78af_0
可以发现在此环境中配置了python=3.8.10 scipy numpy iced bx-python pysam cooler
R=4.0.3 ggplot2 rcolorbrewer gridbase
tbb=2020.2
bowtie2=2.4.4
samtools=1.12
multiqc=1.1
如果是手动下载依赖环境,则需要指定依赖环境的路径,配置config-install.txt
使用which命令查找路径
如which R
可以export PATH=/../bin:$PATH 方便使用
HiC-Pro使用
将下载好的.sra数据用fastq-dump转换成fastq或fastq.gz保存在rawdata文件夹下
如何获得HiC-pro必备的三个文件:
基因组bowtie2索引
bowtie2-build genome.fa genome
酶切片段文件:
/home/lixingze/software/HiC-Pro-3.0.0/bin/utils/digest_genome.py genome.fa -r dpnii -o genome_dpnii.bed
基因组中序列大小文件:
samtools faidx genome.fa
awk '{print $1"\t" $2}' genome.fa.fai >genome.sizes
修改config-hicpro.txt 文件:
BOWTIE2_IDX_PATH = # bowtie2索引文件目录,索引文件提前下载或用bowtie2-build生成
REFERENCE_GENOME = # bowtie2索引的文件名
GENOME_SIZE = # 染色体大小文件,可从http://hgdownload.cse.ucsc.edu/goldenPath/ 下载
GENOME_FRAGMENT = # 储存消化碎片位置信息的bed文件,一般在HiC-Pro的annotation文件夹下
LIGATION_SITE = # 酶切位点重连接后的序列
# 注:GENOME_FRAGMENT 和 LIGATION_SITE 完全取决于使用了什么酶,一般来说是Hind III,所以要仔细检查数据来源的说明。
PAIR2_EXT =
# 根据你的.sra文件的名称,分别输入双端数据文件名称,不用输入后缀
如果annotation下没有对应的消化片段文件,则需要用digest_genome.py产生,命令为:
digest_genome.py -r A^AGCTT -o HindIII_resfrag_hg19.bed hg19_rCRSchrm.fa
最后的执行:
HiC-Pro -c config-hicpro.txt -i [rawdata目录] -o [输出文件目录]