参考文章:
ChIP-seq操作记录
如何使用deeptools处理BAM数据
用deeptools绘制基因组位置的信息
高通量测序数据处理学习记录(四):DeepTools学习笔记
deepTools 使用指南
deepTools 是一套基于python开发的工具,适用于有效处理分析高通量测序数据,可用于ChIP-seq, RNA-seq 或 MNase-seq。需要在Linux服务器上,使用conda进行安装。
deeptools 输入的是比对好的bam文件或者转换好的bigwig文件,可以进行bam文件的处理及数据质控,对数据进行关联分析包括作图,而且可以根据提供的bed文件绘制热图/密度图。
为了统计全基因组范围的peak在基因特征的分布情况,需要用到computeMatrix计算,用plotHeatmap以热图的方式对覆盖进行可视化,用plotProfile以折线图的方式展示覆盖情况。
DEEPTOOLS的三大功能
- BAM & bigWig file processing
- Tools for QC
- Heatmap and summary plot
1. 软件安装
安装仍然在服务器中使用miniconda进行安装,参考文章RNA-seq流程学习笔记(3)。
2. deepTools各工具功能概览
deepTools包含许多可用工具,在使用时需要单独调用它自己的名字。
[ Tools for BAM and bigWig file processing ]
multiBamSummary compute read coverages over bam files. Output used for plotCorrelation or plotPCA
multiBigwigSummary extract scores from bigwig files. Output used for plotCorrelation or plotPCA
correctGCBias corrects GC bias from bam file. Don't use it with ChIP data
bamCoverage computes read coverage per bins or regions
bamCompare computes log2 ratio and other operations of read coverage of two samples per bins or regions
bigwigCompare computes log2 ratio and other operations from bigwig scores of two samples per bins or regions
computeMatrix prepares the data from bigwig scores for plotting with plotHeatmap or plotProfile
alignmentSieve filters BAM alignments according to specified parameters, optionally producing a BEDPE file
[ Tools for QC ]
plotCorrelation plots heatmaps or scatterplots of data correlation
plotPCA plots PCA
plotFingerprint plots the distribution of enriched regions
bamPEFragmentSize returns the read length and paired-end distance from a bam file
computeGCBias computes and plots the GC bias of a sample
plotCoverage plots a histogram of read coverage
estimateReadFiltering estimates the number of reads that will be filtered from a BAM file or files given certain criteria
[Heatmaps and summary plots]
plotHeatmap plots one or multiple heatmaps of user selected regions over different genomic scores
plotProfile plots the average profile of user selected regions over different genomic scores
plotEnrichment plots the read/fragment coverage of one or more sets of regions
3. 功能一:BAM 和bigWig文件处理工具
1. bamCoverage命令(将bam文件转化为bigWig文件)
This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output. The coverage is calculated as the number of reads per bin, where bins are short consecutive counting windows of a defined size. It is possible to extended the length of the reads to better reflect the actual fragment length. bamCoverage offers normalization by scaling factor, Reads Per Kilobase per Million mapped reads (RPKM), counts per million (CPM), bins per million mapped reads (BPM) and 1x depth (reads per genome coverage, RPGC).
bamCoverage 利用测序数据比对结果转换为基因组区域reads覆盖度结果。可以自行设定覆盖度计算的窗口大小(bin);bamCoverage 内置了各种标准化方法:scaling factor, Reads Per Kilobase per Million mapped reads (RPKM), counts per million (CPM), bins per million mapped reads (BPM) and 1x depth (reads per genome coverage, RPGC).
bamCoverage可以用来将bam file转换成bigwig file,同时可以设定binSize参数从而的获取不同的分辨率,在比较非一批数据的时候,还可以设定数据normalizeTo1X到某个值(一般是该物种基因长度)从而方便进行比较。单纯的可以当作bigwig转换工具。
bamCoverage输入文件为bowtie2得到的排序后的ChIP.bam.sort文件,同时需要提供同名的index文件,在bowtie2比对命令后利用samtools index命令进行处理即可(此操作不可以自行命名保存,需使用命令的默认输出)。
- 软件说明
# bamCoverage 利用测序数据比对结果转换为基因组区域reads覆盖度结果。
usage: An example usage is:$ bamCoverage -b reads.bam -o coverage.bw
#输入参数-b:输入的bam文件
Required arguments:
--bam BAM file, -b BAM file
#输出参数项:输出文件名参数-o和输出格式参数-of(默认bigwig格式)
Output:
--outFileName FILENAME, -o FILENAME
Output file name. (default: None)
--outFileFormat {
bigwig,bedgraph}, -of {
bigwig,bedgraph}
Output file type. Either "bigwig" or "bedgraph".
(default: bigwig)
#可选参数
Optional argume