deeptools | bam to BigWig, 并使用IGV可视化峰图差异

本文介绍了如何使用deeptools的bamCoverage将BAM文件转换为bigWig格式,以提高在IGV中加载和查看基因组覆盖度的效率。通过调整参数如--binSize、--scaleFactor和--ignoreDuplicates,可以优化转换过程并影响最终的峰值形状。此外,还展示了如何按DNA链生成单独的bigWig文件,并比较了不同设置下产生的峰形差异。
摘要由CSDN通过智能技术生成

直接把 bam 导入IGV看峰图太慢,把 bam 转为 bigWig 格式后,文件会小很多,载入也很快了。

1. 安装软件

启发: GEO 看到一句:
Bigwig files were obtained with the bamCoverage function from deepTools. Values represent number of reads per bin. We used a bin size of 50.

https://deeptools.readthedocs.io/en/develop/content/list_of_tools.html

安装软件

$ pip3 install deeptools 

$ deeptools --version
deeptools 3.5.1

$ bamCoverage --help
usage: An example usage is:$ bamCoverage -b reads.bam -o coverage.bw

This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output. 

The coverage is calculated as the number of reads per bin, where bins are short consecutive counting windows of a defined size. It is possible to extended the length of the reads to better reflect the actual fragment length. 

*bamCoverage* offers normalization by scaling factor, Reads Per Kilobase per Million mapped reads (RPKM), counts per million (CPM), bins per million mapped reads (BPM) and 1x depth (reads per genome coverage, RPGC).

2. 测试

对于基因 EEF1D, 10x bam 最大366;

(1) 基础测试

# v1 # 去掉的过多了! 最大是 75
$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.bigWig \
--outFileFormat bigwig \
--ignoreDuplicates \
-p 12 \
--scaleFactor 0.5
其中 -p 12 是多进程,测试机器只有12个核;


# v2 no --ignoreDuplicates \  #最大是 3851
$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.v2.bigWig \
--outFileFormat bigwig \
-p 12 \
--scaleFactor 0.5
--ignoreDuplicates    If set, reads that have the same orientation and start position will be considered only once. 
	If reads are paired, the mate's position also has to coincide to ignore a read. (default: False)
	# 方向一致,起点一致就算一次! 如果是配对的,另一个位置也要一致才可以忽略一个 read。 默认 F

# v3 no --ignoreDuplicates , no --scaleFactor 0.5 #最大是 v2 的2倍,7701
$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.v3.bigWig \
--outFileFormat bigwig \
-p 12

(2) --binSize 1 更精细的变化, 默认 50

$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.v4.bigWig \
--outFileFormat bigwig \
--binSize 10 \
-p 12

--binSize INT bp, -bs INT bp
     Size of the bins, in bases, for the output of the bigwig/bedgraph file. (Default: 50)
	 分区间的大小,默认 50

当改为1bp分辨率时,生成时间突然变长了:
$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.v4_1.bigWig \
--outFileFormat bigwig \
--binSize 1 \
-p 12


$ ls -lth bam/bw
total 29M
-rw-rw-r-- 1 wangjl wangjl  13M May 23 13:42 BMMC_donor1_cluster6.v4_1.bigWig
-rw-rw-r-- 1 wangjl wangjl 4.6M May 23 13:34 BMMC_donor1_cluster6.v4.bigWig
-rw-rw-r-- 1 wangjl wangjl 1.9M May 23 13:19 BMMC_donor1_cluster6.v3.bigWig
-rw-rw-r-- 1 wangjl wangjl 1.9M May 23 11:30 BMMC_donor1_cluster6.v2.bigWig
-rw-rw-r-- 1 wangjl wangjl 1.9M May 22 22:44 BMMC_donor1_cluster6.bigWig
-rw-rw-r-- 1 wangjl wangjl 3.2M May 22 22:44 BMMC_donor1_cluster1.bigWig
-rw-rw-r-- 1 wangjl wangjl 2.5M May 22 22:42 BMMC_donor1_cluster3.bigWig

(3) --smoothLength 平滑化,巨慢

会特别慢 13:54–>14:11 还没结束,算了,强制退出。其他都是1-2min结束。

$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.v5.bigWig \
--outFileFormat bigwig \
--binSize 10 \
--smoothLength 20
-p 12

--smoothLength INT bp
	The smooth length defines a window, larger than the binSize, to average the number of reads. 
	For example, if the --binSize is set to 20 and the --smoothLength is set to 60, then, for each bin, the average of the bin and its left and right neighbors is considered.
	Any value smaller than --binSize will be ignored and no smoothing will be applied. (default: None)
	平滑长度,低于 --binSize 则忽略,不做平滑化。默认不做平滑化。

(4) 按照正负链生成2个bw文件 --filterRNAstrand forward or --filterRNAstrand reverse

起初,这个无脑做出来的bw文件在 igv上看是反的,仔细看参数,是过滤掉 forward,那么剩下的是 reverse。同理 filterRNAstrand reverse 得到的是 forward 链的reads。

下面是正确的命名:

$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.v6r.bigWig \
--outFileFormat bigwig \
--binSize 1 \
--filterRNAstrand forward \
-p 12

$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.v6f.bigWig \
--outFileFormat bigwig \
--binSize 1 \
--filterRNAstrand reverse \
-p 12

总结:按方向生成 1bp 分辨率的bw,再使用 igv 查看。

效果图:

EEF1D gene

load BigWig to IGV
看图中,cluster6的bam(最大366)和后面的v6 (最大5791) 最高值差异很大,可能 bam 还是有抽样?或者 bw 值偏大?

换几个基因看看:

CTCF gene

在这里插入图片描述

仅留下 bam 和 bw bin=1的

在这里插入图片描述
BigWig 和 原bam 的峰形状为什么不一样呢? //todo 不知道。
这不是 igv.js 特有的,igv 桌面版也是这样:
在这里插入图片描述

附录

网页版 IGV.js 的代码:

<meta charset=utf8>
<span>
help: https://github.com/igvteam/igv.js/wiki 
| 
test3: show tracks using BigWig file as input
</span>

<script src="js/igv.min.js"></script>
<div id="igv-div"></div>

<script>
	var igvDiv = document.getElementById("igv-div");

    var options =
	{
		genome: "hg19",
		locus: "chr8:144,661,785-144,662,802",
		
		tracks: [
			{
				"name": "Cluster1_NK",
				"url": "bam/BMMC_donor1_cluster1.bam",
				"indexURL": "bam/BMMC_donor1_cluster1.bam.bai",
				"format": "bam",
				"type": "alignment",
				
				"color": "red", //set colors

				showAlignments: false,

				height:50, //total height: default 300
				coverageTrackHeight:50, //coverage height: default 50
				alignmentRowHeight:0, //height of reads, default 14 per reads;
			},
			
			{
				"name": "Cluster3_B",
				"url": "bam/BMMC_donor1_cluster3.bam",
				"indexURL": "bam/BMMC_donor1_cluster3.bam.bai",
				"format": "bam",
				"type": "alignment", //"annotation",
				"color": "blue", //set colors
				showAlignments: false,
				height:50,
				coverageTrackHeight:50,
				//visibilityWindow: -1,
				alignmentRowHeight:0, 
			},
			
			{
				"name": "Cluster6_mono",
				"url": "bam/BMMC_donor1_cluster6.bam",
				"indexURL": "bam/BMMC_donor1_cluster6.bam.bai",
				"format": "bam",
				"type": "alignment", //"annotation",
				"color": "purple", //set colors
				showAlignments: false,
				height:50,
				coverageTrackHeight:50,
				alignmentRowHeight:0, 
			},

// bw
			{
				type: "wig",
				name: "v1-bin50-noDuplicate-scale0.5",
				url: "bam/bw/BMMC_donor1_cluster6.bigWig",
				height:50,
				color: "#FFAEC9",
				min: "0",
				//max: "30",
				//color: "rgb(0, 0, 150)",
				guideLines: [
					{color: 'green', dotted: true, y: 2500}, 
					{color: 'red', dotted: false, y: 5}
				]
			},

			{
				type: "wig",
				name: "v2-bin50-scale0.5",
				url: "bam/bw/BMMC_donor1_cluster6.v2.bigWig",
				height:50,
				color: "#FFC90E",
				min: "0",
				//max: "30",
				//color: "rgb(0, 0, 150)",
				guideLines: [
					{color: 'green', dotted: true, y: 2500}, 
					{color: 'red', dotted: false, y: 5}
				]
			},

			{
				type: "wig",
				name: "v3-bin50-scale1",
				url: "bam/bw/BMMC_donor1_cluster6.v3.bigWig",
				height:50,
				color: "#B5E61D",
				min: "0",
				//max: "30",
				//color: "rgb(0, 0, 150)",
				guideLines: [
					{color: 'green', dotted: true, y: 2500}, 
					{color: 'red', dotted: false, y: 5}
				]
			},

			{
				type: "wig",
				name: "v4-bin10",
				url: "bam/bw/BMMC_donor1_cluster6.v4.bigWig",
				height:50,
				color: "#99D9EA",
				min: "0",
				//max: "30",
				//color: "rgb(0, 0, 150)",
				guideLines: [
					{color: 'green', dotted: true, y: 2500}, 
					{color: 'red', dotted: false, y: 5}
				]
			},

			{
				type: "wig",
				name: "v4_1-bin1",
				url: "bam/bw/BMMC_donor1_cluster6.v4_1.bigWig",
				height:50,
				color: "#22B14C",
				min: "0",
				//max: "30",
				//color: "rgb(0, 0, 150)",
				guideLines: [
					{color: 'green', dotted: true, y: 2500}, 
					{color: 'red', dotted: false, y: 5}
				]
			},

			{
				type: "wig",
				name: "v6-forward", //binSize 1, default:50
				url: "bam/bw/BMMC_donor1_cluster6.v6f.bigWig",
				height:50,
				color: "orange",
				min: "0",
				//max: "30",
				//color: "rgb(0, 0, 150)",
				guideLines: [
					{color: 'green', dotted: true, y: 2500}, 
					{color: 'red', dotted: false, y: 5}
				]
			},

			{
				type: "wig",
				name: "v6-reverse", //binSize 1, default:50
				url: "bam/bw/BMMC_donor1_cluster6.v6r.bigWig",
				height:50,
				color: "#FF7F27",
				min: "0",
				//max: "30",
				//color: "rgb(0, 0, 150)",
				guideLines: [
					{color: 'green', dotted: true, y: 2500}, 
					{color: 'red', dotted: false, y: 5}
				]
			},
		]
	};

	igv.createBrowser(igvDiv, options)
		.then(function (browser) {
			console.log("Created IGV browser");	
		})
</script>

refer

目前,该issue还没有收到回复:https://github.com/deeptools/deepTools/issues/1139

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值