Sambamba: process your BAM data faster!

Sambamba是一个用于快速处理BAM数据的工具,尤其适合大型数据集。它提供了比samtools更快的排序和过滤功能,支持JSON输出。Sambamba包括view、sort、merge、slice、flagstat和markdup等操作,适用于生物信息学领域的BAM文件处理。
摘要由CSDN通过智能技术生成

Sambamba: process your BAM data faster!

  对于很大的(>100G)的bam文件,排序时间很慢不说,往往需要1天或更多的时间,但结果还会出错。如下边的错误. 经测试Sambamba表现较好,能够节省很多时间。随着接触的数据越来越多,感觉很简单的事情也需要花很多时间。不仅仅是数据多了的问题!

[bam_sort_core] merging from 3288 files...
[E::hts_open_format] fail to open file 'CS_RNA_seq.sorted.bam.tmp.1020.bam'
[bam_merge_core] fail to open file CS_RNA_seq.sorted.bam.tmp.1020.bam
1、下载地址

https://github.com/lomereiter/sambamba/releases

2、安装

拷贝至全局环境变量路径即可

3、使用
4、view

sambamba-view - tool for extracting information from SAM/BAM/CRAM files

sambamba view [OPTIONS] <input.bam | input.sam | input.cram> [region1 […]]

sambamba view allows to efficiently filter SAM/BAM/CRAM files for alignments satisfying various conditions, as well as access its SAM header and information about reference sequences. In order to make these data readily available for consumption by scripts in Perl/Python/Ruby, JSON output is provided.

By default, the tool expects BAM file as an input. In order to work with CRAM, specify -C and for SAM, specify -S|--sam-input as a command-line option, the tool does NOT try to guess file format from the extension. Beware that when reading SAM, the tool will skip tags which don’t conform to the SAM/BAM specification, and set invalid fields to their default values.

FILTERING

Filtering is presented in two ways. First, you can specify a condition with -F option, using a special language for filtering, described at

https://github.com/lomereiter/sambamba/wiki/%5Bsambamba-view%5D-Filter-expression-syntax

Second, if you have an indexed BAM file, several regions can be specified as well. The syntax for regions is the same as in samtools: chr:beg-end where beg and end are 1-based start and end of a closed-end interval on the reference chr.

JSON

Alignment record JSON representation is a hash with keys ‘qname’, ‘flag’, ‘rname’, ‘pos’, ‘mapq’, ‘cigar’, ‘rnext’, ‘qual’, ‘tags’, e.g.

{“qname”:”EAS56_57:6:190:289:82”,”flag”:69,”rname”:”chr1”,”pos”:100,
“mapq”:0,”cigar”:”*”,”rnext”:”=”,”pnext”:100,”tlen”:0,
“seq”:”CTCAAGGTTGTTGCAAGGGGGTCTATGTGAACAAA”,
“qual”:[27,27,27,22,27,27,27,26,27,27,27,27,27,27,27,27,23,26,26,27,
22,26,19,27,26,27,26,26,26,26,26,24,19,27,26],”tags”:{“MF”:192}}

JSON representation mimics SAM format except quality is given as an array of integers.

Postprocessing JSON output is best accomplish

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值