Sambamba: process your BAM data faster!
对于很大的(>100G)的bam文件,排序时间很慢不说,往往需要1天或更多的时间,但结果还会出错。如下边的错误. 经测试Sambamba表现较好,能够节省很多时间。随着接触的数据越来越多,感觉很简单的事情也需要花很多时间。不仅仅是数据多了的问题!
[bam_sort_core] merging from 3288 files...
[E::hts_open_format] fail to open file 'CS_RNA_seq.sorted.bam.tmp.1020.bam'
[bam_merge_core] fail to open file CS_RNA_seq.sorted.bam.tmp.1020.bam
1、下载地址
https://github.com/lomereiter/sambamba/releases
2、安装
拷贝至全局环境变量路径即可
3、使用
4、view
sambamba-view - tool for extracting information from SAM/BAM/CRAM files
sambamba view [OPTIONS] <input.bam | input.sam | input.cram> [region1 […]]
sambamba view
allows to efficiently filter SAM/BAM/CRAM files for alignments satisfying various conditions, as well as access its SAM header and information about reference sequences. In order to make these data readily available for consumption by scripts in Perl/Python/Ruby, JSON output is provided.
By default, the tool expects BAM file as an input. In order to work with CRAM, specify -C and for SAM, specify -S
|--sam-input
as a command-line option, the tool does NOT try to guess file format from the extension. Beware that when reading SAM, the tool will skip tags which don’t conform to the SAM/BAM specification, and set invalid fields to their default values.
FILTERING
Filtering is presented in two ways. First, you can specify a condition with -F
option, using a special language for filtering, described at
https://github.com/lomereiter/sambamba/wiki/%5Bsambamba-view%5D-Filter-expression-syntax
Second, if you have an indexed BAM file, several regions can be specified as well. The syntax for regions is the same as in samtools: chr:beg-end where beg and end are 1-based start and end of a closed-end interval on the reference chr.
JSON
Alignment record JSON representation is a hash with keys ‘qname’, ‘flag’, ‘rname’, ‘pos’, ‘mapq’, ‘cigar’, ‘rnext’, ‘qual’, ‘tags’, e.g.
{“qname”:”EAS56_57:6:190:289:82”,”flag”:69,”rname”:”chr1”,”pos”:100,
“mapq”:0,”cigar”:”*”,”rnext”:”=”,”pnext”:100,”tlen”:0,
“seq”:”CTCAAGGTTGTTGCAAGGGGGTCTATGTGAACAAA”,
“qual”:[27,27,27,22,27,27,27,26,27,27,27,27,27,27,27,27,23,26,26,27,
22,26,19,27,26,27,26,26,26,26,26,24,19,27,26],”tags”:{“MF”:192}}
JSON representation mimics SAM format except quality is given as an array of integers.
Postprocessing JSON output is best accomplish