对测序得到的reads进行计数,即基因表达的定量过程。根据reads和基因位置的overlap,以此来判断reads到底属于哪一个基因,同时对该reads总数进行计数,生成counts矩阵。
今日内容
1.HTSeq-count对reads进行计数
2.R语言完成counts矩阵的合并
1. HTSeq-count对reads进行计数
首先了解HTseq用法,参数说明如下:
usage: htseq-count [options] alignment_file gff_file
positional arguments:
samfilenames Path to the SAM/BAM files containing the mapped reads.
If '-' is selected, read from standard input
featuresfilename Path to the file containing the features
optional arguments:
-h, --help show this help message and exit
-f {sam,bam}, --format {sam,bam}
type of <alignment_file> data, either 'sam' or 'bam'
(default: sam)
输入文件类型sam/bam;默认为sam文件
-r {pos,name}, --order {pos,name}
'pos' or 'name'. Sorting order of <alignment_file>
(default: name). Paired-end sequencing data must be
sorted either by position or by read name, and the
sorting order must be specified. Ignored for single-
end data.
输入文件的排序方式,默认按read名排序
--max-reads-in-buffer MAX_BUFFER_SIZE
When <alignment_file> is paired end sorted by
position, allow only so many reads to stay in memory
until the mates are found (raising this number will
use more memory). Has no effect for single end or
paired end sorted by name
-s {yes,no,reverse}, --stranded {yes,no,reverse}
whether the data is from a strand-specific assay.
Specify 'yes', 'no', or 'reverse' (default: yes).
'reverse' means 'yes' with reversed strand
interpretation
-a MINAQUAL, --minaqual MINAQUAL
skip all reads with alignment quality lower than the
given minimum value (default 10)
剔除mapping quality值低于阈值的read
-t FEATURETYPE, --type FEATURETYPE