cellranger atac 操作笔记-3:count 输出文件解读 (2-2)

3. 细胞barcode质控信息,singlecell.csv

 

包含barcode中的片段质控信息,barcode中signal信号,与TSS重叠的片段数量等各种指标信息。

singlecell.csv文件纵坐标为barcode, total, duplicate, chimeric, unmapped, lowmapq, mitochondrial, nonprimary, passed_filters, is__cell_barcode, excluded_reason, TSS_fragments, DNase_sensitive_region_fragments, enhancer_region_fragments, promoter_region_fragments, on_target_fragments, blacklist_region_fragments, peak_region_fragments, peak_region_cutsites等18指标;横坐标为所有barcode序列,包含未质控barcode,总行数超40W。需要注意通过不同pipeline输出的结果指标数量存在一定区别。

# 文件内容
head singlecell.csv 
barcode,total,duplicate,chimeric,unmapped,lowmapq,mitochondrial,nonprimary,passed_filters,is__cell_barcode,excluded_reason,TSS_fragments,DNase_sensitive_region_fragments,enhancer_region_fragments,promoter_region_fragments,on_target_fragments,blacklist_region_fragments,peak_region_fragments,peak_region_cutsites
NO_BARCODE,8692958,1673643,1625,1220361,1114210,0,14527,4668592,0,0,0,0,0,0,0,0,0,0
AAACGAAAGAAAGCAG-1,4,1,0,0,0,0,0,3,0,3,0,0,0,0,0,0,2,4
AAACGAAAGAAAGGGT-1,1,0,0,0,0,0,0,1,0,3,0,0,0,0,0,0,1,2
AAACGAAAGAAATACC-1,3,0,0,0,0,0,0,3,0,3,0,0,0,0,0,0,2,3
AAACGAAAGAAATGGG-1,1200,644,0,109,238,0,6,203,0,0,15,0,0,0,15,0,39,76
AAACGAAAGAAATTCG-1,316,184,0,21,50,0,0,61,0,0,4,0,0,0,4,0,13,25
AAACGAAAGAACAGGA-1,1,0,0,0,0,0,0,1,0,3,0,0,0,0,0,0,0,0
AAACGAAAGAACCCGA-1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
AAACGAAAGAACGACC-1,3,0,0,1,0,0,0,2,0,0,1,0,0,0,1,0,2,4

 

 各指标含义:

umnTypeDescription
barcodekeybarcodes present in input data
totalsequencingtotal read-pairs
duplicatemappingnumber of duplicate read-pairs
chimericmappingnumber of chimerically mapped read-pairs
unmappedmappingnumber of read-pairs with at least one end not mapped
lowmapqmappingnumber of read-pairs with <30 mapq on at least one end
mitochondrialmappingnumber of read-pairs mapping to mitochondria and non-nuclear contigs
nonprimarymappingthe number of reads that map to non-primary contigs
passed_filtersmappingnumber of non-duplicate, usable read-pairs i.e. "fragments"
is_cell_barcodecell callingbinary indicator of whether barcode is associated with a cell
excluded_reasoncell calling0: barcode was not excluded; 1: barcode was excluded because it is a gel bead doublet; 2: barcode was excluded because it is low-targeting; 3: barcode was excluded because it is a barcode multiplet
TSS_fragmentstargetingnumber of fragments overlapping with TSS regions
DNase_sensitive_region_fragmentstargetingnumber of fragments overlapping with DNase sensitive regions
enhancer_region_fragmentstargetingnumber of fragments overlapping enhancer regions
promoter_region_fragmentstargetingnumber of fragments overlapping promoter regions
on_target_fragmentstargetingnumber of fragments overlapping any of TSS, enhancer, promoter and DNase hypersensitivity sites (counted with multiplicity)
blacklist_region_fragmentstargetingnumber of fragments overlapping blacklisted regions
peak_region_fragmentsdenovo targetingnumber of fragments overlapping peaks
peak_region_cutsitesdenovo targetingnumber of ends of fragments in peak regions

4. BAM 和 .BAM.BAI 文件

根据比对位置排序后的bam文件,其格式与标准的sam/bam文件略有区别,详见之前的文章

Cell Ranger count (gene expression) 输出文件解读_韩建刚(CAAS-UCD)的博客-CSDN博客

5. fragment file,包括fragments.tsv.gz 和 fragments.tsv.gz.tbi

fragments.tsv.gz 文件中行:不同的fragment,列:如下5种属性,

NameDescription
chromReference genome chromosome of fragment
chromStartAdjusted start position of fragment on chromosome.
chromEndAdjusted end position of fragment on chromosome. The end position is exclusive, so represents the position immediately following the fragment interval.
barcodeThe 10x cell barcode of this fragment. This corresponds to the CB tag attached to the corresponding BAM file records for this fragment.
readSupportThe total number of read pairs associated with this fragment. This includes the read pair marked unique and all duplicate read pairs.
tail fragments.tsv.gz 
19	55423327	55423435	GATTAGCTCAAGAGAT-1	1
19	55423327	55423435	TCAAGACCATGCGCTG-1	11
19	55423327	55423448	ACGTGGCCAGGTTATC-1	1
19	55423327	55423448	TCAGGTACAGGGCTTC-1	5
19	55423327	55423453	CCTGCTAAGCGTCTGC-1	13
19	55423327	55423453	GGTCATATCAGTGGTT-1	5
19	55423327	55423457	ACCGGGTTCGGGACAA-1	4
19	55423327	55423457	AGTTACGTCGCAACTA-1	2
19	55423327	55423458	GACCGACGTTACGGAG-1	6

6. peaks file, peaks.bed

每一个peak用基因组一段序列区域来表示,其起点-终点分别代表一个酶切事件

Column NumberNameDescription
1chromReference genome chromosome of peak
2chromStartStart position of peak on chromosome.
3chromEndEnd position of peak on chromosome. The end position is exclusive, so represents the position immediately following the peak interval.

7.  peaks 注释

7.1 注释策略:(1)一个peak 可以被比对到多个基因;(2)一个peak 只能是promoter peak 或distal peak的一种;(3)只有蛋白编码基因能够被注释到。

7.2 注释具体过程:(1)如果peak在启动子区域(TSS位点 -1000bp,+100bp),会被注释为promoter peak;(2)如果在TSS 200kb 以内,但没有被注释成 promoter peak,则会被注释为 distal peak;(3)如果一个peak位于转录本中(基因内),同时既不是promoter peak 也不是distal peak,则会被定义为distal peak,但距离会被设为0;(4)如果一个 peak 在上边三步没有被注释到任何基因,最终会被定义为 intergenic peak

7.3 peak_annotation.tsv 格式

共包含6列,前三列为 peak 染色体位置,第四列为注释基因名字,第五列为peak到基因的距离,正值表示到 peak 起点位于 TSS 下游,负值表示 peak 终点位于 TSS 上游,0 表示 peak 与TSS 重叠或者位于基因转录本区域。

NameDescription
chromContig that contains the peak
startPeak start location
endPeak end location
geneGene symbol based on the gene annotation in the reference.
distanceDistance of peak from TSS of gene. Positive distance means the start of the peak is downstream of the position of the TSS, whereas negative distance means the end of the peak is upstream of the TSS. Zero distance means the peak overlaps with the TSS or the peak overlaps with the transcript body of the gene.
peak_typeCan be "promoter", "distal" or "intergenic".
head peak_annotation.tsv 
chrom	start	end	gene	distance	peak_type
1	12116	12985	ENSOARG00020000038	-13113	distal
1	29918	30788	ENSOARG00020000038	3821	distal
1	34037	35000	ENSOARG00020000038	7940	distal
1	36768	37257	ENSOARG00020000038	10671	distal
1	37368	38200	ENSOARG00020000038	11271	distal
1	46853	47720	ENSOARG00020000038	20756	distal
1	49556	50414	ENSOARG00020000038	23459	distal
1	59206	60122	FAM240C	-27312	distal
1	63961	64830	FAM240C	-22604	distal

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值