下定决心一定要处理完这件事,现在总结一下:
(1)再次换用htseq-count进行计数处理,更进一步的研究一些现成的工具的运算的方法,励志要彻底的解决它!!(不能放弃!)
(base) [xxzhang@mu02 chr1]$ htseq-count -f bam result_chr1.bam hg38.gtf >counts2.txt
[E::idx_find_and_load] Could not retrieve index file for 'result_chr1.bam'
[Errno 2] No such file or directory: 'hg38.gtf'
[Exception type: FileNotFoundError, raised in utils.py:38]
samtools index -b result_chr1.bam
Warning: No features of type 'exon' found.
Warning: Read A00928:207:HYLCHDSXY:2:1442:26793:7654 claims to have an aligned mate which could not be found in an adjacent line.
奇怪的倒是遇到了和之前一样的问题:
4700000 GFF lines processed.
4800000 GFF lines processed.
4900000 GFF lines processed.
5000000 GFF lines processed.
5100000 GFF lines processed.
5200000 GFF lines processed.
5300000 GFF lines processed.
start too small
[Exception type: IndexError, raised in _HTSeq.pyx:376]
htseq-count -f bam result_chr1.bam repeatfamily_v3.gtf >count4.txt
这个数据又在同一个位置出现了错误,原因依旧不明。我觉得还是标签的问题,或者我未对gtf文件进行排序。真是让人烦恼。
5000000 GFF lines processed.
5100000 GFF lines processed.
5200000 GFF lines processed.
5300000 GFF lines processed.
start too small
[Exception type: IndexError, raised in _HTSeq.pyx:376]
到底是什么原因呢?
这一次提前了。
(base) [xxzhang@fat02 hg38]$ htseq-count -f bam result_chr1.bam repeatfamily_v4.gtf >counts3.txt
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
start too small
[Exception type: IndexError, raised in _HTSeq.pyx:376]
尝试把exon改为CDS,看看什么结果。
htseq-count -f bam -t CDS result_chr1.bam repeatfamily_v5.gtf >counts3.txt