实验记录 | 提高运算时间的策略(3)

下定决心一定要处理完这件事,现在总结一下:
(1)再次换用htseq-count进行计数处理,更进一步的研究一些现成的工具的运算的方法,励志要彻底的解决它!!(不能放弃!)

(base) [xxzhang@mu02 chr1]$ htseq-count -f bam result_chr1.bam hg38.gtf >counts2.txt
[E::idx_find_and_load] Could not retrieve index file for 'result_chr1.bam'
  [Errno 2] No such file or directory: 'hg38.gtf'
  [Exception type: FileNotFoundError, raised in utils.py:38]
samtools index -b result_chr1.bam
Warning: No features of type 'exon' found.
Warning: Read A00928:207:HYLCHDSXY:2:1442:26793:7654 claims to have an aligned mate which could not be found in an adjacent line.

奇怪的倒是遇到了和之前一样的问题:

4700000 GFF lines processed.
4800000 GFF lines processed.
4900000 GFF lines processed.
5000000 GFF lines processed.
5100000 GFF lines processed.
5200000 GFF lines processed.
5300000 GFF lines processed.
start too small
  [Exception type: IndexError, raised in _HTSeq.pyx:376]
htseq-count -f bam result_chr1.bam repeatfamily_v3.gtf   >count4.txt                                                                                                                                                          

这个数据又在同一个位置出现了错误,原因依旧不明。我觉得还是标签的问题,或者我未对gtf文件进行排序。真是让人烦恼。

5000000 GFF lines processed.
5100000 GFF lines processed.
5200000 GFF lines processed.
5300000 GFF lines processed.
  start too small
  [Exception type: IndexError, raised in _HTSeq.pyx:376]

到底是什么原因呢?
这一次提前了。

(base) [xxzhang@fat02 hg38]$ htseq-count -f bam result_chr1.bam repeatfamily_v4.gtf >counts3.txt
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
  start too small
  [Exception type: IndexError, raised in _HTSeq.pyx:376]


尝试把exon改为CDS,看看什么结果。

 htseq-count -f bam -t CDS result_chr1.bam repeatfamily_v5.gtf >counts3.txt

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值