实验记录 | 提高运算时间的策略（3）

今天也是个妖精头子呀

于 2021-09-26 10:10:16 发布

阅读量602

点赞数

分类专栏：谱系追踪文章标签：实验记录

本文链接：https://blog.csdn.net/weixin_40640700/article/details/120456388

版权

谱系追踪专栏收录该内容

82 篇文章

订阅专栏

博主在使用htseq-count进行基因表达计数时遇到问题，反复尝试更换工具、调整参数，重点关注了gtf标签、排序和索引。文章详细记录了错误信息和尝试的解决方案，包括换用不同类型的特征类型和文件排序。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

下定决心一定要处理完这件事，现在总结一下：
（1）再次换用htseq-count进行计数处理，更进一步的研究一些现成的工具的运算的方法，励志要彻底的解决它！！（不能放弃！）

(base) [xxzhang@mu02 chr1]$ htseq-count -f bam result_chr1.bam hg38.gtf >counts2.txt
[E::idx_find_and_load] Could not retrieve index file for 'result_chr1.bam'
  [Errno 2] No such file or directory: 'hg38.gtf'
  [Exception type: FileNotFoundError, raised in utils.py:38]

samtools index -b result_chr1.bam
Warning: No features of type 'exon' found.
Warning: Read A00928:207:HYLCHDSXY:2:1442:26793:7654 claims to have an aligned mate which could not be found in an adjacent line.

奇怪的倒是遇到了和之前一样的问题:

4700000 GFF lines processed.
4800000 GFF lines processed.
4900000 GFF lines processed.
5000000 GFF lines processed.
5100000 GFF lines processed.
5200000 GFF lines processed.
5300000 GFF lines processed.
start too small
  [Exception type: IndexError, raised in _HTSeq.pyx:376]

htseq-count -f bam result_chr1.bam repeatfamily_v3.gtf   >count4.txt

这个数据又在同一个位置出现了错误，原因依旧不明。我觉得还是标签的问题，或者我未对gtf文件进行排序。真是让人烦恼。

5000000 GFF lines processed.
5100000 GFF lines processed.
5200000 GFF lines processed.
5300000 GFF lines processed.
  start too small
  [Exception type: IndexError, raised in _HTSeq.pyx:376]

到底是什么原因呢？
这一次提前了。

(base) [xxzhang@fat02 hg38]$ htseq-count -f bam result_chr1.bam repeatfamily_v4.gtf >counts3.txt
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
  start too small
  [Exception type: IndexError, raised in _HTSeq.pyx:376]

尝试把exon改为CDS，看看什么结果。

 htseq-count -f bam -t CDS result_chr1.bam repeatfamily_v5.gtf >counts3.txt