2021-08-04

最新推荐文章于 2023-11-27 13:29:24 发布

垚垚爸爱学习

最新推荐文章于 2023-11-27 13:29:24 发布

阅读量1.8k

点赞数 1

分类专栏：操作记录文章标签：数据库

本文链接：https://blog.csdn.net/xiaomotong123/article/details/119391061

版权

操作记录专栏收录该内容

18 篇文章 7 订阅

订阅专栏

参考文章：
宏转录组分析：SortMeRNA鉴定过滤rRNA
SILVA Databases for ARB
SortMeRNA 去除rRNA
sortmerna分析rRNA含量

1. Ribosome RNA数据库介绍

sliva rRNA数据库（http://www.arb-silva.de/）用来检查和比对RNA序列，既可以针对16S/18S,SSU，也可以针对23S/28S, LSU，包括了Bacteria, Archaea and Eukarya。同时也是ARB的官方指定数据库。

LSU: Large subunit (23S/28Sribosomal RNAs)
SSU: Small subunit (16S/18Sribosomal RNAs)

1.1 针对arb的下载

到目前（2015.2.4，最新的数据库为Realease119，网页版的已经到121版本了，但是现在不提供下载）

下载介绍http://www.arb-silva.de/download/arb-files/

下载地址：http://www.arb-silva.de/no_cache/download/archive/release_119/ARB_files/

1.2 仅仅是下载fasta文件

下载地址：http://www.arb-silva.de/no_cache/download/archive/release_119/Exports/

根据下载的需求，选择针对23S/28Sribosomal RNAs的LSU或者是针对16S/18Sribosomal RNAs的SSU；然后选择是否去冗余的，我选择去，即Nr99;然后选择是否trunc，即是否对名字缩写；选择是否全长比对结果；

*_tax_silva.fasta.gz

Multi FASTA files of the SSU/LSU databases including the SILVAtaxonomy for

Bacteria, Archaea and Eukaryotes in the header.

REMARK: The sequences in the files are NOT truncated to theeffective LSU or

SSU genes. They contain the full entries as they have been deposited in the

public repositories (ENA/GenBank/DDBJ).

Fasta header:

accession_number.start_position.stop_position taxonomic pathorganism name

*_tax_silva_full_align_trunc.fasta.gz

Multi FASTA files of the SSU/LSU databases including the SILVAtaxonomy for

Bacteria, Archaea and Eukaryotes in the header (including the FULLalignment).

REMARK: Sequences in these files haven been truncated. This meansthat all

nucleotides that have not been aligned were removed from thesequence.

*_tax_silva_trunc.fasta.gz

Multi FASTA files of the SSU/LSU database including the SILVAtaxonomy for

Bacteria, Archaea and Eukaryotes in the header.

REMARK: Sequences in these files haven been truncated. This meansthat all

nucleotides that have not been aligned were removed from thesequence.

生成使用与Mothur的silva数据库：http://blog.mothur.org/2014/08/08/SILVA-v119-reference-files/

SortMeRNA软件包自带细菌16s rRNA，细菌23s rRNA，古菌16s rRNA, 古菌23s rRNA，真核生物18s rRNA, 真核生物28s rRNA，rfam数据库中的5s rRNA和5.8s rRNA数据。8大数据库可一起帮您鉴定宏转录组测序数据中的rRNA序列。

2. 检查数据完整性

(base) lizexing@bio:~/projects/xindi$ ll
总用量 6494620
drwxrwxr-x 2 lizexing lizexing       4096 8月  24 10:12 ./
drwxrwxr-x 6 lizexing lizexing       4096 8月   4 18:59 ../
-rw-rw-r-- 1 lizexing lizexing 5914142720 8月  24 10:11 Data.tar
-rw-rw-r-- 1 lizexing lizexing         43 8月  24 10:12 Data.tar.md5
-rw-rw-r-- 1 lizexing lizexing  736318953 8月  24 10:12 Summary.tar.gz
-rw-rw-r-- 1 lizexing lizexing         49 8月  24 10:12 Summary.tar.gz.md5
(base) lizexing@bio:~/projects/xindi$ cat Data.tar.md5 > check_md5sum.txt && md5sum -c check_md5sum.txt
Data.tar: 成功
(base) lizexing@bio:~/projects/xindi$ cat Summary.tar.gz.md5 > check_md5sum_Summary.txt && md5sum -c check_md5sum_Summary.txt
Summary.tar.gz: 成功

3. 为8大数据库建索引

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./indexdb_rna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db:./rRNA_databases/silva-bac-23s-id98.fasta,./index/silva-bac-23s-db:./rRNA_databases/silva-arc-16s-id95.fasta,./index/silva-arc-16s-db:./rRNA_databases/silva-arc-23s-id98.fasta,./index/silva-arc-23s-db:./rRNA_databases/silva-euk-18s-id95.fasta,./index/silva-euk-18s-db:./rRNA_databases/silva-euk-28s-id98.fasta,./index/silva-euk-28s:./rRNA_databases/rfam-5s-database-id98.fasta,./index/rfam-5s-db:./rRNA_databases/rfam-5.8s-database-id98.fasta,./index/rfam-5.8s-db -v

  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr

  Parameters summary:
    K-mer size: 19
    K-mer interval: 1
    Maximum positions to store per unique K-mer: 10000

  Total number of databases to index: 8

  Begin indexing file ./rRNA_databases/silva-bac-16s-id90.fasta under index name ./index/silva-bac-16s-db:
  Collecting sequence distribution statistics ..  done  [0.114872 sec]

  start index part # 0:
    (1/3) building burst tries .. done  [10.128288 sec]
    (2/3) building CMPH hash .. done  [29.993126 sec]
    (3/3) building position lookup tables .. done [37.032142 sec]
    total number of sequences in this part = 12798
      temporary file was here: /tmp/sortmerna_keys_690389.txt
      writing kmer data to ./index/silva-bac-16s-db.kmer_0.dat
      writing burst tries to ./index/silva-bac-16s-db.bursttrie_0.dat
      writing position lookup table to ./index/silva-bac-16s-db.pos_0.dat
      writing nucleotide distribution statistics to ./index/silva-bac-16s-db.stats
    done.

  Begin indexing file ./rRNA_databases/silva-bac-23s-id98.fasta under index name ./index/silva-bac-23s-db:
  Collecting sequence distribution statistics ..  done  [0.241604 sec]

  start index part # 0:
    (1/3) building burst tries .. done  [4.956894 sec]
    (2/3) building CMPH hash .. done  [4.041567 sec]
    (3/3) building position lookup tables .. done [11.533700 sec]
    total number of sequences in this part = 4488
      temporary file was here: /tmp/sortmerna_keys_690389.txt
      writing kmer data to ./index/silva-bac-23s-db.kmer_0.dat
      writing burst tries to ./index/silva-bac-23s-db.bursttrie_0.dat
      writing position lookup table to ./index/silva-bac-23s-db.pos_0.dat
      writing nucleotide distribution statistics to ./index/silva-bac-23s-db.stats
    done.

  Begin indexing file ./rRNA_databases/silva-arc-16s-id95.fasta under index name ./index/silva-arc-16s-db:
  Collecting sequence distribution statistics ..  done  [0.168354 sec]

  start index part # 0:
    (1/3) building burst tries .. done  [1.096340 sec]
    (2/3) building CMPH hash .. done  [1.835844 sec]
    (3/3) building position lookup tables .. done [2.659358 sec]
    total number of sequences in this part = 3193
      temporary file was here: /tmp/sortmerna_keys_690389.txt
      writing kmer data to ./index/silva-arc-16s-db.kmer_0.dat
      writing burst tries to ./index/silva-arc-16s-db.bursttrie_0.dat
      writing position lookup table to ./index/silva-arc-16s-db.pos_0.dat
      writing nucleotide distribution statistics to ./index/silva-arc-16s-db.stats
    done.

  Begin indexing file ./rRNA_databases/silva-arc-23s-id98.fasta under index name ./index/silva-arc-23s-db:
  Collecting sequence distribution statistics ..  done  [0.004250 sec]

  start index part # 0:
    (1/3) building burst tries .. done  [0.205082 sec]
    (2/3) building CMPH hash .. done  [0.988803 sec]
    (3/3) building position lookup tables .. done [0.372066 sec]
    total number of sequences in this part = 251
      temporary file was here: /tmp/sortmerna_keys_690389.txt
      writing kmer data to ./index/silva-arc-23s-db.kmer_0.dat
      writing burst tries to ./index/silva-arc-23s-db.bursttrie_0.dat
      writing position lookup table to ./index/silva-arc-23s-db.pos_0.dat
      writing nucleotide distribution statistics to ./index/silva-arc-23s-db.stats
    done.

  Begin indexing file ./rRNA_databases/silva-euk-18s-id95.fasta under index name ./index/silva-euk-18s-db:
  Collecting sequence distribution statistics ..  done  [0.147665 sec]

  start index part # 0:
    (1/3) building burst tries .. done  [6.123623 sec]
    (2/3) building CMPH hash .. done  [5.220644 sec]
    (3/3) building position lookup tables .. done [21.785766 sec]
    total number of sequences in this part = 7348
      temporary file was here: /tmp/sortmerna_keys_690389.txt
      writing kmer data to ./index/silva-euk-18s-db.kmer_0.dat
      writing burst tries to ./index/silva-euk-18s-db.bursttrie_0.dat
      writing position lookup table to ./index/silva-euk-18s-db.pos_0.dat
      writing nucleotide distribution statistics to ./index/silva-euk-18s-db.stats
    done.

  Begin indexing file ./rRNA_databases/silva-euk-28s-id98.fasta under index name ./index/silva-euk-28s:
  Collecting sequence distribution statistics ..  done  [0.112021 sec]

  start index part # 0:
    (1/3) building burst tries .. done  [6.403803 sec]
    (2/3) building CMPH hash .. done  [5.814894 sec]
    (3/3) building position lookup tables .. done [16.681088 sec]
    total number of sequences in this part = 4935
      temporary file was here: /tmp/sortmerna_keys_690389.txt
      writing kmer data to ./index/silva-euk-28s.kmer_0.dat
      writing burst tries to ./index/silva-euk-28s.bursttrie_0.dat
      writing position lookup table to ./index/silva-euk-28s.pos_0.dat
      writing nucleotide distribution statistics to ./index/silva-euk-28s.stats
    done.

  Begin indexing file ./rRNA_databases/rfam-5s-database-id98.fasta under index name ./index/rfam-5s-db:
  Collecting sequence distribution statistics ..  done  [0.076209 sec]

  start index part # 0:
    (1/3) building burst tries .. done  [2.160063 sec]
    (2/3) building CMPH hash .. done  [4.935749 sec]
    (3/3) building position lookup tables .. done [9.965298 sec]
    total number of sequences in this part = 59513
      temporary file was here: /tmp/sortmerna_keys_690389.txt
      writing kmer data to ./index/rfam-5s-db.kmer_0.dat
      writing burst tries to ./index/rfam-5s-db.bursttrie_0.dat
      writing position lookup table to ./index/rfam-5s-db.pos_0.dat
      writing nucleotide distribution statistics to ./index/rfam-5s-db.stats
    done.

  Begin indexing file ./rRNA_databases/rfam-5.8s-database-id98.fasta under index name ./index/rfam-5.8s-db:
  Collecting sequence distribution statistics ..  done  [0.014170 sec]

  start index part # 0:
    (1/3) building burst tries .. done  [0.403497 sec]
    (2/3) building CMPH hash .. done  [0.326261 sec]
    (3/3) building position lookup tables .. done [1.708450 sec]
    total number of sequences in this part = 13034
      temporary file was here: /tmp/sortmerna_keys_690389.txt
      writing kmer data to ./index/rfam-5.8s-db.kmer_0.dat
      writing burst tries to ./index/rfam-5.8s-db.bursttrie_0.dat
      writing position lookup table to ./index/rfam-5.8s-db.pos_0.dat
      writing nucleotide distribution statistics to ./index/rfam-5.8s-db.stats
    done.

运行后在软件的./index/下面生成如下索引：
在这里插入图片描述

4. 利用软件自带merge-paired-reads.sh脚本将293T/HTC116/HeLa三组双端测序合并

(base) lizexing@bio:~/software/sortmerna-2.1b/scripts$ ./merge-paired-reads.sh /Data/lizexing/projects/xindi/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data1.fq /Data/lizexing/projects/xindi/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data2.fq /Data/lizexing/projects/xindi/Data/CleanData/293T.fq
   Processing /Data/lizexing/projects/xindi/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data1.fq ..
   Processing /Data/lizexing/projects/xindi/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data2.fq ..
   Interleaving /Data/lizexing/projects/xindi/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data1.fq and /Data/lizexing/projects/xindi/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data2.fq ..
   Done.

(base) lizexing@bio:~/software/sortmerna-2.1b/scripts$ ./merge-paired-reads.sh /Data/lizexing/projects/xindi/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data1.fq /Data/lizexing/projects/xindi/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data2.fq /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq
   Processing /Data/lizexing/projects/xindi/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data1.fq ..
   Processing /Data/lizexing/projects/xindi/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data2.fq ..
   Interleaving /Data/lizexing/projects/xindi/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data1.fq and /Data/lizexing/projects/xindi/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data2.fq ..
   Done.

(base) lizexing@bio:~/software/sortmerna-2.1b/scripts$ ./merge-paired-reads.sh /Data/lizexing/projects/xindi/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1.fq /Data/lizexing/projects/xindi/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1.fq /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq
   Processing /Data/lizexing/projects/xindi/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1.fq ..
   Processing /Data/lizexing/projects/xindi/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1.fq ..
   Interleaving /Data/lizexing/projects/xindi/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1.fq and /Data/lizexing/projects/xindi/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1.fq ..
   Done.

5. 使用sortmerna鉴定293T.fq测序结果中的真核18SrRNA

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta,/Data/lizexing/software/sortmerna-2.1b/index/silva-euk-18s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/293T.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.18s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.non.18s --log -a 16 -m 6000 -v

  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr

  Computing read file statistics ... done [45.93 sec]
  size of reads file: 13479394926 bytes
  partial section(s) to be executed: 3 of size 6291456000 bytes
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are not output
    Number of threads = 16

  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [10.57 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.612551
    Gumbel K = 0.339810
    Minimal SW score based on E-value = 61
    Loading index part 1/1 ...  done [0.95 sec]
    Begin index search ...  done [964.83 sec]
    Freeing index ...  done [0.24 sec]
    Total number of reads mapped (incl. all reads file sections searched): 5170898
    Writing aligned FASTA/FASTQ ...  done [33.36 sec]
    Writing not-aligned FASTA/FASTQ ...  done [82.08 sec]

  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [8.63 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
    Loading index part 1/1 ...  done [1.17 sec]
    Begin index search ...  done [954.55 sec]
    Freeing index ...  done [0.22 sec]
    Total number of reads mapped (incl. all reads file sections searched): 10323233
    Writing aligned FASTA/FASTQ ...  done [32.73 sec]
    Writing not-aligned FASTA/FASTQ ...  done [82.38 sec]

  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [1.26 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
    Loading index part 1/1 ...  done [1.10 sec]
    Begin index search ...  done [153.42 sec]
    Freeing index ...  done [0.23 sec]
    Total number of reads mapped (incl. all reads file sections searched): 11057673
    Writing aligned FASTA/FASTQ ...  done [4.68 sec]
    Writing not-aligned FASTA/FASTQ ...  done [11.73 sec]

SortMeRNA分析完后后会产生四个文件：
1）工作日志.log文件；
2）数据库匹配详情.sam文件；
3）匹配到数据库的.16s.fastq文件；
4）未匹配到数据库的.non.16s.fastq文件

打开log文件可以查看到如下的统计信息：

 Results:
    Total reads = 39137348
    Total reads passing E-value threshold = 11057673 (28.25%)
    Total reads failing E-value threshold = 28079675 (71.75%)
    Minimum read length = 92
    Maximum read length = 141
    Mean read length = 137
 By database:
    /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta		28.25%

根据该信息总结后可以得到结论：293T.fq转录组测序数据中有11057673条序列来源于真核18s rRNA，这些序列占总序列的28.25%；剩下71.25%的序列为非真菌18s rRNA。

6. 使用sortmerna鉴定293T.fq测序结果中的真核28SrRNA

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/silva-euk-28s: --reads /Data/lizexing/projects/xindi/Data/CleanData/293T.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.28s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.non.28s --log -a 16 -m 6000 -v

  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr


  Computing read file statistics ... done [42.96 sec]
  size of reads file: 13479394926 bytes
  partial section(s) to be executed: 3 of size 6291456000 bytes
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are not output
    Number of threads = 16

  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [8.62 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.612082
    Gumbel K = 0.345772
    Minimal SW score based on E-value = 61
    Loading index part 1/1 ...  done [1.35 sec]
    Begin index search ...  done [1072.42 sec]
    Freeing index ...  done [0.23 sec]
    Total number of reads mapped (incl. all reads file sections searched): 7773873
    Writing aligned FASTA/FASTQ ...  done [49.31 sec]
    Writing not-aligned FASTA/FASTQ ...  done [66.23 sec]

  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [8.48 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
    Loading index part 1/1 ...  done [1.26 sec]
    Begin index search ...  done [1103.96 sec]
    Freeing index ...  done [0.24 sec]
    Total number of reads mapped (incl. all reads file sections searched): 15604331
    Writing aligned FASTA/FASTQ ...  done [51.19 sec]
    Writing not-aligned FASTA/FASTQ ...  done [67.92 sec]

  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [1.22 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
    Loading index part 1/1 ...  done [1.16 sec]
    Begin index search ...  done [143.66 sec]
    Freeing index ...  done [0.23 sec]
    Total number of reads mapped (incl. all reads file sections searched): 16723991
    Writing aligned FASTA/FASTQ ...  done [7.01 sec]
    Writing not-aligned FASTA/FASTQ ...  done [9.25 sec]

打开log文件可以查看到如下的统计信息：

Results:
    Total reads = 39137348
    Total reads passing E-value threshold = 16723991 (42.73%)
    Total reads failing E-value threshold = 22413357 (57.27%)
    Minimum read length = 92
    Maximum read length = 141
    Mean read length = 137
 By database:
    /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta		42.73%

根据该信息总结后可以得到结论：293T.fq转录组测序数据中有16723991条序列来源于真核28s rRNA，这些序列占总序列的42.73%；剩下57.27%的序列为非真菌28s rRNA。

7. 使用sortmerna鉴定HCT116.fq测序结果中的真核18SrRNA

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta,/Data/lizexing/software/sortmerna-2.1b/index/silva-euk-18s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.18s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.non.18s --log -a 32 -m 6000 -v

  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr


  Computing read file statistics ... done [59.57 sec]
  size of reads file: 16549783582 bytes
  partial section(s) to be executed: 3 of size 6291456000 bytes
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are not output
    Number of threads = 32

  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [9.14 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.612551
    Gumbel K = 0.339810
    Minimal SW score based on E-value = 61
    Loading index part 1/1 ...  done [0.85 sec]
    Begin index search ...  done [586.28 sec]
    Freeing index ...  done [0.21 sec]
    Total number of reads mapped (incl. all reads file sections searched): 5737092
    Writing aligned FASTA/FASTQ ...  done [37.03 sec]
    Writing not-aligned FASTA/FASTQ ...  done [78.46 sec]

  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [8.53 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
    Loading index part 1/1 ...  done [4.16 sec]
    Begin index search ...  done [610.00 sec]
    Freeing index ...  done [0.20 sec]
    Total number of reads mapped (incl. all reads file sections searched): 11457300
    Writing aligned FASTA/FASTQ ...  done [36.40 sec]
    Writing not-aligned FASTA/FASTQ ...  done [78.48 sec]

  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [6.54 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
    Loading index part 1/1 ...  done [1.12 sec]
    Begin index search ...  done [364.15 sec]
    Freeing index ...  done [0.22 sec]
    Total number of reads mapped (incl. all reads file sections searched): 15059846
    Writing aligned FASTA/FASTQ ...  done [24.18 sec]
    Writing not-aligned FASTA/FASTQ ...  done [49.55 sec]

打开log文件可以查看到如下的统计信息：

Results:
    Total reads = 48208862
    Total reads passing E-value threshold = 15059846 (31.24%)
    Total reads failing E-value threshold = 33149016 (68.76%)
    Minimum read length = 92
    Maximum read length = 141
    Mean read length = 137
 By database:
    /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta		31.24%

根据该信息总结后可以得到结论：HCT116.fq转录组测序数据中有15059846条序列来源于真核18s rRNA，这些序列占总序列的31.24%；剩下68.76%的序列为非真菌18s rRNA。

8. 使用sortmerna鉴定HCT116.fq测序结果中的真核28SrRNA

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/silva-euk-28s: --reads /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.28s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.non.28s --log -a 32 -m 6000 -v

  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr

  Computing read file statistics ... done [59.92 sec]
  size of reads file: 16549783582 bytes
  partial section(s) to be executed: 3 of size 6291456000 bytes
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are not output
    Number of threads = 32

  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [8.59 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.612082
    Gumbel K = 0.345772
    Minimal SW score based on E-value = 61
    Loading index part 1/1 ...  done [0.94 sec]
    Begin index search ...  done [702.14 sec]
    Freeing index ...  done [0.24 sec]
    Total number of reads mapped (incl. all reads file sections searched): 7580001
    Writing aligned FASTA/FASTQ ...  done [50.30 sec]
    Writing not-aligned FASTA/FASTQ ...  done [69.12 sec]

  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [8.77 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
    Loading index part 1/1 ...  done [1.23 sec]
    Begin index search ...  done [661.03 sec]
    Freeing index ...  done [0.21 sec]
    Total number of reads mapped (incl. all reads file sections searched): 15188940
    Writing aligned FASTA/FASTQ ...  done [53.96 sec]
    Writing not-aligned FASTA/FASTQ ...  done [74.60 sec]

  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [5.82 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
    Loading index part 1/1 ...  done [1.28 sec]
    Begin index search ...  done [446.02 sec]
    Freeing index ...  done [0.20 sec]
    Total number of reads mapped (incl. all reads file sections searched): 20009110
    Writing aligned FASTA/FASTQ ...  done [30.87 sec]
    Writing not-aligned FASTA/FASTQ ...  done [43.13 sec]

打开log文件可以查看到如下的统计信息：

Results:
    Total reads = 48208862
    Total reads passing E-value threshold = 20009110 (41.51%)
    Total reads failing E-value threshold = 28199752 (58.49%)
    Minimum read length = 92
    Maximum read length = 141
    Mean read length = 137
 By database:
    /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta		41.51%

根据该信息总结后可以得到结论：HCT116.fq转录组测序数据中有20009110条序列来源于真核28s rRNA，这些序列占总序列的41.51%；剩下58.49%的序列为非真菌28s rRNA。

9. 使用sortmerna鉴定HeLa.fq测序结果中的真核18SrRNA

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta,/Data/lizexing/software/sortmerna-2.1b/index/silva-euk-18s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.18s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.non.18s --log -a 32 -m 6000 -v

  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr

  Computing read file statistics ... done [56.90 sec]
  size of reads file: 16920775684 bytes
  partial section(s) to be executed: 3 of size 6291456000 bytes
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are not output
    Number of threads = 32

  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [9.49 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.612551
    Gumbel K = 0.339810
    Minimal SW score based on E-value = 61
    Loading index part 1/1 ...  done [0.86 sec]
    Begin index search ...  done [1060.73 sec]
    Freeing index ...  done [0.23 sec]
    Total number of reads mapped (incl. all reads file sections searched): 5923594
    Writing aligned FASTA/FASTQ ...  done [37.81 sec]
    Writing not-aligned FASTA/FASTQ ...  done [77.04 sec]

  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [12.64 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
    Loading index part 1/1 ...  done [1.12 sec]
    Begin index search ...  done [1005.13 sec]
    Freeing index ...  done [0.22 sec]
    Total number of reads mapped (incl. all reads file sections searched): 11825938
    Writing aligned FASTA/FASTQ ...  done [39.92 sec]
    Writing not-aligned FASTA/FASTQ ...  done [77.70 sec]

  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [8.70 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
    Loading index part 1/1 ...  done [1.14 sec]
    Begin index search ...  done [705.21 sec]
    Freeing index ...  done [0.22 sec]
    Total number of reads mapped (incl. all reads file sections searched): 15887708
    Writing aligned FASTA/FASTQ ...  done [25.90 sec]
    Writing not-aligned FASTA/FASTQ ...  done [53.30 sec]

打开log文件可以查看到如下的统计信息：

Results:
    Total reads = 48978212
    Total reads passing E-value threshold = 15887708 (32.44%)
    Total reads failing E-value threshold = 33090504 (67.56%)
    Minimum read length = 92
    Maximum read length = 141
    Mean read length = 138
 By database:
    /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta		32.44%

根据该信息总结后可以得到结论：HeLa.fq转录组测序数据中有15887708条序列来源于真核18s rRNA，这些序列占总序列的32.44%；剩下67.56%的序列为非真菌18s rRNA。

10. 使用sortmerna鉴定HeLa.fq测序结果中的真核28SrRNA

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/silva-euk-28s: --reads /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.28s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.non.28s --log -a 32 -m 6000 -v

  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr

  Computing read file statistics ... done [55.48 sec]
  size of reads file: 16920775684 bytes
  partial section(s) to be executed: 3 of size 6291456000 bytes
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are not output
    Number of threads = 32

  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [8.65 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.612082
    Gumbel K = 0.345772
    Minimal SW score based on E-value = 61
    Loading index part 1/1 ...  done [0.93 sec]
    Begin index search ...  done [1031.49 sec]
    Freeing index ...  done [0.27 sec]
    Total number of reads mapped (incl. all reads file sections searched): 7873708
    Writing aligned FASTA/FASTQ ...  done [49.52 sec]
    Writing not-aligned FASTA/FASTQ ...  done [65.20 sec]

  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [8.59 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
    Loading index part 1/1 ...  done [1.14 sec]
    Begin index search ...  done [864.75 sec]
    Freeing index ...  done [0.26 sec]
    Total number of reads mapped (incl. all reads file sections searched): 15792824
    Writing aligned FASTA/FASTQ ...  done [49.99 sec]
    Writing not-aligned FASTA/FASTQ ...  done [64.71 sec]

  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [5.92 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
    Loading index part 1/1 ...  done [1.16 sec]
    Begin index search ...  done [594.52 sec]
    Freeing index ...  done [0.23 sec]
    Total number of reads mapped (incl. all reads file sections searched): 21265376
    Writing aligned FASTA/FASTQ ...  done [34.44 sec]
    Writing not-aligned FASTA/FASTQ ...  done [44.39 sec]

打开log文件可以查看到如下的统计信息：

 Results:
    Total reads = 48978212
    Total reads passing E-value threshold = 21265376 (43.42%)
    Total reads failing E-value threshold = 27712836 (56.58%)
    Minimum read length = 92
    Maximum read length = 141
    Mean read length = 138
 By database:
    /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta		43.42%

根据该信息总结后可以得到结论：HeLa.fq转录组测序数据中有21265376条序列来源于真核28s rRNA，这些序列占总序列的43.42%；剩下56.58%的序列为非真菌28s rRNA。

11. 为45SrRNA序列建索引

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./indexdb_rna --ref ./rRNA_databases/U13369.1.fasta,./index/U13369.1-db -v

  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr

  Parameters summary:
    K-mer size: 19
    K-mer interval: 1
    Maximum positions to store per unique K-mer: 10000

  Total number of databases to index: 1

  Begin indexing file ./rRNA_databases/U13369.1.fasta under index name ./index/U13369.1-db:
  Collecting sequence distribution statistics ..  done  [0.000744 sec]

  start index part # 0:
    (1/3) building burst tries .. done  [0.027380 sec]
    (2/3) building CMPH hash .. done  [0.014562 sec]
    (3/3) building position lookup tables .. done [0.012918 sec]
    total number of sequences in this part = 1
      temporary file was here: /tmp/sortmerna_keys_969085.txt
      writing kmer data to ./index/U13369.1-db.kmer_0.dat
      writing burst tries to ./index/U13369.1-db.bursttrie_0.dat
      writing position lookup table to ./index/U13369.1-db.pos_0.dat
      writing nucleotide distribution statistics to ./index/U13369.1-db.stats
    done.

12. 使用sortmerna鉴定293T.fq测序结果中的真核45SrRNA

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta,/Data/lizexing/software/sortmerna-2.1b/index/U13369.1-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/293T.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.45s --sam --SQ --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.non.45s --log -a 32 -m 6000 -v

  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr

  Computing read file statistics ... done [43.55 sec]
  size of reads file: 13479394926 bytes
  partial section(s) to be executed: 3 of size 6291456000 bytes
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are output
    Number of threads = 32

  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [8.59 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.580217
    Gumbel K = 0.309932
    Minimal SW score based on E-value = 54
    Loading index part 1/1 ...  done [0.06 sec]
    Begin index search ...  done [168.79 sec]
    Freeing index ...  done [0.01 sec]
    Total number of reads mapped (incl. all reads file sections searched): 13141967
    Writing aligned FASTA/FASTQ ...  done [82.47 sec]
    Writing not-aligned FASTA/FASTQ ...  done [32.56 sec]

  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [8.66 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
    Loading index part 1/1 ...  done [0.04 sec]
    Begin index search ...  done [171.32 sec]
    Freeing index ...  done [0.01 sec]
    Total number of reads mapped (incl. all reads file sections searched): 26318920
    Writing aligned FASTA/FASTQ ...  done [82.95 sec]
    Writing not-aligned FASTA/FASTQ ...  done [32.56 sec]

  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [1.67 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
    Loading index part 1/1 ...  done [0.05 sec]
    Begin index search ...  done [23.57 sec]
    Freeing index ...  done [0.00 sec]
    Total number of reads mapped (incl. all reads file sections searched): 28199824
    Writing aligned FASTA/FASTQ ...  done [11.78 sec]
    Writing not-aligned FASTA/FASTQ ...  done [4.57 sec]

打开log文件可以查看到如下的统计信息：

Results:
    Total reads = 39137348
    Total reads passing E-value threshold = 28199824 (72.05%)
    Total reads failing E-value threshold = 10937524 (27.95%)
    Minimum read length = 92
    Maximum read length = 141
    Mean read length = 137
 By database:
    /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta		72.05%

根据该信息总结后可以得到结论：293T.fq转录组测序数据中有28199824条序列来源于真核45s rRNA，这些序列占总序列的72.05%；剩下27.95%的序列为非真菌45s rRNA。

13. 使用sortmerna鉴定HCT116.fq测序结果中的真核45SrRNA

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta,/Data/lizexing/software/sortmerna-2.1b/index/U13369.1-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.45s --sam --SQ --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.non.45s --log -a 16 -m 6000 -v
  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr


  Computing read file statistics ... done [52.94 sec]
  size of reads file: 16549783582 bytes
  partial section(s) to be executed: 3 of size 6291456000 bytes
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are output
    Number of threads = 16

  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [8.60 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.580217
    Gumbel K = 0.309932
    Minimal SW score based on E-value = 55
    Loading index part 1/1 ...  done [0.04 sec]
    Begin index search ...  done [149.36 sec]
    Freeing index ...  done [0.01 sec]
    Total number of reads mapped (incl. all reads file sections searched): 13211959
    Writing aligned FASTA/FASTQ ...  done [82.95 sec]
    Writing not-aligned FASTA/FASTQ ...  done [32.57 sec]

  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [8.60 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
    Loading index part 1/1 ...  done [0.04 sec]
    Begin index search ...  done [150.39 sec]
    Freeing index ...  done [0.00 sec]
    Total number of reads mapped (incl. all reads file sections searched): 26433911
    Writing aligned FASTA/FASTQ ...  done [83.37 sec]
    Writing not-aligned FASTA/FASTQ ...  done [32.61 sec]

  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [5.45 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
    Loading index part 1/1 ...  done [0.04 sec]
    Begin index search ...  done [85.70 sec]
    Freeing index ...  done [0.00 sec]
    Total number of reads mapped (incl. all reads file sections searched): 34790270
    Writing aligned FASTA/FASTQ ...  done [52.70 sec]
    Writing not-aligned FASTA/FASTQ ...  done [20.33 sec]

打开log文件可以查看到如下的统计信息：

 Results:
    Total reads = 48208862
    Total reads passing E-value threshold = 34790270 (72.17%)
    Total reads failing E-value threshold = 13418592 (27.83%)
    Minimum read length = 92
    Maximum read length = 141
    Mean read length = 137
 By database:
    /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta		72.17%

根据该信息总结后可以得到结论：HCT116.fq转录组测序数据中有34790270条序列来源于真核45s rRNA，这些序列占总序列的72.17%；剩下27.83%的序列为非真菌45s rRNA。

14. 使用sortmerna鉴定HeLa.fq测序结果中的真核45SrRNA

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta,/Data/lizexing/software/sortmerna-2.1b/index/U13369.1-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.45s --sam --SQ --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.non.45s --log -a 16 -m 6000 -v
  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr


  Computing read file statistics ... done [54.58 sec]
  size of reads file: 16920775684 bytes
  partial section(s) to be executed: 3 of size 6291456000 bytes
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are output
    Number of threads = 16

  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [8.63 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.580217
    Gumbel K = 0.309932
    Minimal SW score based on E-value = 55
    Loading index part 1/1 ...  done [0.04 sec]
    Begin index search ...  done [169.61 sec]
    Freeing index ...  done [0.01 sec]
    Total number of reads mapped (incl. all reads file sections searched): 14067778
    Writing aligned FASTA/FASTQ ...  done [107.14 sec]
    Writing not-aligned FASTA/FASTQ ...  done [26.67 sec]

  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [8.72 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
    Loading index part 1/1 ...  done [0.06 sec]
    Begin index search ...  done [178.96 sec]
    Freeing index ...  done [0.01 sec]
    Total number of reads mapped (incl. all reads file sections searched): 28155528
    Writing aligned FASTA/FASTQ ...  done [88.82 sec]
    Writing not-aligned FASTA/FASTQ ...  done [26.61 sec]

  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [6.31 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
    Loading index part 1/1 ...  done [0.04 sec]
    Begin index search ...  done [110.25 sec]
    Freeing index ...  done [0.01 sec]
    Total number of reads mapped (incl. all reads file sections searched): 37871016
    Writing aligned FASTA/FASTQ ...  done [61.17 sec]
    Writing not-aligned FASTA/FASTQ ...  done [18.17 sec]

打开log文件可以查看到如下的统计信息：

 Results:
    Total reads = 48978212
    Total reads passing E-value threshold = 37871016 (77.32%)
    Total reads failing E-value threshold = 11107196 (22.68%)
    Minimum read length = 92
    Maximum read length = 141
    Mean read length = 138
 By database:
    /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta		77.32%

根据该信息总结后可以得到结论：HeLa.fq转录组测序数据中有37871016条序列来源于真核45s rRNA，这些序列占总序列的77.32%；剩下22.68%的序列为非真菌45s rRNA。

15. 使用sortmerna鉴定293T.fq测序结果中的真核5SrRNA

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/rfam-5s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/293T.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.5s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.non.5s --log -a 16 -m 6000 -v
  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr

  Computing read file statistics ... done [48.46 sec]
  size of reads file: 13479394926 bytes
  partial section(s) to be executed: 3 of size 6291456000 bytes
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are not output
    Number of threads = 16

  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [8.67 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.616694
    Gumbel K = 0.342032
    Minimal SW score based on E-value = 59
    Loading index part 1/1 ...  done [0.82 sec]
    Begin index search ...  done [62.16 sec]
    Freeing index ...  done [0.10 sec]
    Total number of reads mapped (incl. all reads file sections searched): 45578
    Writing aligned FASTA/FASTQ ...  done [0.49 sec]
    Writing not-aligned FASTA/FASTQ ...  done [113.84 sec]

  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [8.76 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
    Loading index part 1/1 ...  done [0.62 sec]
    Begin index search ...  done [63.52 sec]
    Freeing index ...  done [0.08 sec]
    Total number of reads mapped (incl. all reads file sections searched): 91268
    Writing aligned FASTA/FASTQ ...  done [0.50 sec]
    Writing not-aligned FASTA/FASTQ ...  done [114.07 sec]

  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [1.41 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
    Loading index part 1/1 ...  done [0.61 sec]
    Begin index search ...  done [9.02 sec]
    Freeing index ...  done [0.08 sec]
    Total number of reads mapped (incl. all reads file sections searched): 97847
    Writing aligned FASTA/FASTQ ...  done [0.05 sec]
    Writing not-aligned FASTA/FASTQ ...  done [16.16 sec]

打开log文件可以查看到如下的统计信息：

 Results:
    Total reads = 39137348
    Total reads passing E-value threshold = 97847 (0.25%)
    Total reads failing E-value threshold = 39039501 (99.75%)
    Minimum read length = 92
    Maximum read length = 141
    Mean read length = 137
 By database:
    /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta		0.25%

根据该信息总结后可以得到结论：293T.fq转录组测序数据中有97847条序列来源于真核5s rRNA，这些序列占总序列的0.25%；剩下99.75%的序列为非真菌5s rRNA。

15. 使用sortmerna鉴定HCT116.fq测序结果中的真核5SrRNA

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/rfam-5s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.5s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.non.5s --log -a 16 -m 6000 -v
  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr

  Computing read file statistics ... done [53.93 sec]
  size of reads file: 16549783582 bytes
  partial section(s) to be executed: 3 of size 6291456000 bytes
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are not output
    Number of threads = 16

  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [8.64 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.616694
    Gumbel K = 0.342032
    Minimal SW score based on E-value = 59
    Loading index part 1/1 ...  done [0.51 sec]
    Begin index search ...  done [58.81 sec]
    Freeing index ...  done [0.11 sec]
    Total number of reads mapped (incl. all reads file sections searched): 30875
    Writing aligned FASTA/FASTQ ...  done [0.30 sec]
    Writing not-aligned FASTA/FASTQ ...  done [114.13 sec]

  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [8.79 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
    Loading index part 1/1 ...  done [0.63 sec]
    Begin index search ...  done [59.62 sec]
    Freeing index ...  done [0.09 sec]
    Total number of reads mapped (incl. all reads file sections searched): 61531
    Writing aligned FASTA/FASTQ ...  done [0.32 sec]
    Writing not-aligned FASTA/FASTQ ...  done [114.20 sec]

  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [5.51 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
    Loading index part 1/1 ...  done [0.62 sec]
    Begin index search ...  done [36.67 sec]
    Freeing index ...  done [0.09 sec]
    Total number of reads mapped (incl. all reads file sections searched): 80619
    Writing aligned FASTA/FASTQ ...  done [0.16 sec]
    Writing not-aligned FASTA/FASTQ ...  done [71.82 sec]

打开log文件可以查看到如下的统计信息：

 Results:
    Total reads = 48208862
    Total reads passing E-value threshold = 80619 (0.17%)
    Total reads failing E-value threshold = 48128243 (99.83%)
    Minimum read length = 92
    Maximum read length = 141
    Mean read length = 137
 By database:
    /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta		0.17%

根据该信息总结后可以得到结论：HCT116.fq转录组测序数据中有80619条序列来源于真核5s rRNA，这些序列占总序列的0.17%；剩下99.83%的序列为非真菌5s rRNA。

16. 使用sortmerna鉴定HeLa.fq测序结果中的真核5SrRNA

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/rfam-5s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.5s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.non.5s --log -a 16 -m 6000 -v
  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr

  Computing read file statistics ... done [54.73 sec]
  size of reads file: 16920775684 bytes
  partial section(s) to be executed: 3 of size 6291456000 bytes
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are not output
    Number of threads = 16

  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [8.58 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.616694
    Gumbel K = 0.342032
    Minimal SW score based on E-value = 59
    Loading index part 1/1 ...  done [0.46 sec]
    Begin index search ...  done [53.51 sec]
    Freeing index ...  done [0.09 sec]
    Total number of reads mapped (incl. all reads file sections searched): 31476
    Writing aligned FASTA/FASTQ ...  done [0.26 sec]
    Writing not-aligned FASTA/FASTQ ...  done [114.45 sec]

  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [8.94 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
    Loading index part 1/1 ...  done [0.61 sec]
    Begin index search ...  done [54.88 sec]
    Freeing index ...  done [0.09 sec]
    Total number of reads mapped (incl. all reads file sections searched): 63234
    Writing aligned FASTA/FASTQ ...  done [0.34 sec]
    Writing not-aligned FASTA/FASTQ ...  done [114.15 sec]

  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [5.90 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
    Loading index part 1/1 ...  done [0.61 sec]
    Begin index search ...  done [39.22 sec]
    Freeing index ...  done [0.09 sec]
    Total number of reads mapped (incl. all reads file sections searched): 85138
    Writing aligned FASTA/FASTQ ...  done [0.21 sec]
    Writing not-aligned FASTA/FASTQ ...  done [78.33 sec]

打开log文件可以查看到如下的统计信息：

 Results:
    Total reads = 48978212
    Total reads passing E-value threshold = 85138 (0.17%)
    Total reads failing E-value threshold = 48893074 (99.83%)
    Minimum read length = 92
    Maximum read length = 141
    Mean read length = 138
 By database:
    /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta		0.17%

根据该信息总结后可以得到结论：HeLa.fq转录组测序数据中有85138条序列来源于真核5s rRNA，这些序列占总序列的0.17%；剩下99.83%的序列为非真菌5s rRNA。

17. 使用sortmerna鉴定293T.fq测序结果中的真核5.8SrRNA

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/rfam-5.8s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/293T.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.5.8s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.non.5.8s --log -a 16 -m 6000 -v
  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr

  Computing read file statistics ... done [43.59 sec]
  size of reads file: 13479394926 bytes
  partial section(s) to be executed: 3 of size 6291456000 bytes
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are not output
    Number of threads = 16

  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [8.60 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.617555
    Gumbel K = 0.343861
    Minimal SW score based on E-value = 57
    Loading index part 1/1 ...  done [0.50 sec]
    Begin index search ...  done [36.39 sec]
    Freeing index ...  done [0.03 sec]
    Total number of reads mapped (incl. all reads file sections searched): 316326
    Writing aligned FASTA/FASTQ ...  done [2.19 sec]
    Writing not-aligned FASTA/FASTQ ...  done [112.72 sec]

  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [8.83 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
    Loading index part 1/1 ...  done [0.21 sec]
    Begin index search ...  done [35.90 sec]
    Freeing index ...  done [0.04 sec]
    Total number of reads mapped (incl. all reads file sections searched): 632442
    Writing aligned FASTA/FASTQ ...  done [2.15 sec]
    Writing not-aligned FASTA/FASTQ ...  done [112.38 sec]

  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [1.49 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
    Loading index part 1/1 ...  done [0.21 sec]
    Begin index search ...  done [5.57 sec]
    Freeing index ...  done [0.03 sec]
    Total number of reads mapped (incl. all reads file sections searched): 677733
    Writing aligned FASTA/FASTQ ...  done [0.31 sec]
    Writing not-aligned FASTA/FASTQ ...  done [16.08 sec]

打开log文件可以查看到如下的统计信息：

 Results:
    Total reads = 39137348
    Total reads passing E-value threshold = 677733 (1.73%)
    Total reads failing E-value threshold = 38459615 (98.27%)
    Minimum read length = 92
    Maximum read length = 141
    Mean read length = 137
 By database:
    /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta		1.73%

根据该信息总结后可以得到结论：293T.fq转录组测序数据中有677733条序列来源于真核5.8s rRNA，这些序列占总序列的1.73%；剩下98.27%的序列为非真菌5.8s rRNA。

18. 使用sortmerna鉴定HCT116.fq测序结果中的真核5.8SrRNA

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/rfam-5.8s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.5.8s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.non.5.8s --log -a 16 -m 6000 -v
  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr

  Computing read file statistics ... done [53.45 sec]
  size of reads file: 16549783582 bytes
  partial section(s) to be executed: 3 of size 6291456000 bytes
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are not output
    Number of threads = 16

  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [8.58 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.617555
    Gumbel K = 0.343861
    Minimal SW score based on E-value = 57
    Loading index part 1/1 ...  done [0.15 sec]
    Begin index search ...  done [34.64 sec]
    Freeing index ...  done [0.03 sec]
    Total number of reads mapped (incl. all reads file sections searched): 356496
    Writing aligned FASTA/FASTQ ...  done [2.51 sec]
    Writing not-aligned FASTA/FASTQ ...  done [112.73 sec]

  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [8.61 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
    Loading index part 1/1 ...  done [0.21 sec]
    Begin index search ...  done [34.83 sec]
    Freeing index ...  done [0.03 sec]
    Total number of reads mapped (incl. all reads file sections searched): 713138
    Writing aligned FASTA/FASTQ ...  done [2.50 sec]
    Writing not-aligned FASTA/FASTQ ...  done [112.49 sec]

  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [5.43 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
    Loading index part 1/1 ...  done [0.20 sec]
    Begin index search ...  done [22.51 sec]
    Freeing index ...  done [0.03 sec]
    Total number of reads mapped (incl. all reads file sections searched): 937341
    Writing aligned FASTA/FASTQ ...  done [1.52 sec]
    Writing not-aligned FASTA/FASTQ ...  done [70.75 sec]

打开log文件可以查看到如下的统计信息：

 Results:
    Total reads = 48208862
    Total reads passing E-value threshold = 937341 (1.94%)
    Total reads failing E-value threshold = 47271521 (98.06%)
    Minimum read length = 92
    Maximum read length = 141
    Mean read length = 137
 By database:
    /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta		1.94%

根据该信息总结后可以得到结论：HCT116.fq转录组测序数据中有937341条序列来源于真核5.8s rRNA，这些序列占总序列的1.94%；剩下98.06%的序列为非真菌5.8s rRNA。

19. 使用sortmerna鉴定HeLa.fq测序结果中的真核5.8SrRNA

(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/rfam-5.8s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.5.8s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.non.5.8s --log -a 16 -m 6000 -v

  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr

  Computing read file statistics ... done [54.99 sec]
  size of reads file: 16920775684 bytes
  partial section(s) to be executed: 3 of size 6291456000 bytes
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are not output
    Number of threads = 16

  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [8.51 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.617555
    Gumbel K = 0.343861
    Minimal SW score based on E-value = 57
    Loading index part 1/1 ...  done [0.15 sec]
    Begin index search ...  done [32.53 sec]
    Freeing index ...  done [0.04 sec]
    Total number of reads mapped (incl. all reads file sections searched): 368962
    Writing aligned FASTA/FASTQ ...  done [2.52 sec]
    Writing not-aligned FASTA/FASTQ ...  done [111.97 sec]

  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [8.67 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
    Loading index part 1/1 ...  done [0.21 sec]
    Begin index search ...  done [32.81 sec]
    Freeing index ...  done [0.03 sec]
    Total number of reads mapped (incl. all reads file sections searched): 735202
    Writing aligned FASTA/FASTQ ...  done [2.49 sec]
    Writing not-aligned FASTA/FASTQ ...  done [112.03 sec]

  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [6.10 sec]

  Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
    Loading index part 1/1 ...  done [0.21 sec]
    Begin index search ...  done [22.22 sec]
    Freeing index ...  done [0.03 sec]
    Total number of reads mapped (incl. all reads file sections searched): 988048
    Writing aligned FASTA/FASTQ ...  done [1.70 sec]
    Writing not-aligned FASTA/FASTQ ...  done [77.04 sec]

打开log文件可以查看到如下的统计信息：

 Results:
    Total reads = 48978212
    Total reads passing E-value threshold = 988048 (2.02%)
    Total reads failing E-value threshold = 47990164 (97.98%)
    Minimum read length = 92
    Maximum read length = 141
    Mean read length = 138
 By database:
    /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta		2.02%

根据该信息总结后可以得到结论：HeLa.fq转录组测序数据中有988048条序列来源于真核5.8s rRNA，这些序列占总序列的2.02%；剩下97.98%的序列为非真菌5.8s rRNA。

20. 使用如下脚本script_1对三组数据转换为bw格式

#!/bin/bash
# 上面一行宣告这个script的语法使用bash语法，当程序被执行时，能够载入bash的相关环境配置文件。
# Program
#     This program is used for Xindi data analysis.
# History
#     2021/08/26       zexing            First release
# 设置变量${dir}为常用目录
dir=/Data/lizexing/projects/xindi/Data/CleanData

# 利用for循环进行后续操作
for i in 293T.fq.45s HCT116.fq.45s HeLa.fq.45s
do
# 对数据进行格式转换
samtools view -@ 16 -S ${dir}/${i}.sam -1b -o ${dir}/${i}.bam

# 对数据进行排序
samtools sort -@ 16 -l 5 -o ${dir}/${i}.bam.sort ${dir}/${i}.bam

# 对数据生成目录
samtools index -@ 16 ${dir}/${i}.bam.sort 

# bamCoverage命令转换文件格式
bamCoverage -p 16 -v -b ${dir}/${i}.bam.sort -o ${dir}/${i}.bam.sort.bw

done

在后台运script_1：

nohup bash script_1 > script_1_log &

21. 使用Trim Galore软件对三组数据进行质控，去掉20bp以下的reads

参考文章：Trim Galore ——自动检测adapter的质控软件
参数说明：

--quality：设定Phred quality score阈值，默认为20。
--phred33：：选择-phred33或者-phred64，表示测序平台使用的Phred quality score。
--adapter：输入adapter序列。也可以不输入，Trim Galore!会自动寻找可能性最高的平台对应的adapter。自动搜选的平台三个，也直接显式输入这三种平台，即--illumina、--nextera和--small_rna。
--stringency：设定可以忍受的前后adapter重叠的碱基数，默认为1（非常苛刻）。可以适度放宽，因为后一个adapter几乎不可能被测序仪读到。
--length：设定输出reads长度阈值，小于设定值会被抛弃。
--paired：对于双端测序结果，一对reads中，如果有一个被剔除，那么另一个会被同样抛弃，而不管是否达到标准。
--retain_unpaired：对于双端测序结果，一对reads中，如果一个read达到标准，但是对应的另一个要被抛弃，达到标准的read会被单独保存为一个文件。
--gzip和--dont_gzip：清洗后的数据zip打包或者不打包。
--output_dir：输入目录。需要提前建立目录，否则运行会报错。
-- trim-n : 移除read一端的reads

1.对HeLa细胞数据进行处理

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 -j 16 --paired /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data2.fq.gz

2.对HCT116细胞数据进行处理

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 -j 16 --paired /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data2.fq.gz

3.对293T细胞数据进行处理

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 --paired /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data2.fq.gz

22. 使用gffread-0.12.1软件将45S rRNA的GFF3注释文件转换为GTF格式

参考文章：gffcompare和gffread

Usage: gffread <input_gff> [-g <genomic_seqs_fasta> | <dir>][-s <seq_info.fsize>]
 [-o <outfile>] [-t <trackname>] [-r [[<strand>]<chr>:]<start>..<end> [-R]]
 [-CTVNJMKQAFPGUBHZWTOLE] [-w <exons.fa>] [-x <cds.fa>] [-y <tr_cds.fa>]
 [-i <maxintron>] [--stream] [--bed] [--table <attrlist>] [--sort-by <ref.lst>]

(base) lizexing@bio:~/reference/h_45S_rDNA$ gffread U13369.1.gff3 -T -o U13369.1.gtf
(base) lizexing@bio:~/reference/h_5S_rDNA$ gffread NR_023363.1.gff3 -T -o NR_023363.1.gtf

23. 使用STAR软件对三组数据与45S rRNA进行比对

参考文章：比对软件STAR的使用

Step 1 - Build a 45S rRNA index构建索引

--runThreadN是指你要用几个cpu来运行；
--genomeDir构建索引输出文件的目录；
--genomeFastaFiles你的基因组fasta文件所在的目录

(base) lizexing@bio:~$ STAR  --runMode genomeGenerate --runThreadN 16 --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --genomeFastaFiles /Data/lizexing/reference/h_45S_rDNA/U13369.1.fasta
Sep 05 14:14:23 ..... started STAR run
Sep 05 14:14:23 ... starting to generate Genome files
!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=42999, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 6
Sep 05 14:14:23 ... starting to sort Suffix Array. This may take a long time...
Sep 05 14:14:23 ... sorting Suffix Array chunks and saving them to disk...
Sep 05 14:14:23 ... loading chunks from disk, packing SA...
Sep 05 14:14:23 ... finished generating suffix array
Sep 05 14:14:23 ... generating Suffix Array index
Sep 05 14:14:26 ... completed Suffix Array index
Sep 05 14:14:26 ... writing Genome to disk ...
Sep 05 14:14:26 ... writing Suffix Array to disk ...
Sep 05 14:14:26 ... writing SAindex to disk
Sep 05 14:14:28 ..... finished successfully

Step 2 - STAR比对用法和结果说明

Usage: STAR  [options]... --genomeDir /path/to/genome/index/   --readFilesIn R1.fq R2.fq
--runThreadN 40 \ #线程数
--runMode alignReads \ #比对模式
--readFilesCommand zcat \ #说明你的fastq文件是压缩形式的，就是.gz结尾的，不加的话会报错
--quantMode TranscriptomeSAM GeneCounts \ #将reads比对至转录本序列
--sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf #加入对应的注释文件
--twopassMode Basic \ #先按索引进行第一次比对，而后把第一次比对发现的新剪切位点信息加入到索引中进行第二次比对。这个参数可以保证更精准的比对情况，但是费时也费内存。
--outSAMtype BAM Unsorted \ #输出BAM文件，不进行排序。如果不加这一行，只输出SAM文件。
--outSAMunmapped None \
--genomeDir /gpfs/home/fangy04/downloads/STAR_index/GRCh38/ \ #索引文件目录
--readFilesIn /gpfs/home/fangy04/downloads/SRR8112732_1.fastq.gz /gpfs/home/fangy04/downloads/SRR8112732_2.fastq.gz \ #两个fastq文件目录
--outFileNamePrefix DRB_TT_seq_SRR8112732 #输出文件前缀
--outReadsUnmapped # output of unmapped and partially mapped (i.e. mapped only one mate of a paired end read) reads in separate file(s). Fastx   ... output in separate fasta/fastq files, Unmapped.out.mate1/2
--outSAMunmapped # output of unmapped reads in the SAM format

9216920116 Jun 28 17:06 DRB_TT_seq_SRR8112732Aligned.out.bam #这个文件是最重要的，用来后续进行remove duplicates和sort
1166235552 Jun 28 17:06 DRB_TT_seq_SRR8112732Aligned.toTranscriptome.out.bam #这个文件是那些比对到转录本上的reads组成的bam文件
2034 Jun 28 17:06 DRB_TT_seq_SRR8112732Log.final.out
20188 Jun 28 17:06 DRB_TT_seq_SRR8112732Log.out
2571 Jun 28 17:06 DRB_TT_seq_SRR8112732Log.progress.out
1585521 Jun 28 17:06 DRB_TT_seq_SRR8112732ReadsPerGene.out.tab
6732305 Jun 28 17:06 DRB_TT_seq_SRR8112732SJ.out.tab #剪切的信息
8192 Jun 28 16:51 DRB_TT_seq_SRR8112732_STARgenome
8192 Jun 28 16:51 DRB_TT_seq_SRR8112732_STARpass1

Step 3 - 对293T测序数据进行比对

(base) lizexing@bio:~$ STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data2_val_2.fq.gz --outFileNamePrefix 293T-val --outReadsUnmapped Fastx
Sep 11 15:57:29 ..... started STAR run
Sep 11 15:57:29 ..... loading genome
Sep 11 15:57:31 ..... processing annotations GTF
Sep 11 15:57:31 ..... started 1st pass mapping
Sep 11 16:01:20 ..... finished 1st pass mapping
Sep 11 16:01:20 ..... inserting junctions into the genome indices
Sep 11 16:02:05 ..... started mapping
Sep 11 16:17:08 ..... finished mapping
Sep 11 16:17:08 ..... finished successfully

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat 293T-valLog.final.out
                                 Started job on |       Sep 05 15:10:34
                             Started mapping on |       Sep 05 15:13:46
                                    Finished on |       Sep 05 15:22:49
       Mapping speed, Million of reads per hour |       129.71

                          Number of input reads |       19564161           # fastq文件的信息
                      Average input read length |       275                # read长度
                                    UNIQUE READS:                          # 唯一比对上的reads数量
                   Uniquely mapped reads number |       11634481     
                        Uniquely mapped reads % |       59.47%
                          Average mapped length |       273.27
                       Number of splices: Total |       371950             # 剪切数
            Number of splices: Annotated (sjdb) |       303947
                       Number of splices: GT/AG |       248969
                       Number of splices: GC/AG |       10039
                       Number of splices: AT/AC |       62
               Number of splices: Non-canonical |       112880              # 非典型剪切数
                      Mismatch rate per base, % |       0.33%
                         Deletion rate per base |       0.06%
                        Deletion average length |       1.16
                        Insertion rate per base |       0.20%
                       Insertion average length |       3.50
                             MULTI-MAPPING READS:                           # 多重比对数
        Number of reads mapped to multiple loci |       1184998
             % of reads mapped to multiple loci |       6.06%
        Number of reads mapped to too many loci |       1350
             % of reads mapped to too many loci |       0.01%
                                  UNMAPPED READS:                           # 未比对上的reads
  Number of reads unmapped: too many mismatches |       0
       % of reads unmapped: too many mismatches |       0.00%
            Number of reads unmapped: too short |       847845
                 % of reads unmapped: too short |       4.33%
                Number of reads unmapped: other |       5895487
                     % of reads unmapped: other |       30.13%
                                  CHIMERIC READS:                           # 嵌合的reads数 
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

Step 4 - 对HCT116测序数据进行比对

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data2_val_2.fq.gz --outFileNamePrefix HCT116-val --outReadsUnmapped Fastx
Sep 05 15:29:57 ..... started STAR run
Sep 05 15:29:57 ..... loading genome
Sep 05 15:29:58 ..... processing annotations GTF
Sep 05 15:29:58 ..... started 1st pass mapping
Sep 05 15:32:59 ..... finished 1st pass mapping
Sep 05 15:32:59 ..... inserting junctions into the genome indices
Sep 05 15:33:44 ..... started mapping
Sep 05 15:44:53 ..... finished mapping
Sep 05 15:44:53 ..... finished successfully

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HCT116-valLog.final.out
                                 Started job on |       Sep 05 15:29:57
                             Started mapping on |       Sep 05 15:33:44
                                    Finished on |       Sep 05 15:44:53
       Mapping speed, Million of reads per hour |       129.69

                          Number of input reads |       24101481         # fastq文件的信息
                      Average input read length |       274              # read长度
                                    UNIQUE READS:
                   Uniquely mapped reads number |       14599303         # 唯一比对上的reads数量
                        Uniquely mapped reads % |       60.57%
                          Average mapped length |       272.26
                       Number of splices: Total |       554747
            Number of splices: Annotated (sjdb) |       458962           # 剪切数
                       Number of splices: GT/AG |       478482
                       Number of splices: GC/AG |       16214
                       Number of splices: AT/AC |       123
               Number of splices: Non-canonical |       59928            # 非典型剪切数
                      Mismatch rate per base, % |       0.34%
                         Deletion rate per base |       0.06%
                        Deletion average length |       1.12
                        Insertion rate per base |       0.20%
                       Insertion average length |       3.30
                             MULTI-MAPPING READS:                         # 多重比对数
        Number of reads mapped to multiple loci |       1191559
             % of reads mapped to multiple loci |       4.94%
        Number of reads mapped to too many loci |       2491
             % of reads mapped to too many loci |       0.01%
                                  UNMAPPED READS:                         # 未比对上的reads  
  Number of reads unmapped: too many mismatches |       0
       % of reads unmapped: too many mismatches |       0.00%
            Number of reads unmapped: too short |       1180944
                 % of reads unmapped: too short |       4.90%
                Number of reads unmapped: other |       7127184
                     % of reads unmapped: other |       29.57%
                                  CHIMERIC READS:                         # 嵌合的reads数
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

Step 5 - 对HeLa测序数据进行比对

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data2_val_2.fq.gz --outFileNamePrefix HeLa-val --outReadsUnmapped Fastx
Sep 05 15:53:18 ..... started STAR run
Sep 05 15:53:18 ..... loading genome
Sep 05 15:53:19 ..... processing annotations GTF
Sep 05 15:53:19 ..... started 1st pass mapping
Sep 05 15:56:12 ..... finished 1st pass mapping
Sep 05 15:56:12 ..... inserting junctions into the genome indices
Sep 05 15:56:57 ..... started mapping
Sep 05 16:08:07 ..... finished mapping
Sep 05 16:08:07 ..... finished successfully

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HeLa-valLog.final.out
                                 Started job on |       Sep 05 15:53:18
                             Started mapping on |       Sep 05 15:56:57
                                    Finished on |       Sep 05 16:08:07
       Mapping speed, Million of reads per hour |       131.55

                          Number of input reads |       24483719        # fastq文件的信息
                      Average input read length |       276             # read长度
                                    UNIQUE READS:
                   Uniquely mapped reads number |       16030941        # 唯一比对上的reads数量
                        Uniquely mapped reads % |       65.48%
                          Average mapped length |       274.45
                       Number of splices: Total |       550274
            Number of splices: Annotated (sjdb) |       422196          # 剪切数
                       Number of splices: GT/AG |       430913
                       Number of splices: GC/AG |       7196
                       Number of splices: AT/AC |       41
               Number of splices: Non-canonical |       112124          # 非典型剪切数
                      Mismatch rate per base, % |       0.31%
                         Deletion rate per base |       0.05%
                        Deletion average length |       1.11
                        Insertion rate per base |       0.18%
                       Insertion average length |       3.11
                             MULTI-MAPPING READS:                        # 多重比对数
        Number of reads mapped to multiple loci |       1519317
             % of reads mapped to multiple loci |       6.21%
        Number of reads mapped to too many loci |       1252
             % of reads mapped to too many loci |       0.01%
                                  UNMAPPED READS:                        # 未比对上的reads
  Number of reads unmapped: too many mismatches |       0
       % of reads unmapped: too many mismatches |       0.00%
            Number of reads unmapped: too short |       1008044
                 % of reads unmapped: too short |       4.12%
                Number of reads unmapped: other |       5924165
                     % of reads unmapped: other |       24.20%
                                  CHIMERIC READS:                        # 嵌合的reads数
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

23. 使用STAR软件对三组数据与5S rRNA进行比对

参考文章：比对软件STAR的使用

Step 1 - Build a 5S rRNA index构建索引

--runThreadN是指你要用几个cpu来运行；
--genomeDir构建索引输出文件的目录；
--genomeFastaFiles你的基因组fasta文件所在的目录

(base) lizexing@bio:~$ STAR  --runMode genomeGenerate --runThreadN 32 --genomeDir /Data/lizexing/reference/h_5S_rDNA/ --genomeFastaFiles /Data/lizexing/reference/h_5S_rDNA/NR_023363.1.fasta
Sep 13 12:16:03 ..... started STAR run
Sep 13 12:16:03 ... starting to generate Genome files
!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=121, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 2
Sep 13 12:16:03 ... starting to sort Suffix Array. This may take a long time...
Sep 13 12:16:03 ... sorting Suffix Array chunks and saving them to disk...
Sep 13 12:16:03 ... loading chunks from disk, packing SA...
Sep 13 12:16:03 ... finished generating suffix array
Sep 13 12:16:03 ... generating Suffix Array index
Sep 13 12:16:06 ... completed Suffix Array index
Sep 13 12:16:06 ... writing Genome to disk ...
Sep 13 12:16:06 ... writing Suffix Array to disk ...
Sep 13 12:16:06 ... writing SAindex to disk
Sep 13 12:16:07 ..... finished successfully

Step 2 - STAR比对用法和结果说明

Usage: STAR  [options]... --genomeDir /path/to/genome/index/   --readFilesIn R1.fq R2.fq
--runThreadN 40 \ #线程数
--runMode alignReads \ #比对模式
--readFilesCommand zcat \ #说明你的fastq文件是压缩形式的，就是.gz结尾的，不加的话会报错
--quantMode TranscriptomeSAM GeneCounts \ #将reads比对至转录本序列
--sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf #加入对应的注释文件
--twopassMode Basic \ #先按索引进行第一次比对，而后把第一次比对发现的新剪切位点信息加入到索引中进行第二次比对。这个参数可以保证更精准的比对情况，但是费时也费内存。
--outSAMtype BAM Unsorted \ #输出BAM文件，不进行排序。如果不加这一行，只输出SAM文件。
--outSAMunmapped None \
--genomeDir /gpfs/home/fangy04/downloads/STAR_index/GRCh38/ \ #索引文件目录
--readFilesIn /gpfs/home/fangy04/downloads/SRR8112732_1.fastq.gz /gpfs/home/fangy04/downloads/SRR8112732_2.fastq.gz \ #两个fastq文件目录
--outFileNamePrefix DRB_TT_seq_SRR8112732 #输出文件前缀
--outReadsUnmapped # output of unmapped and partially mapped (i.e. mapped only one mate of a paired end read) reads in separate file(s). Fastx   ... output in separate fasta/fastq files, Unmapped.out.mate1/2
--outSAMunmapped # output of unmapped reads in the SAM format

9216920116 Jun 28 17:06 DRB_TT_seq_SRR8112732Aligned.out.bam #这个文件是最重要的，用来后续进行remove duplicates和sort
1166235552 Jun 28 17:06 DRB_TT_seq_SRR8112732Aligned.toTranscriptome.out.bam #这个文件是那些比对到转录本上的reads组成的bam文件
2034 Jun 28 17:06 DRB_TT_seq_SRR8112732Log.final.out
20188 Jun 28 17:06 DRB_TT_seq_SRR8112732Log.out
2571 Jun 28 17:06 DRB_TT_seq_SRR8112732Log.progress.out
1585521 Jun 28 17:06 DRB_TT_seq_SRR8112732ReadsPerGene.out.tab
6732305 Jun 28 17:06 DRB_TT_seq_SRR8112732SJ.out.tab #剪切的信息
8192 Jun 28 16:51 DRB_TT_seq_SRR8112732_STARgenome
8192 Jun 28 16:51 DRB_TT_seq_SRR8112732_STARpass1

Step 3 - 对293T测序数据进行比对

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_5S_rDNA/NR_023363.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_5S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data2_val_2.fq.gz --outFileNamePrefix 293T-5S-val --outReadsUnmapped Fastx
fix 293T-5S-val --outReadsUnmapped Fastx
Sep 13 12:18:59 ..... started STAR run
Sep 13 12:18:59 ..... loading genome
Sep 13 12:19:00 ..... processing annotations GTF
Sep 13 12:19:00 ..... started 1st pass mapping
Sep 13 12:20:19 ..... finished 1st pass mapping
Sep 13 12:20:20 ..... started mapping
Sep 13 12:21:47 ..... finished mapping
Sep 13 12:21:47 ..... finished successfully
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat 293T-5S-valLog.final.out
                                 Started job on |       Sep 13 12:18:59
                             Started mapping on |       Sep 13 12:20:20
                                    Finished on |       Sep 13 12:21:47
       Mapping speed, Million of reads per hour |       809.55

                          Number of input reads |       19564161
                      Average input read length |       275
                                    UNIQUE READS:
                   Uniquely mapped reads number |       119
                        Uniquely mapped reads % |       0.00%
                          Average mapped length |       194.09
                       Number of splices: Total |       0
            Number of splices: Annotated (sjdb) |       0
                       Number of splices: GT/AG |       0
                       Number of splices: GC/AG |       0
                       Number of splices: AT/AC |       0
               Number of splices: Non-canonical |       0
                      Mismatch rate per base, % |       0.18%
                         Deletion rate per base |       0.00%
                        Deletion average length |       0.00
                        Insertion rate per base |       0.01%
                       Insertion average length |       1.00
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       0
             % of reads mapped to multiple loci |       0.00%
        Number of reads mapped to too many loci |       0
             % of reads mapped to too many loci |       0.00%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |       0
       % of reads unmapped: too many mismatches |       0.00%
            Number of reads unmapped: too short |       1548
                 % of reads unmapped: too short |       0.01%
                Number of reads unmapped: other |       19562494
                     % of reads unmapped: other |       99.99%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

Step 4 - 对HCT116测序数据进行比对

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_5S_rDNA/NR_023363.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_5S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data2_val_2.fq.gz --outFileNamePrefix HCT116-5S-val --outReadsUnmapped Fastx
Sep 13 12:22:29 ..... started STAR run
Sep 13 12:22:29 ..... loading genome
Sep 13 12:22:30 ..... processing annotations GTF
Sep 13 12:22:30 ..... started 1st pass mapping
Sep 13 12:24:24 ..... finished 1st pass mapping
Sep 13 12:24:25 ..... started mapping
Sep 13 12:26:41 ..... finished mapping
Sep 13 12:26:41 ..... finished successfully

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HCT116-5S-valLog.final.out
                                 Started job on |       Sep 13 12:22:29
                             Started mapping on |       Sep 13 12:24:25
                                    Finished on |       Sep 13 12:26:41
       Mapping speed, Million of reads per hour |       637.98

                          Number of input reads |       24101481
                      Average input read length |       274
                                    UNIQUE READS:
                   Uniquely mapped reads number |       155
                        Uniquely mapped reads % |       0.00%
                          Average mapped length |       190.32
                       Number of splices: Total |       0
            Number of splices: Annotated (sjdb) |       0
                       Number of splices: GT/AG |       0
                       Number of splices: GC/AG |       0
                       Number of splices: AT/AC |       0
               Number of splices: Non-canonical |       0
                      Mismatch rate per base, % |       0.06%
                         Deletion rate per base |       0.01%
                        Deletion average length |       2.00
                        Insertion rate per base |       0.00%
                       Insertion average length |       0.00
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       0
             % of reads mapped to multiple loci |       0.00%
        Number of reads mapped to too many loci |       0
             % of reads mapped to too many loci |       0.00%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |       0
       % of reads unmapped: too many mismatches |       0.00%
            Number of reads unmapped: too short |       2531
                 % of reads unmapped: too short |       0.01%
                Number of reads unmapped: other |       24098795
                     % of reads unmapped: other |       99.99%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

Step 5 - 对HeLa测序数据进行比对

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_5S_rDNA/NR_023363.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_5S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data2_val_2.fq.gz --outFileNamePrefix HeLa-5S-val --outReadsUnmapped Fastx
Sep 13 12:22:50 ..... started STAR run
Sep 13 12:22:50 ..... loading genome
Sep 13 12:22:51 ..... processing annotations GTF
Sep 13 12:22:51 ..... started 1st pass mapping
Sep 13 12:24:51 ..... finished 1st pass mapping
Sep 13 12:24:51 ..... started mapping
Sep 13 12:27:02 ..... finished mapping
Sep 13 12:27:02 ..... finished successfully

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HeLa-5S-valLog.final.out
                                 Started job on |       Sep 13 12:22:50
                             Started mapping on |       Sep 13 12:24:51
                                    Finished on |       Sep 13 12:27:02
       Mapping speed, Million of reads per hour |       672.84

                          Number of input reads |       24483719
                      Average input read length |       276
                                    UNIQUE READS:
                   Uniquely mapped reads number |       148
                        Uniquely mapped reads % |       0.00%
                          Average mapped length |       190.82
                       Number of splices: Total |       0
            Number of splices: Annotated (sjdb) |       0
                       Number of splices: GT/AG |       0
                       Number of splices: GC/AG |       0
                       Number of splices: AT/AC |       0
               Number of splices: Non-canonical |       0
                      Mismatch rate per base, % |       0.13%
                         Deletion rate per base |       0.00%
                        Deletion average length |       0.00
                        Insertion rate per base |       0.00%
                       Insertion average length |       0.00
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |       1
             % of reads mapped to multiple loci |       0.00%
        Number of reads mapped to too many loci |       0
             % of reads mapped to too many loci |       0.00%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |       0
       % of reads unmapped: too many mismatches |       0.00%
            Number of reads unmapped: too short |       1670
                 % of reads unmapped: too short |       0.01%
                Number of reads unmapped: other |       24481900
                     % of reads unmapped: other |       99.99%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

25. 使用Samtools软件对三组数据进行排序

# 对293T数据进行排序
samtools sort -@ 32 -l 5 -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam

# 对HCT116数据进行排序
samtools sort -@ 32 -l 5 -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam

# 对HeLa数据进行排序
samtools sort -@ 32 -l 5 -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam

26. 使用featureCounts软件对三组数据read summarization

参考文章：featurecounts的使用说明

Usage: featureCounts [options] -a <annotation_file> -o <output_file> input_file1 [input_file2] ...
-T <int>            Number of the threads. 1 by default.
-a <string>         Name of an annotation file. GTF/GFF format by default. See
                     -F option for more format information. Inbuilt annotations
                     (SAF format) is available in 'annotation' directory of the
                     package. Gzipped file is also accepted.
-o <string>         Name of output file including read counts. A separate file
                    including summary statistics of counting results is also
                    included in the output ('<string>.summary'). Both files
                    are in tab delimited format.
-p                  If specified, fragments (or templates) will be counted
                    instead of reads. This option is only applicable for
                    paired-end reads; single-end reads are always counted as
                    reads.
-B                  Only count read pairs that have both ends aligned.
-P                  Check validity of paired-end distance when counting read
                    pairs. Use -d and -D to set thresholds.
-d <int>            Minimum fragment/template length, 50 by default.
-D <int>            Maximum fragment/template length, 600 by default.
-C                  Do not count read pairs that have their two ends mapping
                    to different chromosomes or mapping to same chromosome
                    but on different strands.
--donotsort         Do not sort reads in BAM/SAM input. Note that reads from
                    the same pair are required to be located next to each
                    other in the input.
-f                  Perform read counting at feature level (eg. counting
                    reads for exons rather than genes).
-t <string>         Specify feature type(s) in a GTF annotation. If multiple
                    types are provided, they should be separated by ',' with
                    no space in between. 'exon' by default. Rows in the
                    annotation with a matched feature will be extracted and
                    used for read mapping.
 -g <string>        Specify attribute type in GTF annotation. 'gene_id' by
                    default. Meta-features used for read counting will be
                    extracted from annotation using the provided value.

Step 1 - 对293T测序数据进行计数：5.8S_RNA_bin=10bp, 3’ETS_RNA_bin=100=bp, others_RNA_bin=200bp

# Multimapping reads : not counted
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort

        ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
          v2.0.1

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           o 293T-valAligned.out.bam.sort                   ||
||                                                                            ||
||             Output file : 293T_2.read.count                                ||
||                 Summary : 293T_2.read.count.summary                        ||
||              Annotation : U13369.1.2.gtf (GTF)                             ||
||      Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
||                                                                            ||
||                 Threads : 32                                               ||
||                   Level : feature level                                    ||
||              Paired-end : yes                                              ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : required                                         ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file U13369.1.2.gtf ...                                    ||
||    Features : 84                                                           ||
||    Meta-features : 84                                                      ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Process BAM file 293T-valAligned.out.bam.sort...                           ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 14730856                                             ||
||    Successfully assigned alignments : 6321600 (42.9%)                      ||
||    Running time : 0.41 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
|| Summary of counting results can be found in file "/Data/lizexing/projects  ||
|| /xindi/Data/new/Data/CleanData/293T_2.read.count.summary"                  ||
||                                                                            ||
\\============================================================================//

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat 293T.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort"
Geneid  Chr     Start   End     Strand  Length  /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort
rna-U13369.1:1..200     U13369.1        1       200     +       200     149
rna-U13369.1:201..400   U13369.1        201     400     +       200     33
rna-U13369.1:401..600   U13369.1        401     600     +       200     3415
rna-U13369.1:601..800   U13369.1        601     800     +       200     1689
rna-U13369.1:801..1000  U13369.1        801     1000    +       200     11097
rna-U13369.1:1001..1200 U13369.1        1001    1200    +       200     6832
rna-U13369.1:1201..1400 U13369.1        1201    1400    +       200     5593
rna-U13369.1:1401..1600 U13369.1        1401    1600    +       200     1309
rna-U13369.1:1601..1800 U13369.1        1601    1800    +       200     2973
rna-U13369.1:1801..2000 U13369.1        1801    2000    +       200     4920
rna-U13369.1:2001..2200 U13369.1        2001    2200    +       200     9107
rna-U13369.1:2201..2400 U13369.1        2201    2400    +       200     1132
rna-U13369.1:2401..2600 U13369.1        2401    2600    +       200     1263
rna-U13369.1:2601..2800 U13369.1        2601    2800    +       200     573
rna-U13369.1:2801..3000 U13369.1        2801    3000    +       200     292
rna-U13369.1:3001..3200 U13369.1        3001    3200    +       200     477
rna-U13369.1:3201..3400 U13369.1        3201    3400    +       200     162
rna-U13369.1:3401..3656 U13369.1        3401    3656    +       256     6298
rna-U13369.1:3657..3857 U13369.1        3657    3857    +       201     906561
rna-U13369.1:3858..4057 U13369.1        3858    4057    +       200     296332
rna-U13369.1:4058..4257 U13369.1        4058    4257    +       200     225795
rna-U13369.1:4258..4457 U13369.1        4258    4457    +       200     292400
rna-U13369.1:4458..4657 U13369.1        4458    4657    +       200     253866
rna-U13369.1:4658..4857 U13369.1        4658    4857    +       200     258050
rna-U13369.1:4858..5057 U13369.1        4858    5057    +       200     463669
rna-U13369.1:5058..5257 U13369.1        5058    5257    +       200     281272
rna-U13369.1:5258..5457 U13369.1        5258    5457    +       200     98735
rna-U13369.1:5458..5527 U13369.1        5458    5527    +       70      8547
rna-U13369.1:5528..5728 U13369.1        5528    5728    +       201     17658
rna-U13369.1:5729..5928 U13369.1        5729    5928    +       200     2861
rna-U13369.1:5929..6128 U13369.1        5929    6128    +       200     3083
rna-U13369.1:6129..6328 U13369.1        6129    6328    +       200     255
rna-U13369.1:6329..6528 U13369.1        6329    6528    +       200     1463
rna-U13369.1:6529..6622 U13369.1        6529    6622    +       94      402
rna-U13369.1:6623..6633 U13369.1        6623    6633    +       11      2
rna-U13369.1:6634..6643 U13369.1        6634    6643    +       10      10
rna-U13369.1:6644..6653 U13369.1        6644    6653    +       10      2
rna-U13369.1:6654..6663 U13369.1        6654    6663    +       10      0
rna-U13369.1:6664..6673 U13369.1        6664    6673    +       10      0
rna-U13369.1:6674..6683 U13369.1        6674    6683    +       10      0
rna-U13369.1:6684..6693 U13369.1        6684    6693    +       10      2
rna-U13369.1:6694..6703 U13369.1        6694    6703    +       10      1
rna-U13369.1:6704..6713 U13369.1        6704    6713    +       10      2
rna-U13369.1:6714..6723 U13369.1        6714    6723    +       10      0
rna-U13369.1:6724..6733 U13369.1        6724    6733    +       10      16
rna-U13369.1:6734..6743 U13369.1        6734    6743    +       10      35
rna-U13369.1:6744..6753 U13369.1        6744    6753    +       10      3
rna-U13369.1:6754..6763 U13369.1        6754    6763    +       10      6
rna-U13369.1:6764..6779 U13369.1        6764    6779    +       16      58
rna-U13369.1:6780..6980 U13369.1        6780    6980    +       201     2891
rna-U13369.1:6981..7180 U13369.1        6981    7180    +       200     4185
rna-U13369.1:7181..7380 U13369.1        7181    7380    +       200     626
rna-U13369.1:7381..7580 U13369.1        7381    7580    +       200     203
rna-U13369.1:7581..7780 U13369.1        7581    7780    +       200     1585
rna-U13369.1:7781..7934 U13369.1        7781    7934    +       154     8768
rna-U13369.1:7935..8134 U13369.1        7935    8134    +       200     151778
rna-U13369.1:8135..8334 U13369.1        8135    8334    +       200     222844
rna-U13369.1:8335..8534 U13369.1        8335    8534    +       200     98417
rna-U13369.1:8535..8734 U13369.1        8535    8734    +       200     34508
rna-U13369.1:8735..8934 U13369.1        8735    8934    +       200     46975
rna-U13369.1:8935..9134 U13369.1        8935    9134    +       200     16012
rna-U13369.1:9135..9334 U13369.1        9135    9334    +       200     11752
rna-U13369.1:9335..9534 U13369.1        9335    9534    +       200     212911
rna-U13369.1:9535..9734 U13369.1        9535    9734    +       200     471042
rna-U13369.1:9735..9934 U13369.1        9735    9934    +       200     206908
rna-U13369.1:9935..10134        U13369.1        9935    10134   +       200     18494
rna-U13369.1:10135..10334       U13369.1        10135   10334   +       200     255204
rna-U13369.1:10335..10534       U13369.1        10335   10534   +       200     135319
rna-U13369.1:10535..10734       U13369.1        10535   10734   +       200     226546
rna-U13369.1:10735..10934       U13369.1        10735   10934   +       200     46115
rna-U13369.1:10935..11134       U13369.1        10935   11134   +       200     200
rna-U13369.1:11135..11334       U13369.1        11135   11343   +       209     1722
rna-U13369.1:11335..11534       U13369.1        11335   11534   +       200     18274
rna-U13369.1:11535..11734       U13369.1        11535   11734   +       200     358728
rna-U13369.1:11735..11934       U13369.1        11735   11934   +       200     70452
rna-U13369.1:11935..12134       U13369.1        11935   12134   +       200     97312
rna-U13369.1:12135..12334       U13369.1        12135   12334   +       200     119347
rna-U13369.1:12335..12534       U13369.1        12335   12534   +       200     156098
rna-U13369.1:12535..12734       U13369.1        12535   12734   +       200     10497
rna-U13369.1:12735..12969       U13369.1        12735   12969   +       235     147459
rna-U13369.1:12970..13069       U13369.1        12970   13069   +       100     15
rna-U13369.1:13070..13169       U13369.1        13070   13169   +       100     2
rna-U13369.1:13170..13269       U13369.1        13170   13269   +       100     11
rna-U13369.1:13270..13314       U13369.1        13270   13314   +       45      0

# Multimapping reads : yes
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -M -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T_2.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort

        ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
          v2.0.1

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           o 293T-valAligned.out.bam.sort                   ||
||                                                                            ||
||             Output file : 293T_2.read.count                                ||
||                 Summary : 293T_2.read.count.summary                        ||
||              Annotation : U13369.1.2.gtf (GTF)                             ||
||      Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
||                                                                            ||
||                 Threads : 32                                               ||
||                   Level : feature level                                    ||
||              Paired-end : yes                                              ||
||      Multimapping reads : counted                                          ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : required                                         ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file U13369.1.2.gtf ...                                    ||
||    Features : 84                                                           ||
||    Meta-features : 84                                                      ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Process BAM file 293T-valAligned.out.bam.sort...                           ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 14730856                                             ||
||    Successfully assigned alignments : 8382987 (56.9%)                      ||
||    Running time : 0.44 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
|| Summary of counting results can be found in file "/Data/lizexing/projects  ||
|| /xindi/Data/new/Data/CleanData/293T_2.read.count.summary"                  ||
||                                                                            ||
\\============================================================================//

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat 293T_2.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-M" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T_2.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort"
Geneid  Chr     Start   End     Strand  Length  /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort
rna-U13369.1:1..200     U13369.1        1       200     +       200     717
rna-U13369.1:201..400   U13369.1        201     400     +       200     160
rna-U13369.1:401..600   U13369.1        401     600     +       200     3526
rna-U13369.1:601..800   U13369.1        601     800     +       200     1858
rna-U13369.1:801..1000  U13369.1        801     1000    +       200     11511
rna-U13369.1:1001..1200 U13369.1        1001    1200    +       200     6904
rna-U13369.1:1201..1400 U13369.1        1201    1400    +       200     5684
rna-U13369.1:1401..1600 U13369.1        1401    1600    +       200     1465
rna-U13369.1:1601..1800 U13369.1        1601    1800    +       200     3777
rna-U13369.1:1801..2000 U13369.1        1801    2000    +       200     5876
rna-U13369.1:2001..2200 U13369.1        2001    2200    +       200     9861
rna-U13369.1:2201..2400 U13369.1        2201    2400    +       200     1194
rna-U13369.1:2401..2600 U13369.1        2401    2600    +       200     1321
rna-U13369.1:2601..2800 U13369.1        2601    2800    +       200     575
rna-U13369.1:2801..3000 U13369.1        2801    3000    +       200     391
rna-U13369.1:3001..3200 U13369.1        3001    3200    +       200     698
rna-U13369.1:3201..3400 U13369.1        3201    3400    +       200     174
rna-U13369.1:3401..3656 U13369.1        3401    3656    +       256     7791
rna-U13369.1:3657..3857 U13369.1        3657    3857    +       201     916288
rna-U13369.1:3858..4057 U13369.1        3858    4057    +       200     303417
rna-U13369.1:4058..4257 U13369.1        4058    4257    +       200     233973
rna-U13369.1:4258..4457 U13369.1        4258    4457    +       200     295043
rna-U13369.1:4458..4657 U13369.1        4458    4657    +       200     257954
rna-U13369.1:4658..4857 U13369.1        4658    4857    +       200     267275
rna-U13369.1:4858..5057 U13369.1        4858    5057    +       200     525310
rna-U13369.1:5058..5257 U13369.1        5058    5257    +       200     288945
rna-U13369.1:5258..5457 U13369.1        5258    5457    +       200     112451
rna-U13369.1:5458..5527 U13369.1        5458    5527    +       70      8717
rna-U13369.1:5528..5728 U13369.1        5528    5728    +       201     26265
rna-U13369.1:5729..5928 U13369.1        5729    5928    +       200     3537
rna-U13369.1:5929..6128 U13369.1        5929    6128    +       200     3503
rna-U13369.1:6129..6328 U13369.1        6129    6328    +       200     274
rna-U13369.1:6329..6528 U13369.1        6329    6528    +       200     1495
rna-U13369.1:6529..6622 U13369.1        6529    6622    +       94      415
rna-U13369.1:6623..6633 U13369.1        6623    6633    +       11      2
rna-U13369.1:6634..6643 U13369.1        6634    6643    +       10      10
rna-U13369.1:6644..6653 U13369.1        6644    6653    +       10      2
rna-U13369.1:6654..6663 U13369.1        6654    6663    +       10      0
rna-U13369.1:6664..6673 U13369.1        6664    6673    +       10      0
rna-U13369.1:6674..6683 U13369.1        6674    6683    +       10      0
rna-U13369.1:6684..6693 U13369.1        6684    6693    +       10      4
rna-U13369.1:6694..6703 U13369.1        6694    6703    +       10      7
rna-U13369.1:6704..6713 U13369.1        6704    6713    +       10      2
rna-U13369.1:6714..6723 U13369.1        6714    6723    +       10      0
rna-U13369.1:6724..6733 U13369.1        6724    6733    +       10      28
rna-U13369.1:6734..6743 U13369.1        6734    6743    +       10      37
rna-U13369.1:6744..6753 U13369.1        6744    6753    +       10      3
rna-U13369.1:6754..6763 U13369.1        6754    6763    +       10      8
rna-U13369.1:6764..6779 U13369.1        6764    6779    +       16      58
rna-U13369.1:6780..6980 U13369.1        6780    6980    +       201     4722
rna-U13369.1:6981..7180 U13369.1        6981    7180    +       200     4818
rna-U13369.1:7181..7380 U13369.1        7181    7380    +       200     745
rna-U13369.1:7381..7580 U13369.1        7381    7580    +       200     262
rna-U13369.1:7581..7780 U13369.1        7581    7780    +       200     1799
rna-U13369.1:7781..7934 U13369.1        7781    7934    +       154     9013
rna-U13369.1:7935..8134 U13369.1        7935    8134    +       200     1179817
rna-U13369.1:8135..8334 U13369.1        8135    8334    +       200     473198
rna-U13369.1:8335..8534 U13369.1        8335    8534    +       200     120391
rna-U13369.1:8535..8734 U13369.1        8535    8734    +       200     56538
rna-U13369.1:8735..8934 U13369.1        8735    8934    +       200     109817
rna-U13369.1:8935..9134 U13369.1        8935    9134    +       200     17849
rna-U13369.1:9135..9334 U13369.1        9135    9334    +       200     12844
rna-U13369.1:9335..9534 U13369.1        9335    9534    +       200     235242
rna-U13369.1:9535..9734 U13369.1        9535    9734    +       200     476249
rna-U13369.1:9735..9934 U13369.1        9735    9934    +       200     210133
rna-U13369.1:9935..10134        U13369.1        9935    10134   +       200     23893
rna-U13369.1:10135..10334       U13369.1        10135   10334   +       200     259262
rna-U13369.1:10335..10534       U13369.1        10335   10534   +       200     137294
rna-U13369.1:10535..10734       U13369.1        10535   10734   +       200     284736
rna-U13369.1:10735..10934       U13369.1        10735   10934   +       200     65392
rna-U13369.1:10935..11134       U13369.1        10935   11134   +       200     732
rna-U13369.1:11135..11334       U13369.1        11135   11343   +       209     2628
rna-U13369.1:11335..11534       U13369.1        11335   11534   +       200     39032
rna-U13369.1:11535..11734       U13369.1        11535   11734   +       200     366368
rna-U13369.1:11735..11934       U13369.1        11735   11934   +       200     95675
rna-U13369.1:11935..12134       U13369.1        11935   12134   +       200     128072
rna-U13369.1:12135..12334       U13369.1        12135   12334   +       200     128169
rna-U13369.1:12335..12534       U13369.1        12335   12534   +       200     187062
rna-U13369.1:12535..12734       U13369.1        12535   12734   +       200     250071
rna-U13369.1:12735..12969       U13369.1        12735   12969   +       235     192602
rna-U13369.1:12970..13069       U13369.1        12970   13069   +       100     34
rna-U13369.1:13070..13169       U13369.1        13070   13169   +       100     2
rna-U13369.1:13170..13269       U13369.1        13170   13269   +       100     20
rna-U13369.1:13270..13314       U13369.1        13270   13314   +       45      0

Step 2 - 对HCT116测序数据进行计数

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort

        ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
          v2.0.1

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           o HCT116-valAligned.out.bam.sort                 ||
||                                                                            ||
||             Output file : HCT116.read.count                                ||
||                 Summary : HCT116.read.count.summary                        ||
||              Annotation : U13369.1.2.gtf (GTF)                             ||
||      Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
||                                                                            ||
||                 Threads : 32                                               ||
||                   Level : feature level                                    ||
||              Paired-end : yes                                              ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : required                                         ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file U13369.1.2.gtf ...                                    ||
||    Features : 84                                                           ||
||    Meta-features : 84                                                      ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Process BAM file HCT116-valAligned.out.bam.sort...                         ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 17788363                                             ||
||    Successfully assigned alignments : 7826386 (44.0%)                      ||
||    Running time : 0.53 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
|| Summary of counting results can be found in file "/Data/lizexing/projects  ||
|| /xindi/Data/new/Data/CleanData/HCT116.read.count.summary"                  ||
||                                                                            ||
\\============================================================================//

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HCT116.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort"
Geneid  Chr     Start   End     Strand  Length  /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort
rna-U13369.1:1..200     U13369.1        1       200     +       200     161
rna-U13369.1:201..400   U13369.1        201     400     +       200     35
rna-U13369.1:401..600   U13369.1        401     600     +       200     1965
rna-U13369.1:601..800   U13369.1        601     800     +       200     678
rna-U13369.1:801..1000  U13369.1        801     1000    +       200     5357
rna-U13369.1:1001..1200 U13369.1        1001    1200    +       200     3591
rna-U13369.1:1201..1400 U13369.1        1201    1400    +       200     3153
rna-U13369.1:1401..1600 U13369.1        1401    1600    +       200     748
rna-U13369.1:1601..1800 U13369.1        1601    1800    +       200     1431
rna-U13369.1:1801..2000 U13369.1        1801    2000    +       200     3266
rna-U13369.1:2001..2200 U13369.1        2001    2200    +       200     4859
rna-U13369.1:2201..2400 U13369.1        2201    2400    +       200     567
rna-U13369.1:2401..2600 U13369.1        2401    2600    +       200     664
rna-U13369.1:2601..2800 U13369.1        2601    2800    +       200     313
rna-U13369.1:2801..3000 U13369.1        2801    3000    +       200     203
rna-U13369.1:3001..3200 U13369.1        3001    3200    +       200     302
rna-U13369.1:3201..3400 U13369.1        3201    3400    +       200     57
rna-U13369.1:3401..3656 U13369.1        3401    3656    +       256     3622
rna-U13369.1:3657..3857 U13369.1        3657    3857    +       201     1334016
rna-U13369.1:3858..4057 U13369.1        3858    4057    +       200     360644
rna-U13369.1:4058..4257 U13369.1        4058    4257    +       200     275503
rna-U13369.1:4258..4457 U13369.1        4258    4457    +       200     298265
rna-U13369.1:4458..4657 U13369.1        4458    4657    +       200     362694
rna-U13369.1:4658..4857 U13369.1        4658    4857    +       200     300733
rna-U13369.1:4858..5057 U13369.1        4858    5057    +       200     647910
rna-U13369.1:5058..5257 U13369.1        5058    5257    +       200     322107
rna-U13369.1:5258..5457 U13369.1        5258    5457    +       200     111634
rna-U13369.1:5458..5527 U13369.1        5458    5527    +       70      4907
rna-U13369.1:5528..5728 U13369.1        5528    5728    +       201     12576
rna-U13369.1:5729..5928 U13369.1        5729    5928    +       200     1583
rna-U13369.1:5929..6128 U13369.1        5929    6128    +       200     2051
rna-U13369.1:6129..6328 U13369.1        6129    6328    +       200     153
rna-U13369.1:6329..6528 U13369.1        6329    6528    +       200     1035
rna-U13369.1:6529..6622 U13369.1        6529    6622    +       94      313
rna-U13369.1:6623..6633 U13369.1        6623    6633    +       11      7
rna-U13369.1:6634..6643 U13369.1        6634    6643    +       10      7
rna-U13369.1:6644..6653 U13369.1        6644    6653    +       10      0
rna-U13369.1:6654..6663 U13369.1        6654    6663    +       10      0
rna-U13369.1:6664..6673 U13369.1        6664    6673    +       10      0
rna-U13369.1:6674..6683 U13369.1        6674    6683    +       10      0
rna-U13369.1:6684..6693 U13369.1        6684    6693    +       10      1
rna-U13369.1:6694..6703 U13369.1        6694    6703    +       10      0
rna-U13369.1:6704..6713 U13369.1        6704    6713    +       10      0
rna-U13369.1:6714..6723 U13369.1        6714    6723    +       10      0
rna-U13369.1:6724..6733 U13369.1        6724    6733    +       10      11
rna-U13369.1:6734..6743 U13369.1        6734    6743    +       10      2
rna-U13369.1:6744..6753 U13369.1        6744    6753    +       10      0
rna-U13369.1:6754..6763 U13369.1        6754    6763    +       10      3
rna-U13369.1:6764..6779 U13369.1        6764    6779    +       16      62
rna-U13369.1:6780..6980 U13369.1        6780    6980    +       201     2860
rna-U13369.1:6981..7180 U13369.1        6981    7180    +       200     4370
rna-U13369.1:7181..7380 U13369.1        7181    7380    +       200     650
rna-U13369.1:7381..7580 U13369.1        7381    7580    +       200     124
rna-U13369.1:7581..7780 U13369.1        7581    7780    +       200     1822
rna-U13369.1:7781..7934 U13369.1        7781    7934    +       154     9807
rna-U13369.1:7935..8134 U13369.1        7935    8134    +       200     220091
rna-U13369.1:8135..8334 U13369.1        8135    8334    +       200     274925
rna-U13369.1:8335..8534 U13369.1        8335    8534    +       200     138838
rna-U13369.1:8535..8734 U13369.1        8535    8734    +       200     37984
rna-U13369.1:8735..8934 U13369.1        8735    8934    +       200     57781
rna-U13369.1:8935..9134 U13369.1        8935    9134    +       200     20546
rna-U13369.1:9135..9334 U13369.1        9135    9334    +       200     14483
rna-U13369.1:9335..9534 U13369.1        9335    9534    +       200     224465
rna-U13369.1:9535..9734 U13369.1        9535    9734    +       200     566130
rna-U13369.1:9735..9934 U13369.1        9735    9934    +       200     224644
rna-U13369.1:9935..10134        U13369.1        9935    10134   +       200     23861
rna-U13369.1:10135..10334       U13369.1        10135   10334   +       200     324401
rna-U13369.1:10335..10534       U13369.1        10335   10534   +       200     139550
rna-U13369.1:10535..10734       U13369.1        10535   10734   +       200     286406
rna-U13369.1:10735..10934       U13369.1        10735   10934   +       200     44183
rna-U13369.1:10935..11134       U13369.1        10935   11134   +       200     206
rna-U13369.1:11135..11334       U13369.1        11135   11343   +       209     2581
rna-U13369.1:11335..11534       U13369.1        11335   11534   +       200     21753
rna-U13369.1:11535..11734       U13369.1        11535   11734   +       200     397492
rna-U13369.1:11735..11934       U13369.1        11735   11934   +       200     76640
rna-U13369.1:11935..12134       U13369.1        11935   12134   +       200     101386
rna-U13369.1:12135..12334       U13369.1        12135   12334   +       200     131966
rna-U13369.1:12335..12534       U13369.1        12335   12534   +       200     169786
rna-U13369.1:12535..12734       U13369.1        12535   12734   +       200     85848
rna-U13369.1:12735..12969       U13369.1        12735   12969   +       235     153568
rna-U13369.1:12970..13069       U13369.1        12970   13069   +       100     35
rna-U13369.1:13070..13169       U13369.1        13070   13169   +       100     2
rna-U13369.1:13170..13269       U13369.1        13170   13269   +       100     13
rna-U13369.1:13270..13314       U13369.1        13270   13314   +       45      0

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -M -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116_2.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort

        ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
          v2.0.1

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           o HCT116-valAligned.out.bam.sort                 ||
||                                                                            ||
||             Output file : HCT116_2.read.count                              ||
||                 Summary : HCT116_2.read.count.summary                      ||
||              Annotation : U13369.1.2.gtf (GTF)                             ||
||      Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
||                                                                            ||
||                 Threads : 32                                               ||
||                   Level : feature level                                    ||
||              Paired-end : yes                                              ||
||      Multimapping reads : counted                                          ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : required                                         ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file U13369.1.2.gtf ...                                    ||
||    Features : 84                                                           ||
||    Meta-features : 84                                                      ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Process BAM file HCT116-valAligned.out.bam.sort...                         ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 17788363                                             ||
||    Successfully assigned alignments : 9851054 (55.4%)                      ||
||    Running time : 0.55 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
|| Summary of counting results can be found in file "/Data/lizexing/projects  ||
|| /xindi/Data/new/Data/CleanData/HCT116_2.read.count.summary"                ||
||                                                                            ||
\\============================================================================//

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HCT116_2.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-M" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116_2.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort"
Geneid  Chr     Start   End     Strand  Length  /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort
rna-U13369.1:1..200     U13369.1        1       200     +       200     481
rna-U13369.1:201..400   U13369.1        201     400     +       200     103
rna-U13369.1:401..600   U13369.1        401     600     +       200     2047
rna-U13369.1:601..800   U13369.1        601     800     +       200     775
rna-U13369.1:801..1000  U13369.1        801     1000    +       200     5514
rna-U13369.1:1001..1200 U13369.1        1001    1200    +       200     3626
rna-U13369.1:1201..1400 U13369.1        1201    1400    +       200     3202
rna-U13369.1:1401..1600 U13369.1        1401    1600    +       200     833
rna-U13369.1:1601..1800 U13369.1        1601    1800    +       200     2054
rna-U13369.1:1801..2000 U13369.1        1801    2000    +       200     3627
rna-U13369.1:2001..2200 U13369.1        2001    2200    +       200     5013
rna-U13369.1:2201..2400 U13369.1        2201    2400    +       200     642
rna-U13369.1:2401..2600 U13369.1        2401    2600    +       200     720
rna-U13369.1:2601..2800 U13369.1        2601    2800    +       200     331
rna-U13369.1:2801..3000 U13369.1        2801    3000    +       200     233
rna-U13369.1:3001..3200 U13369.1        3001    3200    +       200     349
rna-U13369.1:3201..3400 U13369.1        3201    3400    +       200     61
rna-U13369.1:3401..3656 U13369.1        3401    3656    +       256     5055
rna-U13369.1:3657..3857 U13369.1        3657    3857    +       201     1351748
rna-U13369.1:3858..4057 U13369.1        3858    4057    +       200     371164
rna-U13369.1:4058..4257 U13369.1        4058    4257    +       200     279012
rna-U13369.1:4258..4457 U13369.1        4258    4457    +       200     317474
rna-U13369.1:4458..4657 U13369.1        4458    4657    +       200     367276
rna-U13369.1:4658..4857 U13369.1        4658    4857    +       200     312708
rna-U13369.1:4858..5057 U13369.1        4858    5057    +       200     687565
rna-U13369.1:5058..5257 U13369.1        5058    5257    +       200     343641
rna-U13369.1:5258..5457 U13369.1        5258    5457    +       200     133654
rna-U13369.1:5458..5527 U13369.1        5458    5527    +       70      5054
rna-U13369.1:5528..5728 U13369.1        5528    5728    +       201     18264
rna-U13369.1:5729..5928 U13369.1        5729    5928    +       200     1893
rna-U13369.1:5929..6128 U13369.1        5929    6128    +       200     2161
rna-U13369.1:6129..6328 U13369.1        6129    6328    +       200     164
rna-U13369.1:6329..6528 U13369.1        6329    6528    +       200     1064
rna-U13369.1:6529..6622 U13369.1        6529    6622    +       94      336
rna-U13369.1:6623..6633 U13369.1        6623    6633    +       11      7
rna-U13369.1:6634..6643 U13369.1        6634    6643    +       10      7
rna-U13369.1:6644..6653 U13369.1        6644    6653    +       10      0
rna-U13369.1:6654..6663 U13369.1        6654    6663    +       10      0
rna-U13369.1:6664..6673 U13369.1        6664    6673    +       10      0
rna-U13369.1:6674..6683 U13369.1        6674    6683    +       10      0
rna-U13369.1:6684..6693 U13369.1        6684    6693    +       10      1
rna-U13369.1:6694..6703 U13369.1        6694    6703    +       10      0
rna-U13369.1:6704..6713 U13369.1        6704    6713    +       10      0
rna-U13369.1:6714..6723 U13369.1        6714    6723    +       10      0
rna-U13369.1:6724..6733 U13369.1        6724    6733    +       10      35
rna-U13369.1:6734..6743 U13369.1        6734    6743    +       10      10
rna-U13369.1:6744..6753 U13369.1        6744    6753    +       10      0
rna-U13369.1:6754..6763 U13369.1        6754    6763    +       10      3
rna-U13369.1:6764..6779 U13369.1        6764    6779    +       16      62
rna-U13369.1:6780..6980 U13369.1        6780    6980    +       201     4299
rna-U13369.1:6981..7180 U13369.1        6981    7180    +       200     4973
rna-U13369.1:7181..7380 U13369.1        7181    7380    +       200     754
rna-U13369.1:7381..7580 U13369.1        7381    7580    +       200     172
rna-U13369.1:7581..7780 U13369.1        7581    7780    +       200     2117
rna-U13369.1:7781..7934 U13369.1        7781    7934    +       154     10309
rna-U13369.1:7935..8134 U13369.1        7935    8134    +       200     1384909
rna-U13369.1:8135..8334 U13369.1        8135    8334    +       200     511358
rna-U13369.1:8335..8534 U13369.1        8335    8534    +       200     186408
rna-U13369.1:8535..8734 U13369.1        8535    8734    +       200     58757
rna-U13369.1:8735..8934 U13369.1        8735    8934    +       200     129140
rna-U13369.1:8935..9134 U13369.1        8935    9134    +       200     25706
rna-U13369.1:9135..9334 U13369.1        9135    9334    +       200     16391
rna-U13369.1:9335..9534 U13369.1        9335    9534    +       200     233665
rna-U13369.1:9535..9734 U13369.1        9535    9734    +       200     583439
rna-U13369.1:9735..9934 U13369.1        9735    9934    +       200     233122
rna-U13369.1:9935..10134        U13369.1        9935    10134   +       200     43808
rna-U13369.1:10135..10334       U13369.1        10135   10334   +       200     375750
rna-U13369.1:10335..10534       U13369.1        10335   10534   +       200     141254
rna-U13369.1:10535..10734       U13369.1        10535   10734   +       200     293024
rna-U13369.1:10735..10934       U13369.1        10735   10934   +       200     60600
rna-U13369.1:10935..11134       U13369.1        10935   11134   +       200     689
rna-U13369.1:11135..11334       U13369.1        11135   11343   +       209     3478
rna-U13369.1:11335..11534       U13369.1        11335   11534   +       200     53933
rna-U13369.1:11535..11734       U13369.1        11535   11734   +       200     403297
rna-U13369.1:11735..11934       U13369.1        11735   11934   +       200     102217
rna-U13369.1:11935..12134       U13369.1        11935   12134   +       200     126812
rna-U13369.1:12135..12334       U13369.1        12135   12334   +       200     144705
rna-U13369.1:12335..12534       U13369.1        12335   12534   +       200     178697
rna-U13369.1:12535..12734       U13369.1        12535   12734   +       200     121242
rna-U13369.1:12735..12969       U13369.1        12735   12969   +       235     187945
rna-U13369.1:12970..13069       U13369.1        12970   13069   +       100     98
rna-U13369.1:13070..13169       U13369.1        13070   13169   +       100     2
rna-U13369.1:13170..13269       U13369.1        13170   13269   +       100     13
rna-U13369.1:13270..13314       U13369.1        13270   13314   +       45      2

Step 3 - 对HeLa测序数据进行计数

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort
        ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
          v2.0.1

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           o HeLa-valAligned.out.bam.sort                   ||
||                                                                            ||
||             Output file : HeLa.read.count                                  ||
||                 Summary : HeLa.read.count.summary                          ||
||              Annotation : U13369.1.2.gtf (GTF)                             ||
||      Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
||                                                                            ||
||                 Threads : 32                                               ||
||                   Level : feature level                                    ||
||              Paired-end : yes                                              ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : required                                         ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file U13369.1.2.gtf ...                                    ||
||    Features : 84                                                           ||
||    Meta-features : 84                                                      ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Process BAM file HeLa-valAligned.out.bam.sort...                           ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 19960509                                             ||
||    Successfully assigned alignments : 8942151 (44.8%)                      ||
||    Running time : 0.70 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
|| Summary of counting results can be found in file "/Data/lizexing/projects  ||
|| /xindi/Data/new/Data/CleanData/HeLa.read.count.summary"                    ||
||                                                                            ||
\\============================================================================//

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HeLa.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort"
Geneid  Chr     Start   End     Strand  Length  /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort
rna-U13369.1:1..200     U13369.1        1       200     +       200     425
rna-U13369.1:201..400   U13369.1        201     400     +       200     199
rna-U13369.1:401..600   U13369.1        401     600     +       200     1364
rna-U13369.1:601..800   U13369.1        601     800     +       200     820
rna-U13369.1:801..1000  U13369.1        801     1000    +       200     3497
rna-U13369.1:1001..1200 U13369.1        1001    1200    +       200     3721
rna-U13369.1:1201..1400 U13369.1        1201    1400    +       200     3341
rna-U13369.1:1401..1600 U13369.1        1401    1600    +       200     606
rna-U13369.1:1601..1800 U13369.1        1601    1800    +       200     1830
rna-U13369.1:1801..2000 U13369.1        1801    2000    +       200     2561
rna-U13369.1:2001..2200 U13369.1        2001    2200    +       200     4613
rna-U13369.1:2201..2400 U13369.1        2201    2400    +       200     639
rna-U13369.1:2401..2600 U13369.1        2401    2600    +       200     566
rna-U13369.1:2601..2800 U13369.1        2601    2800    +       200     357
rna-U13369.1:2801..3000 U13369.1        2801    3000    +       200     186
rna-U13369.1:3001..3200 U13369.1        3001    3200    +       200     335
rna-U13369.1:3201..3400 U13369.1        3201    3400    +       200     55
rna-U13369.1:3401..3656 U13369.1        3401    3656    +       256     3545
rna-U13369.1:3657..3857 U13369.1        3657    3857    +       201     1555726
rna-U13369.1:3858..4057 U13369.1        3858    4057    +       200     447916
rna-U13369.1:4058..4257 U13369.1        4058    4257    +       200     325556
rna-U13369.1:4258..4457 U13369.1        4258    4457    +       200     397070
rna-U13369.1:4458..4657 U13369.1        4458    4657    +       200     377027
rna-U13369.1:4658..4857 U13369.1        4658    4857    +       200     389659
rna-U13369.1:4858..5057 U13369.1        4858    5057    +       200     681022
rna-U13369.1:5058..5257 U13369.1        5058    5257    +       200     429943
rna-U13369.1:5258..5457 U13369.1        5258    5457    +       200     140888
rna-U13369.1:5458..5527 U13369.1        5458    5527    +       70      3194
rna-U13369.1:5528..5728 U13369.1        5528    5728    +       201     9285
rna-U13369.1:5729..5928 U13369.1        5729    5928    +       200     2524
rna-U13369.1:5929..6128 U13369.1        5929    6128    +       200     2160
rna-U13369.1:6129..6328 U13369.1        6129    6328    +       200     234
rna-U13369.1:6329..6528 U13369.1        6329    6528    +       200     1125
rna-U13369.1:6529..6622 U13369.1        6529    6622    +       94      394
rna-U13369.1:6623..6633 U13369.1        6623    6633    +       11      4
rna-U13369.1:6634..6643 U13369.1        6634    6643    +       10      11
rna-U13369.1:6644..6653 U13369.1        6644    6653    +       10      0
rna-U13369.1:6654..6663 U13369.1        6654    6663    +       10      1
rna-U13369.1:6664..6673 U13369.1        6664    6673    +       10      0
rna-U13369.1:6674..6683 U13369.1        6674    6683    +       10      0
rna-U13369.1:6684..6693 U13369.1        6684    6693    +       10      2
rna-U13369.1:6694..6703 U13369.1        6694    6703    +       10      1
rna-U13369.1:6704..6713 U13369.1        6704    6713    +       10      1
rna-U13369.1:6714..6723 U13369.1        6714    6723    +       10      0
rna-U13369.1:6724..6733 U13369.1        6724    6733    +       10      20
rna-U13369.1:6734..6743 U13369.1        6734    6743    +       10      19
rna-U13369.1:6744..6753 U13369.1        6744    6753    +       10      0
rna-U13369.1:6754..6763 U13369.1        6754    6763    +       10      8
rna-U13369.1:6764..6779 U13369.1        6764    6779    +       16      143
rna-U13369.1:6780..6980 U13369.1        6780    6980    +       201     5770
rna-U13369.1:6981..7180 U13369.1        6981    7180    +       200     6087
rna-U13369.1:7181..7380 U13369.1        7181    7380    +       200     903
rna-U13369.1:7381..7580 U13369.1        7381    7580    +       200     185
rna-U13369.1:7581..7780 U13369.1        7581    7780    +       200     2597
rna-U13369.1:7781..7934 U13369.1        7781    7934    +       154     9364
rna-U13369.1:7935..8134 U13369.1        7935    8134    +       200     71274
rna-U13369.1:8135..8334 U13369.1        8135    8334    +       200     259450
rna-U13369.1:8335..8534 U13369.1        8335    8534    +       200     120077
rna-U13369.1:8535..8734 U13369.1        8535    8734    +       200     30903
rna-U13369.1:8735..8934 U13369.1        8735    8934    +       200     70267
rna-U13369.1:8935..9134 U13369.1        8935    9134    +       200     24900
rna-U13369.1:9135..9334 U13369.1        9135    9334    +       200     17232
rna-U13369.1:9335..9534 U13369.1        9335    9534    +       200     290496
rna-U13369.1:9535..9734 U13369.1        9535    9734    +       200     601137
rna-U13369.1:9735..9934 U13369.1        9735    9934    +       200     278377
rna-U13369.1:9935..10134        U13369.1        9935    10134   +       200     26631
rna-U13369.1:10135..10334       U13369.1        10135   10334   +       200     279741
rna-U13369.1:10335..10534       U13369.1        10335   10534   +       200     197391
rna-U13369.1:10535..10734       U13369.1        10535   10734   +       200     346754
rna-U13369.1:10735..10934       U13369.1        10735   10934   +       200     54592
rna-U13369.1:10935..11134       U13369.1        10935   11134   +       200     255
rna-U13369.1:11135..11334       U13369.1        11135   11343   +       209     2547
rna-U13369.1:11335..11534       U13369.1        11335   11534   +       200     19006
rna-U13369.1:11535..11734       U13369.1        11535   11734   +       200     468765
rna-U13369.1:11735..11934       U13369.1        11735   11934   +       200     104017
rna-U13369.1:11935..12134       U13369.1        11935   12134   +       200     127648
rna-U13369.1:12135..12334       U13369.1        12135   12334   +       200     170154
rna-U13369.1:12335..12534       U13369.1        12335   12534   +       200     249635
rna-U13369.1:12535..12734       U13369.1        12535   12734   +       200     132354
rna-U13369.1:12735..12969       U13369.1        12735   12969   +       235     180979
rna-U13369.1:12970..13069       U13369.1        12970   13069   +       100     51
rna-U13369.1:13070..13169       U13369.1        13070   13169   +       100     3
rna-U13369.1:13170..13269       U13369.1        13170   13269   +       100     16
rna-U13369.1:13270..13314       U13369.1        13270   13314   +       45      0

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -M -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa_2.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort
        ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
          v2.0.1

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           o HeLa-valAligned.out.bam.sort                   ||
||                                                                            ||
||             Output file : HeLa_2.read.count                                ||
||                 Summary : HeLa_2.read.count.summary                        ||
||              Annotation : U13369.1.2.gtf (GTF)                             ||
||      Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
||                                                                            ||
||                 Threads : 32                                               ||
||                   Level : feature level                                    ||
||              Paired-end : yes                                              ||
||      Multimapping reads : counted                                          ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : required                                         ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file U13369.1.2.gtf ...                                    ||
||    Features : 84                                                           ||
||    Meta-features : 84                                                      ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Process BAM file HeLa-valAligned.out.bam.sort...                           ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 19960509                                             ||
||    Successfully assigned alignments : 11652948 (58.4%)                     ||
||    Running time : 0.74 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
|| Summary of counting results can be found in file "/Data/lizexing/projects  ||
|| /xindi/Data/new/Data/CleanData/HeLa_2.read.count.summary"                  ||
||                                                                            ||
\\============================================================================//


(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HeLa_2.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-M" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa_2.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort"
Geneid  Chr     Start   End     Strand  Length  /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort
rna-U13369.1:1..200     U13369.1        1       200     +       200     2419
rna-U13369.1:201..400   U13369.1        201     400     +       200     532
rna-U13369.1:401..600   U13369.1        401     600     +       200     1485
rna-U13369.1:601..800   U13369.1        601     800     +       200     881
rna-U13369.1:801..1000  U13369.1        801     1000    +       200     3626
rna-U13369.1:1001..1200 U13369.1        1001    1200    +       200     3773
rna-U13369.1:1201..1400 U13369.1        1201    1400    +       200     3406
rna-U13369.1:1401..1600 U13369.1        1401    1600    +       200     878
rna-U13369.1:1601..1800 U13369.1        1601    1800    +       200     3043
rna-U13369.1:1801..2000 U13369.1        1801    2000    +       200     2980
rna-U13369.1:2001..2200 U13369.1        2001    2200    +       200     5493
rna-U13369.1:2201..2400 U13369.1        2201    2400    +       200     661
rna-U13369.1:2401..2600 U13369.1        2401    2600    +       200     604
rna-U13369.1:2601..2800 U13369.1        2601    2800    +       200     363
rna-U13369.1:2801..3000 U13369.1        2801    3000    +       200     252
rna-U13369.1:3001..3200 U13369.1        3001    3200    +       200     431
rna-U13369.1:3201..3400 U13369.1        3201    3400    +       200     64
rna-U13369.1:3401..3656 U13369.1        3401    3656    +       256     4233
rna-U13369.1:3657..3857 U13369.1        3657    3857    +       201     1578385
rna-U13369.1:3858..4057 U13369.1        3858    4057    +       200     477469
rna-U13369.1:4058..4257 U13369.1        4058    4257    +       200     332416
rna-U13369.1:4258..4457 U13369.1        4258    4457    +       200     405998
rna-U13369.1:4458..4657 U13369.1        4458    4657    +       200     385290
rna-U13369.1:4658..4857 U13369.1        4658    4857    +       200     398064
rna-U13369.1:4858..5057 U13369.1        4858    5057    +       200     761293
rna-U13369.1:5058..5257 U13369.1        5058    5257    +       200     442319
rna-U13369.1:5258..5457 U13369.1        5258    5457    +       200     166490
rna-U13369.1:5458..5527 U13369.1        5458    5527    +       70      3361
rna-U13369.1:5528..5728 U13369.1        5528    5728    +       201     16190
rna-U13369.1:5729..5928 U13369.1        5729    5928    +       200     2965
rna-U13369.1:5929..6128 U13369.1        5929    6128    +       200     2208
rna-U13369.1:6129..6328 U13369.1        6129    6328    +       200     239
rna-U13369.1:6329..6528 U13369.1        6329    6528    +       200     1151
rna-U13369.1:6529..6622 U13369.1        6529    6622    +       94      399
rna-U13369.1:6623..6633 U13369.1        6623    6633    +       11      4
rna-U13369.1:6634..6643 U13369.1        6634    6643    +       10      11
rna-U13369.1:6644..6653 U13369.1        6644    6653    +       10      0
rna-U13369.1:6654..6663 U13369.1        6654    6663    +       10      1
rna-U13369.1:6664..6673 U13369.1        6664    6673    +       10      0
rna-U13369.1:6674..6683 U13369.1        6674    6683    +       10      0
rna-U13369.1:6684..6693 U13369.1        6684    6693    +       10      8
rna-U13369.1:6694..6703 U13369.1        6694    6703    +       10      5
rna-U13369.1:6704..6713 U13369.1        6704    6713    +       10      3
rna-U13369.1:6714..6723 U13369.1        6714    6723    +       10      0
rna-U13369.1:6724..6733 U13369.1        6724    6733    +       10      32
rna-U13369.1:6734..6743 U13369.1        6734    6743    +       10      22
rna-U13369.1:6744..6753 U13369.1        6744    6753    +       10      0
rna-U13369.1:6754..6763 U13369.1        6754    6763    +       10      16
rna-U13369.1:6764..6779 U13369.1        6764    6779    +       16      145
rna-U13369.1:6780..6980 U13369.1        6780    6980    +       201     7721
rna-U13369.1:6981..7180 U13369.1        6981    7180    +       200     6618
rna-U13369.1:7181..7380 U13369.1        7181    7380    +       200     1050
rna-U13369.1:7381..7580 U13369.1        7381    7580    +       200     245
rna-U13369.1:7581..7780 U13369.1        7581    7780    +       200     2829
rna-U13369.1:7781..7934 U13369.1        7781    7934    +       154     9438
rna-U13369.1:7935..8134 U13369.1        7935    8134    +       200     1647767
rna-U13369.1:8135..8334 U13369.1        8135    8334    +       200     591411
rna-U13369.1:8335..8534 U13369.1        8335    8534    +       200     142470
rna-U13369.1:8535..8734 U13369.1        8535    8734    +       200     62351
rna-U13369.1:8735..8934 U13369.1        8735    8934    +       200     130670
rna-U13369.1:8935..9134 U13369.1        8935    9134    +       200     25930
rna-U13369.1:9135..9334 U13369.1        9135    9334    +       200     19534
rna-U13369.1:9335..9534 U13369.1        9335    9534    +       200     301434
rna-U13369.1:9535..9734 U13369.1        9535    9734    +       200     637212
rna-U13369.1:9735..9934 U13369.1        9735    9934    +       200     302240
rna-U13369.1:9935..10134        U13369.1        9935    10134   +       200     34581
rna-U13369.1:10135..10334       U13369.1        10135   10334   +       200     317622
rna-U13369.1:10335..10534       U13369.1        10335   10534   +       200     200022
rna-U13369.1:10535..10734       U13369.1        10535   10734   +       200     406134
rna-U13369.1:10735..10934       U13369.1        10735   10934   +       200     65522
rna-U13369.1:10935..11134       U13369.1        10935   11134   +       200     1342
rna-U13369.1:11135..11334       U13369.1        11135   11343   +       209     4619
rna-U13369.1:11335..11534       U13369.1        11335   11534   +       200     45959
rna-U13369.1:11535..11734       U13369.1        11535   11734   +       200     498389
rna-U13369.1:11735..11934       U13369.1        11735   11934   +       200     147233
rna-U13369.1:11935..12134       U13369.1        11935   12134   +       200     160008
rna-U13369.1:12135..12334       U13369.1        12135   12334   +       200     184780
rna-U13369.1:12335..12534       U13369.1        12335   12534   +       200     266084
rna-U13369.1:12535..12734       U13369.1        12535   12734   +       200     208201
rna-U13369.1:12735..12969       U13369.1        12735   12969   +       235     213777
rna-U13369.1:12970..13069       U13369.1        12970   13069   +       100     104
rna-U13369.1:13070..13169       U13369.1        13070   13169   +       100     5
rna-U13369.1:13170..13269       U13369.1        13170   13269   +       100     38
rna-U13369.1:13270..13314       U13369.1        13270   13314   +       45      0

Step 4 - 对293T测序数据进行计数：5.8S_RNA_bin=10bp, 3’ETS_RNA_bin=100=bp, others_RNA_bin=100bp

# Multimapping reads : yes
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -M -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.3.gtf -O -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T_4.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort
        ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
          v2.0.1

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           o 293T-valAligned.out.bam.sort                   ||
||                                                                            ||
||             Output file : 293T_4.read.count                                ||
||                 Summary : 293T_4.read.count.summary                        ||
||              Annotation : U13369.1.3.gtf (GTF)                             ||
||      Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
||                                                                            ||
||                 Threads : 32                                               ||
||                   Level : feature level                                    ||
||              Paired-end : yes                                              ||
||      Multimapping reads : counted                                          ||
|| Multi-overlapping reads : counted                                          ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : required                                         ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file U13369.1.3.gtf ...                                    ||
||    Features : 147                                                          ||
||    Meta-features : 147                                                     ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Process BAM file 293T-valAligned.out.bam.sort...                           ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 14730856                                             ||
||    Successfully assigned alignments : 14729340 (100.0%)                    ||
||    Running time : 0.41 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
|| Summary of counting results can be found in file "/Data/lizexing/projects  ||
|| /xindi/Data/new/Data/CleanData/293T_4.read.count.summary"                  ||
||                                                                            ||
\\============================================================================//

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat 293T_4.read.count

# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-M" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.3.gtf" "-O" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T_4.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort"
Geneid  Chr     Start   End     Strand  Length  /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort
rna-U13369.1:1..100     U13369.1        1       100     +       100     1187
rna-U13369.1:101..200   U13369.1        101     200     +       100     1503
rna-U13369.1:401..500   U13369.1        401     500     +       100     5363
rna-U13369.1:501..600   U13369.1        501     600     +       100     6986
rna-U13369.1:601..700   U13369.1        601     700     +       100     9949
rna-U13369.1:701..800   U13369.1        701     800     +       100     98122
rna-U13369.1:801..900   U13369.1        801     900     +       100     16717
rna-U13369.1:901..1000  U13369.1        901     1000    +       100     20405
rna-U13369.1:1001..1100 U13369.1        1001    1100    +       100     23766
rna-U13369.1:1101..1200 U13369.1        1101    1200    +       100     17966
rna-U13369.1:1201..1300 U13369.1        1201    1300    +       100     19775
rna-U13369.1:1301..1400 U13369.1        1301    1400    +       100     16180
rna-U13369.1:1401..1500 U13369.1        1401    1500    +       100     14001
rna-U13369.1:1501..1600 U13369.1        1501    1600    +       100     6822
rna-U13369.1:1601..1700 U13369.1        1601    1700    +       100     8094
rna-U13369.1:1701..1800 U13369.1        1701    1800    +       100     16134
rna-U13369.1:1801..1900 U13369.1        1801    1900    +       100     94548
rna-U13369.1:1901..2000 U13369.1        1901    2000    +       100     22717
rna-U13369.1:2001..2100 U13369.1        2001    2100    +       100     18312
rna-U13369.1:2101..2200 U13369.1        2101    2200    +       100     20072
rna-U13369.1:2201..2300 U13369.1        2201    2300    +       100     17985
rna-U13369.1:2301..2400 U13369.1        2301    2400    +       100     4152
rna-U13369.1:2401..2500 U13369.1        2401    2500    +       100     7319
rna-U13369.1:2501..2600 U13369.1        2501    2600    +       100     32616
rna-U13369.1:2601..2700 U13369.1        2601    2700    +       100     7969
rna-U13369.1:2701..2800 U13369.1        2701    2800    +       100     4480
rna-U13369.1:2801..2900 U13369.1        2801    2900    +       100     3119
rna-U13369.1:2901..3000 U13369.1        2901    3000    +       100     2616
rna-U13369.1:3001..3100 U13369.1        3001    3100    +       100     101809
rna-U13369.1:3101..3200 U13369.1        3101    3200    +       100     1170
rna-U13369.1:3201..3300 U13369.1        3201    3300    +       100     15880
rna-U13369.1:3301..3400 U13369.1        3301    3400    +       100     13653
rna-U13369.1:3401..3500 U13369.1        3401    3500    +       100     7847
rna-U13369.1:3501..3600 U13369.1        3501    3600    +       100     16769
rna-U13369.1:3601..3656 U13369.1        3601    3656    +       56      15274
rna-U13369.1:3657..3756 U13369.1        3657    3756    +       100     983156
rna-U13369.1:3757..3856 U13369.1        3757    3856    +       100     1165318
rna-U13369.1:3857..3956 U13369.1        3857    3956    +       100     632269
rna-U13369.1:3957..4056 U13369.1        3957    4056    +       100     748895
rna-U13369.1:4057..4156 U13369.1        4057    4156    +       100     744331
rna-U13369.1:4157..4256 U13369.1        4157    4256    +       100     557704
rna-U13369.1:4257..4356 U13369.1        4257    4356    +       100     687009
rna-U13369.1:4357..4456 U13369.1        4357    4456    +       100     667541
rna-U13369.1:4457..4556 U13369.1        4457    4556    +       100     923120
rna-U13369.1:4557..4656 U13369.1        4557    4656    +       100     899692
rna-U13369.1:4657..4756 U13369.1        4657    4756    +       100     702785
rna-U13369.1:4757..4856 U13369.1        4757    4856    +       100     481957
rna-U13369.1:4857..4956 U13369.1        4857    4956    +       100     1077524
rna-U13369.1:4957..5056 U13369.1        4957    5056    +       100     1065799
rna-U13369.1:5057..5156 U13369.1        5057    5156    +       100     1171111
rna-U13369.1:5157..5256 U13369.1        5157    5256    +       100     775546
rna-U13369.1:5257..5356 U13369.1        5257    5356    +       100     695770
rna-U13369.1:5357..5456 U13369.1        5357    5456    +       100     385095
rna-U13369.1:5457..5527 U13369.1        5457    5527    +       71      203279
rna-U13369.1:5528..5627 U13369.1        5528    5627    +       100     84858
rna-U13369.1:5628..5727 U13369.1        5628    5727    +       100     29179
rna-U13369.1:5728..5827 U13369.1        5728    5827    +       100     23726
rna-U13369.1:5828..5927 U13369.1        5828    5927    +       100     11274
rna-U13369.1:5928..6027 U13369.1        5928    6027    +       100     11847
rna-U13369.1:6028..6127 U13369.1        6028    6127    +       100     17874
rna-U13369.1:6128..6227 U13369.1        6128    6227    +       100     7161
rna-U13369.1:6228..6327 U13369.1        6228    6327    +       100     2865
rna-U13369.1:6328..6427 U13369.1        6328    6427    +       100     7905
rna-U13369.1:6428..6527 U13369.1        6428    6527    +       100     7148
rna-U13369.1:6528..6623 U13369.1        6528    6623    +       96      6709
rna-U13369.1:6623..6633 U13369.1        6623    6633    +       11      6315
rna-U13369.1:6634..6643 U13369.1        6634    6643    +       10      18897
rna-U13369.1:6644..6653 U13369.1        6644    6653    +       10      19357
rna-U13369.1:6654..6663 U13369.1        6654    6663    +       10      21411
rna-U13369.1:6664..6673 U13369.1        6664    6673    +       10      23597
rna-U13369.1:6674..6683 U13369.1        6674    6683    +       10      24246
rna-U13369.1:6684..6693 U13369.1        6684    6693    +       10      25318
rna-U13369.1:6694..6703 U13369.1        6694    6703    +       10      25594
rna-U13369.1:6704..6713 U13369.1        6704    6713    +       10      26630
rna-U13369.1:6714..6723 U13369.1        6714    6723    +       10      27362
rna-U13369.1:6724..6733 U13369.1        6724    6733    +       10      27816
rna-U13369.1:6734..6743 U13369.1        6734    6743    +       10      27702
rna-U13369.1:6744..6753 U13369.1        6744    6753    +       10      27890
rna-U13369.1:6754..6763 U13369.1        6754    6763    +       10      28045
rna-U13369.1:6764..6779 U13369.1        6764    6779    +       16      28987
rna-U13369.1:6780..6879 U13369.1        6780    6879    +       100     28190
rna-U13369.1:6880..6979 U13369.1        6880    6979    +       100     12940
rna-U13369.1:6980..7079 U13369.1        6980    7079    +       100     9960
rna-U13369.1:7080..7179 U13369.1        7080    7179    +       100     7999
rna-U13369.1:7180..7279 U13369.1        7180    7279    +       100     5306
rna-U13369.1:7280..7379 U13369.1        7280    7379    +       100     1722
rna-U13369.1:7380..7479 U13369.1        7380    7479    +       100     5870
rna-U13369.1:7480..7579 U13369.1        7480    7579    +       100     2894
rna-U13369.1:7580..7679 U13369.1        7580    7679    +       100     4368
rna-U13369.1:7680..7779 U13369.1        7680    7779    +       100     11435
rna-U13369.1:7780..7879 U13369.1        7780    7879    +       100     38017
rna-U13369.1:7880..7934 U13369.1        7880    7934    +       55      46852
rna-U13369.1:7935..8034 U13369.1        7935    8034    +       100     1345085
rna-U13369.1:8035..8134 U13369.1        8035    8134    +       100     1611691
rna-U13369.1:8135..8234 U13369.1        8135    8234    +       100     1232707
rna-U13369.1:8235..8334 U13369.1        8235    8334    +       100     1302705
rna-U13369.1:8335..8434 U13369.1        8335    8434    +       100     840667
rna-U13369.1:8435..8534 U13369.1        8435    8534    +       100     525412
rna-U13369.1:8535..8634 U13369.1        8535    8634    +       100     84223
rna-U13369.1:8635..8734 U13369.1        8635    8734    +       100     250692
rna-U13369.1:8735..8834 U13369.1        8735    8834    +       100     257462
rna-U13369.1:8835..8934 U13369.1        8835    8934    +       100     280610
rna-U13369.1:8935..9034 U13369.1        8935    9034    +       100     121052
rna-U13369.1:9035..9134 U13369.1        9035    9134    +       100     58300
rna-U13369.1:9135..9234 U13369.1        9135    9234    +       100     66976
rna-U13369.1:9235..9334 U13369.1        9235    9334    +       100     358926
rna-U13369.1:9335..9434 U13369.1        9335    9434    +       100     685549
rna-U13369.1:9435..9534 U13369.1        9435    9534    +       100     999236
rna-U13369.1:9535..9634 U13369.1        9535    9634    +       100     1314841
rna-U13369.1:9635..9734 U13369.1        9635    9734    +       100     1507189
rna-U13369.1:9735..9834 U13369.1        9735    9834    +       100     1172520
rna-U13369.1:9835..9934 U13369.1        9835    9934    +       100     1040687
rna-U13369.1:9935..10034        U13369.1        9935    10034   +       100     355227
rna-U13369.1:10035..10134       U13369.1        10035   10134   +       100     108283
rna-U13369.1:10135..10234       U13369.1        10135   10234   +       100     356888
rna-U13369.1:10235..10334       U13369.1        10235   10334   +       100     579757
rna-U13369.1:10335..10434       U13369.1        10335   10434   +       100     432036
rna-U13369.1:10435..10534       U13369.1        10435   10534   +       100     401304
rna-U13369.1:10535..10634       U13369.1        10535   10634   +       100     482139
rna-U13369.1:10635..10734       U13369.1        10635   10734   +       100     542858
rna-U13369.1:10735..10834       U13369.1        10735   10834   +       100     419599
rna-U13369.1:10835..10934       U13369.1        10835   10934   +       100     103019
rna-U13369.1:10935..11034       U13369.1        10935   11034   +       100     45063
rna-U13369.1:11035..11134       U13369.1        11035   11134   +       100     14769
rna-U13369.1:11135..11234       U13369.1        11135   11234   +       100     14929
rna-U13369.1:11235..11334       U13369.1        11235   11334   +       100     17393
rna-U13369.1:11335..11434       U13369.1        11335   11434   +       100     58989
rna-U13369.1:11435..11534       U13369.1        11435   11534   +       100     135358
rna-U13369.1:11535..11634       U13369.1        11535   11634   +       100     563636
rna-U13369.1:11635..11734       U13369.1        11635   11734   +       100     806291
rna-U13369.1:11735..11834       U13369.1        11735   11834   +       100     852255
rna-U13369.1:11835..11934       U13369.1        11835   11934   +       100     397908
rna-U13369.1:11935..12034       U13369.1        11935   12034   +       100     297926
rna-U13369.1:12035..12134       U13369.1        12035   12134   +       100     325780
rna-U13369.1:12135..12234       U13369.1        12135   12234   +       100     299787
rna-U13369.1:12235..12334       U13369.1        12235   12334   +       100     275376
rna-U13369.1:12335..12434       U13369.1        12335   12434   +       100     326802
rna-U13369.1:12435..12534       U13369.1        12435   12534   +       100     695120
rna-U13369.1:12535..12634       U13369.1        12535   12634   +       100     958579
rna-U13369.1:12635..12734       U13369.1        12635   12734   +       100     903318
rna-U13369.1:12735..12834       U13369.1        12735   12834   +       100     715592
rna-U13369.1:12835..12934       U13369.1        12835   12934   +       100     544458
rna-U13369.1:12735..12969       U13369.1        12935   12969   +       35      46803
rna-U13369.1:12970..13069       U13369.1        12970   13069   +       100     1202
rna-U13369.1:13070..13169       U13369.1        13070   13169   +       100     562
rna-U13369.1:13170..13269       U13369.1        13170   13269   +       100     255
rna-U13369.1:13270..13314       U13369.1        13270   13314   +       45      89

Step 5 - 对HCT116测序数据进行计数

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -M -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.3.gtf -O -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116_4.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort

        ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
          v2.0.1

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           o HCT116-valAligned.out.bam.sort                 ||
||                                                                            ||
||             Output file : HCT116_4.read.count                              ||
||                 Summary : HCT116_4.read.count.summary                      ||
||              Annotation : U13369.1.3.gtf (GTF)                             ||
||      Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
||                                                                            ||
||                 Threads : 32                                               ||
||                   Level : feature level                                    ||
||              Paired-end : yes                                              ||
||      Multimapping reads : counted                                          ||
|| Multi-overlapping reads : counted                                          ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : required                                         ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file U13369.1.3.gtf ...                                    ||
||    Features : 147                                                          ||
||    Meta-features : 147                                                     ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Process BAM file HCT116-valAligned.out.bam.sort...                         ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 17788363                                             ||
||    Successfully assigned alignments : 17787396 (100.0%)                    ||
||    Running time : 0.51 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
|| Summary of counting results can be found in file "/Data/lizexing/projects  ||
|| /xindi/Data/new/Data/CleanData/HCT116_4.read.count.summary"                ||
||                                                                            ||
\\============================================================================//


(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HCT116_4.read.count

# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-M" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.3.gtf" "-O" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116_4.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort"
Geneid  Chr     Start   End     Strand  Length  /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort
rna-U13369.1:1..100     U13369.1        1       100     +       100     899
rna-U13369.1:101..200   U13369.1        101     200     +       100     3809
rna-U13369.1:401..500   U13369.1        401     500     +       100     3321
rna-U13369.1:501..600   U13369.1        501     600     +       100     4314
rna-U13369.1:601..700   U13369.1        601     700     +       100     5879
rna-U13369.1:701..800   U13369.1        701     800     +       100     85608
rna-U13369.1:801..900   U13369.1        801     900     +       100     65096
rna-U13369.1:901..1000  U13369.1        901     1000    +       100     10504
rna-U13369.1:1001..1100 U13369.1        1001    1100    +       100     16884
rna-U13369.1:1101..1200 U13369.1        1101    1200    +       100     10355
rna-U13369.1:1201..1300 U13369.1        1201    1300    +       100     20986
rna-U13369.1:1301..1400 U13369.1        1301    1400    +       100     9680
rna-U13369.1:1401..1500 U13369.1        1401    1500    +       100     8206
rna-U13369.1:1501..1600 U13369.1        1501    1600    +       100     4067
rna-U13369.1:1601..1700 U13369.1        1601    1700    +       100     6487
rna-U13369.1:1701..1800 U13369.1        1701    1800    +       100     9367
rna-U13369.1:1801..1900 U13369.1        1801    1900    +       100     68566
rna-U13369.1:1901..2000 U13369.1        1901    2000    +       100     13355
rna-U13369.1:2001..2100 U13369.1        2001    2100    +       100     11643
rna-U13369.1:2101..2200 U13369.1        2101    2200    +       100     27520
rna-U13369.1:2201..2300 U13369.1        2201    2300    +       100     11418
rna-U13369.1:2301..2400 U13369.1        2301    2400    +       100     3173
rna-U13369.1:2401..2500 U13369.1        2401    2500    +       100     5298
rna-U13369.1:2501..2600 U13369.1        2501    2600    +       100     29929
rna-U13369.1:2601..2700 U13369.1        2601    2700    +       100     5183
rna-U13369.1:2701..2800 U13369.1        2701    2800    +       100     2800
rna-U13369.1:2801..2900 U13369.1        2801    2900    +       100     3356
rna-U13369.1:2901..3000 U13369.1        2901    3000    +       100     2090
rna-U13369.1:3001..3100 U13369.1        3001    3100    +       100     98066
rna-U13369.1:3101..3200 U13369.1        3101    3200    +       100     706
rna-U13369.1:3201..3300 U13369.1        3201    3300    +       100     17733
rna-U13369.1:3301..3400 U13369.1        3301    3400    +       100     14367
rna-U13369.1:3401..3500 U13369.1        3401    3500    +       100     5321
rna-U13369.1:3501..3600 U13369.1        3501    3600    +       100     10711
rna-U13369.1:3601..3656 U13369.1        3601    3656    +       56      11391
rna-U13369.1:3657..3756 U13369.1        3657    3756    +       100     1435560
rna-U13369.1:3757..3856 U13369.1        3757    3856    +       100     1677882
rna-U13369.1:3857..3956 U13369.1        3857    3956    +       100     817967
rna-U13369.1:3957..4056 U13369.1        3957    4056    +       100     976286
rna-U13369.1:4057..4156 U13369.1        4057    4156    +       100     930196
rna-U13369.1:4157..4256 U13369.1        4157    4256    +       100     671001
rna-U13369.1:4257..4356 U13369.1        4257    4356    +       100     792943
rna-U13369.1:4357..4456 U13369.1        4357    4456    +       100     827521
rna-U13369.1:4457..4556 U13369.1        4457    4556    +       100     1200888
rna-U13369.1:4557..4656 U13369.1        4557    4656    +       100     1209052
rna-U13369.1:4657..4756 U13369.1        4657    4756    +       100     922498
rna-U13369.1:4757..4856 U13369.1        4757    4856    +       100     587374
rna-U13369.1:4857..4956 U13369.1        4857    4956    +       100     1420225
rna-U13369.1:4957..5056 U13369.1        4957    5056    +       100     1408911
rna-U13369.1:5057..5156 U13369.1        5057    5156    +       100     1493089
rna-U13369.1:5157..5256 U13369.1        5157    5256    +       100     989229
rna-U13369.1:5257..5356 U13369.1        5257    5356    +       100     883788
rna-U13369.1:5357..5456 U13369.1        5357    5456    +       100     444330
rna-U13369.1:5457..5527 U13369.1        5457    5527    +       71      218191
rna-U13369.1:5528..5627 U13369.1        5528    5627    +       100     57031
rna-U13369.1:5628..5727 U13369.1        5628    5727    +       100     20749
rna-U13369.1:5728..5827 U13369.1        5728    5827    +       100     16409
rna-U13369.1:5828..5927 U13369.1        5828    5927    +       100     6236
rna-U13369.1:5928..6027 U13369.1        5928    6027    +       100     7847
rna-U13369.1:6028..6127 U13369.1        6028    6127    +       100     5461
rna-U13369.1:6128..6227 U13369.1        6128    6227    +       100     3546
rna-U13369.1:6228..6327 U13369.1        6228    6327    +       100     2065
rna-U13369.1:6328..6427 U13369.1        6328    6427    +       100     7038
rna-U13369.1:6428..6527 U13369.1        6428    6527    +       100     6780
rna-U13369.1:6528..6623 U13369.1        6528    6623    +       96      7432
rna-U13369.1:6623..6633 U13369.1        6623    6633    +       11      5597
rna-U13369.1:6634..6643 U13369.1        6634    6643    +       10      22858
rna-U13369.1:6644..6653 U13369.1        6644    6653    +       10      23986
rna-U13369.1:6654..6663 U13369.1        6654    6663    +       10      26156
rna-U13369.1:6664..6673 U13369.1        6664    6673    +       10      29023
rna-U13369.1:6674..6683 U13369.1        6674    6683    +       10      29723
rna-U13369.1:6684..6693 U13369.1        6684    6693    +       10      30940
rna-U13369.1:6694..6703 U13369.1        6694    6703    +       10      31858
rna-U13369.1:6704..6713 U13369.1        6704    6713    +       10      33187
rna-U13369.1:6714..6723 U13369.1        6714    6723    +       10      34336
rna-U13369.1:6724..6733 U13369.1        6724    6733    +       10      35221
rna-U13369.1:6734..6743 U13369.1        6734    6743    +       10      34947
rna-U13369.1:6744..6753 U13369.1        6744    6753    +       10      34952
rna-U13369.1:6754..6763 U13369.1        6754    6763    +       10      35173
rna-U13369.1:6764..6779 U13369.1        6764    6779    +       16      36173
rna-U13369.1:6780..6879 U13369.1        6780    6879    +       100     34981
rna-U13369.1:6880..6979 U13369.1        6880    6979    +       100     13275
rna-U13369.1:6980..7079 U13369.1        6980    7079    +       100     10386
rna-U13369.1:7080..7179 U13369.1        7080    7179    +       100     8528
rna-U13369.1:7180..7279 U13369.1        7180    7279    +       100     5894
rna-U13369.1:7280..7379 U13369.1        7280    7379    +       100     1842
rna-U13369.1:7380..7479 U13369.1        7380    7479    +       100     5546
rna-U13369.1:7480..7579 U13369.1        7480    7579    +       100     2276
rna-U13369.1:7580..7679 U13369.1        7580    7679    +       100     4771
rna-U13369.1:7680..7779 U13369.1        7680    7779    +       100     48226
rna-U13369.1:7780..7879 U13369.1        7780    7879    +       100     46267
rna-U13369.1:7880..7934 U13369.1        7880    7934    +       55      51235
rna-U13369.1:7935..8034 U13369.1        7935    8034    +       100     1572032
rna-U13369.1:8035..8134 U13369.1        8035    8134    +       100     1877287
rna-U13369.1:8135..8234 U13369.1        8135    8234    +       100     1370282
rna-U13369.1:8235..8334 U13369.1        8235    8334    +       100     1601031
rna-U13369.1:8335..8434 U13369.1        8335    8434    +       100     1143657
rna-U13369.1:8435..8534 U13369.1        8435    8534    +       100     792499
rna-U13369.1:8535..8634 U13369.1        8535    8634    +       100     82913
rna-U13369.1:8635..8734 U13369.1        8635    8734    +       100     280190
rna-U13369.1:8735..8834 U13369.1        8735    8834    +       100     293053
rna-U13369.1:8835..8934 U13369.1        8835    8934    +       100     334811
rna-U13369.1:8935..9034 U13369.1        8935    9034    +       100     153342
rna-U13369.1:9035..9134 U13369.1        9035    9134    +       100     83329
rna-U13369.1:9135..9234 U13369.1        9135    9234    +       100     96309
rna-U13369.1:9235..9334 U13369.1        9235    9334    +       100     363386
rna-U13369.1:9335..9434 U13369.1        9335    9434    +       100     749820
rna-U13369.1:9435..9534 U13369.1        9435    9534    +       100     1192671
rna-U13369.1:9535..9634 U13369.1        9535    9634    +       100     1724150
rna-U13369.1:9635..9734 U13369.1        9635    9734    +       100     1845135
rna-U13369.1:9735..9834 U13369.1        9735    9834    +       100     1365528
rna-U13369.1:9835..9934 U13369.1        9835    9934    +       100     1178376
rna-U13369.1:9935..10034        U13369.1        9935    10034   +       100     384939
rna-U13369.1:10035..10134       U13369.1        10035   10134   +       100     170031
rna-U13369.1:10135..10234       U13369.1        10135   10234   +       100     535143
rna-U13369.1:10235..10334       U13369.1        10235   10334   +       100     768793
rna-U13369.1:10335..10434       U13369.1        10335   10434   +       100     494930
rna-U13369.1:10435..10534       U13369.1        10435   10534   +       100     419520
rna-U13369.1:10535..10634       U13369.1        10535   10634   +       100     518426
rna-U13369.1:10635..10734       U13369.1        10635   10734   +       100     580807
rna-U13369.1:10735..10834       U13369.1        10735   10834   +       100     472139
rna-U13369.1:10835..10934       U13369.1        10835   10934   +       100     103825
rna-U13369.1:10935..11034       U13369.1        10935   11034   +       100     37848
rna-U13369.1:11035..11134       U13369.1        11035   11134   +       100     17181
rna-U13369.1:11135..11234       U13369.1        11135   11234   +       100     18790
rna-U13369.1:11235..11334       U13369.1        11235   11334   +       100     22928
rna-U13369.1:11335..11434       U13369.1        11335   11434   +       100     76066
rna-U13369.1:11435..11534       U13369.1        11435   11534   +       100     154582
rna-U13369.1:11535..11634       U13369.1        11535   11634   +       100     640212
rna-U13369.1:11635..11734       U13369.1        11635   11734   +       100     905400
rna-U13369.1:11735..11834       U13369.1        11735   11834   +       100     946617
rna-U13369.1:11835..11934       U13369.1        11835   11934   +       100     403651
rna-U13369.1:11935..12034       U13369.1        11935   12034   +       100     314676
rna-U13369.1:12035..12134       U13369.1        12035   12134   +       100     351003
rna-U13369.1:12135..12234       U13369.1        12135   12234   +       100     333742
rna-U13369.1:12235..12334       U13369.1        12235   12334   +       100     287773
rna-U13369.1:12335..12434       U13369.1        12335   12434   +       100     318202
rna-U13369.1:12435..12534       U13369.1        12435   12534   +       100     578552
rna-U13369.1:12535..12634       U13369.1        12535   12634   +       100     671803
rna-U13369.1:12635..12734       U13369.1        12635   12734   +       100     672357
rna-U13369.1:12735..12834       U13369.1        12735   12834   +       100     561538
rna-U13369.1:12835..12934       U13369.1        12835   12934   +       100     521403
rna-U13369.1:12735..12969       U13369.1        12935   12969   +       35      38125
rna-U13369.1:12970..13069       U13369.1        12970   13069   +       100     2304
rna-U13369.1:13070..13169       U13369.1        13070   13169   +       100     703
rna-U13369.1:13170..13269       U13369.1        13170   13269   +       100     323
rna-U13369.1:13270..13314       U13369.1        13270   13314   +       45      82

Step 6 - 对HeLa测序数据进行计数

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -M -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.3.gtf -O -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa_4.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort
        ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
          v2.0.1

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                           o HeLa-valAligned.out.bam.sort                   ||
||                                                                            ||
||             Output file : HeLa_4.read.count                                ||
||                 Summary : HeLa_4.read.count.summary                        ||
||              Annotation : U13369.1.3.gtf (GTF)                             ||
||      Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
||                                                                            ||
||                 Threads : 32                                               ||
||                   Level : feature level                                    ||
||              Paired-end : yes                                              ||
||      Multimapping reads : counted                                          ||
|| Multi-overlapping reads : counted                                          ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : not counted                                      ||
||        Both ends mapped : required                                         ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file U13369.1.3.gtf ...                                    ||
||    Features : 147                                                          ||
||    Meta-features : 147                                                     ||
||    Chromosomes/contigs : 1                                                 ||
||                                                                            ||
|| Process BAM file HeLa-valAligned.out.bam.sort...                           ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 19960509                                             ||
||    Successfully assigned alignments : 19955679 (100.0%)                    ||
||    Running time : 0.67 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
|| Summary of counting results can be found in file "/Data/lizexing/projects  ||
|| /xindi/Data/new/Data/CleanData/HeLa_4.read.count.summary"                  ||
||                                                                            ||
\\============================================================================//


(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HeLa_4.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-M" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.3.gtf" "-O" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa_4.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort"
Geneid  Chr     Start   End     Strand  Length  /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort
rna-U13369.1:1..100     U13369.1        1       100     +       100     3370
rna-U13369.1:101..200   U13369.1        101     200     +       100     4585
rna-U13369.1:401..500   U13369.1        401     500     +       100     3682
rna-U13369.1:501..600   U13369.1        501     600     +       100     2958
rna-U13369.1:601..700   U13369.1        601     700     +       100     4011
rna-U13369.1:701..800   U13369.1        701     800     +       100     119401
rna-U13369.1:801..900   U13369.1        801     900     +       100     6803
rna-U13369.1:901..1000  U13369.1        901     1000    +       100     8265
rna-U13369.1:1001..1100 U13369.1        1001    1100    +       100     8832
rna-U13369.1:1101..1200 U13369.1        1101    1200    +       100     9611
rna-U13369.1:1201..1300 U13369.1        1201    1300    +       100     12128
rna-U13369.1:1301..1400 U13369.1        1301    1400    +       100     8338
rna-U13369.1:1401..1500 U13369.1        1401    1500    +       100     10285
rna-U13369.1:1501..1600 U13369.1        1501    1600    +       100     4272
rna-U13369.1:1601..1700 U13369.1        1601    1700    +       100     5897
rna-U13369.1:1701..1800 U13369.1        1701    1800    +       100     9179
rna-U13369.1:1801..1900 U13369.1        1801    1900    +       100     108051
rna-U13369.1:1901..2000 U13369.1        1901    2000    +       100     11789
rna-U13369.1:2001..2100 U13369.1        2001    2100    +       100     9692
rna-U13369.1:2101..2200 U13369.1        2101    2200    +       100     10434
rna-U13369.1:2201..2300 U13369.1        2201    2300    +       100     9701
rna-U13369.1:2301..2400 U13369.1        2301    2400    +       100     2507
rna-U13369.1:2401..2500 U13369.1        2401    2500    +       100     6246
rna-U13369.1:2501..2600 U13369.1        2501    2600    +       100     9081
rna-U13369.1:2601..2700 U13369.1        2601    2700    +       100     4592
rna-U13369.1:2701..2800 U13369.1        2701    2800    +       100     3166
rna-U13369.1:2801..2900 U13369.1        2801    2900    +       100     2292
rna-U13369.1:2901..3000 U13369.1        2901    3000    +       100     4861
rna-U13369.1:3001..3100 U13369.1        3001    3100    +       100     131649
rna-U13369.1:3101..3200 U13369.1        3101    3200    +       100     867
rna-U13369.1:3201..3300 U13369.1        3201    3300    +       100     19146
rna-U13369.1:3301..3400 U13369.1        3301    3400    +       100     15038
rna-U13369.1:3401..3500 U13369.1        3401    3500    +       100     4407
rna-U13369.1:3501..3600 U13369.1        3501    3600    +       100     8069
rna-U13369.1:3601..3656 U13369.1        3601    3656    +       56      7406
rna-U13369.1:3657..3756 U13369.1        3657    3756    +       100     1678242
rna-U13369.1:3757..3856 U13369.1        3757    3856    +       100     1986846
rna-U13369.1:3857..3956 U13369.1        3857    3956    +       100     990258
rna-U13369.1:3957..4056 U13369.1        3957    4056    +       100     1094256
rna-U13369.1:4057..4156 U13369.1        4057    4156    +       100     1047087
rna-U13369.1:4157..4256 U13369.1        4157    4256    +       100     755495
rna-U13369.1:4257..4356 U13369.1        4257    4356    +       100     981482
rna-U13369.1:4357..4456 U13369.1        4357    4456    +       100     896156
rna-U13369.1:4457..4556 U13369.1        4457    4556    +       100     1282005
rna-U13369.1:4557..4656 U13369.1        4557    4656    +       100     1247009
rna-U13369.1:4657..4756 U13369.1        4657    4756    +       100     1055064
rna-U13369.1:4757..4856 U13369.1        4757    4856    +       100     713463
rna-U13369.1:4857..4956 U13369.1        4857    4956    +       100     1597757
rna-U13369.1:4957..5056 U13369.1        4957    5056    +       100     1634894
rna-U13369.1:5057..5156 U13369.1        5057    5156    +       100     1753994
rna-U13369.1:5157..5256 U13369.1        5157    5256    +       100     1155004
rna-U13369.1:5257..5356 U13369.1        5257    5356    +       100     976121
rna-U13369.1:5357..5456 U13369.1        5357    5456    +       100     504963
rna-U13369.1:5457..5527 U13369.1        5457    5527    +       71      248275
rna-U13369.1:5528..5627 U13369.1        5528    5627    +       100     46379
rna-U13369.1:5628..5727 U13369.1        5628    5727    +       100     18738
rna-U13369.1:5728..5827 U13369.1        5728    5827    +       100     15797
rna-U13369.1:5828..5927 U13369.1        5828    5927    +       100     9599
rna-U13369.1:5928..6027 U13369.1        5928    6027    +       100     9276
rna-U13369.1:6028..6127 U13369.1        6028    6127    +       100     13331
rna-U13369.1:6128..6227 U13369.1        6128    6227    +       100     6199
rna-U13369.1:6228..6327 U13369.1        6228    6327    +       100     2589
rna-U13369.1:6328..6427 U13369.1        6328    6427    +       100     7461
rna-U13369.1:6428..6527 U13369.1        6428    6527    +       100     5157
rna-U13369.1:6528..6623 U13369.1        6528    6623    +       96      4707
rna-U13369.1:6623..6633 U13369.1        6623    6633    +       11      6846
rna-U13369.1:6634..6643 U13369.1        6634    6643    +       10      39098
rna-U13369.1:6644..6653 U13369.1        6644    6653    +       10      41476
rna-U13369.1:6654..6663 U13369.1        6654    6663    +       10      46462
rna-U13369.1:6664..6673 U13369.1        6664    6673    +       10      51148
rna-U13369.1:6674..6683 U13369.1        6674    6683    +       10      52316
rna-U13369.1:6684..6693 U13369.1        6684    6693    +       10      54006
rna-U13369.1:6694..6703 U13369.1        6694    6703    +       10      54944
rna-U13369.1:6704..6713 U13369.1        6704    6713    +       10      57223
rna-U13369.1:6714..6723 U13369.1        6714    6723    +       10      58545
rna-U13369.1:6724..6733 U13369.1        6724    6733    +       10      59322
rna-U13369.1:6734..6743 U13369.1        6734    6743    +       10      59622
rna-U13369.1:6744..6753 U13369.1        6744    6753    +       10      59937
rna-U13369.1:6754..6763 U13369.1        6754    6763    +       10      60432
rna-U13369.1:6764..6779 U13369.1        6764    6779    +       16      61770
rna-U13369.1:6780..6879 U13369.1        6780    6879    +       100     56446
rna-U13369.1:6880..6979 U13369.1        6880    6979    +       100     22820
rna-U13369.1:6980..7079 U13369.1        6980    7079    +       100     14861
rna-U13369.1:7080..7179 U13369.1        7080    7179    +       100     10487
rna-U13369.1:7180..7279 U13369.1        7180    7279    +       100     9092
rna-U13369.1:7280..7379 U13369.1        7280    7379    +       100     2582
rna-U13369.1:7380..7479 U13369.1        7380    7479    +       100     7284
rna-U13369.1:7480..7579 U13369.1        7480    7579    +       100     3379
rna-U13369.1:7580..7679 U13369.1        7580    7679    +       100     6159
rna-U13369.1:7680..7779 U13369.1        7680    7779    +       100     13247
rna-U13369.1:7780..7879 U13369.1        7780    7879    +       100     44625
rna-U13369.1:7880..7934 U13369.1        7880    7934    +       55      54921
rna-U13369.1:7935..8034 U13369.1        7935    8034    +       100     1866486
rna-U13369.1:8035..8134 U13369.1        8035    8134    +       100     2191163
rna-U13369.1:8135..8234 U13369.1        8135    8234    +       100     1661526
rna-U13369.1:8235..8334 U13369.1        8235    8334    +       100     1697478
rna-U13369.1:8335..8434 U13369.1        8335    8434    +       100     1111833
rna-U13369.1:8435..8534 U13369.1        8435    8534    +       100     769359
rna-U13369.1:8535..8634 U13369.1        8535    8634    +       100     97781
rna-U13369.1:8635..8734 U13369.1        8635    8734    +       100     295272
rna-U13369.1:8735..8834 U13369.1        8735    8834    +       100     316382
rna-U13369.1:8835..8934 U13369.1        8835    8934    +       100     350558
rna-U13369.1:8935..9034 U13369.1        8935    9034    +       100     160379
rna-U13369.1:9035..9134 U13369.1        9035    9134    +       100     82634
rna-U13369.1:9135..9234 U13369.1        9135    9234    +       100     85191
rna-U13369.1:9235..9334 U13369.1        9235    9334    +       100     442108
rna-U13369.1:9335..9434 U13369.1        9335    9434    +       100     897183
rna-U13369.1:9435..9534 U13369.1        9435    9534    +       100     1310775
rna-U13369.1:9535..9634 U13369.1        9535    9634    +       100     1774057
rna-U13369.1:9635..9734 U13369.1        9635    9734    +       100     2031500
rna-U13369.1:9735..9834 U13369.1        9735    9834    +       100     1633367
rna-U13369.1:9835..9934 U13369.1        9835    9934    +       100     1455707
rna-U13369.1:9935..10034        U13369.1        9935    10034   +       100     496206
rna-U13369.1:10035..10134       U13369.1        10035   10134   +       100     138555
rna-U13369.1:10135..10234       U13369.1        10135   10234   +       100     439098
rna-U13369.1:10235..10334       U13369.1        10235   10334   +       100     737591
rna-U13369.1:10335..10434       U13369.1        10335   10434   +       100     584975
rna-U13369.1:10435..10534       U13369.1        10435   10534   +       100     558475
rna-U13369.1:10535..10634       U13369.1        10535   10634   +       100     695322
rna-U13369.1:10635..10734       U13369.1        10635   10734   +       100     727567
rna-U13369.1:10735..10834       U13369.1        10735   10834   +       100     565347
rna-U13369.1:10835..10934       U13369.1        10835   10934   +       100     182889
rna-U13369.1:10935..11034       U13369.1        10935   11034   +       100     108785
rna-U13369.1:11035..11134       U13369.1        11035   11134   +       100     20672
rna-U13369.1:11135..11234       U13369.1        11135   11234   +       100     21352
rna-U13369.1:11235..11334       U13369.1        11235   11334   +       100     28617
rna-U13369.1:11335..11434       U13369.1        11335   11434   +       100     72170
rna-U13369.1:11435..11534       U13369.1        11435   11534   +       100     170104
rna-U13369.1:11535..11634       U13369.1        11535   11634   +       100     749326
rna-U13369.1:11635..11734       U13369.1        11635   11734   +       100     1032113
rna-U13369.1:11735..11834       U13369.1        11735   11834   +       100     1113544
rna-U13369.1:11835..11934       U13369.1        11835   11934   +       100     541404
rna-U13369.1:11935..12034       U13369.1        11935   12034   +       100     431856
rna-U13369.1:12035..12134       U13369.1        12035   12134   +       100     473294
rna-U13369.1:12135..12234       U13369.1        12135   12234   +       100     397396
rna-U13369.1:12235..12334       U13369.1        12235   12334   +       100     388342
rna-U13369.1:12335..12434       U13369.1        12335   12434   +       100     433350
rna-U13369.1:12435..12534       U13369.1        12435   12534   +       100     742054
rna-U13369.1:12535..12634       U13369.1        12535   12634   +       100     914501
rna-U13369.1:12635..12734       U13369.1        12635   12734   +       100     901947
rna-U13369.1:12735..12834       U13369.1        12735   12834   +       100     734796
rna-U13369.1:12835..12934       U13369.1        12835   12934   +       100     721153
rna-U13369.1:12735..12969       U13369.1        12935   12969   +       35      41187
rna-U13369.1:12970..13069       U13369.1        12970   13069   +       100     2698
rna-U13369.1:13070..13169       U13369.1        13070   13169   +       100     645
rna-U13369.1:13170..13269       U13369.1        13170   13269   +       100     504
rna-U13369.1:13270..13314       U13369.1        13270   13314   +       45      196

27. 使用STAR软件对三组数据未比对上的序列与Genome比对

Step 1 - Build a GPCh38.p13 genome index构建索引

--runThreadN是指你要用几个cpu来运行；
--genomeDir构建索引输出文件的目录；
--genomeFastaFiles你的基因组fasta文件所在的目录

(base) lizexing@bio:~$ STAR --runMode genomeGenerate --runThreadN 60 --genomeDir /Data/lizexing/reference/UCSC_hg38/star_index/ --genomeFastaFiles /Data/lizexing/reference/UCSC_hg38/GRCh38.p13.genome.fa
Sep 11 18:45:25 ..... started STAR run
Sep 11 18:45:25 ... starting to generate Genome files
Sep 11 18:46:45 ... starting to sort Suffix Array. This may take a long time...
Sep 11 18:47:07 ... sorting Suffix Array chunks and saving them to disk...
Sep 11 19:08:32 ... loading chunks from disk, packing SA...
Sep 11 19:10:08 ... finished generating suffix array
Sep 11 19:10:08 ... generating Suffix Array index
Sep 11 19:14:38 ... completed Suffix Array index
Sep 11 19:14:38 ... writing Genome to disk ...
Sep 11 19:14:40 ... writing Suffix Array to disk ...
Sep 11 19:15:08 ... writing SAindex to disk
Sep 11 19:15:11 ..... finished successfully

Step 2 - 对293T测序数据进行比对

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesType Fastx --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/UCSC_hg38/gencode.v38.chr_patch_hapl_scaff.annotation.gtf --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/UCSC_hg38/star_index/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valUnmapped.out.mate1 --outFileNamePrefix 293T-genome_mate1
Sep 12 11:40:44 ..... started STAR run
Sep 12 11:40:44 ..... loading genome
Sep 12 11:41:05 ..... processing annotations GTF
Sep 12 11:41:25 ..... inserting junctions into the genome indices
Sep 12 11:45:01 ..... started mapping
Sep 12 11:46:13 ..... finished mapping
Sep 12 11:46:16 ..... finished successfully

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesType Fastx --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/UCSC_hg38/gencode.v38.chr_patch_hapl_scaff.annotation.gtf --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/UCSC_hg38/star_index/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valUnmapped.out.mate2 --outFileNamePrefix 293T-genome_mate2

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesType Fastx --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/UCSC_hg38/gencode.v38.chr_patch_hapl_scaff.annotation.gtf --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/UCSC_hg38/star_index/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valUnmapped.out.mate1 --outFileNamePrefix HCT116-genome_mate1




(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesType Fastx --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/UCSC_hg38/gencode.v38.chr_patch_hapl_scaff.annotation.gtf --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/UCSC_hg38/star_index/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valUnmapped.out.mate2 --outFileNamePrefix HCT116-genome_mate2




(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesType Fastx --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/UCSC_hg38/gencode.v38.chr_patch_hapl_scaff.annotation.gtf --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/UCSC_hg38/star_index/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valUnmapped.out.mate1 --outFileNamePrefix HeLa-genome_mate1

(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesType Fastx --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/UCSC_hg38/gencode.v38.chr_patch_hapl_scaff.annotation.gtf --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/UCSC_hg38/star_index/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valUnmapped.out.mate2 --outFileNamePrefix HeLa-genome_mate2

垚垚爸爱学习

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
2021-08-04

Ribosome RNA数据库下载：https://www.arb-silva.de/did you try sortMeRNA? The input are reads in fastq file + rRNA sequences. The tool will extract those reads that do not match to the rRNA sequences, so by quantifying how many reads you’re left with, you should
复制链接

扫一扫