参考文章:
宏转录组分析:SortMeRNA鉴定过滤rRNA
SILVA Databases for ARB
SortMeRNA 去除rRNA
sortmerna分析rRNA含量
1. Ribosome RNA数据库介绍
sliva rRNA数据库(http://www.arb-silva.de/)用来检查和比对RNA序列,既可以针对16S/18S,SSU,也可以针对23S/28S, LSU,包括了Bacteria, Archaea and Eukarya。同时也是ARB的官方指定数据库。
LSU: Large subunit (23S/28Sribosomal RNAs)
SSU: Small subunit (16S/18Sribosomal RNAs)
1.1 针对arb的下载
到目前(2015.2.4,最新的数据库为Realease119,网页版的已经到121版本了,但是现在不提供下载)
下载介绍http://www.arb-silva.de/download/arb-files/
下载地址:http://www.arb-silva.de/no_cache/download/archive/release_119/ARB_files/
1.2 仅仅是下载fasta文件
下载地址:http://www.arb-silva.de/no_cache/download/archive/release_119/Exports/
根据下载的需求,选择针对23S/28Sribosomal RNAs的LSU或者是针对16S/18Sribosomal RNAs的SSU;然后选择是否去冗余的,我选择去,即Nr99;然后选择是否trunc,即是否对名字缩写;选择是否全长比对结果;
*_tax_silva.fasta.gz
Multi FASTA files of the SSU/LSU databases including the SILVAtaxonomy for
Bacteria, Archaea and Eukaryotes in the header.
REMARK: The sequences in the files are NOT truncated to theeffective LSU or
SSU genes. They contain the full entries as they have been deposited in the
public repositories (ENA/GenBank/DDBJ).
Fasta header:
accession_number.start_position.stop_position taxonomic pathorganism name
*_tax_silva_full_align_trunc.fasta.gz
Multi FASTA files of the SSU/LSU databases including the SILVAtaxonomy for
Bacteria, Archaea and Eukaryotes in the header (including the FULLalignment).
REMARK: Sequences in these files haven been truncated. This meansthat all
nucleotides that have not been aligned were removed from thesequence.
*_tax_silva_trunc.fasta.gz
Multi FASTA files of the SSU/LSU database including the SILVAtaxonomy for
Bacteria, Archaea and Eukaryotes in the header.
REMARK: Sequences in these files haven been truncated. This meansthat all
nucleotides that have not been aligned were removed from thesequence.
生成使用与Mothur的silva数据库:http://blog.mothur.org/2014/08/08/SILVA-v119-reference-files/
SortMeRNA软件包自带细菌16s rRNA,细菌23s rRNA,古菌16s rRNA, 古菌23s rRNA,真核生物18s rRNA, 真核生物28s rRNA,rfam数据库中的5s rRNA和5.8s rRNA数据。8大数据库可一起帮您鉴定宏转录组测序数据中的rRNA序列。
2. 检查数据完整性
(base) lizexing@bio:~/projects/xindi$ ll
总用量 6494620
drwxrwxr-x 2 lizexing lizexing 4096 8月 24 10:12 ./
drwxrwxr-x 6 lizexing lizexing 4096 8月 4 18:59 ../
-rw-rw-r-- 1 lizexing lizexing 5914142720 8月 24 10:11 Data.tar
-rw-rw-r-- 1 lizexing lizexing 43 8月 24 10:12 Data.tar.md5
-rw-rw-r-- 1 lizexing lizexing 736318953 8月 24 10:12 Summary.tar.gz
-rw-rw-r-- 1 lizexing lizexing 49 8月 24 10:12 Summary.tar.gz.md5
(base) lizexing@bio:~/projects/xindi$ cat Data.tar.md5 > check_md5sum.txt && md5sum -c check_md5sum.txt
Data.tar: 成功
(base) lizexing@bio:~/projects/xindi$ cat Summary.tar.gz.md5 > check_md5sum_Summary.txt && md5sum -c check_md5sum_Summary.txt
Summary.tar.gz: 成功
3. 为8大数据库建索引
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./indexdb_rna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db:./rRNA_databases/silva-bac-23s-id98.fasta,./index/silva-bac-23s-db:./rRNA_databases/silva-arc-16s-id95.fasta,./index/silva-arc-16s-db:./rRNA_databases/silva-arc-23s-id98.fasta,./index/silva-arc-23s-db:./rRNA_databases/silva-euk-18s-id95.fasta,./index/silva-euk-18s-db:./rRNA_databases/silva-euk-28s-id98.fasta,./index/silva-euk-28s:./rRNA_databases/rfam-5s-database-id98.fasta,./index/rfam-5s-db:./rRNA_databases/rfam-5.8s-database-id98.fasta,./index/rfam-5.8s-db -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Parameters summary:
K-mer size: 19
K-mer interval: 1
Maximum positions to store per unique K-mer: 10000
Total number of databases to index: 8
Begin indexing file ./rRNA_databases/silva-bac-16s-id90.fasta under index name ./index/silva-bac-16s-db:
Collecting sequence distribution statistics .. done [0.114872 sec]
start index part # 0:
(1/3) building burst tries .. done [10.128288 sec]
(2/3) building CMPH hash .. done [29.993126 sec]
(3/3) building position lookup tables .. done [37.032142 sec]
total number of sequences in this part = 12798
temporary file was here: /tmp/sortmerna_keys_690389.txt
writing kmer data to ./index/silva-bac-16s-db.kmer_0.dat
writing burst tries to ./index/silva-bac-16s-db.bursttrie_0.dat
writing position lookup table to ./index/silva-bac-16s-db.pos_0.dat
writing nucleotide distribution statistics to ./index/silva-bac-16s-db.stats
done.
Begin indexing file ./rRNA_databases/silva-bac-23s-id98.fasta under index name ./index/silva-bac-23s-db:
Collecting sequence distribution statistics .. done [0.241604 sec]
start index part # 0:
(1/3) building burst tries .. done [4.956894 sec]
(2/3) building CMPH hash .. done [4.041567 sec]
(3/3) building position lookup tables .. done [11.533700 sec]
total number of sequences in this part = 4488
temporary file was here: /tmp/sortmerna_keys_690389.txt
writing kmer data to ./index/silva-bac-23s-db.kmer_0.dat
writing burst tries to ./index/silva-bac-23s-db.bursttrie_0.dat
writing position lookup table to ./index/silva-bac-23s-db.pos_0.dat
writing nucleotide distribution statistics to ./index/silva-bac-23s-db.stats
done.
Begin indexing file ./rRNA_databases/silva-arc-16s-id95.fasta under index name ./index/silva-arc-16s-db:
Collecting sequence distribution statistics .. done [0.168354 sec]
start index part # 0:
(1/3) building burst tries .. done [1.096340 sec]
(2/3) building CMPH hash .. done [1.835844 sec]
(3/3) building position lookup tables .. done [2.659358 sec]
total number of sequences in this part = 3193
temporary file was here: /tmp/sortmerna_keys_690389.txt
writing kmer data to ./index/silva-arc-16s-db.kmer_0.dat
writing burst tries to ./index/silva-arc-16s-db.bursttrie_0.dat
writing position lookup table to ./index/silva-arc-16s-db.pos_0.dat
writing nucleotide distribution statistics to ./index/silva-arc-16s-db.stats
done.
Begin indexing file ./rRNA_databases/silva-arc-23s-id98.fasta under index name ./index/silva-arc-23s-db:
Collecting sequence distribution statistics .. done [0.004250 sec]
start index part # 0:
(1/3) building burst tries .. done [0.205082 sec]
(2/3) building CMPH hash .. done [0.988803 sec]
(3/3) building position lookup tables .. done [0.372066 sec]
total number of sequences in this part = 251
temporary file was here: /tmp/sortmerna_keys_690389.txt
writing kmer data to ./index/silva-arc-23s-db.kmer_0.dat
writing burst tries to ./index/silva-arc-23s-db.bursttrie_0.dat
writing position lookup table to ./index/silva-arc-23s-db.pos_0.dat
writing nucleotide distribution statistics to ./index/silva-arc-23s-db.stats
done.
Begin indexing file ./rRNA_databases/silva-euk-18s-id95.fasta under index name ./index/silva-euk-18s-db:
Collecting sequence distribution statistics .. done [0.147665 sec]
start index part # 0:
(1/3) building burst tries .. done [6.123623 sec]
(2/3) building CMPH hash .. done [5.220644 sec]
(3/3) building position lookup tables .. done [21.785766 sec]
total number of sequences in this part = 7348
temporary file was here: /tmp/sortmerna_keys_690389.txt
writing kmer data to ./index/silva-euk-18s-db.kmer_0.dat
writing burst tries to ./index/silva-euk-18s-db.bursttrie_0.dat
writing position lookup table to ./index/silva-euk-18s-db.pos_0.dat
writing nucleotide distribution statistics to ./index/silva-euk-18s-db.stats
done.
Begin indexing file ./rRNA_databases/silva-euk-28s-id98.fasta under index name ./index/silva-euk-28s:
Collecting sequence distribution statistics .. done [0.112021 sec]
start index part # 0:
(1/3) building burst tries .. done [6.403803 sec]
(2/3) building CMPH hash .. done [5.814894 sec]
(3/3) building position lookup tables .. done [16.681088 sec]
total number of sequences in this part = 4935
temporary file was here: /tmp/sortmerna_keys_690389.txt
writing kmer data to ./index/silva-euk-28s.kmer_0.dat
writing burst tries to ./index/silva-euk-28s.bursttrie_0.dat
writing position lookup table to ./index/silva-euk-28s.pos_0.dat
writing nucleotide distribution statistics to ./index/silva-euk-28s.stats
done.
Begin indexing file ./rRNA_databases/rfam-5s-database-id98.fasta under index name ./index/rfam-5s-db:
Collecting sequence distribution statistics .. done [0.076209 sec]
start index part # 0:
(1/3) building burst tries .. done [2.160063 sec]
(2/3) building CMPH hash .. done [4.935749 sec]
(3/3) building position lookup tables .. done [9.965298 sec]
total number of sequences in this part = 59513
temporary file was here: /tmp/sortmerna_keys_690389.txt
writing kmer data to ./index/rfam-5s-db.kmer_0.dat
writing burst tries to ./index/rfam-5s-db.bursttrie_0.dat
writing position lookup table to ./index/rfam-5s-db.pos_0.dat
writing nucleotide distribution statistics to ./index/rfam-5s-db.stats
done.
Begin indexing file ./rRNA_databases/rfam-5.8s-database-id98.fasta under index name ./index/rfam-5.8s-db:
Collecting sequence distribution statistics .. done [0.014170 sec]
start index part # 0:
(1/3) building burst tries .. done [0.403497 sec]
(2/3) building CMPH hash .. done [0.326261 sec]
(3/3) building position lookup tables .. done [1.708450 sec]
total number of sequences in this part = 13034
temporary file was here: /tmp/sortmerna_keys_690389.txt
writing kmer data to ./index/rfam-5.8s-db.kmer_0.dat
writing burst tries to ./index/rfam-5.8s-db.bursttrie_0.dat
writing position lookup table to ./index/rfam-5.8s-db.pos_0.dat
writing nucleotide distribution statistics to ./index/rfam-5.8s-db.stats
done.
运行后在软件的./index/下面生成如下索引:
4. 利用软件自带merge-paired-reads.sh脚本将293T/HTC116/HeLa三组双端测序合并
(base) lizexing@bio:~/software/sortmerna-2.1b/scripts$ ./merge-paired-reads.sh /Data/lizexing/projects/xindi/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data1.fq /Data/lizexing/projects/xindi/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data2.fq /Data/lizexing/projects/xindi/Data/CleanData/293T.fq
Processing /Data/lizexing/projects/xindi/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data1.fq ..
Processing /Data/lizexing/projects/xindi/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data2.fq ..
Interleaving /Data/lizexing/projects/xindi/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data1.fq and /Data/lizexing/projects/xindi/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data2.fq ..
Done.
(base) lizexing@bio:~/software/sortmerna-2.1b/scripts$ ./merge-paired-reads.sh /Data/lizexing/projects/xindi/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data1.fq /Data/lizexing/projects/xindi/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data2.fq /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq
Processing /Data/lizexing/projects/xindi/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data1.fq ..
Processing /Data/lizexing/projects/xindi/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data2.fq ..
Interleaving /Data/lizexing/projects/xindi/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data1.fq and /Data/lizexing/projects/xindi/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data2.fq ..
Done.
(base) lizexing@bio:~/software/sortmerna-2.1b/scripts$ ./merge-paired-reads.sh /Data/lizexing/projects/xindi/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1.fq /Data/lizexing/projects/xindi/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1.fq /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq
Processing /Data/lizexing/projects/xindi/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1.fq ..
Processing /Data/lizexing/projects/xindi/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1.fq ..
Interleaving /Data/lizexing/projects/xindi/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1.fq and /Data/lizexing/projects/xindi/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1.fq ..
Done.
5. 使用sortmerna鉴定293T.fq测序结果中的真核18SrRNA
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta,/Data/lizexing/software/sortmerna-2.1b/index/silva-euk-18s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/293T.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.18s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.non.18s --log -a 16 -m 6000 -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Computing read file statistics ... done [45.93 sec]
size of reads file: 13479394926 bytes
partial section(s) to be executed: 3 of size 6291456000 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are not output
Number of threads = 16
Begin mmap reads section # 1:
Time to mmap reads and set up pointers [10.57 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.612551
Gumbel K = 0.339810
Minimal SW score based on E-value = 61
Loading index part 1/1 ... done [0.95 sec]
Begin index search ... done [964.83 sec]
Freeing index ... done [0.24 sec]
Total number of reads mapped (incl. all reads file sections searched): 5170898
Writing aligned FASTA/FASTQ ... done [33.36 sec]
Writing not-aligned FASTA/FASTQ ... done [82.08 sec]
Begin mmap reads section # 2:
Time to mmap reads and set up pointers [8.63 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
Loading index part 1/1 ... done [1.17 sec]
Begin index search ... done [954.55 sec]
Freeing index ... done [0.22 sec]
Total number of reads mapped (incl. all reads file sections searched): 10323233
Writing aligned FASTA/FASTQ ... done [32.73 sec]
Writing not-aligned FASTA/FASTQ ... done [82.38 sec]
Begin mmap reads section # 3:
Time to mmap reads and set up pointers [1.26 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
Loading index part 1/1 ... done [1.10 sec]
Begin index search ... done [153.42 sec]
Freeing index ... done [0.23 sec]
Total number of reads mapped (incl. all reads file sections searched): 11057673
Writing aligned FASTA/FASTQ ... done [4.68 sec]
Writing not-aligned FASTA/FASTQ ... done [11.73 sec]
SortMeRNA分析完后后会产生四个文件:
1)工作日志.log文件;
2)数据库匹配详情.sam文件;
3)匹配到数据库的.16s.fastq文件;
4)未匹配到数据库的.non.16s.fastq文件
打开log文件可以查看到如下的统计信息:
Results:
Total reads = 39137348
Total reads passing E-value threshold = 11057673 (28.25%)
Total reads failing E-value threshold = 28079675 (71.75%)
Minimum read length = 92
Maximum read length = 141
Mean read length = 137
By database:
/Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta 28.25%
根据该信息总结后可以得到结论:293T.fq转录组测序数据中有11057673条序列来源于真核18s rRNA,这些序列占总序列的28.25%;剩下71.25%的序列为非真菌18s rRNA。
6. 使用sortmerna鉴定293T.fq测序结果中的真核28SrRNA
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/silva-euk-28s: --reads /Data/lizexing/projects/xindi/Data/CleanData/293T.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.28s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.non.28s --log -a 16 -m 6000 -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Computing read file statistics ... done [42.96 sec]
size of reads file: 13479394926 bytes
partial section(s) to be executed: 3 of size 6291456000 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are not output
Number of threads = 16
Begin mmap reads section # 1:
Time to mmap reads and set up pointers [8.62 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.612082
Gumbel K = 0.345772
Minimal SW score based on E-value = 61
Loading index part 1/1 ... done [1.35 sec]
Begin index search ... done [1072.42 sec]
Freeing index ... done [0.23 sec]
Total number of reads mapped (incl. all reads file sections searched): 7773873
Writing aligned FASTA/FASTQ ... done [49.31 sec]
Writing not-aligned FASTA/FASTQ ... done [66.23 sec]
Begin mmap reads section # 2:
Time to mmap reads and set up pointers [8.48 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
Loading index part 1/1 ... done [1.26 sec]
Begin index search ... done [1103.96 sec]
Freeing index ... done [0.24 sec]
Total number of reads mapped (incl. all reads file sections searched): 15604331
Writing aligned FASTA/FASTQ ... done [51.19 sec]
Writing not-aligned FASTA/FASTQ ... done [67.92 sec]
Begin mmap reads section # 3:
Time to mmap reads and set up pointers [1.22 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
Loading index part 1/1 ... done [1.16 sec]
Begin index search ... done [143.66 sec]
Freeing index ... done [0.23 sec]
Total number of reads mapped (incl. all reads file sections searched): 16723991
Writing aligned FASTA/FASTQ ... done [7.01 sec]
Writing not-aligned FASTA/FASTQ ... done [9.25 sec]
打开log文件可以查看到如下的统计信息:
Results:
Total reads = 39137348
Total reads passing E-value threshold = 16723991 (42.73%)
Total reads failing E-value threshold = 22413357 (57.27%)
Minimum read length = 92
Maximum read length = 141
Mean read length = 137
By database:
/Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta 42.73%
根据该信息总结后可以得到结论:293T.fq转录组测序数据中有16723991条序列来源于真核28s rRNA,这些序列占总序列的42.73%;剩下57.27%的序列为非真菌28s rRNA。
7. 使用sortmerna鉴定HCT116.fq测序结果中的真核18SrRNA
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta,/Data/lizexing/software/sortmerna-2.1b/index/silva-euk-18s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.18s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.non.18s --log -a 32 -m 6000 -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Computing read file statistics ... done [59.57 sec]
size of reads file: 16549783582 bytes
partial section(s) to be executed: 3 of size 6291456000 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are not output
Number of threads = 32
Begin mmap reads section # 1:
Time to mmap reads and set up pointers [9.14 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.612551
Gumbel K = 0.339810
Minimal SW score based on E-value = 61
Loading index part 1/1 ... done [0.85 sec]
Begin index search ... done [586.28 sec]
Freeing index ... done [0.21 sec]
Total number of reads mapped (incl. all reads file sections searched): 5737092
Writing aligned FASTA/FASTQ ... done [37.03 sec]
Writing not-aligned FASTA/FASTQ ... done [78.46 sec]
Begin mmap reads section # 2:
Time to mmap reads and set up pointers [8.53 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
Loading index part 1/1 ... done [4.16 sec]
Begin index search ... done [610.00 sec]
Freeing index ... done [0.20 sec]
Total number of reads mapped (incl. all reads file sections searched): 11457300
Writing aligned FASTA/FASTQ ... done [36.40 sec]
Writing not-aligned FASTA/FASTQ ... done [78.48 sec]
Begin mmap reads section # 3:
Time to mmap reads and set up pointers [6.54 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
Loading index part 1/1 ... done [1.12 sec]
Begin index search ... done [364.15 sec]
Freeing index ... done [0.22 sec]
Total number of reads mapped (incl. all reads file sections searched): 15059846
Writing aligned FASTA/FASTQ ... done [24.18 sec]
Writing not-aligned FASTA/FASTQ ... done [49.55 sec]
打开log文件可以查看到如下的统计信息:
Results:
Total reads = 48208862
Total reads passing E-value threshold = 15059846 (31.24%)
Total reads failing E-value threshold = 33149016 (68.76%)
Minimum read length = 92
Maximum read length = 141
Mean read length = 137
By database:
/Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta 31.24%
根据该信息总结后可以得到结论:HCT116.fq转录组测序数据中有15059846条序列来源于真核18s rRNA,这些序列占总序列的31.24%;剩下68.76%的序列为非真菌18s rRNA。
8. 使用sortmerna鉴定HCT116.fq测序结果中的真核28SrRNA
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/silva-euk-28s: --reads /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.28s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.non.28s --log -a 32 -m 6000 -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Computing read file statistics ... done [59.92 sec]
size of reads file: 16549783582 bytes
partial section(s) to be executed: 3 of size 6291456000 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are not output
Number of threads = 32
Begin mmap reads section # 1:
Time to mmap reads and set up pointers [8.59 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.612082
Gumbel K = 0.345772
Minimal SW score based on E-value = 61
Loading index part 1/1 ... done [0.94 sec]
Begin index search ... done [702.14 sec]
Freeing index ... done [0.24 sec]
Total number of reads mapped (incl. all reads file sections searched): 7580001
Writing aligned FASTA/FASTQ ... done [50.30 sec]
Writing not-aligned FASTA/FASTQ ... done [69.12 sec]
Begin mmap reads section # 2:
Time to mmap reads and set up pointers [8.77 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
Loading index part 1/1 ... done [1.23 sec]
Begin index search ... done [661.03 sec]
Freeing index ... done [0.21 sec]
Total number of reads mapped (incl. all reads file sections searched): 15188940
Writing aligned FASTA/FASTQ ... done [53.96 sec]
Writing not-aligned FASTA/FASTQ ... done [74.60 sec]
Begin mmap reads section # 3:
Time to mmap reads and set up pointers [5.82 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
Loading index part 1/1 ... done [1.28 sec]
Begin index search ... done [446.02 sec]
Freeing index ... done [0.20 sec]
Total number of reads mapped (incl. all reads file sections searched): 20009110
Writing aligned FASTA/FASTQ ... done [30.87 sec]
Writing not-aligned FASTA/FASTQ ... done [43.13 sec]
打开log文件可以查看到如下的统计信息:
Results:
Total reads = 48208862
Total reads passing E-value threshold = 20009110 (41.51%)
Total reads failing E-value threshold = 28199752 (58.49%)
Minimum read length = 92
Maximum read length = 141
Mean read length = 137
By database:
/Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta 41.51%
根据该信息总结后可以得到结论:HCT116.fq转录组测序数据中有20009110条序列来源于真核28s rRNA,这些序列占总序列的41.51%;剩下58.49%的序列为非真菌28s rRNA。
9. 使用sortmerna鉴定HeLa.fq测序结果中的真核18SrRNA
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta,/Data/lizexing/software/sortmerna-2.1b/index/silva-euk-18s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.18s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.non.18s --log -a 32 -m 6000 -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Computing read file statistics ... done [56.90 sec]
size of reads file: 16920775684 bytes
partial section(s) to be executed: 3 of size 6291456000 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are not output
Number of threads = 32
Begin mmap reads section # 1:
Time to mmap reads and set up pointers [9.49 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.612551
Gumbel K = 0.339810
Minimal SW score based on E-value = 61
Loading index part 1/1 ... done [0.86 sec]
Begin index search ... done [1060.73 sec]
Freeing index ... done [0.23 sec]
Total number of reads mapped (incl. all reads file sections searched): 5923594
Writing aligned FASTA/FASTQ ... done [37.81 sec]
Writing not-aligned FASTA/FASTQ ... done [77.04 sec]
Begin mmap reads section # 2:
Time to mmap reads and set up pointers [12.64 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
Loading index part 1/1 ... done [1.12 sec]
Begin index search ... done [1005.13 sec]
Freeing index ... done [0.22 sec]
Total number of reads mapped (incl. all reads file sections searched): 11825938
Writing aligned FASTA/FASTQ ... done [39.92 sec]
Writing not-aligned FASTA/FASTQ ... done [77.70 sec]
Begin mmap reads section # 3:
Time to mmap reads and set up pointers [8.70 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta
Loading index part 1/1 ... done [1.14 sec]
Begin index search ... done [705.21 sec]
Freeing index ... done [0.22 sec]
Total number of reads mapped (incl. all reads file sections searched): 15887708
Writing aligned FASTA/FASTQ ... done [25.90 sec]
Writing not-aligned FASTA/FASTQ ... done [53.30 sec]
打开log文件可以查看到如下的统计信息:
Results:
Total reads = 48978212
Total reads passing E-value threshold = 15887708 (32.44%)
Total reads failing E-value threshold = 33090504 (67.56%)
Minimum read length = 92
Maximum read length = 141
Mean read length = 138
By database:
/Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-18s-id95.fasta 32.44%
根据该信息总结后可以得到结论:HeLa.fq转录组测序数据中有15887708条序列来源于真核18s rRNA,这些序列占总序列的32.44%;剩下67.56%的序列为非真菌18s rRNA。
10. 使用sortmerna鉴定HeLa.fq测序结果中的真核28SrRNA
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/silva-euk-28s: --reads /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.28s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.non.28s --log -a 32 -m 6000 -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Computing read file statistics ... done [55.48 sec]
size of reads file: 16920775684 bytes
partial section(s) to be executed: 3 of size 6291456000 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are not output
Number of threads = 32
Begin mmap reads section # 1:
Time to mmap reads and set up pointers [8.65 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.612082
Gumbel K = 0.345772
Minimal SW score based on E-value = 61
Loading index part 1/1 ... done [0.93 sec]
Begin index search ... done [1031.49 sec]
Freeing index ... done [0.27 sec]
Total number of reads mapped (incl. all reads file sections searched): 7873708
Writing aligned FASTA/FASTQ ... done [49.52 sec]
Writing not-aligned FASTA/FASTQ ... done [65.20 sec]
Begin mmap reads section # 2:
Time to mmap reads and set up pointers [8.59 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
Loading index part 1/1 ... done [1.14 sec]
Begin index search ... done [864.75 sec]
Freeing index ... done [0.26 sec]
Total number of reads mapped (incl. all reads file sections searched): 15792824
Writing aligned FASTA/FASTQ ... done [49.99 sec]
Writing not-aligned FASTA/FASTQ ... done [64.71 sec]
Begin mmap reads section # 3:
Time to mmap reads and set up pointers [5.92 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta
Loading index part 1/1 ... done [1.16 sec]
Begin index search ... done [594.52 sec]
Freeing index ... done [0.23 sec]
Total number of reads mapped (incl. all reads file sections searched): 21265376
Writing aligned FASTA/FASTQ ... done [34.44 sec]
Writing not-aligned FASTA/FASTQ ... done [44.39 sec]
打开log文件可以查看到如下的统计信息:
Results:
Total reads = 48978212
Total reads passing E-value threshold = 21265376 (43.42%)
Total reads failing E-value threshold = 27712836 (56.58%)
Minimum read length = 92
Maximum read length = 141
Mean read length = 138
By database:
/Data/lizexing/software/sortmerna-2.1b/rRNA_databases/silva-euk-28s-id98.fasta 43.42%
根据该信息总结后可以得到结论:HeLa.fq转录组测序数据中有21265376条序列来源于真核28s rRNA,这些序列占总序列的43.42%;剩下56.58%的序列为非真菌28s rRNA。
11. 为45SrRNA序列建索引
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./indexdb_rna --ref ./rRNA_databases/U13369.1.fasta,./index/U13369.1-db -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Parameters summary:
K-mer size: 19
K-mer interval: 1
Maximum positions to store per unique K-mer: 10000
Total number of databases to index: 1
Begin indexing file ./rRNA_databases/U13369.1.fasta under index name ./index/U13369.1-db:
Collecting sequence distribution statistics .. done [0.000744 sec]
start index part # 0:
(1/3) building burst tries .. done [0.027380 sec]
(2/3) building CMPH hash .. done [0.014562 sec]
(3/3) building position lookup tables .. done [0.012918 sec]
total number of sequences in this part = 1
temporary file was here: /tmp/sortmerna_keys_969085.txt
writing kmer data to ./index/U13369.1-db.kmer_0.dat
writing burst tries to ./index/U13369.1-db.bursttrie_0.dat
writing position lookup table to ./index/U13369.1-db.pos_0.dat
writing nucleotide distribution statistics to ./index/U13369.1-db.stats
done.
12. 使用sortmerna鉴定293T.fq测序结果中的真核45SrRNA
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta,/Data/lizexing/software/sortmerna-2.1b/index/U13369.1-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/293T.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.45s --sam --SQ --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.non.45s --log -a 32 -m 6000 -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Computing read file statistics ... done [43.55 sec]
size of reads file: 13479394926 bytes
partial section(s) to be executed: 3 of size 6291456000 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are output
Number of threads = 32
Begin mmap reads section # 1:
Time to mmap reads and set up pointers [8.59 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.580217
Gumbel K = 0.309932
Minimal SW score based on E-value = 54
Loading index part 1/1 ... done [0.06 sec]
Begin index search ... done [168.79 sec]
Freeing index ... done [0.01 sec]
Total number of reads mapped (incl. all reads file sections searched): 13141967
Writing aligned FASTA/FASTQ ... done [82.47 sec]
Writing not-aligned FASTA/FASTQ ... done [32.56 sec]
Begin mmap reads section # 2:
Time to mmap reads and set up pointers [8.66 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
Loading index part 1/1 ... done [0.04 sec]
Begin index search ... done [171.32 sec]
Freeing index ... done [0.01 sec]
Total number of reads mapped (incl. all reads file sections searched): 26318920
Writing aligned FASTA/FASTQ ... done [82.95 sec]
Writing not-aligned FASTA/FASTQ ... done [32.56 sec]
Begin mmap reads section # 3:
Time to mmap reads and set up pointers [1.67 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
Loading index part 1/1 ... done [0.05 sec]
Begin index search ... done [23.57 sec]
Freeing index ... done [0.00 sec]
Total number of reads mapped (incl. all reads file sections searched): 28199824
Writing aligned FASTA/FASTQ ... done [11.78 sec]
Writing not-aligned FASTA/FASTQ ... done [4.57 sec]
打开log文件可以查看到如下的统计信息:
Results:
Total reads = 39137348
Total reads passing E-value threshold = 28199824 (72.05%)
Total reads failing E-value threshold = 10937524 (27.95%)
Minimum read length = 92
Maximum read length = 141
Mean read length = 137
By database:
/Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta 72.05%
根据该信息总结后可以得到结论:293T.fq转录组测序数据中有28199824条序列来源于真核45s rRNA,这些序列占总序列的72.05%;剩下27.95%的序列为非真菌45s rRNA。
13. 使用sortmerna鉴定HCT116.fq测序结果中的真核45SrRNA
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta,/Data/lizexing/software/sortmerna-2.1b/index/U13369.1-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.45s --sam --SQ --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.non.45s --log -a 16 -m 6000 -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Computing read file statistics ... done [52.94 sec]
size of reads file: 16549783582 bytes
partial section(s) to be executed: 3 of size 6291456000 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are output
Number of threads = 16
Begin mmap reads section # 1:
Time to mmap reads and set up pointers [8.60 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.580217
Gumbel K = 0.309932
Minimal SW score based on E-value = 55
Loading index part 1/1 ... done [0.04 sec]
Begin index search ... done [149.36 sec]
Freeing index ... done [0.01 sec]
Total number of reads mapped (incl. all reads file sections searched): 13211959
Writing aligned FASTA/FASTQ ... done [82.95 sec]
Writing not-aligned FASTA/FASTQ ... done [32.57 sec]
Begin mmap reads section # 2:
Time to mmap reads and set up pointers [8.60 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
Loading index part 1/1 ... done [0.04 sec]
Begin index search ... done [150.39 sec]
Freeing index ... done [0.00 sec]
Total number of reads mapped (incl. all reads file sections searched): 26433911
Writing aligned FASTA/FASTQ ... done [83.37 sec]
Writing not-aligned FASTA/FASTQ ... done [32.61 sec]
Begin mmap reads section # 3:
Time to mmap reads and set up pointers [5.45 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
Loading index part 1/1 ... done [0.04 sec]
Begin index search ... done [85.70 sec]
Freeing index ... done [0.00 sec]
Total number of reads mapped (incl. all reads file sections searched): 34790270
Writing aligned FASTA/FASTQ ... done [52.70 sec]
Writing not-aligned FASTA/FASTQ ... done [20.33 sec]
打开log文件可以查看到如下的统计信息:
Results:
Total reads = 48208862
Total reads passing E-value threshold = 34790270 (72.17%)
Total reads failing E-value threshold = 13418592 (27.83%)
Minimum read length = 92
Maximum read length = 141
Mean read length = 137
By database:
/Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta 72.17%
根据该信息总结后可以得到结论:HCT116.fq转录组测序数据中有34790270条序列来源于真核45s rRNA,这些序列占总序列的72.17%;剩下27.83%的序列为非真菌45s rRNA。
14. 使用sortmerna鉴定HeLa.fq测序结果中的真核45SrRNA
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta,/Data/lizexing/software/sortmerna-2.1b/index/U13369.1-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.45s --sam --SQ --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.non.45s --log -a 16 -m 6000 -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Computing read file statistics ... done [54.58 sec]
size of reads file: 16920775684 bytes
partial section(s) to be executed: 3 of size 6291456000 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are output
Number of threads = 16
Begin mmap reads section # 1:
Time to mmap reads and set up pointers [8.63 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.580217
Gumbel K = 0.309932
Minimal SW score based on E-value = 55
Loading index part 1/1 ... done [0.04 sec]
Begin index search ... done [169.61 sec]
Freeing index ... done [0.01 sec]
Total number of reads mapped (incl. all reads file sections searched): 14067778
Writing aligned FASTA/FASTQ ... done [107.14 sec]
Writing not-aligned FASTA/FASTQ ... done [26.67 sec]
Begin mmap reads section # 2:
Time to mmap reads and set up pointers [8.72 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
Loading index part 1/1 ... done [0.06 sec]
Begin index search ... done [178.96 sec]
Freeing index ... done [0.01 sec]
Total number of reads mapped (incl. all reads file sections searched): 28155528
Writing aligned FASTA/FASTQ ... done [88.82 sec]
Writing not-aligned FASTA/FASTQ ... done [26.61 sec]
Begin mmap reads section # 3:
Time to mmap reads and set up pointers [6.31 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta
Loading index part 1/1 ... done [0.04 sec]
Begin index search ... done [110.25 sec]
Freeing index ... done [0.01 sec]
Total number of reads mapped (incl. all reads file sections searched): 37871016
Writing aligned FASTA/FASTQ ... done [61.17 sec]
Writing not-aligned FASTA/FASTQ ... done [18.17 sec]
打开log文件可以查看到如下的统计信息:
Results:
Total reads = 48978212
Total reads passing E-value threshold = 37871016 (77.32%)
Total reads failing E-value threshold = 11107196 (22.68%)
Minimum read length = 92
Maximum read length = 141
Mean read length = 138
By database:
/Data/lizexing/software/sortmerna-2.1b/rRNA_databases/U13369.1.fasta 77.32%
根据该信息总结后可以得到结论:HeLa.fq转录组测序数据中有37871016条序列来源于真核45s rRNA,这些序列占总序列的77.32%;剩下22.68%的序列为非真菌45s rRNA。
15. 使用sortmerna鉴定293T.fq测序结果中的真核5SrRNA
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/rfam-5s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/293T.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.5s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.non.5s --log -a 16 -m 6000 -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Computing read file statistics ... done [48.46 sec]
size of reads file: 13479394926 bytes
partial section(s) to be executed: 3 of size 6291456000 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are not output
Number of threads = 16
Begin mmap reads section # 1:
Time to mmap reads and set up pointers [8.67 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.616694
Gumbel K = 0.342032
Minimal SW score based on E-value = 59
Loading index part 1/1 ... done [0.82 sec]
Begin index search ... done [62.16 sec]
Freeing index ... done [0.10 sec]
Total number of reads mapped (incl. all reads file sections searched): 45578
Writing aligned FASTA/FASTQ ... done [0.49 sec]
Writing not-aligned FASTA/FASTQ ... done [113.84 sec]
Begin mmap reads section # 2:
Time to mmap reads and set up pointers [8.76 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
Loading index part 1/1 ... done [0.62 sec]
Begin index search ... done [63.52 sec]
Freeing index ... done [0.08 sec]
Total number of reads mapped (incl. all reads file sections searched): 91268
Writing aligned FASTA/FASTQ ... done [0.50 sec]
Writing not-aligned FASTA/FASTQ ... done [114.07 sec]
Begin mmap reads section # 3:
Time to mmap reads and set up pointers [1.41 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
Loading index part 1/1 ... done [0.61 sec]
Begin index search ... done [9.02 sec]
Freeing index ... done [0.08 sec]
Total number of reads mapped (incl. all reads file sections searched): 97847
Writing aligned FASTA/FASTQ ... done [0.05 sec]
Writing not-aligned FASTA/FASTQ ... done [16.16 sec]
打开log文件可以查看到如下的统计信息:
Results:
Total reads = 39137348
Total reads passing E-value threshold = 97847 (0.25%)
Total reads failing E-value threshold = 39039501 (99.75%)
Minimum read length = 92
Maximum read length = 141
Mean read length = 137
By database:
/Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta 0.25%
根据该信息总结后可以得到结论:293T.fq转录组测序数据中有97847条序列来源于真核5s rRNA,这些序列占总序列的0.25%;剩下99.75%的序列为非真菌5s rRNA。
15. 使用sortmerna鉴定HCT116.fq测序结果中的真核5SrRNA
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/rfam-5s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.5s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.non.5s --log -a 16 -m 6000 -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Computing read file statistics ... done [53.93 sec]
size of reads file: 16549783582 bytes
partial section(s) to be executed: 3 of size 6291456000 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are not output
Number of threads = 16
Begin mmap reads section # 1:
Time to mmap reads and set up pointers [8.64 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.616694
Gumbel K = 0.342032
Minimal SW score based on E-value = 59
Loading index part 1/1 ... done [0.51 sec]
Begin index search ... done [58.81 sec]
Freeing index ... done [0.11 sec]
Total number of reads mapped (incl. all reads file sections searched): 30875
Writing aligned FASTA/FASTQ ... done [0.30 sec]
Writing not-aligned FASTA/FASTQ ... done [114.13 sec]
Begin mmap reads section # 2:
Time to mmap reads and set up pointers [8.79 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
Loading index part 1/1 ... done [0.63 sec]
Begin index search ... done [59.62 sec]
Freeing index ... done [0.09 sec]
Total number of reads mapped (incl. all reads file sections searched): 61531
Writing aligned FASTA/FASTQ ... done [0.32 sec]
Writing not-aligned FASTA/FASTQ ... done [114.20 sec]
Begin mmap reads section # 3:
Time to mmap reads and set up pointers [5.51 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
Loading index part 1/1 ... done [0.62 sec]
Begin index search ... done [36.67 sec]
Freeing index ... done [0.09 sec]
Total number of reads mapped (incl. all reads file sections searched): 80619
Writing aligned FASTA/FASTQ ... done [0.16 sec]
Writing not-aligned FASTA/FASTQ ... done [71.82 sec]
打开log文件可以查看到如下的统计信息:
Results:
Total reads = 48208862
Total reads passing E-value threshold = 80619 (0.17%)
Total reads failing E-value threshold = 48128243 (99.83%)
Minimum read length = 92
Maximum read length = 141
Mean read length = 137
By database:
/Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta 0.17%
根据该信息总结后可以得到结论:HCT116.fq转录组测序数据中有80619条序列来源于真核5s rRNA,这些序列占总序列的0.17%;剩下99.83%的序列为非真菌5s rRNA。
16. 使用sortmerna鉴定HeLa.fq测序结果中的真核5SrRNA
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/rfam-5s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.5s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.non.5s --log -a 16 -m 6000 -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Computing read file statistics ... done [54.73 sec]
size of reads file: 16920775684 bytes
partial section(s) to be executed: 3 of size 6291456000 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are not output
Number of threads = 16
Begin mmap reads section # 1:
Time to mmap reads and set up pointers [8.58 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.616694
Gumbel K = 0.342032
Minimal SW score based on E-value = 59
Loading index part 1/1 ... done [0.46 sec]
Begin index search ... done [53.51 sec]
Freeing index ... done [0.09 sec]
Total number of reads mapped (incl. all reads file sections searched): 31476
Writing aligned FASTA/FASTQ ... done [0.26 sec]
Writing not-aligned FASTA/FASTQ ... done [114.45 sec]
Begin mmap reads section # 2:
Time to mmap reads and set up pointers [8.94 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
Loading index part 1/1 ... done [0.61 sec]
Begin index search ... done [54.88 sec]
Freeing index ... done [0.09 sec]
Total number of reads mapped (incl. all reads file sections searched): 63234
Writing aligned FASTA/FASTQ ... done [0.34 sec]
Writing not-aligned FASTA/FASTQ ... done [114.15 sec]
Begin mmap reads section # 3:
Time to mmap reads and set up pointers [5.90 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta
Loading index part 1/1 ... done [0.61 sec]
Begin index search ... done [39.22 sec]
Freeing index ... done [0.09 sec]
Total number of reads mapped (incl. all reads file sections searched): 85138
Writing aligned FASTA/FASTQ ... done [0.21 sec]
Writing not-aligned FASTA/FASTQ ... done [78.33 sec]
打开log文件可以查看到如下的统计信息:
Results:
Total reads = 48978212
Total reads passing E-value threshold = 85138 (0.17%)
Total reads failing E-value threshold = 48893074 (99.83%)
Minimum read length = 92
Maximum read length = 141
Mean read length = 138
By database:
/Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5s-database-id98.fasta 0.17%
根据该信息总结后可以得到结论:HeLa.fq转录组测序数据中有85138条序列来源于真核5s rRNA,这些序列占总序列的0.17%;剩下99.83%的序列为非真菌5s rRNA。
17. 使用sortmerna鉴定293T.fq测序结果中的真核5.8SrRNA
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/rfam-5.8s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/293T.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.5.8s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/293T.fq.non.5.8s --log -a 16 -m 6000 -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Computing read file statistics ... done [43.59 sec]
size of reads file: 13479394926 bytes
partial section(s) to be executed: 3 of size 6291456000 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are not output
Number of threads = 16
Begin mmap reads section # 1:
Time to mmap reads and set up pointers [8.60 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.617555
Gumbel K = 0.343861
Minimal SW score based on E-value = 57
Loading index part 1/1 ... done [0.50 sec]
Begin index search ... done [36.39 sec]
Freeing index ... done [0.03 sec]
Total number of reads mapped (incl. all reads file sections searched): 316326
Writing aligned FASTA/FASTQ ... done [2.19 sec]
Writing not-aligned FASTA/FASTQ ... done [112.72 sec]
Begin mmap reads section # 2:
Time to mmap reads and set up pointers [8.83 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
Loading index part 1/1 ... done [0.21 sec]
Begin index search ... done [35.90 sec]
Freeing index ... done [0.04 sec]
Total number of reads mapped (incl. all reads file sections searched): 632442
Writing aligned FASTA/FASTQ ... done [2.15 sec]
Writing not-aligned FASTA/FASTQ ... done [112.38 sec]
Begin mmap reads section # 3:
Time to mmap reads and set up pointers [1.49 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
Loading index part 1/1 ... done [0.21 sec]
Begin index search ... done [5.57 sec]
Freeing index ... done [0.03 sec]
Total number of reads mapped (incl. all reads file sections searched): 677733
Writing aligned FASTA/FASTQ ... done [0.31 sec]
Writing not-aligned FASTA/FASTQ ... done [16.08 sec]
打开log文件可以查看到如下的统计信息:
Results:
Total reads = 39137348
Total reads passing E-value threshold = 677733 (1.73%)
Total reads failing E-value threshold = 38459615 (98.27%)
Minimum read length = 92
Maximum read length = 141
Mean read length = 137
By database:
/Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta 1.73%
根据该信息总结后可以得到结论:293T.fq转录组测序数据中有677733条序列来源于真核5.8s rRNA,这些序列占总序列的1.73%;剩下98.27%的序列为非真菌5.8s rRNA。
18. 使用sortmerna鉴定HCT116.fq测序结果中的真核5.8SrRNA
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/rfam-5.8s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.5.8s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HCT116.fq.non.5.8s --log -a 16 -m 6000 -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Computing read file statistics ... done [53.45 sec]
size of reads file: 16549783582 bytes
partial section(s) to be executed: 3 of size 6291456000 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are not output
Number of threads = 16
Begin mmap reads section # 1:
Time to mmap reads and set up pointers [8.58 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.617555
Gumbel K = 0.343861
Minimal SW score based on E-value = 57
Loading index part 1/1 ... done [0.15 sec]
Begin index search ... done [34.64 sec]
Freeing index ... done [0.03 sec]
Total number of reads mapped (incl. all reads file sections searched): 356496
Writing aligned FASTA/FASTQ ... done [2.51 sec]
Writing not-aligned FASTA/FASTQ ... done [112.73 sec]
Begin mmap reads section # 2:
Time to mmap reads and set up pointers [8.61 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
Loading index part 1/1 ... done [0.21 sec]
Begin index search ... done [34.83 sec]
Freeing index ... done [0.03 sec]
Total number of reads mapped (incl. all reads file sections searched): 713138
Writing aligned FASTA/FASTQ ... done [2.50 sec]
Writing not-aligned FASTA/FASTQ ... done [112.49 sec]
Begin mmap reads section # 3:
Time to mmap reads and set up pointers [5.43 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
Loading index part 1/1 ... done [0.20 sec]
Begin index search ... done [22.51 sec]
Freeing index ... done [0.03 sec]
Total number of reads mapped (incl. all reads file sections searched): 937341
Writing aligned FASTA/FASTQ ... done [1.52 sec]
Writing not-aligned FASTA/FASTQ ... done [70.75 sec]
打开log文件可以查看到如下的统计信息:
Results:
Total reads = 48208862
Total reads passing E-value threshold = 937341 (1.94%)
Total reads failing E-value threshold = 47271521 (98.06%)
Minimum read length = 92
Maximum read length = 141
Mean read length = 137
By database:
/Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta 1.94%
根据该信息总结后可以得到结论:HCT116.fq转录组测序数据中有937341条序列来源于真核5.8s rRNA,这些序列占总序列的1.94%;剩下98.06%的序列为非真菌5.8s rRNA。
19. 使用sortmerna鉴定HeLa.fq测序结果中的真核5.8SrRNA
(base) lizexing@bio:~/software/sortmerna-2.1b$ ./sortmerna --ref /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta,/Data/lizexing/software/sortmerna-2.1b/index/rfam-5.8s-db: --reads /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq --aligned /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.5.8s --sam --num_alignments 1 --fastx --other /Data/lizexing/projects/xindi/Data/CleanData/HeLa.fq.non.5.8s --log -a 16 -m 6000 -v
Program: SortMeRNA version 2.1b, 03/03/2016
Copyright: 2012-16 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
2014-16 Knight Lab:
Department of Pediatrics, UCSD, La Jolla,
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contact: Evguenia Kopylova, jenya.kopylov@gmail.com
Laurent Noé, laurent.noe@lifl.fr
Hélène Touzet, helene.touzet@lifl.fr
Computing read file statistics ... done [54.99 sec]
size of reads file: 16920775684 bytes
partial section(s) to be executed: 3 of size 6291456000 bytes
Parameters summary:
Number of seeds = 2
Edges = 4 (as integer)
SW match = 2
SW mismatch = -3
SW gap open penalty = 5
SW gap extend penalty = 2
SW ambiguous nucleotide = -3
SQ tags are not output
Number of threads = 16
Begin mmap reads section # 1:
Time to mmap reads and set up pointers [8.51 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
Seed length = 18
Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
Gumbel lambda = 0.617555
Gumbel K = 0.343861
Minimal SW score based on E-value = 57
Loading index part 1/1 ... done [0.15 sec]
Begin index search ... done [32.53 sec]
Freeing index ... done [0.04 sec]
Total number of reads mapped (incl. all reads file sections searched): 368962
Writing aligned FASTA/FASTQ ... done [2.52 sec]
Writing not-aligned FASTA/FASTQ ... done [111.97 sec]
Begin mmap reads section # 2:
Time to mmap reads and set up pointers [8.67 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
Loading index part 1/1 ... done [0.21 sec]
Begin index search ... done [32.81 sec]
Freeing index ... done [0.03 sec]
Total number of reads mapped (incl. all reads file sections searched): 735202
Writing aligned FASTA/FASTQ ... done [2.49 sec]
Writing not-aligned FASTA/FASTQ ... done [112.03 sec]
Begin mmap reads section # 3:
Time to mmap reads and set up pointers [6.10 sec]
Begin analysis of: /Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta
Loading index part 1/1 ... done [0.21 sec]
Begin index search ... done [22.22 sec]
Freeing index ... done [0.03 sec]
Total number of reads mapped (incl. all reads file sections searched): 988048
Writing aligned FASTA/FASTQ ... done [1.70 sec]
Writing not-aligned FASTA/FASTQ ... done [77.04 sec]
打开log文件可以查看到如下的统计信息:
Results:
Total reads = 48978212
Total reads passing E-value threshold = 988048 (2.02%)
Total reads failing E-value threshold = 47990164 (97.98%)
Minimum read length = 92
Maximum read length = 141
Mean read length = 138
By database:
/Data/lizexing/software/sortmerna-2.1b/rRNA_databases/rfam-5.8s-database-id98.fasta 2.02%
根据该信息总结后可以得到结论:HeLa.fq转录组测序数据中有988048条序列来源于真核5.8s rRNA,这些序列占总序列的2.02%;剩下97.98%的序列为非真菌5.8s rRNA。
20. 使用如下脚本script_1对三组数据转换为bw格式
#!/bin/bash
# 上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
# Program
# This program is used for Xindi data analysis.
# History
# 2021/08/26 zexing First release
# 设置变量${dir}为常用目录
dir=/Data/lizexing/projects/xindi/Data/CleanData
# 利用for循环进行后续操作
for i in 293T.fq.45s HCT116.fq.45s HeLa.fq.45s
do
# 对数据进行格式转换
samtools view -@ 16 -S ${dir}/${i}.sam -1b -o ${dir}/${i}.bam
# 对数据进行排序
samtools sort -@ 16 -l 5 -o ${dir}/${i}.bam.sort ${dir}/${i}.bam
# 对数据生成目录
samtools index -@ 16 ${dir}/${i}.bam.sort
# bamCoverage命令转换文件格式
bamCoverage -p 16 -v -b ${dir}/${i}.bam.sort -o ${dir}/${i}.bam.sort.bw
done
在后台运script_1:
nohup bash script_1 > script_1_log &
21. 使用Trim Galore软件对三组数据进行质控,去掉20bp以下的reads
参考文章:Trim Galore ——自动检测adapter的质控软件
参数说明:
--quality:设定Phred quality score阈值,默认为20。
--phred33::选择-phred33或者-phred64,表示测序平台使用的Phred quality score。
--adapter:输入adapter序列。也可以不输入,Trim Galore!会自动寻找可能性最高的平台对应的adapter。自动搜选的平台三个,也直接显式输入这三种平台,即--illumina、--nextera和--small_rna。
--stringency:设定可以忍受的前后adapter重叠的碱基数,默认为1(非常苛刻)。可以适度放宽,因为后一个adapter几乎不可能被测序仪读到。
--length:设定输出reads长度阈值,小于设定值会被抛弃。
--paired:对于双端测序结果,一对reads中,如果有一个被剔除,那么另一个会被同样抛弃,而不管是否达到标准。
--retain_unpaired:对于双端测序结果,一对reads中,如果一个read达到标准,但是对应的另一个要被抛弃,达到标准的read会被单独保存为一个文件。
--gzip和--dont_gzip:清洗后的数据zip打包或者不打包。
--output_dir:输入目录。需要提前建立目录,否则运行会报错。
-- trim-n : 移除read一端的reads
1.对HeLa细胞数据进行处理
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 -j 16 --paired /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data2.fq.gz
2.对HCT116细胞数据进行处理
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 -j 16 --paired /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data2.fq.gz
3.对293T细胞数据进行处理
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ trim_galore -q 20 --phred33 --stringency 3 --length 20 -e 0.1 --paired /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data2.fq.gz
22. 使用gffread-0.12.1软件将45S rRNA的GFF3注释文件转换为GTF格式
参考文章:gffcompare和gffread
Usage: gffread <input_gff> [-g <genomic_seqs_fasta> | <dir>][-s <seq_info.fsize>]
[-o <outfile>] [-t <trackname>] [-r [[<strand>]<chr>:]<start>..<end> [-R]]
[-CTVNJMKQAFPGUBHZWTOLE] [-w <exons.fa>] [-x <cds.fa>] [-y <tr_cds.fa>]
[-i <maxintron>] [--stream] [--bed] [--table <attrlist>] [--sort-by <ref.lst>]
(base) lizexing@bio:~/reference/h_45S_rDNA$ gffread U13369.1.gff3 -T -o U13369.1.gtf
(base) lizexing@bio:~/reference/h_5S_rDNA$ gffread NR_023363.1.gff3 -T -o NR_023363.1.gtf
23. 使用STAR软件对三组数据与45S rRNA进行比对
参考文章:比对软件STAR的使用
Step 1 - Build a 45S rRNA index构建索引
--runThreadN是指你要用几个cpu来运行;
--genomeDir构建索引输出文件的目录;
--genomeFastaFiles你的基因组fasta文件所在的目录
(base) lizexing@bio:~$ STAR --runMode genomeGenerate --runThreadN 16 --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --genomeFastaFiles /Data/lizexing/reference/h_45S_rDNA/U13369.1.fasta
Sep 05 14:14:23 ..... started STAR run
Sep 05 14:14:23 ... starting to generate Genome files
!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=42999, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 6
Sep 05 14:14:23 ... starting to sort Suffix Array. This may take a long time...
Sep 05 14:14:23 ... sorting Suffix Array chunks and saving them to disk...
Sep 05 14:14:23 ... loading chunks from disk, packing SA...
Sep 05 14:14:23 ... finished generating suffix array
Sep 05 14:14:23 ... generating Suffix Array index
Sep 05 14:14:26 ... completed Suffix Array index
Sep 05 14:14:26 ... writing Genome to disk ...
Sep 05 14:14:26 ... writing Suffix Array to disk ...
Sep 05 14:14:26 ... writing SAindex to disk
Sep 05 14:14:28 ..... finished successfully
Step 2 - STAR比对用法和结果说明
Usage: STAR [options]... --genomeDir /path/to/genome/index/ --readFilesIn R1.fq R2.fq
--runThreadN 40 \ #线程数
--runMode alignReads \ #比对模式
--readFilesCommand zcat \ #说明你的fastq文件是压缩形式的,就是.gz结尾的,不加的话会报错
--quantMode TranscriptomeSAM GeneCounts \ #将reads比对至转录本序列
--sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf #加入对应的注释文件
--twopassMode Basic \ #先按索引进行第一次比对,而后把第一次比对发现的新剪切位点信息加入到索引中进行第二次比对。这个参数可以保证更精准的比对情况,但是费时也费内存。
--outSAMtype BAM Unsorted \ #输出BAM文件,不进行排序。如果不加这一行,只输出SAM文件。
--outSAMunmapped None \
--genomeDir /gpfs/home/fangy04/downloads/STAR_index/GRCh38/ \ #索引文件目录
--readFilesIn /gpfs/home/fangy04/downloads/SRR8112732_1.fastq.gz /gpfs/home/fangy04/downloads/SRR8112732_2.fastq.gz \ #两个fastq文件目录
--outFileNamePrefix DRB_TT_seq_SRR8112732 #输出文件前缀
--outReadsUnmapped # output of unmapped and partially mapped (i.e. mapped only one mate of a paired end read) reads in separate file(s). Fastx ... output in separate fasta/fastq files, Unmapped.out.mate1/2
--outSAMunmapped # output of unmapped reads in the SAM format
9216920116 Jun 28 17:06 DRB_TT_seq_SRR8112732Aligned.out.bam #这个文件是最重要的,用来后续进行remove duplicates和sort
1166235552 Jun 28 17:06 DRB_TT_seq_SRR8112732Aligned.toTranscriptome.out.bam #这个文件是那些比对到转录本上的reads组成的bam文件
2034 Jun 28 17:06 DRB_TT_seq_SRR8112732Log.final.out
20188 Jun 28 17:06 DRB_TT_seq_SRR8112732Log.out
2571 Jun 28 17:06 DRB_TT_seq_SRR8112732Log.progress.out
1585521 Jun 28 17:06 DRB_TT_seq_SRR8112732ReadsPerGene.out.tab
6732305 Jun 28 17:06 DRB_TT_seq_SRR8112732SJ.out.tab #剪切的信息
8192 Jun 28 16:51 DRB_TT_seq_SRR8112732_STARgenome
8192 Jun 28 16:51 DRB_TT_seq_SRR8112732_STARpass1
Step 3 - 对293T测序数据进行比对
(base) lizexing@bio:~$ STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data2_val_2.fq.gz --outFileNamePrefix 293T-val --outReadsUnmapped Fastx
Sep 11 15:57:29 ..... started STAR run
Sep 11 15:57:29 ..... loading genome
Sep 11 15:57:31 ..... processing annotations GTF
Sep 11 15:57:31 ..... started 1st pass mapping
Sep 11 16:01:20 ..... finished 1st pass mapping
Sep 11 16:01:20 ..... inserting junctions into the genome indices
Sep 11 16:02:05 ..... started mapping
Sep 11 16:17:08 ..... finished mapping
Sep 11 16:17:08 ..... finished successfully
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat 293T-valLog.final.out
Started job on | Sep 05 15:10:34
Started mapping on | Sep 05 15:13:46
Finished on | Sep 05 15:22:49
Mapping speed, Million of reads per hour | 129.71
Number of input reads | 19564161 # fastq文件的信息
Average input read length | 275 # read长度
UNIQUE READS: # 唯一比对上的reads数量
Uniquely mapped reads number | 11634481
Uniquely mapped reads % | 59.47%
Average mapped length | 273.27
Number of splices: Total | 371950 # 剪切数
Number of splices: Annotated (sjdb) | 303947
Number of splices: GT/AG | 248969
Number of splices: GC/AG | 10039
Number of splices: AT/AC | 62
Number of splices: Non-canonical | 112880 # 非典型剪切数
Mismatch rate per base, % | 0.33%
Deletion rate per base | 0.06%
Deletion average length | 1.16
Insertion rate per base | 0.20%
Insertion average length | 3.50
MULTI-MAPPING READS: # 多重比对数
Number of reads mapped to multiple loci | 1184998
% of reads mapped to multiple loci | 6.06%
Number of reads mapped to too many loci | 1350
% of reads mapped to too many loci | 0.01%
UNMAPPED READS: # 未比对上的reads
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 847845
% of reads unmapped: too short | 4.33%
Number of reads unmapped: other | 5895487
% of reads unmapped: other | 30.13%
CHIMERIC READS: # 嵌合的reads数
Number of chimeric reads | 0
% of chimeric reads | 0.00%
Step 4 - 对HCT116测序数据进行比对
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data2_val_2.fq.gz --outFileNamePrefix HCT116-val --outReadsUnmapped Fastx
Sep 05 15:29:57 ..... started STAR run
Sep 05 15:29:57 ..... loading genome
Sep 05 15:29:58 ..... processing annotations GTF
Sep 05 15:29:58 ..... started 1st pass mapping
Sep 05 15:32:59 ..... finished 1st pass mapping
Sep 05 15:32:59 ..... inserting junctions into the genome indices
Sep 05 15:33:44 ..... started mapping
Sep 05 15:44:53 ..... finished mapping
Sep 05 15:44:53 ..... finished successfully
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HCT116-valLog.final.out
Started job on | Sep 05 15:29:57
Started mapping on | Sep 05 15:33:44
Finished on | Sep 05 15:44:53
Mapping speed, Million of reads per hour | 129.69
Number of input reads | 24101481 # fastq文件的信息
Average input read length | 274 # read长度
UNIQUE READS:
Uniquely mapped reads number | 14599303 # 唯一比对上的reads数量
Uniquely mapped reads % | 60.57%
Average mapped length | 272.26
Number of splices: Total | 554747
Number of splices: Annotated (sjdb) | 458962 # 剪切数
Number of splices: GT/AG | 478482
Number of splices: GC/AG | 16214
Number of splices: AT/AC | 123
Number of splices: Non-canonical | 59928 # 非典型剪切数
Mismatch rate per base, % | 0.34%
Deletion rate per base | 0.06%
Deletion average length | 1.12
Insertion rate per base | 0.20%
Insertion average length | 3.30
MULTI-MAPPING READS: # 多重比对数
Number of reads mapped to multiple loci | 1191559
% of reads mapped to multiple loci | 4.94%
Number of reads mapped to too many loci | 2491
% of reads mapped to too many loci | 0.01%
UNMAPPED READS: # 未比对上的reads
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 1180944
% of reads unmapped: too short | 4.90%
Number of reads unmapped: other | 7127184
% of reads unmapped: other | 29.57%
CHIMERIC READS: # 嵌合的reads数
Number of chimeric reads | 0
% of chimeric reads | 0.00%
Step 5 - 对HeLa测序数据进行比对
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_45S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data2_val_2.fq.gz --outFileNamePrefix HeLa-val --outReadsUnmapped Fastx
Sep 05 15:53:18 ..... started STAR run
Sep 05 15:53:18 ..... loading genome
Sep 05 15:53:19 ..... processing annotations GTF
Sep 05 15:53:19 ..... started 1st pass mapping
Sep 05 15:56:12 ..... finished 1st pass mapping
Sep 05 15:56:12 ..... inserting junctions into the genome indices
Sep 05 15:56:57 ..... started mapping
Sep 05 16:08:07 ..... finished mapping
Sep 05 16:08:07 ..... finished successfully
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HeLa-valLog.final.out
Started job on | Sep 05 15:53:18
Started mapping on | Sep 05 15:56:57
Finished on | Sep 05 16:08:07
Mapping speed, Million of reads per hour | 131.55
Number of input reads | 24483719 # fastq文件的信息
Average input read length | 276 # read长度
UNIQUE READS:
Uniquely mapped reads number | 16030941 # 唯一比对上的reads数量
Uniquely mapped reads % | 65.48%
Average mapped length | 274.45
Number of splices: Total | 550274
Number of splices: Annotated (sjdb) | 422196 # 剪切数
Number of splices: GT/AG | 430913
Number of splices: GC/AG | 7196
Number of splices: AT/AC | 41
Number of splices: Non-canonical | 112124 # 非典型剪切数
Mismatch rate per base, % | 0.31%
Deletion rate per base | 0.05%
Deletion average length | 1.11
Insertion rate per base | 0.18%
Insertion average length | 3.11
MULTI-MAPPING READS: # 多重比对数
Number of reads mapped to multiple loci | 1519317
% of reads mapped to multiple loci | 6.21%
Number of reads mapped to too many loci | 1252
% of reads mapped to too many loci | 0.01%
UNMAPPED READS: # 未比对上的reads
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 1008044
% of reads unmapped: too short | 4.12%
Number of reads unmapped: other | 5924165
% of reads unmapped: other | 24.20%
CHIMERIC READS: # 嵌合的reads数
Number of chimeric reads | 0
% of chimeric reads | 0.00%
23. 使用STAR软件对三组数据与5S rRNA进行比对
参考文章:比对软件STAR的使用
Step 1 - Build a 5S rRNA index构建索引
--runThreadN是指你要用几个cpu来运行;
--genomeDir构建索引输出文件的目录;
--genomeFastaFiles你的基因组fasta文件所在的目录
(base) lizexing@bio:~$ STAR --runMode genomeGenerate --runThreadN 32 --genomeDir /Data/lizexing/reference/h_5S_rDNA/ --genomeFastaFiles /Data/lizexing/reference/h_5S_rDNA/NR_023363.1.fasta
Sep 13 12:16:03 ..... started STAR run
Sep 13 12:16:03 ... starting to generate Genome files
!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=121, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 2
Sep 13 12:16:03 ... starting to sort Suffix Array. This may take a long time...
Sep 13 12:16:03 ... sorting Suffix Array chunks and saving them to disk...
Sep 13 12:16:03 ... loading chunks from disk, packing SA...
Sep 13 12:16:03 ... finished generating suffix array
Sep 13 12:16:03 ... generating Suffix Array index
Sep 13 12:16:06 ... completed Suffix Array index
Sep 13 12:16:06 ... writing Genome to disk ...
Sep 13 12:16:06 ... writing Suffix Array to disk ...
Sep 13 12:16:06 ... writing SAindex to disk
Sep 13 12:16:07 ..... finished successfully
Step 2 - STAR比对用法和结果说明
Usage: STAR [options]... --genomeDir /path/to/genome/index/ --readFilesIn R1.fq R2.fq
--runThreadN 40 \ #线程数
--runMode alignReads \ #比对模式
--readFilesCommand zcat \ #说明你的fastq文件是压缩形式的,就是.gz结尾的,不加的话会报错
--quantMode TranscriptomeSAM GeneCounts \ #将reads比对至转录本序列
--sjdbGTFfile /Data/lizexing/reference/h_45S_rDNA/U13369.1.gtf #加入对应的注释文件
--twopassMode Basic \ #先按索引进行第一次比对,而后把第一次比对发现的新剪切位点信息加入到索引中进行第二次比对。这个参数可以保证更精准的比对情况,但是费时也费内存。
--outSAMtype BAM Unsorted \ #输出BAM文件,不进行排序。如果不加这一行,只输出SAM文件。
--outSAMunmapped None \
--genomeDir /gpfs/home/fangy04/downloads/STAR_index/GRCh38/ \ #索引文件目录
--readFilesIn /gpfs/home/fangy04/downloads/SRR8112732_1.fastq.gz /gpfs/home/fangy04/downloads/SRR8112732_2.fastq.gz \ #两个fastq文件目录
--outFileNamePrefix DRB_TT_seq_SRR8112732 #输出文件前缀
--outReadsUnmapped # output of unmapped and partially mapped (i.e. mapped only one mate of a paired end read) reads in separate file(s). Fastx ... output in separate fasta/fastq files, Unmapped.out.mate1/2
--outSAMunmapped # output of unmapped reads in the SAM format
9216920116 Jun 28 17:06 DRB_TT_seq_SRR8112732Aligned.out.bam #这个文件是最重要的,用来后续进行remove duplicates和sort
1166235552 Jun 28 17:06 DRB_TT_seq_SRR8112732Aligned.toTranscriptome.out.bam #这个文件是那些比对到转录本上的reads组成的bam文件
2034 Jun 28 17:06 DRB_TT_seq_SRR8112732Log.final.out
20188 Jun 28 17:06 DRB_TT_seq_SRR8112732Log.out
2571 Jun 28 17:06 DRB_TT_seq_SRR8112732Log.progress.out
1585521 Jun 28 17:06 DRB_TT_seq_SRR8112732ReadsPerGene.out.tab
6732305 Jun 28 17:06 DRB_TT_seq_SRR8112732SJ.out.tab #剪切的信息
8192 Jun 28 16:51 DRB_TT_seq_SRR8112732_STARgenome
8192 Jun 28 16:51 DRB_TT_seq_SRR8112732_STARpass1
Step 3 - 对293T测序数据进行比对
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_5S_rDNA/NR_023363.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_5S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_293T_T_PAR_CLIP_Clean_Data2_val_2.fq.gz --outFileNamePrefix 293T-5S-val --outReadsUnmapped Fastx
fix 293T-5S-val --outReadsUnmapped Fastx
Sep 13 12:18:59 ..... started STAR run
Sep 13 12:18:59 ..... loading genome
Sep 13 12:19:00 ..... processing annotations GTF
Sep 13 12:19:00 ..... started 1st pass mapping
Sep 13 12:20:19 ..... finished 1st pass mapping
Sep 13 12:20:20 ..... started mapping
Sep 13 12:21:47 ..... finished mapping
Sep 13 12:21:47 ..... finished successfully
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat 293T-5S-valLog.final.out
Started job on | Sep 13 12:18:59
Started mapping on | Sep 13 12:20:20
Finished on | Sep 13 12:21:47
Mapping speed, Million of reads per hour | 809.55
Number of input reads | 19564161
Average input read length | 275
UNIQUE READS:
Uniquely mapped reads number | 119
Uniquely mapped reads % | 0.00%
Average mapped length | 194.09
Number of splices: Total | 0
Number of splices: Annotated (sjdb) | 0
Number of splices: GT/AG | 0
Number of splices: GC/AG | 0
Number of splices: AT/AC | 0
Number of splices: Non-canonical | 0
Mismatch rate per base, % | 0.18%
Deletion rate per base | 0.00%
Deletion average length | 0.00
Insertion rate per base | 0.01%
Insertion average length | 1.00
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 0
% of reads mapped to multiple loci | 0.00%
Number of reads mapped to too many loci | 0
% of reads mapped to too many loci | 0.00%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 1548
% of reads unmapped: too short | 0.01%
Number of reads unmapped: other | 19562494
% of reads unmapped: other | 99.99%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
Step 4 - 对HCT116测序数据进行比对
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_5S_rDNA/NR_023363.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_5S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HCT116_T_PAR_CLIP_Clean_Data2_val_2.fq.gz --outFileNamePrefix HCT116-5S-val --outReadsUnmapped Fastx
Sep 13 12:22:29 ..... started STAR run
Sep 13 12:22:29 ..... loading genome
Sep 13 12:22:30 ..... processing annotations GTF
Sep 13 12:22:30 ..... started 1st pass mapping
Sep 13 12:24:24 ..... finished 1st pass mapping
Sep 13 12:24:25 ..... started mapping
Sep 13 12:26:41 ..... finished mapping
Sep 13 12:26:41 ..... finished successfully
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HCT116-5S-valLog.final.out
Started job on | Sep 13 12:22:29
Started mapping on | Sep 13 12:24:25
Finished on | Sep 13 12:26:41
Mapping speed, Million of reads per hour | 637.98
Number of input reads | 24101481
Average input read length | 274
UNIQUE READS:
Uniquely mapped reads number | 155
Uniquely mapped reads % | 0.00%
Average mapped length | 190.32
Number of splices: Total | 0
Number of splices: Annotated (sjdb) | 0
Number of splices: GT/AG | 0
Number of splices: GC/AG | 0
Number of splices: AT/AC | 0
Number of splices: Non-canonical | 0
Mismatch rate per base, % | 0.06%
Deletion rate per base | 0.01%
Deletion average length | 2.00
Insertion rate per base | 0.00%
Insertion average length | 0.00
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 0
% of reads mapped to multiple loci | 0.00%
Number of reads mapped to too many loci | 0
% of reads mapped to too many loci | 0.00%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 2531
% of reads unmapped: too short | 0.01%
Number of reads unmapped: other | 24098795
% of reads unmapped: other | 99.99%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
Step 5 - 对HeLa测序数据进行比对
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesCommand zcat --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/h_5S_rDNA/NR_023363.1.gtf --twopassMode Basic --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/h_5S_rDNA/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data1_val_1.fq.gz /Data/lizexing/projects/xindi/Data/new/Data/CleanData/T_HeLa_T_PAR_CLIP_Clean_Data2_val_2.fq.gz --outFileNamePrefix HeLa-5S-val --outReadsUnmapped Fastx
Sep 13 12:22:50 ..... started STAR run
Sep 13 12:22:50 ..... loading genome
Sep 13 12:22:51 ..... processing annotations GTF
Sep 13 12:22:51 ..... started 1st pass mapping
Sep 13 12:24:51 ..... finished 1st pass mapping
Sep 13 12:24:51 ..... started mapping
Sep 13 12:27:02 ..... finished mapping
Sep 13 12:27:02 ..... finished successfully
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HeLa-5S-valLog.final.out
Started job on | Sep 13 12:22:50
Started mapping on | Sep 13 12:24:51
Finished on | Sep 13 12:27:02
Mapping speed, Million of reads per hour | 672.84
Number of input reads | 24483719
Average input read length | 276
UNIQUE READS:
Uniquely mapped reads number | 148
Uniquely mapped reads % | 0.00%
Average mapped length | 190.82
Number of splices: Total | 0
Number of splices: Annotated (sjdb) | 0
Number of splices: GT/AG | 0
Number of splices: GC/AG | 0
Number of splices: AT/AC | 0
Number of splices: Non-canonical | 0
Mismatch rate per base, % | 0.13%
Deletion rate per base | 0.00%
Deletion average length | 0.00
Insertion rate per base | 0.00%
Insertion average length | 0.00
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 1
% of reads mapped to multiple loci | 0.00%
Number of reads mapped to too many loci | 0
% of reads mapped to too many loci | 0.00%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 1670
% of reads unmapped: too short | 0.01%
Number of reads unmapped: other | 24481900
% of reads unmapped: other | 99.99%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
25. 使用Samtools软件对三组数据进行排序
# 对293T数据进行排序
samtools sort -@ 32 -l 5 -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam
# 对HCT116数据进行排序
samtools sort -@ 32 -l 5 -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam
# 对HeLa数据进行排序
samtools sort -@ 32 -l 5 -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam
26. 使用featureCounts软件对三组数据read summarization
参考文章:featurecounts的使用说明
Usage: featureCounts [options] -a <annotation_file> -o <output_file> input_file1 [input_file2] ...
-T <int> Number of the threads. 1 by default.
-a <string> Name of an annotation file. GTF/GFF format by default. See
-F option for more format information. Inbuilt annotations
(SAF format) is available in 'annotation' directory of the
package. Gzipped file is also accepted.
-o <string> Name of output file including read counts. A separate file
including summary statistics of counting results is also
included in the output ('<string>.summary'). Both files
are in tab delimited format.
-p If specified, fragments (or templates) will be counted
instead of reads. This option is only applicable for
paired-end reads; single-end reads are always counted as
reads.
-B Only count read pairs that have both ends aligned.
-P Check validity of paired-end distance when counting read
pairs. Use -d and -D to set thresholds.
-d <int> Minimum fragment/template length, 50 by default.
-D <int> Maximum fragment/template length, 600 by default.
-C Do not count read pairs that have their two ends mapping
to different chromosomes or mapping to same chromosome
but on different strands.
--donotsort Do not sort reads in BAM/SAM input. Note that reads from
the same pair are required to be located next to each
other in the input.
-f Perform read counting at feature level (eg. counting
reads for exons rather than genes).
-t <string> Specify feature type(s) in a GTF annotation. If multiple
types are provided, they should be separated by ',' with
no space in between. 'exon' by default. Rows in the
annotation with a matched feature will be extracted and
used for read mapping.
-g <string> Specify attribute type in GTF annotation. 'gene_id' by
default. Meta-features used for read counting will be
extracted from annotation using the provided value.
Step 1 - 对293T测序数据进行计数:5.8S_RNA_bin=10bp, 3’ETS_RNA_bin=100=bp, others_RNA_bin=200bp
# Multimapping reads : not counted
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
v2.0.1
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| o 293T-valAligned.out.bam.sort ||
|| ||
|| Output file : 293T_2.read.count ||
|| Summary : 293T_2.read.count.summary ||
|| Annotation : U13369.1.2.gtf (GTF) ||
|| Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
|| ||
|| Threads : 32 ||
|| Level : feature level ||
|| Paired-end : yes ||
|| Multimapping reads : not counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : required ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file U13369.1.2.gtf ... ||
|| Features : 84 ||
|| Meta-features : 84 ||
|| Chromosomes/contigs : 1 ||
|| ||
|| Process BAM file 293T-valAligned.out.bam.sort... ||
|| Paired-end reads are included. ||
|| Total alignments : 14730856 ||
|| Successfully assigned alignments : 6321600 (42.9%) ||
|| Running time : 0.41 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
|| Summary of counting results can be found in file "/Data/lizexing/projects ||
|| /xindi/Data/new/Data/CleanData/293T_2.read.count.summary" ||
|| ||
\\============================================================================//
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat 293T.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort"
Geneid Chr Start End Strand Length /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort
rna-U13369.1:1..200 U13369.1 1 200 + 200 149
rna-U13369.1:201..400 U13369.1 201 400 + 200 33
rna-U13369.1:401..600 U13369.1 401 600 + 200 3415
rna-U13369.1:601..800 U13369.1 601 800 + 200 1689
rna-U13369.1:801..1000 U13369.1 801 1000 + 200 11097
rna-U13369.1:1001..1200 U13369.1 1001 1200 + 200 6832
rna-U13369.1:1201..1400 U13369.1 1201 1400 + 200 5593
rna-U13369.1:1401..1600 U13369.1 1401 1600 + 200 1309
rna-U13369.1:1601..1800 U13369.1 1601 1800 + 200 2973
rna-U13369.1:1801..2000 U13369.1 1801 2000 + 200 4920
rna-U13369.1:2001..2200 U13369.1 2001 2200 + 200 9107
rna-U13369.1:2201..2400 U13369.1 2201 2400 + 200 1132
rna-U13369.1:2401..2600 U13369.1 2401 2600 + 200 1263
rna-U13369.1:2601..2800 U13369.1 2601 2800 + 200 573
rna-U13369.1:2801..3000 U13369.1 2801 3000 + 200 292
rna-U13369.1:3001..3200 U13369.1 3001 3200 + 200 477
rna-U13369.1:3201..3400 U13369.1 3201 3400 + 200 162
rna-U13369.1:3401..3656 U13369.1 3401 3656 + 256 6298
rna-U13369.1:3657..3857 U13369.1 3657 3857 + 201 906561
rna-U13369.1:3858..4057 U13369.1 3858 4057 + 200 296332
rna-U13369.1:4058..4257 U13369.1 4058 4257 + 200 225795
rna-U13369.1:4258..4457 U13369.1 4258 4457 + 200 292400
rna-U13369.1:4458..4657 U13369.1 4458 4657 + 200 253866
rna-U13369.1:4658..4857 U13369.1 4658 4857 + 200 258050
rna-U13369.1:4858..5057 U13369.1 4858 5057 + 200 463669
rna-U13369.1:5058..5257 U13369.1 5058 5257 + 200 281272
rna-U13369.1:5258..5457 U13369.1 5258 5457 + 200 98735
rna-U13369.1:5458..5527 U13369.1 5458 5527 + 70 8547
rna-U13369.1:5528..5728 U13369.1 5528 5728 + 201 17658
rna-U13369.1:5729..5928 U13369.1 5729 5928 + 200 2861
rna-U13369.1:5929..6128 U13369.1 5929 6128 + 200 3083
rna-U13369.1:6129..6328 U13369.1 6129 6328 + 200 255
rna-U13369.1:6329..6528 U13369.1 6329 6528 + 200 1463
rna-U13369.1:6529..6622 U13369.1 6529 6622 + 94 402
rna-U13369.1:6623..6633 U13369.1 6623 6633 + 11 2
rna-U13369.1:6634..6643 U13369.1 6634 6643 + 10 10
rna-U13369.1:6644..6653 U13369.1 6644 6653 + 10 2
rna-U13369.1:6654..6663 U13369.1 6654 6663 + 10 0
rna-U13369.1:6664..6673 U13369.1 6664 6673 + 10 0
rna-U13369.1:6674..6683 U13369.1 6674 6683 + 10 0
rna-U13369.1:6684..6693 U13369.1 6684 6693 + 10 2
rna-U13369.1:6694..6703 U13369.1 6694 6703 + 10 1
rna-U13369.1:6704..6713 U13369.1 6704 6713 + 10 2
rna-U13369.1:6714..6723 U13369.1 6714 6723 + 10 0
rna-U13369.1:6724..6733 U13369.1 6724 6733 + 10 16
rna-U13369.1:6734..6743 U13369.1 6734 6743 + 10 35
rna-U13369.1:6744..6753 U13369.1 6744 6753 + 10 3
rna-U13369.1:6754..6763 U13369.1 6754 6763 + 10 6
rna-U13369.1:6764..6779 U13369.1 6764 6779 + 16 58
rna-U13369.1:6780..6980 U13369.1 6780 6980 + 201 2891
rna-U13369.1:6981..7180 U13369.1 6981 7180 + 200 4185
rna-U13369.1:7181..7380 U13369.1 7181 7380 + 200 626
rna-U13369.1:7381..7580 U13369.1 7381 7580 + 200 203
rna-U13369.1:7581..7780 U13369.1 7581 7780 + 200 1585
rna-U13369.1:7781..7934 U13369.1 7781 7934 + 154 8768
rna-U13369.1:7935..8134 U13369.1 7935 8134 + 200 151778
rna-U13369.1:8135..8334 U13369.1 8135 8334 + 200 222844
rna-U13369.1:8335..8534 U13369.1 8335 8534 + 200 98417
rna-U13369.1:8535..8734 U13369.1 8535 8734 + 200 34508
rna-U13369.1:8735..8934 U13369.1 8735 8934 + 200 46975
rna-U13369.1:8935..9134 U13369.1 8935 9134 + 200 16012
rna-U13369.1:9135..9334 U13369.1 9135 9334 + 200 11752
rna-U13369.1:9335..9534 U13369.1 9335 9534 + 200 212911
rna-U13369.1:9535..9734 U13369.1 9535 9734 + 200 471042
rna-U13369.1:9735..9934 U13369.1 9735 9934 + 200 206908
rna-U13369.1:9935..10134 U13369.1 9935 10134 + 200 18494
rna-U13369.1:10135..10334 U13369.1 10135 10334 + 200 255204
rna-U13369.1:10335..10534 U13369.1 10335 10534 + 200 135319
rna-U13369.1:10535..10734 U13369.1 10535 10734 + 200 226546
rna-U13369.1:10735..10934 U13369.1 10735 10934 + 200 46115
rna-U13369.1:10935..11134 U13369.1 10935 11134 + 200 200
rna-U13369.1:11135..11334 U13369.1 11135 11343 + 209 1722
rna-U13369.1:11335..11534 U13369.1 11335 11534 + 200 18274
rna-U13369.1:11535..11734 U13369.1 11535 11734 + 200 358728
rna-U13369.1:11735..11934 U13369.1 11735 11934 + 200 70452
rna-U13369.1:11935..12134 U13369.1 11935 12134 + 200 97312
rna-U13369.1:12135..12334 U13369.1 12135 12334 + 200 119347
rna-U13369.1:12335..12534 U13369.1 12335 12534 + 200 156098
rna-U13369.1:12535..12734 U13369.1 12535 12734 + 200 10497
rna-U13369.1:12735..12969 U13369.1 12735 12969 + 235 147459
rna-U13369.1:12970..13069 U13369.1 12970 13069 + 100 15
rna-U13369.1:13070..13169 U13369.1 13070 13169 + 100 2
rna-U13369.1:13170..13269 U13369.1 13170 13269 + 100 11
rna-U13369.1:13270..13314 U13369.1 13270 13314 + 45 0
# Multimapping reads : yes
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -M -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T_2.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
v2.0.1
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| o 293T-valAligned.out.bam.sort ||
|| ||
|| Output file : 293T_2.read.count ||
|| Summary : 293T_2.read.count.summary ||
|| Annotation : U13369.1.2.gtf (GTF) ||
|| Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
|| ||
|| Threads : 32 ||
|| Level : feature level ||
|| Paired-end : yes ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : required ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file U13369.1.2.gtf ... ||
|| Features : 84 ||
|| Meta-features : 84 ||
|| Chromosomes/contigs : 1 ||
|| ||
|| Process BAM file 293T-valAligned.out.bam.sort... ||
|| Paired-end reads are included. ||
|| Total alignments : 14730856 ||
|| Successfully assigned alignments : 8382987 (56.9%) ||
|| Running time : 0.44 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
|| Summary of counting results can be found in file "/Data/lizexing/projects ||
|| /xindi/Data/new/Data/CleanData/293T_2.read.count.summary" ||
|| ||
\\============================================================================//
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat 293T_2.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-M" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T_2.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort"
Geneid Chr Start End Strand Length /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort
rna-U13369.1:1..200 U13369.1 1 200 + 200 717
rna-U13369.1:201..400 U13369.1 201 400 + 200 160
rna-U13369.1:401..600 U13369.1 401 600 + 200 3526
rna-U13369.1:601..800 U13369.1 601 800 + 200 1858
rna-U13369.1:801..1000 U13369.1 801 1000 + 200 11511
rna-U13369.1:1001..1200 U13369.1 1001 1200 + 200 6904
rna-U13369.1:1201..1400 U13369.1 1201 1400 + 200 5684
rna-U13369.1:1401..1600 U13369.1 1401 1600 + 200 1465
rna-U13369.1:1601..1800 U13369.1 1601 1800 + 200 3777
rna-U13369.1:1801..2000 U13369.1 1801 2000 + 200 5876
rna-U13369.1:2001..2200 U13369.1 2001 2200 + 200 9861
rna-U13369.1:2201..2400 U13369.1 2201 2400 + 200 1194
rna-U13369.1:2401..2600 U13369.1 2401 2600 + 200 1321
rna-U13369.1:2601..2800 U13369.1 2601 2800 + 200 575
rna-U13369.1:2801..3000 U13369.1 2801 3000 + 200 391
rna-U13369.1:3001..3200 U13369.1 3001 3200 + 200 698
rna-U13369.1:3201..3400 U13369.1 3201 3400 + 200 174
rna-U13369.1:3401..3656 U13369.1 3401 3656 + 256 7791
rna-U13369.1:3657..3857 U13369.1 3657 3857 + 201 916288
rna-U13369.1:3858..4057 U13369.1 3858 4057 + 200 303417
rna-U13369.1:4058..4257 U13369.1 4058 4257 + 200 233973
rna-U13369.1:4258..4457 U13369.1 4258 4457 + 200 295043
rna-U13369.1:4458..4657 U13369.1 4458 4657 + 200 257954
rna-U13369.1:4658..4857 U13369.1 4658 4857 + 200 267275
rna-U13369.1:4858..5057 U13369.1 4858 5057 + 200 525310
rna-U13369.1:5058..5257 U13369.1 5058 5257 + 200 288945
rna-U13369.1:5258..5457 U13369.1 5258 5457 + 200 112451
rna-U13369.1:5458..5527 U13369.1 5458 5527 + 70 8717
rna-U13369.1:5528..5728 U13369.1 5528 5728 + 201 26265
rna-U13369.1:5729..5928 U13369.1 5729 5928 + 200 3537
rna-U13369.1:5929..6128 U13369.1 5929 6128 + 200 3503
rna-U13369.1:6129..6328 U13369.1 6129 6328 + 200 274
rna-U13369.1:6329..6528 U13369.1 6329 6528 + 200 1495
rna-U13369.1:6529..6622 U13369.1 6529 6622 + 94 415
rna-U13369.1:6623..6633 U13369.1 6623 6633 + 11 2
rna-U13369.1:6634..6643 U13369.1 6634 6643 + 10 10
rna-U13369.1:6644..6653 U13369.1 6644 6653 + 10 2
rna-U13369.1:6654..6663 U13369.1 6654 6663 + 10 0
rna-U13369.1:6664..6673 U13369.1 6664 6673 + 10 0
rna-U13369.1:6674..6683 U13369.1 6674 6683 + 10 0
rna-U13369.1:6684..6693 U13369.1 6684 6693 + 10 4
rna-U13369.1:6694..6703 U13369.1 6694 6703 + 10 7
rna-U13369.1:6704..6713 U13369.1 6704 6713 + 10 2
rna-U13369.1:6714..6723 U13369.1 6714 6723 + 10 0
rna-U13369.1:6724..6733 U13369.1 6724 6733 + 10 28
rna-U13369.1:6734..6743 U13369.1 6734 6743 + 10 37
rna-U13369.1:6744..6753 U13369.1 6744 6753 + 10 3
rna-U13369.1:6754..6763 U13369.1 6754 6763 + 10 8
rna-U13369.1:6764..6779 U13369.1 6764 6779 + 16 58
rna-U13369.1:6780..6980 U13369.1 6780 6980 + 201 4722
rna-U13369.1:6981..7180 U13369.1 6981 7180 + 200 4818
rna-U13369.1:7181..7380 U13369.1 7181 7380 + 200 745
rna-U13369.1:7381..7580 U13369.1 7381 7580 + 200 262
rna-U13369.1:7581..7780 U13369.1 7581 7780 + 200 1799
rna-U13369.1:7781..7934 U13369.1 7781 7934 + 154 9013
rna-U13369.1:7935..8134 U13369.1 7935 8134 + 200 1179817
rna-U13369.1:8135..8334 U13369.1 8135 8334 + 200 473198
rna-U13369.1:8335..8534 U13369.1 8335 8534 + 200 120391
rna-U13369.1:8535..8734 U13369.1 8535 8734 + 200 56538
rna-U13369.1:8735..8934 U13369.1 8735 8934 + 200 109817
rna-U13369.1:8935..9134 U13369.1 8935 9134 + 200 17849
rna-U13369.1:9135..9334 U13369.1 9135 9334 + 200 12844
rna-U13369.1:9335..9534 U13369.1 9335 9534 + 200 235242
rna-U13369.1:9535..9734 U13369.1 9535 9734 + 200 476249
rna-U13369.1:9735..9934 U13369.1 9735 9934 + 200 210133
rna-U13369.1:9935..10134 U13369.1 9935 10134 + 200 23893
rna-U13369.1:10135..10334 U13369.1 10135 10334 + 200 259262
rna-U13369.1:10335..10534 U13369.1 10335 10534 + 200 137294
rna-U13369.1:10535..10734 U13369.1 10535 10734 + 200 284736
rna-U13369.1:10735..10934 U13369.1 10735 10934 + 200 65392
rna-U13369.1:10935..11134 U13369.1 10935 11134 + 200 732
rna-U13369.1:11135..11334 U13369.1 11135 11343 + 209 2628
rna-U13369.1:11335..11534 U13369.1 11335 11534 + 200 39032
rna-U13369.1:11535..11734 U13369.1 11535 11734 + 200 366368
rna-U13369.1:11735..11934 U13369.1 11735 11934 + 200 95675
rna-U13369.1:11935..12134 U13369.1 11935 12134 + 200 128072
rna-U13369.1:12135..12334 U13369.1 12135 12334 + 200 128169
rna-U13369.1:12335..12534 U13369.1 12335 12534 + 200 187062
rna-U13369.1:12535..12734 U13369.1 12535 12734 + 200 250071
rna-U13369.1:12735..12969 U13369.1 12735 12969 + 235 192602
rna-U13369.1:12970..13069 U13369.1 12970 13069 + 100 34
rna-U13369.1:13070..13169 U13369.1 13070 13169 + 100 2
rna-U13369.1:13170..13269 U13369.1 13170 13269 + 100 20
rna-U13369.1:13270..13314 U13369.1 13270 13314 + 45 0
Step 2 - 对HCT116测序数据进行计数
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
v2.0.1
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| o HCT116-valAligned.out.bam.sort ||
|| ||
|| Output file : HCT116.read.count ||
|| Summary : HCT116.read.count.summary ||
|| Annotation : U13369.1.2.gtf (GTF) ||
|| Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
|| ||
|| Threads : 32 ||
|| Level : feature level ||
|| Paired-end : yes ||
|| Multimapping reads : not counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : required ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file U13369.1.2.gtf ... ||
|| Features : 84 ||
|| Meta-features : 84 ||
|| Chromosomes/contigs : 1 ||
|| ||
|| Process BAM file HCT116-valAligned.out.bam.sort... ||
|| Paired-end reads are included. ||
|| Total alignments : 17788363 ||
|| Successfully assigned alignments : 7826386 (44.0%) ||
|| Running time : 0.53 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
|| Summary of counting results can be found in file "/Data/lizexing/projects ||
|| /xindi/Data/new/Data/CleanData/HCT116.read.count.summary" ||
|| ||
\\============================================================================//
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HCT116.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort"
Geneid Chr Start End Strand Length /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort
rna-U13369.1:1..200 U13369.1 1 200 + 200 161
rna-U13369.1:201..400 U13369.1 201 400 + 200 35
rna-U13369.1:401..600 U13369.1 401 600 + 200 1965
rna-U13369.1:601..800 U13369.1 601 800 + 200 678
rna-U13369.1:801..1000 U13369.1 801 1000 + 200 5357
rna-U13369.1:1001..1200 U13369.1 1001 1200 + 200 3591
rna-U13369.1:1201..1400 U13369.1 1201 1400 + 200 3153
rna-U13369.1:1401..1600 U13369.1 1401 1600 + 200 748
rna-U13369.1:1601..1800 U13369.1 1601 1800 + 200 1431
rna-U13369.1:1801..2000 U13369.1 1801 2000 + 200 3266
rna-U13369.1:2001..2200 U13369.1 2001 2200 + 200 4859
rna-U13369.1:2201..2400 U13369.1 2201 2400 + 200 567
rna-U13369.1:2401..2600 U13369.1 2401 2600 + 200 664
rna-U13369.1:2601..2800 U13369.1 2601 2800 + 200 313
rna-U13369.1:2801..3000 U13369.1 2801 3000 + 200 203
rna-U13369.1:3001..3200 U13369.1 3001 3200 + 200 302
rna-U13369.1:3201..3400 U13369.1 3201 3400 + 200 57
rna-U13369.1:3401..3656 U13369.1 3401 3656 + 256 3622
rna-U13369.1:3657..3857 U13369.1 3657 3857 + 201 1334016
rna-U13369.1:3858..4057 U13369.1 3858 4057 + 200 360644
rna-U13369.1:4058..4257 U13369.1 4058 4257 + 200 275503
rna-U13369.1:4258..4457 U13369.1 4258 4457 + 200 298265
rna-U13369.1:4458..4657 U13369.1 4458 4657 + 200 362694
rna-U13369.1:4658..4857 U13369.1 4658 4857 + 200 300733
rna-U13369.1:4858..5057 U13369.1 4858 5057 + 200 647910
rna-U13369.1:5058..5257 U13369.1 5058 5257 + 200 322107
rna-U13369.1:5258..5457 U13369.1 5258 5457 + 200 111634
rna-U13369.1:5458..5527 U13369.1 5458 5527 + 70 4907
rna-U13369.1:5528..5728 U13369.1 5528 5728 + 201 12576
rna-U13369.1:5729..5928 U13369.1 5729 5928 + 200 1583
rna-U13369.1:5929..6128 U13369.1 5929 6128 + 200 2051
rna-U13369.1:6129..6328 U13369.1 6129 6328 + 200 153
rna-U13369.1:6329..6528 U13369.1 6329 6528 + 200 1035
rna-U13369.1:6529..6622 U13369.1 6529 6622 + 94 313
rna-U13369.1:6623..6633 U13369.1 6623 6633 + 11 7
rna-U13369.1:6634..6643 U13369.1 6634 6643 + 10 7
rna-U13369.1:6644..6653 U13369.1 6644 6653 + 10 0
rna-U13369.1:6654..6663 U13369.1 6654 6663 + 10 0
rna-U13369.1:6664..6673 U13369.1 6664 6673 + 10 0
rna-U13369.1:6674..6683 U13369.1 6674 6683 + 10 0
rna-U13369.1:6684..6693 U13369.1 6684 6693 + 10 1
rna-U13369.1:6694..6703 U13369.1 6694 6703 + 10 0
rna-U13369.1:6704..6713 U13369.1 6704 6713 + 10 0
rna-U13369.1:6714..6723 U13369.1 6714 6723 + 10 0
rna-U13369.1:6724..6733 U13369.1 6724 6733 + 10 11
rna-U13369.1:6734..6743 U13369.1 6734 6743 + 10 2
rna-U13369.1:6744..6753 U13369.1 6744 6753 + 10 0
rna-U13369.1:6754..6763 U13369.1 6754 6763 + 10 3
rna-U13369.1:6764..6779 U13369.1 6764 6779 + 16 62
rna-U13369.1:6780..6980 U13369.1 6780 6980 + 201 2860
rna-U13369.1:6981..7180 U13369.1 6981 7180 + 200 4370
rna-U13369.1:7181..7380 U13369.1 7181 7380 + 200 650
rna-U13369.1:7381..7580 U13369.1 7381 7580 + 200 124
rna-U13369.1:7581..7780 U13369.1 7581 7780 + 200 1822
rna-U13369.1:7781..7934 U13369.1 7781 7934 + 154 9807
rna-U13369.1:7935..8134 U13369.1 7935 8134 + 200 220091
rna-U13369.1:8135..8334 U13369.1 8135 8334 + 200 274925
rna-U13369.1:8335..8534 U13369.1 8335 8534 + 200 138838
rna-U13369.1:8535..8734 U13369.1 8535 8734 + 200 37984
rna-U13369.1:8735..8934 U13369.1 8735 8934 + 200 57781
rna-U13369.1:8935..9134 U13369.1 8935 9134 + 200 20546
rna-U13369.1:9135..9334 U13369.1 9135 9334 + 200 14483
rna-U13369.1:9335..9534 U13369.1 9335 9534 + 200 224465
rna-U13369.1:9535..9734 U13369.1 9535 9734 + 200 566130
rna-U13369.1:9735..9934 U13369.1 9735 9934 + 200 224644
rna-U13369.1:9935..10134 U13369.1 9935 10134 + 200 23861
rna-U13369.1:10135..10334 U13369.1 10135 10334 + 200 324401
rna-U13369.1:10335..10534 U13369.1 10335 10534 + 200 139550
rna-U13369.1:10535..10734 U13369.1 10535 10734 + 200 286406
rna-U13369.1:10735..10934 U13369.1 10735 10934 + 200 44183
rna-U13369.1:10935..11134 U13369.1 10935 11134 + 200 206
rna-U13369.1:11135..11334 U13369.1 11135 11343 + 209 2581
rna-U13369.1:11335..11534 U13369.1 11335 11534 + 200 21753
rna-U13369.1:11535..11734 U13369.1 11535 11734 + 200 397492
rna-U13369.1:11735..11934 U13369.1 11735 11934 + 200 76640
rna-U13369.1:11935..12134 U13369.1 11935 12134 + 200 101386
rna-U13369.1:12135..12334 U13369.1 12135 12334 + 200 131966
rna-U13369.1:12335..12534 U13369.1 12335 12534 + 200 169786
rna-U13369.1:12535..12734 U13369.1 12535 12734 + 200 85848
rna-U13369.1:12735..12969 U13369.1 12735 12969 + 235 153568
rna-U13369.1:12970..13069 U13369.1 12970 13069 + 100 35
rna-U13369.1:13070..13169 U13369.1 13070 13169 + 100 2
rna-U13369.1:13170..13269 U13369.1 13170 13269 + 100 13
rna-U13369.1:13270..13314 U13369.1 13270 13314 + 45 0
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -M -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116_2.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
v2.0.1
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| o HCT116-valAligned.out.bam.sort ||
|| ||
|| Output file : HCT116_2.read.count ||
|| Summary : HCT116_2.read.count.summary ||
|| Annotation : U13369.1.2.gtf (GTF) ||
|| Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
|| ||
|| Threads : 32 ||
|| Level : feature level ||
|| Paired-end : yes ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : required ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file U13369.1.2.gtf ... ||
|| Features : 84 ||
|| Meta-features : 84 ||
|| Chromosomes/contigs : 1 ||
|| ||
|| Process BAM file HCT116-valAligned.out.bam.sort... ||
|| Paired-end reads are included. ||
|| Total alignments : 17788363 ||
|| Successfully assigned alignments : 9851054 (55.4%) ||
|| Running time : 0.55 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
|| Summary of counting results can be found in file "/Data/lizexing/projects ||
|| /xindi/Data/new/Data/CleanData/HCT116_2.read.count.summary" ||
|| ||
\\============================================================================//
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HCT116_2.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-M" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116_2.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort"
Geneid Chr Start End Strand Length /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort
rna-U13369.1:1..200 U13369.1 1 200 + 200 481
rna-U13369.1:201..400 U13369.1 201 400 + 200 103
rna-U13369.1:401..600 U13369.1 401 600 + 200 2047
rna-U13369.1:601..800 U13369.1 601 800 + 200 775
rna-U13369.1:801..1000 U13369.1 801 1000 + 200 5514
rna-U13369.1:1001..1200 U13369.1 1001 1200 + 200 3626
rna-U13369.1:1201..1400 U13369.1 1201 1400 + 200 3202
rna-U13369.1:1401..1600 U13369.1 1401 1600 + 200 833
rna-U13369.1:1601..1800 U13369.1 1601 1800 + 200 2054
rna-U13369.1:1801..2000 U13369.1 1801 2000 + 200 3627
rna-U13369.1:2001..2200 U13369.1 2001 2200 + 200 5013
rna-U13369.1:2201..2400 U13369.1 2201 2400 + 200 642
rna-U13369.1:2401..2600 U13369.1 2401 2600 + 200 720
rna-U13369.1:2601..2800 U13369.1 2601 2800 + 200 331
rna-U13369.1:2801..3000 U13369.1 2801 3000 + 200 233
rna-U13369.1:3001..3200 U13369.1 3001 3200 + 200 349
rna-U13369.1:3201..3400 U13369.1 3201 3400 + 200 61
rna-U13369.1:3401..3656 U13369.1 3401 3656 + 256 5055
rna-U13369.1:3657..3857 U13369.1 3657 3857 + 201 1351748
rna-U13369.1:3858..4057 U13369.1 3858 4057 + 200 371164
rna-U13369.1:4058..4257 U13369.1 4058 4257 + 200 279012
rna-U13369.1:4258..4457 U13369.1 4258 4457 + 200 317474
rna-U13369.1:4458..4657 U13369.1 4458 4657 + 200 367276
rna-U13369.1:4658..4857 U13369.1 4658 4857 + 200 312708
rna-U13369.1:4858..5057 U13369.1 4858 5057 + 200 687565
rna-U13369.1:5058..5257 U13369.1 5058 5257 + 200 343641
rna-U13369.1:5258..5457 U13369.1 5258 5457 + 200 133654
rna-U13369.1:5458..5527 U13369.1 5458 5527 + 70 5054
rna-U13369.1:5528..5728 U13369.1 5528 5728 + 201 18264
rna-U13369.1:5729..5928 U13369.1 5729 5928 + 200 1893
rna-U13369.1:5929..6128 U13369.1 5929 6128 + 200 2161
rna-U13369.1:6129..6328 U13369.1 6129 6328 + 200 164
rna-U13369.1:6329..6528 U13369.1 6329 6528 + 200 1064
rna-U13369.1:6529..6622 U13369.1 6529 6622 + 94 336
rna-U13369.1:6623..6633 U13369.1 6623 6633 + 11 7
rna-U13369.1:6634..6643 U13369.1 6634 6643 + 10 7
rna-U13369.1:6644..6653 U13369.1 6644 6653 + 10 0
rna-U13369.1:6654..6663 U13369.1 6654 6663 + 10 0
rna-U13369.1:6664..6673 U13369.1 6664 6673 + 10 0
rna-U13369.1:6674..6683 U13369.1 6674 6683 + 10 0
rna-U13369.1:6684..6693 U13369.1 6684 6693 + 10 1
rna-U13369.1:6694..6703 U13369.1 6694 6703 + 10 0
rna-U13369.1:6704..6713 U13369.1 6704 6713 + 10 0
rna-U13369.1:6714..6723 U13369.1 6714 6723 + 10 0
rna-U13369.1:6724..6733 U13369.1 6724 6733 + 10 35
rna-U13369.1:6734..6743 U13369.1 6734 6743 + 10 10
rna-U13369.1:6744..6753 U13369.1 6744 6753 + 10 0
rna-U13369.1:6754..6763 U13369.1 6754 6763 + 10 3
rna-U13369.1:6764..6779 U13369.1 6764 6779 + 16 62
rna-U13369.1:6780..6980 U13369.1 6780 6980 + 201 4299
rna-U13369.1:6981..7180 U13369.1 6981 7180 + 200 4973
rna-U13369.1:7181..7380 U13369.1 7181 7380 + 200 754
rna-U13369.1:7381..7580 U13369.1 7381 7580 + 200 172
rna-U13369.1:7581..7780 U13369.1 7581 7780 + 200 2117
rna-U13369.1:7781..7934 U13369.1 7781 7934 + 154 10309
rna-U13369.1:7935..8134 U13369.1 7935 8134 + 200 1384909
rna-U13369.1:8135..8334 U13369.1 8135 8334 + 200 511358
rna-U13369.1:8335..8534 U13369.1 8335 8534 + 200 186408
rna-U13369.1:8535..8734 U13369.1 8535 8734 + 200 58757
rna-U13369.1:8735..8934 U13369.1 8735 8934 + 200 129140
rna-U13369.1:8935..9134 U13369.1 8935 9134 + 200 25706
rna-U13369.1:9135..9334 U13369.1 9135 9334 + 200 16391
rna-U13369.1:9335..9534 U13369.1 9335 9534 + 200 233665
rna-U13369.1:9535..9734 U13369.1 9535 9734 + 200 583439
rna-U13369.1:9735..9934 U13369.1 9735 9934 + 200 233122
rna-U13369.1:9935..10134 U13369.1 9935 10134 + 200 43808
rna-U13369.1:10135..10334 U13369.1 10135 10334 + 200 375750
rna-U13369.1:10335..10534 U13369.1 10335 10534 + 200 141254
rna-U13369.1:10535..10734 U13369.1 10535 10734 + 200 293024
rna-U13369.1:10735..10934 U13369.1 10735 10934 + 200 60600
rna-U13369.1:10935..11134 U13369.1 10935 11134 + 200 689
rna-U13369.1:11135..11334 U13369.1 11135 11343 + 209 3478
rna-U13369.1:11335..11534 U13369.1 11335 11534 + 200 53933
rna-U13369.1:11535..11734 U13369.1 11535 11734 + 200 403297
rna-U13369.1:11735..11934 U13369.1 11735 11934 + 200 102217
rna-U13369.1:11935..12134 U13369.1 11935 12134 + 200 126812
rna-U13369.1:12135..12334 U13369.1 12135 12334 + 200 144705
rna-U13369.1:12335..12534 U13369.1 12335 12534 + 200 178697
rna-U13369.1:12535..12734 U13369.1 12535 12734 + 200 121242
rna-U13369.1:12735..12969 U13369.1 12735 12969 + 235 187945
rna-U13369.1:12970..13069 U13369.1 12970 13069 + 100 98
rna-U13369.1:13070..13169 U13369.1 13070 13169 + 100 2
rna-U13369.1:13170..13269 U13369.1 13170 13269 + 100 13
rna-U13369.1:13270..13314 U13369.1 13270 13314 + 45 2
Step 3 - 对HeLa测序数据进行计数
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
v2.0.1
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| o HeLa-valAligned.out.bam.sort ||
|| ||
|| Output file : HeLa.read.count ||
|| Summary : HeLa.read.count.summary ||
|| Annotation : U13369.1.2.gtf (GTF) ||
|| Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
|| ||
|| Threads : 32 ||
|| Level : feature level ||
|| Paired-end : yes ||
|| Multimapping reads : not counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : required ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file U13369.1.2.gtf ... ||
|| Features : 84 ||
|| Meta-features : 84 ||
|| Chromosomes/contigs : 1 ||
|| ||
|| Process BAM file HeLa-valAligned.out.bam.sort... ||
|| Paired-end reads are included. ||
|| Total alignments : 19960509 ||
|| Successfully assigned alignments : 8942151 (44.8%) ||
|| Running time : 0.70 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
|| Summary of counting results can be found in file "/Data/lizexing/projects ||
|| /xindi/Data/new/Data/CleanData/HeLa.read.count.summary" ||
|| ||
\\============================================================================//
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HeLa.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort"
Geneid Chr Start End Strand Length /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort
rna-U13369.1:1..200 U13369.1 1 200 + 200 425
rna-U13369.1:201..400 U13369.1 201 400 + 200 199
rna-U13369.1:401..600 U13369.1 401 600 + 200 1364
rna-U13369.1:601..800 U13369.1 601 800 + 200 820
rna-U13369.1:801..1000 U13369.1 801 1000 + 200 3497
rna-U13369.1:1001..1200 U13369.1 1001 1200 + 200 3721
rna-U13369.1:1201..1400 U13369.1 1201 1400 + 200 3341
rna-U13369.1:1401..1600 U13369.1 1401 1600 + 200 606
rna-U13369.1:1601..1800 U13369.1 1601 1800 + 200 1830
rna-U13369.1:1801..2000 U13369.1 1801 2000 + 200 2561
rna-U13369.1:2001..2200 U13369.1 2001 2200 + 200 4613
rna-U13369.1:2201..2400 U13369.1 2201 2400 + 200 639
rna-U13369.1:2401..2600 U13369.1 2401 2600 + 200 566
rna-U13369.1:2601..2800 U13369.1 2601 2800 + 200 357
rna-U13369.1:2801..3000 U13369.1 2801 3000 + 200 186
rna-U13369.1:3001..3200 U13369.1 3001 3200 + 200 335
rna-U13369.1:3201..3400 U13369.1 3201 3400 + 200 55
rna-U13369.1:3401..3656 U13369.1 3401 3656 + 256 3545
rna-U13369.1:3657..3857 U13369.1 3657 3857 + 201 1555726
rna-U13369.1:3858..4057 U13369.1 3858 4057 + 200 447916
rna-U13369.1:4058..4257 U13369.1 4058 4257 + 200 325556
rna-U13369.1:4258..4457 U13369.1 4258 4457 + 200 397070
rna-U13369.1:4458..4657 U13369.1 4458 4657 + 200 377027
rna-U13369.1:4658..4857 U13369.1 4658 4857 + 200 389659
rna-U13369.1:4858..5057 U13369.1 4858 5057 + 200 681022
rna-U13369.1:5058..5257 U13369.1 5058 5257 + 200 429943
rna-U13369.1:5258..5457 U13369.1 5258 5457 + 200 140888
rna-U13369.1:5458..5527 U13369.1 5458 5527 + 70 3194
rna-U13369.1:5528..5728 U13369.1 5528 5728 + 201 9285
rna-U13369.1:5729..5928 U13369.1 5729 5928 + 200 2524
rna-U13369.1:5929..6128 U13369.1 5929 6128 + 200 2160
rna-U13369.1:6129..6328 U13369.1 6129 6328 + 200 234
rna-U13369.1:6329..6528 U13369.1 6329 6528 + 200 1125
rna-U13369.1:6529..6622 U13369.1 6529 6622 + 94 394
rna-U13369.1:6623..6633 U13369.1 6623 6633 + 11 4
rna-U13369.1:6634..6643 U13369.1 6634 6643 + 10 11
rna-U13369.1:6644..6653 U13369.1 6644 6653 + 10 0
rna-U13369.1:6654..6663 U13369.1 6654 6663 + 10 1
rna-U13369.1:6664..6673 U13369.1 6664 6673 + 10 0
rna-U13369.1:6674..6683 U13369.1 6674 6683 + 10 0
rna-U13369.1:6684..6693 U13369.1 6684 6693 + 10 2
rna-U13369.1:6694..6703 U13369.1 6694 6703 + 10 1
rna-U13369.1:6704..6713 U13369.1 6704 6713 + 10 1
rna-U13369.1:6714..6723 U13369.1 6714 6723 + 10 0
rna-U13369.1:6724..6733 U13369.1 6724 6733 + 10 20
rna-U13369.1:6734..6743 U13369.1 6734 6743 + 10 19
rna-U13369.1:6744..6753 U13369.1 6744 6753 + 10 0
rna-U13369.1:6754..6763 U13369.1 6754 6763 + 10 8
rna-U13369.1:6764..6779 U13369.1 6764 6779 + 16 143
rna-U13369.1:6780..6980 U13369.1 6780 6980 + 201 5770
rna-U13369.1:6981..7180 U13369.1 6981 7180 + 200 6087
rna-U13369.1:7181..7380 U13369.1 7181 7380 + 200 903
rna-U13369.1:7381..7580 U13369.1 7381 7580 + 200 185
rna-U13369.1:7581..7780 U13369.1 7581 7780 + 200 2597
rna-U13369.1:7781..7934 U13369.1 7781 7934 + 154 9364
rna-U13369.1:7935..8134 U13369.1 7935 8134 + 200 71274
rna-U13369.1:8135..8334 U13369.1 8135 8334 + 200 259450
rna-U13369.1:8335..8534 U13369.1 8335 8534 + 200 120077
rna-U13369.1:8535..8734 U13369.1 8535 8734 + 200 30903
rna-U13369.1:8735..8934 U13369.1 8735 8934 + 200 70267
rna-U13369.1:8935..9134 U13369.1 8935 9134 + 200 24900
rna-U13369.1:9135..9334 U13369.1 9135 9334 + 200 17232
rna-U13369.1:9335..9534 U13369.1 9335 9534 + 200 290496
rna-U13369.1:9535..9734 U13369.1 9535 9734 + 200 601137
rna-U13369.1:9735..9934 U13369.1 9735 9934 + 200 278377
rna-U13369.1:9935..10134 U13369.1 9935 10134 + 200 26631
rna-U13369.1:10135..10334 U13369.1 10135 10334 + 200 279741
rna-U13369.1:10335..10534 U13369.1 10335 10534 + 200 197391
rna-U13369.1:10535..10734 U13369.1 10535 10734 + 200 346754
rna-U13369.1:10735..10934 U13369.1 10735 10934 + 200 54592
rna-U13369.1:10935..11134 U13369.1 10935 11134 + 200 255
rna-U13369.1:11135..11334 U13369.1 11135 11343 + 209 2547
rna-U13369.1:11335..11534 U13369.1 11335 11534 + 200 19006
rna-U13369.1:11535..11734 U13369.1 11535 11734 + 200 468765
rna-U13369.1:11735..11934 U13369.1 11735 11934 + 200 104017
rna-U13369.1:11935..12134 U13369.1 11935 12134 + 200 127648
rna-U13369.1:12135..12334 U13369.1 12135 12334 + 200 170154
rna-U13369.1:12335..12534 U13369.1 12335 12534 + 200 249635
rna-U13369.1:12535..12734 U13369.1 12535 12734 + 200 132354
rna-U13369.1:12735..12969 U13369.1 12735 12969 + 235 180979
rna-U13369.1:12970..13069 U13369.1 12970 13069 + 100 51
rna-U13369.1:13070..13169 U13369.1 13070 13169 + 100 3
rna-U13369.1:13170..13269 U13369.1 13170 13269 + 100 16
rna-U13369.1:13270..13314 U13369.1 13270 13314 + 45 0
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -M -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa_2.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
v2.0.1
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| o HeLa-valAligned.out.bam.sort ||
|| ||
|| Output file : HeLa_2.read.count ||
|| Summary : HeLa_2.read.count.summary ||
|| Annotation : U13369.1.2.gtf (GTF) ||
|| Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
|| ||
|| Threads : 32 ||
|| Level : feature level ||
|| Paired-end : yes ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : required ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file U13369.1.2.gtf ... ||
|| Features : 84 ||
|| Meta-features : 84 ||
|| Chromosomes/contigs : 1 ||
|| ||
|| Process BAM file HeLa-valAligned.out.bam.sort... ||
|| Paired-end reads are included. ||
|| Total alignments : 19960509 ||
|| Successfully assigned alignments : 11652948 (58.4%) ||
|| Running time : 0.74 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
|| Summary of counting results can be found in file "/Data/lizexing/projects ||
|| /xindi/Data/new/Data/CleanData/HeLa_2.read.count.summary" ||
|| ||
\\============================================================================//
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HeLa_2.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-M" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.2.gtf" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa_2.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort"
Geneid Chr Start End Strand Length /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort
rna-U13369.1:1..200 U13369.1 1 200 + 200 2419
rna-U13369.1:201..400 U13369.1 201 400 + 200 532
rna-U13369.1:401..600 U13369.1 401 600 + 200 1485
rna-U13369.1:601..800 U13369.1 601 800 + 200 881
rna-U13369.1:801..1000 U13369.1 801 1000 + 200 3626
rna-U13369.1:1001..1200 U13369.1 1001 1200 + 200 3773
rna-U13369.1:1201..1400 U13369.1 1201 1400 + 200 3406
rna-U13369.1:1401..1600 U13369.1 1401 1600 + 200 878
rna-U13369.1:1601..1800 U13369.1 1601 1800 + 200 3043
rna-U13369.1:1801..2000 U13369.1 1801 2000 + 200 2980
rna-U13369.1:2001..2200 U13369.1 2001 2200 + 200 5493
rna-U13369.1:2201..2400 U13369.1 2201 2400 + 200 661
rna-U13369.1:2401..2600 U13369.1 2401 2600 + 200 604
rna-U13369.1:2601..2800 U13369.1 2601 2800 + 200 363
rna-U13369.1:2801..3000 U13369.1 2801 3000 + 200 252
rna-U13369.1:3001..3200 U13369.1 3001 3200 + 200 431
rna-U13369.1:3201..3400 U13369.1 3201 3400 + 200 64
rna-U13369.1:3401..3656 U13369.1 3401 3656 + 256 4233
rna-U13369.1:3657..3857 U13369.1 3657 3857 + 201 1578385
rna-U13369.1:3858..4057 U13369.1 3858 4057 + 200 477469
rna-U13369.1:4058..4257 U13369.1 4058 4257 + 200 332416
rna-U13369.1:4258..4457 U13369.1 4258 4457 + 200 405998
rna-U13369.1:4458..4657 U13369.1 4458 4657 + 200 385290
rna-U13369.1:4658..4857 U13369.1 4658 4857 + 200 398064
rna-U13369.1:4858..5057 U13369.1 4858 5057 + 200 761293
rna-U13369.1:5058..5257 U13369.1 5058 5257 + 200 442319
rna-U13369.1:5258..5457 U13369.1 5258 5457 + 200 166490
rna-U13369.1:5458..5527 U13369.1 5458 5527 + 70 3361
rna-U13369.1:5528..5728 U13369.1 5528 5728 + 201 16190
rna-U13369.1:5729..5928 U13369.1 5729 5928 + 200 2965
rna-U13369.1:5929..6128 U13369.1 5929 6128 + 200 2208
rna-U13369.1:6129..6328 U13369.1 6129 6328 + 200 239
rna-U13369.1:6329..6528 U13369.1 6329 6528 + 200 1151
rna-U13369.1:6529..6622 U13369.1 6529 6622 + 94 399
rna-U13369.1:6623..6633 U13369.1 6623 6633 + 11 4
rna-U13369.1:6634..6643 U13369.1 6634 6643 + 10 11
rna-U13369.1:6644..6653 U13369.1 6644 6653 + 10 0
rna-U13369.1:6654..6663 U13369.1 6654 6663 + 10 1
rna-U13369.1:6664..6673 U13369.1 6664 6673 + 10 0
rna-U13369.1:6674..6683 U13369.1 6674 6683 + 10 0
rna-U13369.1:6684..6693 U13369.1 6684 6693 + 10 8
rna-U13369.1:6694..6703 U13369.1 6694 6703 + 10 5
rna-U13369.1:6704..6713 U13369.1 6704 6713 + 10 3
rna-U13369.1:6714..6723 U13369.1 6714 6723 + 10 0
rna-U13369.1:6724..6733 U13369.1 6724 6733 + 10 32
rna-U13369.1:6734..6743 U13369.1 6734 6743 + 10 22
rna-U13369.1:6744..6753 U13369.1 6744 6753 + 10 0
rna-U13369.1:6754..6763 U13369.1 6754 6763 + 10 16
rna-U13369.1:6764..6779 U13369.1 6764 6779 + 16 145
rna-U13369.1:6780..6980 U13369.1 6780 6980 + 201 7721
rna-U13369.1:6981..7180 U13369.1 6981 7180 + 200 6618
rna-U13369.1:7181..7380 U13369.1 7181 7380 + 200 1050
rna-U13369.1:7381..7580 U13369.1 7381 7580 + 200 245
rna-U13369.1:7581..7780 U13369.1 7581 7780 + 200 2829
rna-U13369.1:7781..7934 U13369.1 7781 7934 + 154 9438
rna-U13369.1:7935..8134 U13369.1 7935 8134 + 200 1647767
rna-U13369.1:8135..8334 U13369.1 8135 8334 + 200 591411
rna-U13369.1:8335..8534 U13369.1 8335 8534 + 200 142470
rna-U13369.1:8535..8734 U13369.1 8535 8734 + 200 62351
rna-U13369.1:8735..8934 U13369.1 8735 8934 + 200 130670
rna-U13369.1:8935..9134 U13369.1 8935 9134 + 200 25930
rna-U13369.1:9135..9334 U13369.1 9135 9334 + 200 19534
rna-U13369.1:9335..9534 U13369.1 9335 9534 + 200 301434
rna-U13369.1:9535..9734 U13369.1 9535 9734 + 200 637212
rna-U13369.1:9735..9934 U13369.1 9735 9934 + 200 302240
rna-U13369.1:9935..10134 U13369.1 9935 10134 + 200 34581
rna-U13369.1:10135..10334 U13369.1 10135 10334 + 200 317622
rna-U13369.1:10335..10534 U13369.1 10335 10534 + 200 200022
rna-U13369.1:10535..10734 U13369.1 10535 10734 + 200 406134
rna-U13369.1:10735..10934 U13369.1 10735 10934 + 200 65522
rna-U13369.1:10935..11134 U13369.1 10935 11134 + 200 1342
rna-U13369.1:11135..11334 U13369.1 11135 11343 + 209 4619
rna-U13369.1:11335..11534 U13369.1 11335 11534 + 200 45959
rna-U13369.1:11535..11734 U13369.1 11535 11734 + 200 498389
rna-U13369.1:11735..11934 U13369.1 11735 11934 + 200 147233
rna-U13369.1:11935..12134 U13369.1 11935 12134 + 200 160008
rna-U13369.1:12135..12334 U13369.1 12135 12334 + 200 184780
rna-U13369.1:12335..12534 U13369.1 12335 12534 + 200 266084
rna-U13369.1:12535..12734 U13369.1 12535 12734 + 200 208201
rna-U13369.1:12735..12969 U13369.1 12735 12969 + 235 213777
rna-U13369.1:12970..13069 U13369.1 12970 13069 + 100 104
rna-U13369.1:13070..13169 U13369.1 13070 13169 + 100 5
rna-U13369.1:13170..13269 U13369.1 13170 13269 + 100 38
rna-U13369.1:13270..13314 U13369.1 13270 13314 + 45 0
Step 4 - 对293T测序数据进行计数:5.8S_RNA_bin=10bp, 3’ETS_RNA_bin=100=bp, others_RNA_bin=100bp
# Multimapping reads : yes
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -M -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.3.gtf -O -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T_4.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
v2.0.1
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| o 293T-valAligned.out.bam.sort ||
|| ||
|| Output file : 293T_4.read.count ||
|| Summary : 293T_4.read.count.summary ||
|| Annotation : U13369.1.3.gtf (GTF) ||
|| Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
|| ||
|| Threads : 32 ||
|| Level : feature level ||
|| Paired-end : yes ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : required ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file U13369.1.3.gtf ... ||
|| Features : 147 ||
|| Meta-features : 147 ||
|| Chromosomes/contigs : 1 ||
|| ||
|| Process BAM file 293T-valAligned.out.bam.sort... ||
|| Paired-end reads are included. ||
|| Total alignments : 14730856 ||
|| Successfully assigned alignments : 14729340 (100.0%) ||
|| Running time : 0.41 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
|| Summary of counting results can be found in file "/Data/lizexing/projects ||
|| /xindi/Data/new/Data/CleanData/293T_4.read.count.summary" ||
|| ||
\\============================================================================//
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat 293T_4.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-M" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.3.gtf" "-O" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T_4.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort"
Geneid Chr Start End Strand Length /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valAligned.out.bam.sort
rna-U13369.1:1..100 U13369.1 1 100 + 100 1187
rna-U13369.1:101..200 U13369.1 101 200 + 100 1503
rna-U13369.1:401..500 U13369.1 401 500 + 100 5363
rna-U13369.1:501..600 U13369.1 501 600 + 100 6986
rna-U13369.1:601..700 U13369.1 601 700 + 100 9949
rna-U13369.1:701..800 U13369.1 701 800 + 100 98122
rna-U13369.1:801..900 U13369.1 801 900 + 100 16717
rna-U13369.1:901..1000 U13369.1 901 1000 + 100 20405
rna-U13369.1:1001..1100 U13369.1 1001 1100 + 100 23766
rna-U13369.1:1101..1200 U13369.1 1101 1200 + 100 17966
rna-U13369.1:1201..1300 U13369.1 1201 1300 + 100 19775
rna-U13369.1:1301..1400 U13369.1 1301 1400 + 100 16180
rna-U13369.1:1401..1500 U13369.1 1401 1500 + 100 14001
rna-U13369.1:1501..1600 U13369.1 1501 1600 + 100 6822
rna-U13369.1:1601..1700 U13369.1 1601 1700 + 100 8094
rna-U13369.1:1701..1800 U13369.1 1701 1800 + 100 16134
rna-U13369.1:1801..1900 U13369.1 1801 1900 + 100 94548
rna-U13369.1:1901..2000 U13369.1 1901 2000 + 100 22717
rna-U13369.1:2001..2100 U13369.1 2001 2100 + 100 18312
rna-U13369.1:2101..2200 U13369.1 2101 2200 + 100 20072
rna-U13369.1:2201..2300 U13369.1 2201 2300 + 100 17985
rna-U13369.1:2301..2400 U13369.1 2301 2400 + 100 4152
rna-U13369.1:2401..2500 U13369.1 2401 2500 + 100 7319
rna-U13369.1:2501..2600 U13369.1 2501 2600 + 100 32616
rna-U13369.1:2601..2700 U13369.1 2601 2700 + 100 7969
rna-U13369.1:2701..2800 U13369.1 2701 2800 + 100 4480
rna-U13369.1:2801..2900 U13369.1 2801 2900 + 100 3119
rna-U13369.1:2901..3000 U13369.1 2901 3000 + 100 2616
rna-U13369.1:3001..3100 U13369.1 3001 3100 + 100 101809
rna-U13369.1:3101..3200 U13369.1 3101 3200 + 100 1170
rna-U13369.1:3201..3300 U13369.1 3201 3300 + 100 15880
rna-U13369.1:3301..3400 U13369.1 3301 3400 + 100 13653
rna-U13369.1:3401..3500 U13369.1 3401 3500 + 100 7847
rna-U13369.1:3501..3600 U13369.1 3501 3600 + 100 16769
rna-U13369.1:3601..3656 U13369.1 3601 3656 + 56 15274
rna-U13369.1:3657..3756 U13369.1 3657 3756 + 100 983156
rna-U13369.1:3757..3856 U13369.1 3757 3856 + 100 1165318
rna-U13369.1:3857..3956 U13369.1 3857 3956 + 100 632269
rna-U13369.1:3957..4056 U13369.1 3957 4056 + 100 748895
rna-U13369.1:4057..4156 U13369.1 4057 4156 + 100 744331
rna-U13369.1:4157..4256 U13369.1 4157 4256 + 100 557704
rna-U13369.1:4257..4356 U13369.1 4257 4356 + 100 687009
rna-U13369.1:4357..4456 U13369.1 4357 4456 + 100 667541
rna-U13369.1:4457..4556 U13369.1 4457 4556 + 100 923120
rna-U13369.1:4557..4656 U13369.1 4557 4656 + 100 899692
rna-U13369.1:4657..4756 U13369.1 4657 4756 + 100 702785
rna-U13369.1:4757..4856 U13369.1 4757 4856 + 100 481957
rna-U13369.1:4857..4956 U13369.1 4857 4956 + 100 1077524
rna-U13369.1:4957..5056 U13369.1 4957 5056 + 100 1065799
rna-U13369.1:5057..5156 U13369.1 5057 5156 + 100 1171111
rna-U13369.1:5157..5256 U13369.1 5157 5256 + 100 775546
rna-U13369.1:5257..5356 U13369.1 5257 5356 + 100 695770
rna-U13369.1:5357..5456 U13369.1 5357 5456 + 100 385095
rna-U13369.1:5457..5527 U13369.1 5457 5527 + 71 203279
rna-U13369.1:5528..5627 U13369.1 5528 5627 + 100 84858
rna-U13369.1:5628..5727 U13369.1 5628 5727 + 100 29179
rna-U13369.1:5728..5827 U13369.1 5728 5827 + 100 23726
rna-U13369.1:5828..5927 U13369.1 5828 5927 + 100 11274
rna-U13369.1:5928..6027 U13369.1 5928 6027 + 100 11847
rna-U13369.1:6028..6127 U13369.1 6028 6127 + 100 17874
rna-U13369.1:6128..6227 U13369.1 6128 6227 + 100 7161
rna-U13369.1:6228..6327 U13369.1 6228 6327 + 100 2865
rna-U13369.1:6328..6427 U13369.1 6328 6427 + 100 7905
rna-U13369.1:6428..6527 U13369.1 6428 6527 + 100 7148
rna-U13369.1:6528..6623 U13369.1 6528 6623 + 96 6709
rna-U13369.1:6623..6633 U13369.1 6623 6633 + 11 6315
rna-U13369.1:6634..6643 U13369.1 6634 6643 + 10 18897
rna-U13369.1:6644..6653 U13369.1 6644 6653 + 10 19357
rna-U13369.1:6654..6663 U13369.1 6654 6663 + 10 21411
rna-U13369.1:6664..6673 U13369.1 6664 6673 + 10 23597
rna-U13369.1:6674..6683 U13369.1 6674 6683 + 10 24246
rna-U13369.1:6684..6693 U13369.1 6684 6693 + 10 25318
rna-U13369.1:6694..6703 U13369.1 6694 6703 + 10 25594
rna-U13369.1:6704..6713 U13369.1 6704 6713 + 10 26630
rna-U13369.1:6714..6723 U13369.1 6714 6723 + 10 27362
rna-U13369.1:6724..6733 U13369.1 6724 6733 + 10 27816
rna-U13369.1:6734..6743 U13369.1 6734 6743 + 10 27702
rna-U13369.1:6744..6753 U13369.1 6744 6753 + 10 27890
rna-U13369.1:6754..6763 U13369.1 6754 6763 + 10 28045
rna-U13369.1:6764..6779 U13369.1 6764 6779 + 16 28987
rna-U13369.1:6780..6879 U13369.1 6780 6879 + 100 28190
rna-U13369.1:6880..6979 U13369.1 6880 6979 + 100 12940
rna-U13369.1:6980..7079 U13369.1 6980 7079 + 100 9960
rna-U13369.1:7080..7179 U13369.1 7080 7179 + 100 7999
rna-U13369.1:7180..7279 U13369.1 7180 7279 + 100 5306
rna-U13369.1:7280..7379 U13369.1 7280 7379 + 100 1722
rna-U13369.1:7380..7479 U13369.1 7380 7479 + 100 5870
rna-U13369.1:7480..7579 U13369.1 7480 7579 + 100 2894
rna-U13369.1:7580..7679 U13369.1 7580 7679 + 100 4368
rna-U13369.1:7680..7779 U13369.1 7680 7779 + 100 11435
rna-U13369.1:7780..7879 U13369.1 7780 7879 + 100 38017
rna-U13369.1:7880..7934 U13369.1 7880 7934 + 55 46852
rna-U13369.1:7935..8034 U13369.1 7935 8034 + 100 1345085
rna-U13369.1:8035..8134 U13369.1 8035 8134 + 100 1611691
rna-U13369.1:8135..8234 U13369.1 8135 8234 + 100 1232707
rna-U13369.1:8235..8334 U13369.1 8235 8334 + 100 1302705
rna-U13369.1:8335..8434 U13369.1 8335 8434 + 100 840667
rna-U13369.1:8435..8534 U13369.1 8435 8534 + 100 525412
rna-U13369.1:8535..8634 U13369.1 8535 8634 + 100 84223
rna-U13369.1:8635..8734 U13369.1 8635 8734 + 100 250692
rna-U13369.1:8735..8834 U13369.1 8735 8834 + 100 257462
rna-U13369.1:8835..8934 U13369.1 8835 8934 + 100 280610
rna-U13369.1:8935..9034 U13369.1 8935 9034 + 100 121052
rna-U13369.1:9035..9134 U13369.1 9035 9134 + 100 58300
rna-U13369.1:9135..9234 U13369.1 9135 9234 + 100 66976
rna-U13369.1:9235..9334 U13369.1 9235 9334 + 100 358926
rna-U13369.1:9335..9434 U13369.1 9335 9434 + 100 685549
rna-U13369.1:9435..9534 U13369.1 9435 9534 + 100 999236
rna-U13369.1:9535..9634 U13369.1 9535 9634 + 100 1314841
rna-U13369.1:9635..9734 U13369.1 9635 9734 + 100 1507189
rna-U13369.1:9735..9834 U13369.1 9735 9834 + 100 1172520
rna-U13369.1:9835..9934 U13369.1 9835 9934 + 100 1040687
rna-U13369.1:9935..10034 U13369.1 9935 10034 + 100 355227
rna-U13369.1:10035..10134 U13369.1 10035 10134 + 100 108283
rna-U13369.1:10135..10234 U13369.1 10135 10234 + 100 356888
rna-U13369.1:10235..10334 U13369.1 10235 10334 + 100 579757
rna-U13369.1:10335..10434 U13369.1 10335 10434 + 100 432036
rna-U13369.1:10435..10534 U13369.1 10435 10534 + 100 401304
rna-U13369.1:10535..10634 U13369.1 10535 10634 + 100 482139
rna-U13369.1:10635..10734 U13369.1 10635 10734 + 100 542858
rna-U13369.1:10735..10834 U13369.1 10735 10834 + 100 419599
rna-U13369.1:10835..10934 U13369.1 10835 10934 + 100 103019
rna-U13369.1:10935..11034 U13369.1 10935 11034 + 100 45063
rna-U13369.1:11035..11134 U13369.1 11035 11134 + 100 14769
rna-U13369.1:11135..11234 U13369.1 11135 11234 + 100 14929
rna-U13369.1:11235..11334 U13369.1 11235 11334 + 100 17393
rna-U13369.1:11335..11434 U13369.1 11335 11434 + 100 58989
rna-U13369.1:11435..11534 U13369.1 11435 11534 + 100 135358
rna-U13369.1:11535..11634 U13369.1 11535 11634 + 100 563636
rna-U13369.1:11635..11734 U13369.1 11635 11734 + 100 806291
rna-U13369.1:11735..11834 U13369.1 11735 11834 + 100 852255
rna-U13369.1:11835..11934 U13369.1 11835 11934 + 100 397908
rna-U13369.1:11935..12034 U13369.1 11935 12034 + 100 297926
rna-U13369.1:12035..12134 U13369.1 12035 12134 + 100 325780
rna-U13369.1:12135..12234 U13369.1 12135 12234 + 100 299787
rna-U13369.1:12235..12334 U13369.1 12235 12334 + 100 275376
rna-U13369.1:12335..12434 U13369.1 12335 12434 + 100 326802
rna-U13369.1:12435..12534 U13369.1 12435 12534 + 100 695120
rna-U13369.1:12535..12634 U13369.1 12535 12634 + 100 958579
rna-U13369.1:12635..12734 U13369.1 12635 12734 + 100 903318
rna-U13369.1:12735..12834 U13369.1 12735 12834 + 100 715592
rna-U13369.1:12835..12934 U13369.1 12835 12934 + 100 544458
rna-U13369.1:12735..12969 U13369.1 12935 12969 + 35 46803
rna-U13369.1:12970..13069 U13369.1 12970 13069 + 100 1202
rna-U13369.1:13070..13169 U13369.1 13070 13169 + 100 562
rna-U13369.1:13170..13269 U13369.1 13170 13269 + 100 255
rna-U13369.1:13270..13314 U13369.1 13270 13314 + 45 89
Step 5 - 对HCT116测序数据进行计数
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -M -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.3.gtf -O -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116_4.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
v2.0.1
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| o HCT116-valAligned.out.bam.sort ||
|| ||
|| Output file : HCT116_4.read.count ||
|| Summary : HCT116_4.read.count.summary ||
|| Annotation : U13369.1.3.gtf (GTF) ||
|| Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
|| ||
|| Threads : 32 ||
|| Level : feature level ||
|| Paired-end : yes ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : required ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file U13369.1.3.gtf ... ||
|| Features : 147 ||
|| Meta-features : 147 ||
|| Chromosomes/contigs : 1 ||
|| ||
|| Process BAM file HCT116-valAligned.out.bam.sort... ||
|| Paired-end reads are included. ||
|| Total alignments : 17788363 ||
|| Successfully assigned alignments : 17787396 (100.0%) ||
|| Running time : 0.51 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
|| Summary of counting results can be found in file "/Data/lizexing/projects ||
|| /xindi/Data/new/Data/CleanData/HCT116_4.read.count.summary" ||
|| ||
\\============================================================================//
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HCT116_4.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-M" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.3.gtf" "-O" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116_4.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort"
Geneid Chr Start End Strand Length /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valAligned.out.bam.sort
rna-U13369.1:1..100 U13369.1 1 100 + 100 899
rna-U13369.1:101..200 U13369.1 101 200 + 100 3809
rna-U13369.1:401..500 U13369.1 401 500 + 100 3321
rna-U13369.1:501..600 U13369.1 501 600 + 100 4314
rna-U13369.1:601..700 U13369.1 601 700 + 100 5879
rna-U13369.1:701..800 U13369.1 701 800 + 100 85608
rna-U13369.1:801..900 U13369.1 801 900 + 100 65096
rna-U13369.1:901..1000 U13369.1 901 1000 + 100 10504
rna-U13369.1:1001..1100 U13369.1 1001 1100 + 100 16884
rna-U13369.1:1101..1200 U13369.1 1101 1200 + 100 10355
rna-U13369.1:1201..1300 U13369.1 1201 1300 + 100 20986
rna-U13369.1:1301..1400 U13369.1 1301 1400 + 100 9680
rna-U13369.1:1401..1500 U13369.1 1401 1500 + 100 8206
rna-U13369.1:1501..1600 U13369.1 1501 1600 + 100 4067
rna-U13369.1:1601..1700 U13369.1 1601 1700 + 100 6487
rna-U13369.1:1701..1800 U13369.1 1701 1800 + 100 9367
rna-U13369.1:1801..1900 U13369.1 1801 1900 + 100 68566
rna-U13369.1:1901..2000 U13369.1 1901 2000 + 100 13355
rna-U13369.1:2001..2100 U13369.1 2001 2100 + 100 11643
rna-U13369.1:2101..2200 U13369.1 2101 2200 + 100 27520
rna-U13369.1:2201..2300 U13369.1 2201 2300 + 100 11418
rna-U13369.1:2301..2400 U13369.1 2301 2400 + 100 3173
rna-U13369.1:2401..2500 U13369.1 2401 2500 + 100 5298
rna-U13369.1:2501..2600 U13369.1 2501 2600 + 100 29929
rna-U13369.1:2601..2700 U13369.1 2601 2700 + 100 5183
rna-U13369.1:2701..2800 U13369.1 2701 2800 + 100 2800
rna-U13369.1:2801..2900 U13369.1 2801 2900 + 100 3356
rna-U13369.1:2901..3000 U13369.1 2901 3000 + 100 2090
rna-U13369.1:3001..3100 U13369.1 3001 3100 + 100 98066
rna-U13369.1:3101..3200 U13369.1 3101 3200 + 100 706
rna-U13369.1:3201..3300 U13369.1 3201 3300 + 100 17733
rna-U13369.1:3301..3400 U13369.1 3301 3400 + 100 14367
rna-U13369.1:3401..3500 U13369.1 3401 3500 + 100 5321
rna-U13369.1:3501..3600 U13369.1 3501 3600 + 100 10711
rna-U13369.1:3601..3656 U13369.1 3601 3656 + 56 11391
rna-U13369.1:3657..3756 U13369.1 3657 3756 + 100 1435560
rna-U13369.1:3757..3856 U13369.1 3757 3856 + 100 1677882
rna-U13369.1:3857..3956 U13369.1 3857 3956 + 100 817967
rna-U13369.1:3957..4056 U13369.1 3957 4056 + 100 976286
rna-U13369.1:4057..4156 U13369.1 4057 4156 + 100 930196
rna-U13369.1:4157..4256 U13369.1 4157 4256 + 100 671001
rna-U13369.1:4257..4356 U13369.1 4257 4356 + 100 792943
rna-U13369.1:4357..4456 U13369.1 4357 4456 + 100 827521
rna-U13369.1:4457..4556 U13369.1 4457 4556 + 100 1200888
rna-U13369.1:4557..4656 U13369.1 4557 4656 + 100 1209052
rna-U13369.1:4657..4756 U13369.1 4657 4756 + 100 922498
rna-U13369.1:4757..4856 U13369.1 4757 4856 + 100 587374
rna-U13369.1:4857..4956 U13369.1 4857 4956 + 100 1420225
rna-U13369.1:4957..5056 U13369.1 4957 5056 + 100 1408911
rna-U13369.1:5057..5156 U13369.1 5057 5156 + 100 1493089
rna-U13369.1:5157..5256 U13369.1 5157 5256 + 100 989229
rna-U13369.1:5257..5356 U13369.1 5257 5356 + 100 883788
rna-U13369.1:5357..5456 U13369.1 5357 5456 + 100 444330
rna-U13369.1:5457..5527 U13369.1 5457 5527 + 71 218191
rna-U13369.1:5528..5627 U13369.1 5528 5627 + 100 57031
rna-U13369.1:5628..5727 U13369.1 5628 5727 + 100 20749
rna-U13369.1:5728..5827 U13369.1 5728 5827 + 100 16409
rna-U13369.1:5828..5927 U13369.1 5828 5927 + 100 6236
rna-U13369.1:5928..6027 U13369.1 5928 6027 + 100 7847
rna-U13369.1:6028..6127 U13369.1 6028 6127 + 100 5461
rna-U13369.1:6128..6227 U13369.1 6128 6227 + 100 3546
rna-U13369.1:6228..6327 U13369.1 6228 6327 + 100 2065
rna-U13369.1:6328..6427 U13369.1 6328 6427 + 100 7038
rna-U13369.1:6428..6527 U13369.1 6428 6527 + 100 6780
rna-U13369.1:6528..6623 U13369.1 6528 6623 + 96 7432
rna-U13369.1:6623..6633 U13369.1 6623 6633 + 11 5597
rna-U13369.1:6634..6643 U13369.1 6634 6643 + 10 22858
rna-U13369.1:6644..6653 U13369.1 6644 6653 + 10 23986
rna-U13369.1:6654..6663 U13369.1 6654 6663 + 10 26156
rna-U13369.1:6664..6673 U13369.1 6664 6673 + 10 29023
rna-U13369.1:6674..6683 U13369.1 6674 6683 + 10 29723
rna-U13369.1:6684..6693 U13369.1 6684 6693 + 10 30940
rna-U13369.1:6694..6703 U13369.1 6694 6703 + 10 31858
rna-U13369.1:6704..6713 U13369.1 6704 6713 + 10 33187
rna-U13369.1:6714..6723 U13369.1 6714 6723 + 10 34336
rna-U13369.1:6724..6733 U13369.1 6724 6733 + 10 35221
rna-U13369.1:6734..6743 U13369.1 6734 6743 + 10 34947
rna-U13369.1:6744..6753 U13369.1 6744 6753 + 10 34952
rna-U13369.1:6754..6763 U13369.1 6754 6763 + 10 35173
rna-U13369.1:6764..6779 U13369.1 6764 6779 + 16 36173
rna-U13369.1:6780..6879 U13369.1 6780 6879 + 100 34981
rna-U13369.1:6880..6979 U13369.1 6880 6979 + 100 13275
rna-U13369.1:6980..7079 U13369.1 6980 7079 + 100 10386
rna-U13369.1:7080..7179 U13369.1 7080 7179 + 100 8528
rna-U13369.1:7180..7279 U13369.1 7180 7279 + 100 5894
rna-U13369.1:7280..7379 U13369.1 7280 7379 + 100 1842
rna-U13369.1:7380..7479 U13369.1 7380 7479 + 100 5546
rna-U13369.1:7480..7579 U13369.1 7480 7579 + 100 2276
rna-U13369.1:7580..7679 U13369.1 7580 7679 + 100 4771
rna-U13369.1:7680..7779 U13369.1 7680 7779 + 100 48226
rna-U13369.1:7780..7879 U13369.1 7780 7879 + 100 46267
rna-U13369.1:7880..7934 U13369.1 7880 7934 + 55 51235
rna-U13369.1:7935..8034 U13369.1 7935 8034 + 100 1572032
rna-U13369.1:8035..8134 U13369.1 8035 8134 + 100 1877287
rna-U13369.1:8135..8234 U13369.1 8135 8234 + 100 1370282
rna-U13369.1:8235..8334 U13369.1 8235 8334 + 100 1601031
rna-U13369.1:8335..8434 U13369.1 8335 8434 + 100 1143657
rna-U13369.1:8435..8534 U13369.1 8435 8534 + 100 792499
rna-U13369.1:8535..8634 U13369.1 8535 8634 + 100 82913
rna-U13369.1:8635..8734 U13369.1 8635 8734 + 100 280190
rna-U13369.1:8735..8834 U13369.1 8735 8834 + 100 293053
rna-U13369.1:8835..8934 U13369.1 8835 8934 + 100 334811
rna-U13369.1:8935..9034 U13369.1 8935 9034 + 100 153342
rna-U13369.1:9035..9134 U13369.1 9035 9134 + 100 83329
rna-U13369.1:9135..9234 U13369.1 9135 9234 + 100 96309
rna-U13369.1:9235..9334 U13369.1 9235 9334 + 100 363386
rna-U13369.1:9335..9434 U13369.1 9335 9434 + 100 749820
rna-U13369.1:9435..9534 U13369.1 9435 9534 + 100 1192671
rna-U13369.1:9535..9634 U13369.1 9535 9634 + 100 1724150
rna-U13369.1:9635..9734 U13369.1 9635 9734 + 100 1845135
rna-U13369.1:9735..9834 U13369.1 9735 9834 + 100 1365528
rna-U13369.1:9835..9934 U13369.1 9835 9934 + 100 1178376
rna-U13369.1:9935..10034 U13369.1 9935 10034 + 100 384939
rna-U13369.1:10035..10134 U13369.1 10035 10134 + 100 170031
rna-U13369.1:10135..10234 U13369.1 10135 10234 + 100 535143
rna-U13369.1:10235..10334 U13369.1 10235 10334 + 100 768793
rna-U13369.1:10335..10434 U13369.1 10335 10434 + 100 494930
rna-U13369.1:10435..10534 U13369.1 10435 10534 + 100 419520
rna-U13369.1:10535..10634 U13369.1 10535 10634 + 100 518426
rna-U13369.1:10635..10734 U13369.1 10635 10734 + 100 580807
rna-U13369.1:10735..10834 U13369.1 10735 10834 + 100 472139
rna-U13369.1:10835..10934 U13369.1 10835 10934 + 100 103825
rna-U13369.1:10935..11034 U13369.1 10935 11034 + 100 37848
rna-U13369.1:11035..11134 U13369.1 11035 11134 + 100 17181
rna-U13369.1:11135..11234 U13369.1 11135 11234 + 100 18790
rna-U13369.1:11235..11334 U13369.1 11235 11334 + 100 22928
rna-U13369.1:11335..11434 U13369.1 11335 11434 + 100 76066
rna-U13369.1:11435..11534 U13369.1 11435 11534 + 100 154582
rna-U13369.1:11535..11634 U13369.1 11535 11634 + 100 640212
rna-U13369.1:11635..11734 U13369.1 11635 11734 + 100 905400
rna-U13369.1:11735..11834 U13369.1 11735 11834 + 100 946617
rna-U13369.1:11835..11934 U13369.1 11835 11934 + 100 403651
rna-U13369.1:11935..12034 U13369.1 11935 12034 + 100 314676
rna-U13369.1:12035..12134 U13369.1 12035 12134 + 100 351003
rna-U13369.1:12135..12234 U13369.1 12135 12234 + 100 333742
rna-U13369.1:12235..12334 U13369.1 12235 12334 + 100 287773
rna-U13369.1:12335..12434 U13369.1 12335 12434 + 100 318202
rna-U13369.1:12435..12534 U13369.1 12435 12534 + 100 578552
rna-U13369.1:12535..12634 U13369.1 12535 12634 + 100 671803
rna-U13369.1:12635..12734 U13369.1 12635 12734 + 100 672357
rna-U13369.1:12735..12834 U13369.1 12735 12834 + 100 561538
rna-U13369.1:12835..12934 U13369.1 12835 12934 + 100 521403
rna-U13369.1:12735..12969 U13369.1 12935 12969 + 35 38125
rna-U13369.1:12970..13069 U13369.1 12970 13069 + 100 2304
rna-U13369.1:13070..13169 U13369.1 13070 13169 + 100 703
rna-U13369.1:13170..13269 U13369.1 13170 13269 + 100 323
rna-U13369.1:13270..13314 U13369.1 13270 13314 + 45 82
Step 6 - 对HeLa测序数据进行计数
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ featureCounts -T 32 -M -a /Data/lizexing/reference/h_45S_rDNA/U13369.1.3.gtf -O -p -B -C -f -t exon -g gene_id -o /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa_4.read.count /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
v2.0.1
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| o HeLa-valAligned.out.bam.sort ||
|| ||
|| Output file : HeLa_4.read.count ||
|| Summary : HeLa_4.read.count.summary ||
|| Annotation : U13369.1.3.gtf (GTF) ||
|| Dir for temp files : /Data/lizexing/projects/xindi/Data/new/Data/ ... ||
|| ||
|| Threads : 32 ||
|| Level : feature level ||
|| Paired-end : yes ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : required ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file U13369.1.3.gtf ... ||
|| Features : 147 ||
|| Meta-features : 147 ||
|| Chromosomes/contigs : 1 ||
|| ||
|| Process BAM file HeLa-valAligned.out.bam.sort... ||
|| Paired-end reads are included. ||
|| Total alignments : 19960509 ||
|| Successfully assigned alignments : 19955679 (100.0%) ||
|| Running time : 0.67 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
|| Summary of counting results can be found in file "/Data/lizexing/projects ||
|| /xindi/Data/new/Data/CleanData/HeLa_4.read.count.summary" ||
|| ||
\\============================================================================//
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ cat HeLa_4.read.count
# Program:featureCounts v2.0.1; Command:"featureCounts" "-T" "32" "-M" "-a" "/Data/lizexing/reference/h_45S_rDNA/U13369.1.3.gtf" "-O" "-p" "-B" "-C" "-f" "-t" "exon" "-g" "gene_id" "-o" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa_4.read.count" "/Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort"
Geneid Chr Start End Strand Length /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valAligned.out.bam.sort
rna-U13369.1:1..100 U13369.1 1 100 + 100 3370
rna-U13369.1:101..200 U13369.1 101 200 + 100 4585
rna-U13369.1:401..500 U13369.1 401 500 + 100 3682
rna-U13369.1:501..600 U13369.1 501 600 + 100 2958
rna-U13369.1:601..700 U13369.1 601 700 + 100 4011
rna-U13369.1:701..800 U13369.1 701 800 + 100 119401
rna-U13369.1:801..900 U13369.1 801 900 + 100 6803
rna-U13369.1:901..1000 U13369.1 901 1000 + 100 8265
rna-U13369.1:1001..1100 U13369.1 1001 1100 + 100 8832
rna-U13369.1:1101..1200 U13369.1 1101 1200 + 100 9611
rna-U13369.1:1201..1300 U13369.1 1201 1300 + 100 12128
rna-U13369.1:1301..1400 U13369.1 1301 1400 + 100 8338
rna-U13369.1:1401..1500 U13369.1 1401 1500 + 100 10285
rna-U13369.1:1501..1600 U13369.1 1501 1600 + 100 4272
rna-U13369.1:1601..1700 U13369.1 1601 1700 + 100 5897
rna-U13369.1:1701..1800 U13369.1 1701 1800 + 100 9179
rna-U13369.1:1801..1900 U13369.1 1801 1900 + 100 108051
rna-U13369.1:1901..2000 U13369.1 1901 2000 + 100 11789
rna-U13369.1:2001..2100 U13369.1 2001 2100 + 100 9692
rna-U13369.1:2101..2200 U13369.1 2101 2200 + 100 10434
rna-U13369.1:2201..2300 U13369.1 2201 2300 + 100 9701
rna-U13369.1:2301..2400 U13369.1 2301 2400 + 100 2507
rna-U13369.1:2401..2500 U13369.1 2401 2500 + 100 6246
rna-U13369.1:2501..2600 U13369.1 2501 2600 + 100 9081
rna-U13369.1:2601..2700 U13369.1 2601 2700 + 100 4592
rna-U13369.1:2701..2800 U13369.1 2701 2800 + 100 3166
rna-U13369.1:2801..2900 U13369.1 2801 2900 + 100 2292
rna-U13369.1:2901..3000 U13369.1 2901 3000 + 100 4861
rna-U13369.1:3001..3100 U13369.1 3001 3100 + 100 131649
rna-U13369.1:3101..3200 U13369.1 3101 3200 + 100 867
rna-U13369.1:3201..3300 U13369.1 3201 3300 + 100 19146
rna-U13369.1:3301..3400 U13369.1 3301 3400 + 100 15038
rna-U13369.1:3401..3500 U13369.1 3401 3500 + 100 4407
rna-U13369.1:3501..3600 U13369.1 3501 3600 + 100 8069
rna-U13369.1:3601..3656 U13369.1 3601 3656 + 56 7406
rna-U13369.1:3657..3756 U13369.1 3657 3756 + 100 1678242
rna-U13369.1:3757..3856 U13369.1 3757 3856 + 100 1986846
rna-U13369.1:3857..3956 U13369.1 3857 3956 + 100 990258
rna-U13369.1:3957..4056 U13369.1 3957 4056 + 100 1094256
rna-U13369.1:4057..4156 U13369.1 4057 4156 + 100 1047087
rna-U13369.1:4157..4256 U13369.1 4157 4256 + 100 755495
rna-U13369.1:4257..4356 U13369.1 4257 4356 + 100 981482
rna-U13369.1:4357..4456 U13369.1 4357 4456 + 100 896156
rna-U13369.1:4457..4556 U13369.1 4457 4556 + 100 1282005
rna-U13369.1:4557..4656 U13369.1 4557 4656 + 100 1247009
rna-U13369.1:4657..4756 U13369.1 4657 4756 + 100 1055064
rna-U13369.1:4757..4856 U13369.1 4757 4856 + 100 713463
rna-U13369.1:4857..4956 U13369.1 4857 4956 + 100 1597757
rna-U13369.1:4957..5056 U13369.1 4957 5056 + 100 1634894
rna-U13369.1:5057..5156 U13369.1 5057 5156 + 100 1753994
rna-U13369.1:5157..5256 U13369.1 5157 5256 + 100 1155004
rna-U13369.1:5257..5356 U13369.1 5257 5356 + 100 976121
rna-U13369.1:5357..5456 U13369.1 5357 5456 + 100 504963
rna-U13369.1:5457..5527 U13369.1 5457 5527 + 71 248275
rna-U13369.1:5528..5627 U13369.1 5528 5627 + 100 46379
rna-U13369.1:5628..5727 U13369.1 5628 5727 + 100 18738
rna-U13369.1:5728..5827 U13369.1 5728 5827 + 100 15797
rna-U13369.1:5828..5927 U13369.1 5828 5927 + 100 9599
rna-U13369.1:5928..6027 U13369.1 5928 6027 + 100 9276
rna-U13369.1:6028..6127 U13369.1 6028 6127 + 100 13331
rna-U13369.1:6128..6227 U13369.1 6128 6227 + 100 6199
rna-U13369.1:6228..6327 U13369.1 6228 6327 + 100 2589
rna-U13369.1:6328..6427 U13369.1 6328 6427 + 100 7461
rna-U13369.1:6428..6527 U13369.1 6428 6527 + 100 5157
rna-U13369.1:6528..6623 U13369.1 6528 6623 + 96 4707
rna-U13369.1:6623..6633 U13369.1 6623 6633 + 11 6846
rna-U13369.1:6634..6643 U13369.1 6634 6643 + 10 39098
rna-U13369.1:6644..6653 U13369.1 6644 6653 + 10 41476
rna-U13369.1:6654..6663 U13369.1 6654 6663 + 10 46462
rna-U13369.1:6664..6673 U13369.1 6664 6673 + 10 51148
rna-U13369.1:6674..6683 U13369.1 6674 6683 + 10 52316
rna-U13369.1:6684..6693 U13369.1 6684 6693 + 10 54006
rna-U13369.1:6694..6703 U13369.1 6694 6703 + 10 54944
rna-U13369.1:6704..6713 U13369.1 6704 6713 + 10 57223
rna-U13369.1:6714..6723 U13369.1 6714 6723 + 10 58545
rna-U13369.1:6724..6733 U13369.1 6724 6733 + 10 59322
rna-U13369.1:6734..6743 U13369.1 6734 6743 + 10 59622
rna-U13369.1:6744..6753 U13369.1 6744 6753 + 10 59937
rna-U13369.1:6754..6763 U13369.1 6754 6763 + 10 60432
rna-U13369.1:6764..6779 U13369.1 6764 6779 + 16 61770
rna-U13369.1:6780..6879 U13369.1 6780 6879 + 100 56446
rna-U13369.1:6880..6979 U13369.1 6880 6979 + 100 22820
rna-U13369.1:6980..7079 U13369.1 6980 7079 + 100 14861
rna-U13369.1:7080..7179 U13369.1 7080 7179 + 100 10487
rna-U13369.1:7180..7279 U13369.1 7180 7279 + 100 9092
rna-U13369.1:7280..7379 U13369.1 7280 7379 + 100 2582
rna-U13369.1:7380..7479 U13369.1 7380 7479 + 100 7284
rna-U13369.1:7480..7579 U13369.1 7480 7579 + 100 3379
rna-U13369.1:7580..7679 U13369.1 7580 7679 + 100 6159
rna-U13369.1:7680..7779 U13369.1 7680 7779 + 100 13247
rna-U13369.1:7780..7879 U13369.1 7780 7879 + 100 44625
rna-U13369.1:7880..7934 U13369.1 7880 7934 + 55 54921
rna-U13369.1:7935..8034 U13369.1 7935 8034 + 100 1866486
rna-U13369.1:8035..8134 U13369.1 8035 8134 + 100 2191163
rna-U13369.1:8135..8234 U13369.1 8135 8234 + 100 1661526
rna-U13369.1:8235..8334 U13369.1 8235 8334 + 100 1697478
rna-U13369.1:8335..8434 U13369.1 8335 8434 + 100 1111833
rna-U13369.1:8435..8534 U13369.1 8435 8534 + 100 769359
rna-U13369.1:8535..8634 U13369.1 8535 8634 + 100 97781
rna-U13369.1:8635..8734 U13369.1 8635 8734 + 100 295272
rna-U13369.1:8735..8834 U13369.1 8735 8834 + 100 316382
rna-U13369.1:8835..8934 U13369.1 8835 8934 + 100 350558
rna-U13369.1:8935..9034 U13369.1 8935 9034 + 100 160379
rna-U13369.1:9035..9134 U13369.1 9035 9134 + 100 82634
rna-U13369.1:9135..9234 U13369.1 9135 9234 + 100 85191
rna-U13369.1:9235..9334 U13369.1 9235 9334 + 100 442108
rna-U13369.1:9335..9434 U13369.1 9335 9434 + 100 897183
rna-U13369.1:9435..9534 U13369.1 9435 9534 + 100 1310775
rna-U13369.1:9535..9634 U13369.1 9535 9634 + 100 1774057
rna-U13369.1:9635..9734 U13369.1 9635 9734 + 100 2031500
rna-U13369.1:9735..9834 U13369.1 9735 9834 + 100 1633367
rna-U13369.1:9835..9934 U13369.1 9835 9934 + 100 1455707
rna-U13369.1:9935..10034 U13369.1 9935 10034 + 100 496206
rna-U13369.1:10035..10134 U13369.1 10035 10134 + 100 138555
rna-U13369.1:10135..10234 U13369.1 10135 10234 + 100 439098
rna-U13369.1:10235..10334 U13369.1 10235 10334 + 100 737591
rna-U13369.1:10335..10434 U13369.1 10335 10434 + 100 584975
rna-U13369.1:10435..10534 U13369.1 10435 10534 + 100 558475
rna-U13369.1:10535..10634 U13369.1 10535 10634 + 100 695322
rna-U13369.1:10635..10734 U13369.1 10635 10734 + 100 727567
rna-U13369.1:10735..10834 U13369.1 10735 10834 + 100 565347
rna-U13369.1:10835..10934 U13369.1 10835 10934 + 100 182889
rna-U13369.1:10935..11034 U13369.1 10935 11034 + 100 108785
rna-U13369.1:11035..11134 U13369.1 11035 11134 + 100 20672
rna-U13369.1:11135..11234 U13369.1 11135 11234 + 100 21352
rna-U13369.1:11235..11334 U13369.1 11235 11334 + 100 28617
rna-U13369.1:11335..11434 U13369.1 11335 11434 + 100 72170
rna-U13369.1:11435..11534 U13369.1 11435 11534 + 100 170104
rna-U13369.1:11535..11634 U13369.1 11535 11634 + 100 749326
rna-U13369.1:11635..11734 U13369.1 11635 11734 + 100 1032113
rna-U13369.1:11735..11834 U13369.1 11735 11834 + 100 1113544
rna-U13369.1:11835..11934 U13369.1 11835 11934 + 100 541404
rna-U13369.1:11935..12034 U13369.1 11935 12034 + 100 431856
rna-U13369.1:12035..12134 U13369.1 12035 12134 + 100 473294
rna-U13369.1:12135..12234 U13369.1 12135 12234 + 100 397396
rna-U13369.1:12235..12334 U13369.1 12235 12334 + 100 388342
rna-U13369.1:12335..12434 U13369.1 12335 12434 + 100 433350
rna-U13369.1:12435..12534 U13369.1 12435 12534 + 100 742054
rna-U13369.1:12535..12634 U13369.1 12535 12634 + 100 914501
rna-U13369.1:12635..12734 U13369.1 12635 12734 + 100 901947
rna-U13369.1:12735..12834 U13369.1 12735 12834 + 100 734796
rna-U13369.1:12835..12934 U13369.1 12835 12934 + 100 721153
rna-U13369.1:12735..12969 U13369.1 12935 12969 + 35 41187
rna-U13369.1:12970..13069 U13369.1 12970 13069 + 100 2698
rna-U13369.1:13070..13169 U13369.1 13070 13169 + 100 645
rna-U13369.1:13170..13269 U13369.1 13170 13269 + 100 504
rna-U13369.1:13270..13314 U13369.1 13270 13314 + 45 196
27. 使用STAR软件对三组数据未比对上的序列与Genome比对
Step 1 - Build a GPCh38.p13 genome index构建索引
--runThreadN是指你要用几个cpu来运行;
--genomeDir构建索引输出文件的目录;
--genomeFastaFiles你的基因组fasta文件所在的目录
(base) lizexing@bio:~$ STAR --runMode genomeGenerate --runThreadN 60 --genomeDir /Data/lizexing/reference/UCSC_hg38/star_index/ --genomeFastaFiles /Data/lizexing/reference/UCSC_hg38/GRCh38.p13.genome.fa
Sep 11 18:45:25 ..... started STAR run
Sep 11 18:45:25 ... starting to generate Genome files
Sep 11 18:46:45 ... starting to sort Suffix Array. This may take a long time...
Sep 11 18:47:07 ... sorting Suffix Array chunks and saving them to disk...
Sep 11 19:08:32 ... loading chunks from disk, packing SA...
Sep 11 19:10:08 ... finished generating suffix array
Sep 11 19:10:08 ... generating Suffix Array index
Sep 11 19:14:38 ... completed Suffix Array index
Sep 11 19:14:38 ... writing Genome to disk ...
Sep 11 19:14:40 ... writing Suffix Array to disk ...
Sep 11 19:15:08 ... writing SAindex to disk
Sep 11 19:15:11 ..... finished successfully
Step 2 - 对293T测序数据进行比对
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesType Fastx --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/UCSC_hg38/gencode.v38.chr_patch_hapl_scaff.annotation.gtf --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/UCSC_hg38/star_index/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valUnmapped.out.mate1 --outFileNamePrefix 293T-genome_mate1
Sep 12 11:40:44 ..... started STAR run
Sep 12 11:40:44 ..... loading genome
Sep 12 11:41:05 ..... processing annotations GTF
Sep 12 11:41:25 ..... inserting junctions into the genome indices
Sep 12 11:45:01 ..... started mapping
Sep 12 11:46:13 ..... finished mapping
Sep 12 11:46:16 ..... finished successfully
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesType Fastx --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/UCSC_hg38/gencode.v38.chr_patch_hapl_scaff.annotation.gtf --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/UCSC_hg38/star_index/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/293T-valUnmapped.out.mate2 --outFileNamePrefix 293T-genome_mate2
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesType Fastx --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/UCSC_hg38/gencode.v38.chr_patch_hapl_scaff.annotation.gtf --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/UCSC_hg38/star_index/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valUnmapped.out.mate1 --outFileNamePrefix HCT116-genome_mate1
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesType Fastx --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/UCSC_hg38/gencode.v38.chr_patch_hapl_scaff.annotation.gtf --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/UCSC_hg38/star_index/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HCT116-valUnmapped.out.mate2 --outFileNamePrefix HCT116-genome_mate2
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesType Fastx --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/UCSC_hg38/gencode.v38.chr_patch_hapl_scaff.annotation.gtf --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/UCSC_hg38/star_index/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valUnmapped.out.mate1 --outFileNamePrefix HeLa-genome_mate1
(base) lizexing@bio:~/projects/xindi/Data/new/Data/CleanData$ STAR --runThreadN 40 --runMode alignReads --readFilesType Fastx --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile /Data/lizexing/reference/UCSC_hg38/gencode.v38.chr_patch_hapl_scaff.annotation.gtf --outSAMtype BAM Unsorted --genomeDir /Data/lizexing/reference/UCSC_hg38/star_index/ --readFilesIn /Data/lizexing/projects/xindi/Data/new/Data/CleanData/HeLa-valUnmapped.out.mate2 --outFileNamePrefix HeLa-genome_mate2