RNA-seq流程——使用hisat2进行序列比对(不利用循环&利用循环)(未完待续)
本次使用ky老师的文件进行序列比对,比对时使用双端比对,_1.clean.fq.gz,_2.clean.fq.gz
一、不利用循环进行比对
1.进行比对前,首先将目录转到有.fq.gz的文件夹下
hisat2 -t -p 8 -x /f/xudonglab/yuanye/reference/UCSC_mm10/hisat2_index/hisat2_index_mm10 \
#最后的\一定要加,不然程序直接就运行了,包括下面的-1 -2都要加\
> -1 /f/xudonglab/yuanye/projects/kongyu/RNA_seq/2021_02_22/clean_data/DIPG4_2_3_1.clean.fq.gz \
> -2 /f/xudonglab/yuanye/projects/kongyu/RNA_seq/2021_02_22/clean_data/DIPG4_2_3_2.clean.fq.gz \
> -S /f/xudonglab/yuanye/projects/kongyu/RNA_seq/2021_02_22/sam/DIPG4_2_3.sam
得到的结果应出现.sam文件。当不用循环操作时,不要把各组各重复遗漏或弄混。
(base) yuanye@DNA:~/projects/kongyu/RNA_seq/2021_02_22/clean_data$ hisat2 -t -p 8 -x /f/xudonglab/yuanye/reference/UCSC_mm10/hisat2_index/hisat2_index_mm10 \
> -1 /f/xudonglab/yuanye/projects/kongyu/RNA_seq/2021_02_22/clean_data/DIPG4_2_4_1.clean.fq.gz \
> -2 /f/xudonglab/yuanye/projects/kongyu/RNA_seq/2021_02_22/clean_data/DIPG4_2_4_2.clean.fq.gz \
> -S /f/xudonglab/yuanye/projects/kongyu/RNA_seq/2021_02_22/sam/DIPG4_2_4.sam
比对结果文件名称及后缀
二、利用循环进行比对
三、将.sam文件格式.bam文件
bam文件作用:将内存过大的sam文件压缩成二进制,减少内存
先将路径转到sam目录中,进行操作
(base) yuanye@DNA:~/projects/kongyu/RNA_seq/2021_02_22/sam$ samtools view -@ 8 -S DIPG4_2_3.sam -b -o DIPG4_2_3.bam
(base) yuanye@DNA:~/projects/kongyu/RNA_seq/2021_02_22/sam$ samtools view -@ 8 -S DIPG4_2_4.sam -b -o DIPG4_2_4.bam
得到的文件存在于sam目录下,可将其转移到bam文件下
(base) yuanye@DNA:~/projects/kongyu/RNA_seq/2021_02_22/sam$ mv DIPG4_2_3.bam /f/xudonglab/yuanye/projects/kongyu/RNA_seq/2021_02_22/bam
(base) yuanye@DNA:~/projects/kongyu/RNA_seq/2021_02_22/sam$ mv DIPG4_2_4.bam /f/xudonglab/yuanye/projects/kongyu/RNA_seq/2021_02_22/bam
四、排序(默认根据染色体位置进行排序)
(base) yuanye@DNA:~/projects/kongyu/RNA_seq/2021_02_22/bam$ samtools sort -@ 8 -l 5 -o DIPG4_2_3.bam.sort DIPG4_2_3.bam
[bam_sort_core] merging from 16 files and 8 in-memory blocks...
(base) yuanye@DNA:~/projects/kongyu/RNA_seq/2021_02_22/bam$ samtools sort -@ 8 -l 5 -o DIPG4_2_4.bam.sort DIPG4_2_4.bam
[bam_sort_core] merging from 16 files and 8 in-memory blocks...
查看排序结果:
(base) yuanye@DNA:~/projects/kongyu/RNA_seq/2021_02_22/bam$ samtools flagstat DIPG4_2_3.bam.sort
排序结果:
41319661 + 0 in total (QC-passed reads + QC-failed reads)
68199 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
752962 + 0 mapped (1.82% : N/A)
41251462 + 0 paired in sequencing
20625731 + 0 read1
20625731 + 0 read2
242066 + 0 properly paired (0.59% : N/A)
249836 + 0 with itself and mate mapped
434927 + 0 singletons (1.05% : N/A)
6516 + 0 with mate mapped to a different chr
4941 + 0 with mate mapped to a different chr (mapQ>=5)
五、查看得到的reads计数文件(*
.count/*
.tab)信息
1.使用wc命令对结果进行统计
(base) yuanye@DNA:~/projects/kongyu/RNA_seq/2021_02_22/bam/ballgown/DIPG4_2_4$ wc -l *.tab
24566 DIPG4_2_4.gene.tab
(base) yuanye@DNA:~/projects/kongyu/RNA_seq/2021_02_22/bam/ballgown/DIPG4_2_3$ wc -l *.tab
# -l 代表输出行数
# 可以看到同一批测序结果中每个文件的行数相同
24566 DIPG4_2_3.gene.tab
2.用head/tail命令查看结果的首尾信息
(base) yuanye@DNA:~/projects/kongyu/RNA_seq/2021_02_22/bam$ htseq-count -r pos -f bam DIPG4_2_4.bam.sort /f/xudonglab/yuanye/reference/UCSC_mm10/mm10_genes.gtf -c DIPG4_2_4.count