RNA病毒基因组的重头组装-内含tophat2报错的快速解决办法-CPIV3数据分析-2023-07-13

1、使用Trim Galore软件对两次数据进行质控,去掉20bp以下的reads

vim新建RNA_seq_script_1对CPIV3测序数据进行质控分析

#!/bin/bash
# 上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
# Program
#     This program is used for RNA-seq data analysis.
# History
#     2023/07/13       zexing            First release
# 设置变量${dir}为常用目录
dir=/home/customer/lizexing/projects/CPIV3/

# 使用fastqc软件对数据进行质控分析
# fastqc -t 8 -o ${dir}/fastqc_report/ ${dir}/raw_data/*.fq.gz

# 对数据利用Trim_galore去掉20bp以下的接头
trim_galore -q 8 --phred33 --stringency 3 --length 20 -e 0.1 -j 4 --paired \
${dir}/CleanData/C3_1.clean.fq \
${dir}/CleanData/C3_2.clean.fq \
-o ${dir}/clean_data/

后台运行RNA_seq_script_1:

nohup bash RNA_seq_script_1 > RNA_seq_script_1_log &

2、 使用Bowtie2软件对牛的基因组[bosTau9]构建索引

vim新建RNA_seq_script_2对牛的基因组构建索引

# 参数说明
# bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r> | --interleaved <i> | -b <bam>} [-S <sam>]
#  <bt2-idx>  Index filename prefix (minus trailing .X.bt2).
#            NOTE: Bowtie 1 and Bowtie 2 indexes are not compatible.
#  <m1>       Files with #1 mates, paired with files in <m2>.
#            Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
#  <m2>       Files with #2 mates, paired with files in <m1>.
#             Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
#  <r>        Files with unpaired reads.
#             Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
#  <i>        Files with interleaved paired-end FASTQ/FASTA reads
#             Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
#  <bam>      Files are unaligned BAM sorted by read name.
#  <sam>      File for SAM output (default: stdout)
#  <m1>, <m2>, <r> can be comma-separated lists (no whitespace) and can be  specified many times.  E.g. '-U file1.fq,file2.fq -U file3.fq'.
# bowtie2-build 基因组序列.fq 输出Index前缀 -p CPU数目
bowtie2-build GCF_002263795.2_ARS-UCD1.3_genomic.fna Bos -p 8

运行如下脚本:

nohup bash RNA_seq_script_2 > RNA_seq_script_2_log &

3、使用Tophat2软件对测序数据与牛的基因组进行比对

之前没有使用过Tophat2进行序列比对,这次因为第一次做病毒基因组从头组装,参考RNA病毒基因组组装指南TopHat的使用以及输出文件安装、使用Tophat2软件进行比对。

(1)、先从Tophat官网下载相应的软件包进行安装;
(2)、使用Tophat需要提供bowtie软件提供的Index,官网已经提供了很多物种的Index及GTF格式文件,我自己还弄了半天;
(3)、tophat2报错:软件解压缩后,运行依赖于python2才可以运行,而现在的服务器默认的多为python3,看了网上很多办法,基本都是通过conda建立一个python2的环境后再运行Tophat2,我这边因为只能连接校园网,还没跟清华镜像联网,导致conda无法建立新环境(实属无奈),上网查到这个简便方法:

# 在tophat文件夹中打开tophat脚本
vim tophat

# 对该脚本的第一行进行python版本的指定
原命令为:#!/usr/bin/env python
新命令为:#!/usr/bin/env python2
# 仅添加了一个2的版本,省却了我很大的麻烦,必须记录、推广一下

(4)、通过help命令查看软件运行情况

(base) [customer@node01 tophat]$ bash tophat2 -h
Usage:
    tophat [options] <bowtie_index> <reads1[,reads2,...]> [reads1[,reads2,...]] \
                                    [quals1,[quals2,...]] [quals1[,quals2,...]]

(5)、按照默认参数对测序结果进行比对,命令如下:

#!/bin/bash
# 上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
#     This program is used for RNA-seq data analysis.
# History
#     2023/07/13       zexing            First release

# 设置变量${dir}为常用目录
# dir=/home/customer/lizexing/projects/CPIV3/

# tophat [options] <bowtie_index> <reads1[,reads2,...]> [reads1[,reads2,...]] [quals1,[quals2,...]] [quals1[,quals2,...]]

#-o 输出文件位置
#-p 几个CPU
#-G gtf文件
#index文件
#reads1
#reads2

tophat2 -o /home/customer/lizexing/projects/CPIV3/clean_data/mapped \
-p 8 \
-G /home/customer/lizexing/references/NCBI/Bostaurus/GCF_002263795.2.gtf \
/home/customer/lizexing/references/NCBI/Bostaurus/bt2_index/Bos \
/home/customer/lizexing/projects/CPIV3/clean_data/C3_1.clean_val_1.fq \
/home/customer/lizexing/projects/CPIV3/clean_data/C3_2.clean_val_2.fq 

运行的时间比较长,我是利用nohup命令将其放到了后台运行。

# 软件运行记录如下:
[2023-07-14 13:32:23] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2023-07-14 13:32:23] Checking for Bowtie
                  Bowtie version:        2.4.2.0
[2023-07-14 13:32:23] Checking for Bowtie index files (genome)..
[2023-07-14 13:32:23] Checking for reference FASTA file
        Warning: Could not find FASTA file /home/customer/lizexing/references/NCBI/Bostaurus/bt2_index/Bos.fa
[2023-07-14 13:32:23] Reconstituting reference FASTA file from Bowtie index
  Executing: /home/customer/.conda/envs/RNA-ChIP-Seq/bin/bowtie2-inspect /home/customer/lizexing/references/NCBI/Bostaurus/bt2_index/Bos > /home/customer/lizexing/projects/CPIV3/tophat_out/tmp/Bos.fa
[2023-07-14 13:34:02] Generating SAM header for /home/customer/lizexing/references/NCBI/Bostaurus/bt2_index/Bos
[2023-07-14 13:34:04] Reading known junctions from GTF file
[2023-07-14 13:34:24] Preparing reads
         left reads: min. length=20, max. length=150, 15779194 kept reads (18 discarded)
        right reads: min. length=20, max. length=150, 15779192 kept reads (20 discarded)
[2023-07-14 13:43:14] Building transcriptome data files /home/customer/lizexing/projects/CPIV3/tophat_out/tmp/GCF_002263795.2
[2023-07-14 13:44:09] Building Bowtie index from GCF_002263795.2.fa
[2023-07-14 14:39:34] Mapping left_kept_reads to transcriptome GCF_002263795.2 with Bowtie2
[2023-07-14 15:08:02] Mapping right_kept_reads to transcriptome GCF_002263795.2 with Bowtie2
[2023-07-14 15:39:45] Resuming TopHat pipeline with unmapped reads
[2023-07-14 15:39:46] Mapping left_kept_reads.m2g_um to genome Bos with Bowtie2
[2023-07-14 15:49:49] Mapping left_kept_reads.m2g_um_seg1 to genome Bos with Bowtie2 (1/6)
[2023-07-14 15:50:58] Mapping left_kept_reads.m2g_um_seg2 to genome Bos with Bowtie2 (2/6)
[2023-07-14 15:52:20] Mapping left_kept_reads.m2g_um_seg3 to genome Bos with Bowtie2 (3/6)
[2023-07-14 15:53:30] Mapping left_kept_reads.m2g_um_seg4 to genome Bos with Bowtie2 
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值