- 数据位置
- 路径
- path: /opt/data/assembly
- 原始数据
- /opt/data/assembly/test_1.fastq
- /opt/data/assembly/test_2.fastq
- SARS-CoV-2 Reference - NC_045512.2
- /opt/data/assembly/refseq/2019_nCoV.fasta
- SARS-CoV-2 blastdb-nucl
- /opt/data/assembly/refseq/2019_nCoV
- 路径
- 软件路径
- conda安装,做了软链接在/usr/bin,使用时直接可以调用
- python3
- /opt/software/miniconda3/bin/python3.8
- fastqc
- /opt/software/miniconda3/bin/fastqc
- trimmomatic
- /opt/software/miniconda3/bin/trimmomatic
- MultiQC
- /opt/software/miniconda3/bin/multiqc
- megahit
- /opt/software/miniconda3/bin/megahit
- QUAST
- /opt/software/miniconda3/envs/quast/bin/quast
- blastn
- /opt/software/ncbi-blast-2.10.0+/bin/blastn
- 挑选序列脚本
- /opt/software/extract_seq_f_fa.py
- bowtie2
- /opt/software/miniconda3/bin/bowtie2
- /opt/software/miniconda3/bin/bowtie2-build
- samtools
- /opt/software/miniconda3/bin/samtools
- weeSAM
- /opt/software/weeSAM.py
- VAPiD
- /opt/software/VAPiD-master/vapid3.py
- /opt/data/assembly/annotation/test.sbt
- /opt/data/assembly/annotation/test_metadata.csv
- 自己根据组装结果修改
- 流程
- raw data
- 二代测序数据,VLP
- 原始fastq
- Raw Data QC - FastQC
- 原始数据质控
- software: FastQC
- version: 0.11.9
- command:
- mkdir 1_raw_fastqc
- fastqc -o /home/test01/test/1_raw_fastqc/ /opt/data/assembly/*.fastq > /home/test01/test/0_logs/1_raw_qc.log 2>&1
- Quality Control - Trimmomatic
- 数据质控,去除低质量和接头
- software: Trimmomatic
- version: 0.39
- adapter: /opt/software/miniconda3/share/trimmomatic-0.39-1/adapters/TruSeq3-PE-2.fa
- 主要参数
- TruSeq3-PE-2.fa:用的多的那个adapter文件
- https://www.jianshu.com/p/a8935adebaae
- command:
- mkdir 2_trim
- trimmomatic PE -summary /home/test01/test/0_logs/2_trim_summary.txt /opt/data/assembly/test_1.fastq /opt/data/assembly/test_2.fastq -baseout /home/test01/test/2_trim/test.fastq.gz ILLUMINACLIP:/opt/software/miniconda3/share/trimmomatic-0.39-1/adapters/TruSeq3-PE-2.fa:2:30:10 SLIDINGWINDOW:4:15 LEADING:3 TRAILING:3 MINLEN:36 >/home/test01/test/0_logs/2_trim.log 2>&1
- Clean Data QC - FastQC
- 高质量数据质控
- software: FastQC
- version: 0.11.9
- command:
- mkdir 3_clean_fastqc
- fastqc -o /home/test01/test/3_clean_fastqc/ /home/test01/test/2_trim/*P.fastq.gz > /home/test01/test/0_logs/3_clean_qc.log 2>&1
- de novo Assembly - MEGAHIT
- 无参组装
- software: MEGAHIT
- version: 1.2.9
- command:
- *不预先创建输出文件夹
- megahit -1 /home/test01/test/2_trim/test_1P.fastq.gz -2 /home/test01/test/2_trim/test_2P.fastq.gz --min-contig-len 500 -o 4_assembly/ > /home/test01/test/0_logs/4_assembly.log 2>&1
- Assembly Statistics - QUAST
- 组装结果统计
- software: QUAST
- version: 5.0.2
- command:
- mkdir 5_quast
- quast /home/test01/test/4_assembly/final.contigs.fa -o /home/test01/test/5_quast/ > /home/test01/test/0_logs/5_quast.log 2>&1
- Blast to Reference Genome (NC_045512.2) - BLASTN
- 组装序列比对新冠参考基因组,挑出比对上的序列
- software: blastn
- version: 2.10.0+
- refseq: NC_045512.2
- outfmt6表格输出结果题头
- query acc.ver, query length, subject acc.ver, subject length, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
- command:
- mkdir 6_viral_contigs
- blastn -query /home/test01/test/4_assembly/final.contigs.fa -db /opt/data/assembly/refseq/SARS-CoV-2 -outfmt '6 qaccver qlen saccver slen pident length mismatch gapopen qstart qend sstart send evalue bitscore stitle' -qcov_hsp_perc 50 -out /home/test01/test/6_viral_contigs/nCoV_blastn_6.tsv
- -qcov_hsp_perc 50: Percent query coverage per hsp, 50%
- 挑选出比对上的contigs
- 根据contigs ID在final.contigs.fa中挑出
- cat /home/test01/test/6_viral_contigs/nCoV_blastn_6.tsv | cut -f 1 | sort -u > /home/test01/test/6_viral_contigs/viral_list.txt
- /opt/software/extract_seq_f_fa.py /home/test01/test/4_assembly/final.contigs.fa /home/test01/test/6_viral_contigs/viral_list.txt /home/test01/test/6_viral_contigs/viral_contigs.fa
- 查看reads分布
- Reads mapping查看reads利用率,查看viral contigs的reads覆盖情况
- software: Bowtie2
- version: 2.4.1
- software: samtools
- version: 1.11
- weeSAM
- /opt/software/weeSAM.py
- command:
- mkdir -p 7_bowtie2/index
- cp /home/test01/test/6_viral_contigs/viral_contigs.fa /home/test01/test/7_bowtie2/index/
- 建索引
- bowtie2-build /home/test01/test/7_bowtie2/index/viral_contigs.fa /home/test01/test/7_bowtie2/index/viral_contigs > /home/test01/test/0_logs/7_bowtie2_build.log 2>&1
- reads mapping
- bowtie2 -x /home/test01/test/7_bowtie2/index/viral_contigs -1 /home/test01/test/2_trim/test_1P.fastq.gz -2 /home/test01/test/2_trim/test_2P.fastq.gz | samtools sort -O bam -o - > /home/test01/test/7_bowtie2/viral_contigs_sorted.bam
- 统计比对率
- samtools flagstat /home/test01/test/7_bowtie2/viral_contigs_sorted.bam > /home/test01/test/0_logs/7_bowtie2_stats.txt
- 统计序列覆盖度等信息
- samtools coverage /home/test01/test/7_bowtie2/viral_contigs_sorted.bam > /home/test01/test/0_logs/7_contigs_coverage.txt
- OR
- /opt/software/weeSAM.py --bam /home/test01/test/7_bowtie2/viral_contigs_sorted.bam --out /home/test01/test/7_bowtie2/contigs_coverage_ws.tsv --html /home/test01/test/7_bowtie2/viral_contigs > /home/test01/test/0_logs/7_weesam.log 2>&1
- samtools coverage /home/test01/test/7_bowtie2/viral_contigs_sorted.bam > /home/test01/test/0_logs/7_contigs_coverage.txt
- SARS-CoV-2 Genome Annotation - VAPiD
- 新冠病毒基因组注释,使用VAPiD软件,把完整的viral contigs挑出来用于注释
- software: VAPiD
- version: 1.6.6
- /opt/data/assembly/annotation/test.sbt
- /opt/data/assembly/annotation/test_metadata.csv
- command:
- mkdir 8_annotation && cd 8_annotation
- 根据自己序列修改metada.tsv
- 参考文件/opt/data/assembly/annotation/test_metadata.csv
- 修改strain(自己contigs名称)、collection-date(自定义)、country(自定义)、coverage(weeSAM结果中Avg_Depth)、full_name(自定义)
- /home/test01/test/8_annotation/test_metadata.csv
- /opt/software/VAPiD-master/vapid3.py /home/test01/test/6_viral_contigs/viral_contigs.fa /opt/data/assembly/annotation/test.sbt --metadata_loc /home/test01/test/8_annotation/test_metadata.csv --db /opt/data/assembly/refseq/SARS-CoV-2 > /home/test01/test/0_logs/8_annotation.log 2>&1
- 汇总报告
- 使用multiqc软件汇总流程中的部分日志报告
- software: MultiQC
- version: 1.9
- command:
- mkdir /home/test01/test/9_all_report
- multiqc /home/test01/test/ -o /home/test01/test/9_all_report/ -n all_report > /home/test01/test/0_logs/9_multiqc.log 2>&1
- raw data
- 步骤流程
(以下命令行中的路径“/home/test01/test”请使用自己建立的路径代替,如“/home/usr01/zhangsan/Assembly”或者如果严格按照文档设计的结构执行,出现“/home/test01/test”用”.”替换)
-
- 创建文件夹
cd /home/usrxx/xxxxx/Assembly
mkdir test && cd test
mkdir 0_logs
-
- 1. raw data qc
mkdir 1_raw_fastqc
fastqc -o /home/test01/test/1_raw_fastqc/ /opt/data/assembly/*.fastq > /home/test01/test/0_logs/1_raw_qc.log 2>&1
-
- 2. trim
mkdir 2_trim
trimmomatic PE -summary /home/test01/test/0_logs/2_trim_summary.txt /opt/data/assembly/test_1.fastq /opt/data/assembly/test_2.fastq -baseout /home/test01/test/2_trim/test.fastq.gz ILLUMINACLIP:/opt/software/miniconda3/share/trimmomatic-0.39-1/adapters/TruSeq3-PE-2.fa:2:30:10 SLIDINGWINDOW:4:15 LEADING:3 TRAILING:3 MINLEN:36 >/home/test01/test/0_logs/2_trim.log 2>&1
-
- 3. clean data qc
mkdir 3_clean_fastqc
fastqc -o /home/test01/test/3_clean_fastqc/ /home/test01/test/2_trim/*P.fastq.gz > /home/test01/test/0_logs/3_clean_qc.log 2>&1
-
- 4. de novo assembly
- 不预先创建输出文件夹
- 4. de novo assembly
megahit -1 /home/test01/test/2_trim/test_1P.fastq.gz -2 /home/test01/test/2_trim/test_2P.fastq.gz --min-contig-len 500 -o /home/test01/test/4_assembly/ >/home/test01/test/0_logs/4_assembly.log 2>&1
-
- 5. QUAST
mkdir 5_quast
quast /home/test01/test/4_assembly/final.contigs.fa -o /home/test01/test/5_quast/ > /home/test01/test/0_logs/5_quast.log 2>&1
-
- 6. blastn
mkdir 6_viral_contigs
blastn -query /home/test01/test/4_assembly/final.contigs.fa -db /opt/data/assembly/refseq/SARS-CoV-2 -outfmt '6 qaccver qlen saccver slen pident length mismatch gapopen qstart qend sstart send evalue bitscore stitle' -qcov_hsp_perc 50 -out /home/test01/test/6_viral_contigs/nCoV_blastn_6.tsv
cat /home/test01/test/6_viral_contigs/nCoV_blastn_6.tsv | cut -f 1 | sort -u > /home/test01/test/6_viral_contigs/viral_list.txt
/opt/software/extract_seq_f_fa.py /home/test01/test/4_assembly/final.contigs.fa /home/test01/test/6_viral_contigs/viral_list.txt /home/test01/test/6_viral_contigs/viral_contigs.fa
-
- 7. reads mapping
mkdir -p 7_bowtie2/index
cp /home/test01/test/6_viral_contigs/viral_contigs.fa /home/test01/test/7_bowtie2/index/
bowtie2-build /home/test01/test/7_bowtie2/index/viral_contigs.fa /home/test01/test/7_bowtie2/index/viral_contigs >/home/test01/test/0_logs/7_bowtie2_build.log 2>&1
bowtie2 -x /home/test01/test/7_bowtie2/index/viral_contigs -1 /home/test01/test/2_trim/test_1P.fastq.gz -2 /home/test01/test/2_trim/test_2P.fastq.gz | samtools sort -O bam -o - > /home/test01/test/7_bowtie2/viral_contigs_sorted.bam
-
-
- 统计比对率
-
samtools flagstat /home/test01/test/7_bowtie2/viral_contigs_sorted.bam > /home/test01/test/0_logs/7_bowtie2_stats.txt
-
-
- 统计序列覆盖度等信息
-
/opt/software/weeSAM.py --bam /home/test01/test/7_bowtie2/viral_contigs_sorted.bam --out /home/test01/test/7_bowtie2/contigs_coverage_ws.tsv --html /home/test01/test/7_bowtie2/viral_contigs > /home/test01/test/0_logs/7_weesam.log 2>&1
-
- 8. SARS-CoV-2 Genome Annotation
mkdir 8_annotation && cd 8_annotation
cp /opt/data/assembly/annotation/test_metadata.csv ./
-
-
- 根据自己序列修改test_metadata.tsv
- 参考文件/opt/data/assembly/annotation/test_metadata.csv
- 修改strain(自己contigs名称)、collection-date(自定义)、country(自定义)、coverage(weeSAM结果中Avg_Depth)、full_name(自定义)
- 根据自己序列修改test_metadata.tsv
-
/opt/software/VAPiD-master/vapid3.py /home/test01/test/6_viral_contigs/viral_contigs.fa /opt/data/assembly/annotation/test.sbt --metadata_loc /home/test01/test/8_annotation/test_metadata.csv --db /opt/data/assembly/refseq/SARS-CoV-2 > /home/test01/test/0_logs/8_annotation.log 2>&1
-
- 9. 汇总报告(fastqc, Trimmomatic, QUAST, samtools flagstat)
mkdir /home/test01/test/9_all_report
multiqc /home/test01/test/ -o /home/test01/test/9_all_report/ -n all_report > /home/test01/test/0_logs/9_multiqc.log 2>&1