基因组组装---Nanopore数据评估（nanoqc和NanoPlot套件工具）

我是大南瓜

已于 2022-09-16 15:49:24 修改

阅读量3.3k

点赞数 2

分类专栏：基因组组装科研软件文章标签： linux

于 2022-09-15 23:02:51 首次发布

本文链接：https://blog.csdn.net/cfc424/article/details/126880924

版权

科研软件同时被 2 个专栏收录

20 篇文章 4 订阅

订阅专栏

基因组组装

6 篇文章 1 订阅

订阅专栏

基因组组装---Nanopore数据评估（拟南芥nanopore）

1. 下载软件

使用conda创建环境，下载nanoqc、NanoPlot和nanostat，并运行相关代码：

## Nanopore QC softwares
conda create -n nanoqc
conda activate nanoqc
mamba install -c bioconda nanoqc

## NanoPlot作者还开发了几个过滤比较的工具：NanoFilt， NanoStat， NanoLyse和NanoComp
## 下载还是conda环境中的pip方便，使用conda下载总是有报错
pip install NanoPlot ## plot
pip install nanostat ## stat report
pip install NanoFilt ## filter nanopore reads
pip install NanoLyse ## Remove reads mapping to the lambda phage genome from a fastq file.
## 
nohup NanoStat  --fastq ../CRR302667.fastq.gz -t 10   --tsv  --outdir 01.StatReports -n stat &
nohup NanoPlot -t 10 --fastq ../CRR302667.fastq.gz --plots hex dot kde -o 01.Nanoplot -p Ath -cm Viridis &

在这里插入图片描述其中NanoStat可以只是进行raw nanopore数据的统计，然后使用NanoFilt进行后续过滤，这个软件过滤主要是：
质量、长度和GC含量

Perform quality and/or length and/or GC filtering of (long read) fastq data.

2. 软件使用

（1）nanoQC

nanoQC软件说明如下，主要设定 -l参数：

usage: nanoQC [-h] [-v] [-o OUTDIR] [--rna] [-l MINLEN] fastq
Investigate nucleotide composition and base quality.
positional arguments:
  fastq                 Reads data in fastq.gz format.

options:
  -h, --help            show this help message and exit
  -v, --version         Print version and exit.
  -o OUTDIR, --outdir OUTDIR
                        Specify directory in which output has to be created.
  --rna                 Fastq is from direct RNA-seq and contains U nucleotides.
  -l MINLEN, --minlen MINLEN
                        Filters the reads on a minimal length of the given range. Also plots the given length/2 of the
                        begin and end of the reads.

使用命令：

## nanoQC
## -l 参数制定最短的reads长度限制
nohup nanoQC ../CRR302667.fastq.gz -o 01.nanoQC_res -l 1000 &   
nohup nanoQC ../CRR302667.fastq.gz -o 01.nanoQC_res2k -l 2000 &

输出结果包含log文件和html报告文件（主要看html）：

-rw-r--r-- 1 debian debian 164097 8月  30 21:14 nanoQC.html
-rw-r--r-- 1 debian debian    385 8月  30 21:14 NanoQC.log

报告中包含read长度、碱基含量和碱基质量：
在这里插入图片描述
设置参数 -l 2000，可以看出确实结果更好一些了，特别碱基含量和质量情况明显改善：

（2）NanoPlot

软件参数较多，官方使用例子：

EXAMPLES:
NanoPlot --summary sequencing_summary.txt --loglength -o summary-plots-log-transformed  
NanoPlot -t 2 --fastq reads1.fastq.gz reads2.fastq.gz --maxlength 40000 --plots dot --legacy hex
NanoPlot -t 12 --color yellow --bam alignment1.bam alignment2.bam alignment3.bam --downsample 10000 -o bamplots_downsampled

使用NanoPlot

## 其中hex参数没有出图，在github中找到原因。
## --plots uses the plotly package to plot kde and dot plots. Hex option will be ignored.
## 其中hex plot可以使用 --legacy hex 参数进行调用
## --downsample参数可以进行总体抽样
nohup NanoPlot -t 10 --fastq ../CRR302667.fastq.gz --plots hex dot kde -o 01.Nanoplot -p Ath -cm Viridis &

输出PNG图片：

-rw-r--r-- 1 debian debian 46966 8月  30 23:20 AthLengthvsQualityScatterPlot_dot.png
-rw-r--r-- 1 debian debian 99265 8月  30 23:20 AthLengthvsQualityScatterPlot_kde.png
-rw-r--r-- 1 debian debian 27722 8月  30 23:20 AthNon_weightedHistogramReadlength.png
-rw-r--r-- 1 debian debian 37099 8月  30 23:20 AthNon_weightedLogTransformed_HistogramReadlength.png
-rw-r--r-- 1 debian debian 34650 8月  30 23:20 AthWeightedHistogramReadlength.png
-rw-r--r-- 1 debian debian 38920 8月  30 23:20 AthWeightedLogTransformed_HistogramReadlength.png
-rw-r--r-- 1 debian debian 36923 8月  30 23:20 AthYield_By_Length.png

输出html和log文件，整体报告文件为 AthNanoPlot-report.html，报告中包含statistics summary和图片：

-rw-r--r-- 1 debian debian  486051 8月  30 23:20 AthLengthvsQualityScatterPlot_dot.html
-rw-r--r-- 1 debian debian  723285 8月  30 23:20 AthLengthvsQualityScatterPlot_kde.html
-rw-r--r-- 1 debian debian    2693 8月  30 23:20 AthNanoPlot_20220830_2151.log
-rw-r--r-- 1 debian debian 1540597 8月  30 23:20 AthNanoPlot-report.html
-rw-r--r-- 1 debian debian   29207 8月  30 23:20 AthNon_weightedHistogramReadlength.html
-rw-r--r-- 1 debian debian   30051 8月  30 23:20 AthNon_weightedLogTransformed_HistogramReadlength.html
-rw-r--r-- 1 debian debian   32743 8月  30 23:20 AthWeightedHistogramReadlength.html
-rw-r--r-- 1 debian debian   39886 8月  30 23:20 AthWeightedLogTransformed_HistogramReadlength.html
-rw-r--r-- 1 debian debian  189660 8月  30 23:20 AthYield_By_Length.html

此外还包含一个text文件AthNanoStats.txt，对整体的数据进行统计summary：

General summary:         
Mean read length:                 18,541.3
Mean read quality:                    11.1
Median read length:                7,818.0
Median read quality:                  11.2
Number of reads:               3,064,191.0
Read length N50:                  46,452.0
STDEV read length:                26,536.0
Total bases:              56,814,196,989.0
Number, percentage and megabases of reads above quality cutoffs
>Q5:	3064191 (100.0%) 56814.2Mb
>Q7:	3064123 (100.0%) 56814.2Mb
>Q10:	2168595 (70.8%) 40456.1Mb
>Q12:	1055916 (34.5%) 19383.7Mb
>Q15:	6640 (0.2%) 12.3Mb
Top 5 highest mean basecall quality scores and their read lengths
1:	21.0 (1)
2:	19.0 (1)
3:	19.0 (1)
4:	19.0 (1)
5:	18.9 (358)
Top 5 longest reads and their mean basecall quality score
1:	495032 (12.4)
2:	457760 (8.7)
3:	439434 (9.1)
4:	438143 (8.7)
5:	431286 (9.7)

NanoFilt使用例子：

EXAMPLES:
  gunzip -c reads.fastq.gz | NanoFilt -q 10 -l 500 --headcrop 50 | minimap2 genome.fa - | samtools sort -O BAM -@24 -o alignment.bam -
  gunzip -c reads.fastq.gz | NanoFilt -q 12 --headcrop 75 | gzip > trimmed-reads.fastq.gz
  gunzip -c reads.fastq.gz | NanoFilt -q 10 | gzip > highQuality-reads.fastq.gz

个人觉得stat信息比较有用，可以看出read长度平均值，质量情况等；
其他绘图png结果只能是查看，这些图质量不太行；
（该软件运行时需要联网，否则不出png图片）

整体看这个拟南芥nanopore数据，测序质量还是不太行，跟二代测序质量还是没法比。
另外，有点怀疑这个软件的质量值统计情况呢，我在github软件的issue问了一下作者。

此外minion_qc软件也可以评价nanopore数据，但是这个数据是基于basecaller结果（basecall from fast5），只有fastq不能用：

The benefit of MinIONQC is that it works directly with the sequencing_summary.txt
 files produced by ONT's Albacore or Guppy base callers.

参考：
https://github.com/wdecoster/nanoQC
https://github.com/wdecoster/NanoPlot

我是大南瓜

关注

2
点赞
踩
7

收藏

觉得还不错? 一键收藏
4
评论
基因组组装---Nanopore数据评估（nanoqc和NanoPlot套件工具）

基因组组装---Nanopore数据评估（nanoqc和NanoPlot）
复制链接

扫一扫

专栏目录