用Mothur制作OTUtable

最新推荐文章于 2024-02-20 11:43:53 发布

马志远的生信笔记

最新推荐文章于 2024-02-20 11:43:53 发布

阅读量2.9k

点赞数 1

分类专栏： 16s 文章标签： 16s Mothur

本文链接：https://blog.csdn.net/weixin_42480153/article/details/81040885

版权

16s 专栏收录该内容

13 篇文章 8 订阅

订阅专栏

数据类型：rawdata。因为下机数据是用qiime拆分的，最好还是从拼接步骤开始做吧。数据和上篇的是一批的：https://blog.csdn.net/weixin_42480153/article/details/81036510

提前准备的内容包括：greengene序列文件，greengene注释文件，greengene的align（该怎么翻译，对齐吗？）文件。数据库文件提前下载好，放在固定文件夹中，为了引用方便，alias一下

alias ggfa=/home/ruminant/Documents/database/tax_seqs/greengene/gg_13_8_99.fasta
alias ggtax=/home/ruminant/Documents/database/tax_seqs/greengene/gg_13_8_99.tax
ggalign=/home/ruminant/Documents/database/align/gggg_13_8_99.refalign

另外建立一个TSV格式列表（list.txt），内容为合并前后文件名

conbine.2.fq	raw.split.2.1.fq	raw.split.2.2.fq
conbine.4.fq	raw.split.4.1.fq	raw.split.4.2.fq
conbine.7.fq	raw.split.7.1.fq	raw.split.7.2.fq
conbine.11.fq	raw.split.11.1.fq	raw.split.11.2.fq
conbine.14.fq	raw.split.14.1.fq	raw.split.14.2.fq
conbine.18.fq	raw.split.18.1.fq	raw.split.18.2.fq
conbine.19.fq	raw.split.19.1.fq	raw.split.19.2.fq
conbine.23.fq	raw.split.23.1.fq	raw.split.23.2.fq
conbine.25.fq	raw.split.25.1.fq	raw.split.25.2.fq
conbine.28.fq	raw.split.28.1.fq	raw.split.28.2.fq
conbine.31.fq	raw.split.31.1.fq	raw.split.31.2.fq
conbine.35.fq	raw.split.35.1.fq	raw.split.35.2.fq
conbine.37.fq	raw.split.37.1.fq	raw.split.37.2.fq
conbine.41.fq	raw.split.41.1.fq	raw.split.41.2.fq
conbine.44.fq	raw.split.44.1.fq	raw.split.44.2.fq

1. 拼接两端测序数据

make.contigs(file=list)

trim.seqs(fasta=current, qfile=current, qaverage=28, qthreshold=25,maxambig=0) #做一个简单的修剪

测序质量还不错，只删除了49条序列。

...
sequence name mismatch btwn fasta: M02419_322_000000000-BDP8B_1_2110_25931_10060 and qual file: M02419_322_000000000-BDP8B_1_2102_16995_20947
2
It took 9 secs to trim 49 sequences.

看一下序列的信息

summary.seqs()

长度分布在300左右，文件包含了barcode和引物序列

2. 删除引物和barcode数据

准备oligos文件

primer	GTGYCAGCMGCCGCGGTAA（上游引物）	GGACTACNVGGGTWTCTAAT（下游引物）	
primer	GGACTACNVGGGTWTCTAAT（下游引物）		GTGYCAGCMGCCGCGGTAA（上游引物）	
barcode	CTGTA(上游barcode)	TACAT(下游barcode)	2
barcode	GAGATA	AGTGGA	4
barcode	ACGACTAC	GGATTGGT	7
barcode	TAGCGGA	CATAAGT	11
barcode	CTAGC	TCACC	14
...
barcode	TACAT(下游barcode)	CTGTA(上游barcode)	2
barcode	AGTGGA	GAGATA	4
barcode	GGATTGGT	ACGACTAC	7
barcode	CATAAGT	TAGCGGA	11
barcode	TCACC	CTAGC	14
...

trim.seqs(fasta=current,oligos=yourfile)

经过此步骤后序列长度为253左右，比公司给出的序列短20bp左右，公司未删除引物序列

3. 非冗余序列（唯一序列）

得到的序列很多都是重复的，运算过程中浪费资源，所以要唯一化

unique.seqs(fasta=current)
count.seqs(name=current,group=current)
summary.seqs(fasta=current, count=current)

4.序列对齐

align.seqs(fasta=current,reference=ggalign)
summary.seqs(fasta=current,count=current) 
summary.seqs(fasta=current,count=current) 
screen.seqs(fasta=current,count=current, summary=current, start=(97.5%value), end=(2.5%value), maxhomop=8)

5.质量控制

unique.seqs(fasta=current,count=current)
pre.cluster(fasta=current,count=current, diffs=2)
chimera.vsearch(fasta=current,count=current) #Mothur3.8以上
remove.seqs(fasta=current,accnos=current,count=current)
classify.seqs(fasta=current,count=current,reference=RDP.fa, taxonomy=RDP.tax,cutoff=80)
summary.tax(taxonomy=current, count=current) #发现unknown的序列最多，以下步骤不删除unknown
remove.lineage(fasta=current,count=current,taxonomy=current, taxon=unknown-Archaea-Eukaryotes-Rhizaria-Plantae-Eukaryota_kgd_Incertae_sedis)

6. OTU生成

cluster.split(fasta=current,count=current, taxonomy=current, splitmethod=classify, cutoff=0.03)
make.shared(list=current, count=current, label=0.03)

生成了4900多个OTU，接近qiime水平

7.物种注释

get.oturep(list=final.an.unique_list, label=0.03, fasta=final.fasta, column=final.dist, count=final.count_table)
classify.seqs(fasta=yourrepfasta,references=ggfa, taxonomy=ggtax, cutoff=80) #为了和qiime结果一致，选择greengene数据库

8.手动建立Otu表

方法见点击打开链接

9.也可以利用以下方法建立OTU表，这样的结果是RDP数据库的结果（推荐）

classify.otu(list=current, count=current, taxonomy=current)
create.database(shared=current,constaxonomy=current)

马志远的生信笔记

关注

1
点赞
踩
10

收藏

觉得还不错? 一键收藏
7
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录