SMURF-Science封面文章使用的16S新流程

最新推荐文章于 2023-07-14 14:06:45 发布

zd200572

最新推荐文章于 2023-07-14 14:06:45 发布

阅读量662

点赞数 1

分类专栏：生物信息文章标签： SMURF 16S

本文链接：https://blog.csdn.net/zd200572/article/details/113039048

版权

生物信息专栏收录该内容

58 篇文章 23 订阅

订阅专栏

在这里插入图片描述
肠道微生物是近两年的研究热点，但是去年登上Science封面的是一篇研究肿瘤中的微生物的文章，另人眼前一亮，有些肿瘤即使没有与外界环境相通，也是有微生物的存在的。外行看热闹，内行要看看他是具体怎么进行研究的。
在这里插入图片描述
首先是研究手段，并不是宏基因组，是16S，估计是由于肿瘤中的微生物含量过少，多数不能满足宏基因组的建库所需DNA的量。然后，作者是用了一种不同于常规16S的研究手段进行的，扩增并测序了5段V区（68%的长度），然后合并分析的，作者称之为SMURF的方法流程，认为这个方法是接近于三代16S全长的物种分辨率的，并给出了参考文献。可应用于任何一组扩增子，从而实现有效的下游分析。
在这里插入图片描述
鉴于三代测序的成本居高不下，这个方法还是有一定市场的，二代的白菜价格，获得三代的结果，何乐而不为呢？有这么好的学习资源，我们当然要学习一下嘛。建库实验方面并没有多大的问题，我们主要来看下数据分析的部分。
在这里插入图片描述
算法的运行方式有两种，matlab里面运行，类似R语言，或者依赖于MCR库，不需要安装matlab（类似于R语言的运行方式吧），我选择了后者，毕竟matlab收费的。

还有个超级可爱的图标，github上看不到，拖回国内才发现。

软件安装与配置

提前声明，这个脚本会报错，不能使用，如想使用可采用qiime2插件进行。以下内容可无视。


# 下载依赖的matlab MCR平台（作者使用matlab写的分析软件）虽然这是美帝的，但是学习先进技术嘛！
mkdir MCR && cd MCR
wget https://www.mathworks.com/supportfiles/downloads/R2014a/deployment_files/R2014a/installers/glnxa64/MCR_R2014a_glnxa64_installer.zip
unzip MCR_R2014a_glnxa64_installer.zip
# 安装下，必须有图形界面才能安装成功。
sudo ./install
# 设置环境变量，这里是临时的，所以退出终端后埼再添加一次
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/MATLAB/MATLAB_Compiler_Runtime/v83/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Compiler_Runtime/v83/bin/glnxa64:/usr/local/MATLAB/MATLAB_Compiler_Runtime/v83/sys/os/glnxa64
export XAPPLRESDIR=/usr/local/MATLAB/MATLAB_Compiler_Runtime/v83/X11/app-defaults
# 报了个这个，因为找不到图形界面，算法不需要，应该不影响
#Exception in thread "main" java.lang.InternalError: Can't connect to X11 window server using ':0' as the value of the DISPLAY variable.
cd ..
# 下载脚本
# https://github.com/NoamShental/SMURF.git
# 我把它拉到了gitee上，克隆速度快上许多，特别是这个，因为挺大的
git clone https://github.com/NoamShental/SMURF.git
cd SMURF
# 数据库准备
cat ./Green_Genes_201305/unique_up_to_3_ambiguous_16S/GreenGenes_201305_unique_up_to_3_ambiguous_16S.fasta.gz*> ./Green_Genes_201305/unique_up_to_3_ambiguous_16S/Green_Genes_201305_unique_up_to_3_ambiguous_16S.fasta.gz
gunzip ./Green_Genes_201305/unique_up_to_3_ambiguous_16S/Green_Genes_201305_unique_up_to_3_ambiguous_16S.fasta.gz

运行与结果

运行好像一条命令就行了，前提是配置好引物等参数。
需要修改的参数：

% ********************** GENERAL PARAMETERS ********************
base_samples_dir = '/';
...
% ********************** SAMPLE PREP PARAMETERS ********************
% Set the 16S reference DB
uniS16_dir = './Green_Genes_201305/unique_up_to_3_ambiguous_16S';
db_filename = 'Green_Genes_201305_unique_up_to_3_ambiguous_16S';
# 其他参数，不确定是否需要
 vi Configs/db_params_script.m
 #把 ../ 替换为./或者在Standalone文件夹运行，不需要改
vi Configs/adhoc_db_params_script.m

运行啦

chmod +x ./StandaloneVersion/SMURF_lin
 time ./StandaloneVersion/SMURF_lin ./Configs/compiled_params_script.m

当然，示例文件肯定不会报错，很轻松出结果嘛。


```bash
time  ./StandaloneVersion/SMURF_lin ./Configs/compiled_params_script.m
Doing quality filters
Part 1/1 - Block 1/5
Part 1/1 - Block 2/5
Part 1/1 - Block 3/5
Part 1/1 - Block 4/5
Part 1/1 - Block 5/5
Number of reads: 472350
Percent of long enough reads: 0.94713
Percent of good reads: 0.91592
Counting fasta write: 1
Elapsed time is 9.831863 seconds.
Mapped to primers 82% of unique reads
Mapped to primers 97% of read counts
regions_files = 
6x1 struct array with fields:
    name
    date
    bytes
    isdir
    datenum
ans =
./Green_Genes_201305/unique_up_to_3_ambiguous_16S_amp6Regions_2mm_RL130/GreenGenes_201305_unique_up_to_3_ambiguous_16S_amp6Regions_2mm_RL130_region1.mat
Loading bacterial DB for region 1 out of 6 from original region 1
Loading bacterial DB for region 2 out of 6 from original region 2
Loading bacterial DB for region 3 out of 6 from original region 3
Loading bacterial DB for region 4 out of 6 from original region 4
Loading bacterial DB for region 5 out of 6 from original region 5
Loading bacterial DB for region 6 out of 6 from original region 6
Region 1 out of 6
Keep high freq: 28% of reads
Keep high freq: 91% of counts
Building matrix M
Building matrix Q
 --------------------------------------------
...
--------------------------------------------
Region 6 out of 6
Keep high freq: 2% of reads
Keep high freq: 89% of counts
Building matrix M
Building matrix Q
--------------------------------------------
Region 1 out of 6
Keeping reads matched to DB: 95% of reads
Keeping reads matched to DB: 98% of counts
--------------------------------------------
...
--------------------------------------------
Region 6 out of 6
Keeping reads matched to DB: 97% of reads
Keeping reads matched to DB: 100% of counts
--------------------------------------------
Filter out columns (bacteria)
Normalize frequency counts
Build matrix A_L2
Iter:4674. Error reduction of X (L1 norm): 9.7149e-07
Total iterations time: 60.4761
Error using main_multiple_regions (line 34)
Not enough input arguments.
Error in main_smurf (line 36)

MATLAB:minrhs

real	9m1.616s
user	11m32.462s
sys	0m38.527s

所以它是报错了，这个代码有问题，但是github上发现了另一个实现方式，qiime2 Sidle插件，也可以做到，切换工具，继续学习。

zd200572

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
5
评论
SMURF-Science封面文章使用的16S新流程

肠道微生物是近两年的研究热点，但是去年登上Science封面的是一篇研究肿瘤中的微生物的文章，另人眼前一亮，有些肿瘤即使没有与外界环境相通，也是有微生物的存在的。外行看热闹，内行要看看他是具体怎么进行研究的。首先是研究手段，并不是宏基因组，是16S，估计是由于肿瘤中的微生物含量过少，多数不能满足宏基因组的建库所需DNA的量。然后，作者是用了一种不同于常规16S的研究手段进行的，扩增并测序了5段V区（68%的长度），然后合并分析的，作者称之为SMURF的方法流程，认为这个方法是接近于三代16S全长的物种.
复制链接

扫一扫

专栏目录