<二代测序> 批量下载 NCBI sra 文件

本文最近更新地址: 
http://blog.csdn.net/tanzuozhev/article/details/51078460

前文 
http://blog.csdn.net/tanzuozhev/article/details/51077222 
介绍了如何采用 sra-toolkit 下载 sra 文件,但是如果你想下载整个项目的所有样本,应该怎样批量下载呢,下面参考biostar网站的部分回帖,做简单介绍。

R语言 SRAdb 包

参考 
https://www.biostars.org/p/93494/

<code class="hljs vala has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># 安装</span>
source(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'http://bioconductor.org/biocLite.R'</span>)
biocLite(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'SRAdb'</span>)
<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># 使用</span>
library(SRAdb)
srafile = getSRAdbFile()
con = dbConnect(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'SQLite'</span>,srafile)
<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># 列举 SRP026197 项目下的所有样本,并写入sqlite数据库</span>
listSRAfile(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'SRP026197'</span>,con)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li></ul>
   study    sample experiment       run                                                                                                           ftp

1 SRP026197 SRS449410 SRX311638 SRR913951 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311638/SRR913951/SRR913951.sra 
2 SRP026197 SRS449476 SRX311704 SRR914066 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311704/SRR914066/SRR914066.sra 
3 SRP026197 SRS449408 SRX311636 SRR913949 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311636/SRR913949/SRR913949.sra 
…. 
247 SRP026197 SRS449508 SRX311735 SRR914158 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311735/SRR914158/SRR914158.sra 
248 SRP026197 SRS449460 SRX311688 SRR914006 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311688/SRR914006/SRR914006.sra 
249 SRP026197 SRS449509 SRX311736 SRR914160 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311736/SRR914160/SRR914160.sra

<code class="hljs vala has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;"># 下载数据</span>
getSRAfile(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'SRP026197'</span>,con,fileType=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'sra'</span>)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>

命令行工具

首先需要下载NCBI的E-utilities工具,这是NCBI所有数据库的API,提供非常丰富的功能,搜索全部NCBI数据库,之前做pubmed的文本挖掘就是用的这个工具。

E-utilities 安装(貌似这个不行了,还是直接用R语言的包吧)

官方文档:http://www.ncbi.nlm.nih.gov/books/NBK179288/ 
Linux和mac没有问题,windows没有试过。

安装

<code class="hljs nginx has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 这里没有必要非要回到 ~ 目录,也没有必要非要设置 PATH</span>
<span class="hljs-title" style="box-sizing: border-box; color: rgb(0, 0, 136);">cd</span> <span class="hljs-regexp" style="color: rgb(0, 136, 0); box-sizing: border-box;">~
  perl</span> -MNet::FTP -e \
    <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$ftp</span> = new Net::FTP("ftp.ncbi.nlm.nih.gov", Passive => 1); <span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$ftp</span>->login;
     <span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$ftp</span>->binary; <span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$ftp</span>->get("/entrez/entrezdirect/edirect.zip");'</span>

unzip -u -q edirect.zip

 rm edirect.zip
 export PATH=<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$PATH</span>:<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">$HOME</span>/edirect
 ./edirect/setup.sh</code><code class="hljs nginx has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">
</code><code class="hljs nginx has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">
</code><code class="hljs nginx has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">
</code><code class="hljs nginx has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">转载自:<a target=_blank href="http://blog.csdn.net/tanzuozhev/article/details/51078460">http://blog.csdn.net/tanzuozhev/article/details/51078460</a></code><code class="hljs nginx has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">
</code><code class="hljs nginx has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">
</code>
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值