首先采用fastq-dump下载:
除了直接从NCBI下载数据,可以用sra toolkit下载并处理。
1. 下载并安装sra toolkit
http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software
选在对应版本下载,我用的是MAC。
下载后点击解压,在终端中,输入:
export PATH=$PATH:/Users/YOU/sratoolkit.2.5.7-mac64/bin
就可以开始使用了。
如果要永久使用,需要在bash_profile添加:
vim ~/.bash_profile
然后加入:
export PATH=$PATH:/Users/YOU/sratoolkit.2.5.7-mac64/bin
保存退出(Esc, :wq)后:
source ~/.bash_profile
2. 下载SRA数据
找到你想要下载的数据,比如SRP057998
http://www.ncbi.nlm.nih.gov/sra/?term=SRP057998
我只想要下载其中2个数据,SRR2007490和SRR2007493
需要注意的是,必须是 SRR开头!
然后在终端中输入:
prefetch -v SRR2007490
prefetch -v SRR2007493
那么会分别开始下载。下载速度取决于你的网速。
下载的SRA文件会在:
/Users/YOU/ncbi/public/sra/
需要的话,你可以移到自己的文件夹。进入你要移动到的文件夹,输入:
mv /Users/YOU/ncbi/public/sra/SRR2007490.sra
.
mv /Users/YOU/ncbi/public/sra/SRR2007493.sra
.
3. 转换SRA到Fastq
fastq-dump --split-files SRR2007490.sra
fastq-dump --split-files SRR2007493.sra
注意要加 --split-files 参数,要不然只会产生一个fastq文件。而本实验中,是双端测序的。
我使用prefecth下载SRR3656745的时候,由于文件过大32G导致报错:
例子:
####./sratoolkit.2.8.2-1-ubuntu64/bin/prefetch
-v SRR3656745
2017-11-07T02:44:08 prefetch.2.8.2: Using
'ascp'
2017-11-07T02:44:08 prefetch.2.8.2: Using
'ascp'
2017-11-07T02:44:08 prefetch.2.8.2: Using
'/usr/bin/ascp'
2017-11-07T02:44:08 prefetch.2.8.2: Using
'/usr/bin/ascp'
2017-11-07T02:44:09 prefetch.2.8.2:
KClientHttpOpen - connected to www.ncbi.nlm.nih.gov
2017-11-07T02:44:10 prefetch.2.8.2:
KClientHttpOpen - verifying CA cert
2017-11-07T02:44:12 prefetch.2.8.2:
KClientHttpOpen - connected to
sra-download.ncbi.nlm.nih.gov
2017-11-07T02:44:13 prefetch.2.8.2:
KClientHttpOpen - verifying CA cert
2017-11-07T02:44:14 prefetch.2.8.2 warn: Maximum
file size download limit is 20GB
2017-11-07T02:44:14 prefetch.2.8.2: 1)
'SRR3656745' (31GB) is larger than maximum allowed:
skipped
2017-11-07T02:44:14 prefetch.2.8.2: '(null)' is
not recognized as a database or a table
2017-11-07T02:44:15 prefetch.2.8.2:
KClientHttpOpen - connected to
sra-download.ncbi.nlm.nih.gov
2017-11-07T02:44:16 prefetch.2.8.2:
KClientHttpOpen - verifying CA cert
2017-11-07T02:44:17 prefetch.2.8.2:
KClientHttpOpen - connected to
sra-download.ncbi.nlm.nih.gov
2017-11-07T02:44:18 prefetch.2.8.2:
KClientHttpOpen - verifying CA cert
2017-11-07T02:44:19 prefetch.2.8.2:
KClientHttpOpen - connected to
sra-download.ncbi.nlm.nih.gov
2017-11-07T02:44:20 prefetch.2.8.2:
KClientHttpOpen - verifying CA cert
2017-11-07T02:44:21 prefetch.2.8.2:
KClientHttpOpen - connected to
sra-download.ncbi.nlm.nih.gov
2017-11-07T02:44:22 prefetch.2.8.2:
KClientHttpOpen - verifying CA cert
2017-11-07T02:44:24 prefetch.2.8.2:
KClientHttpOpen - connected to
sra-download.ncbi.nlm.nih.gov
2017-11-07T02:44:25 prefetch.2.8.2:
KClientHttpOpen - verifying CA cert
2017-11-07T02:44:27 prefetch.2.8.2:
KClientHttpOpen - connected to
sra-download.ncbi.nlm.nih.gov
2017-11-07T02:44:28 prefetch.2.8.2:
KClientHttpOpen - verifying CA cert
2017-11-07T02:44:29 prefetch.2.8.2:
KClientHttpOpen - connected to
sra-download.ncbi.nlm.nih.gov
2017-11-07T02:44:30 prefetch.2.8.2:
KClientHttpOpen - verifying CA cert
2017-11-07T02:44:31 prefetch.2.8.2: 'SRR3656745'
has no remote vdbcache
Download of some files was skipped because they
are too large
You can
change size download limit by setting
--min-size and --max-size command line
arguments
之后直接使用fastq-dump下载
./sratoolkit.2.8.2-1-ubuntu64/bin/fastq-dump -I
--split-files SRR3656745
使用wget下载,可参考博客:http://blog.sina.com.cn/s/blog_72512a1d0102x9nl.html