软件下载:https://github.com/bbuchfink/diamond
特点:速度快,比blastx速度快20,000倍
简要使用(核酸比对蛋白):
建立索引:
diamond makedb --in nr.fa -d nr
--in : 参考序列(格式:fasta)
-d :索引的前缀名
比对:
diamond blastx -e 1e-5 --db $ref/nr -q $query.fa -o $out.diamond -p 20 -f 6 qseqid qlen qstart qend qcovhsp slen sstart send score evalue positive length ppos sseqid stitle nident mismatch gaps gapopen bitscore pident
-e : 比对结果的期望值
-db : 参考数据的索引
-q : 比对的序列
-p : 质量值
-f : 输出的文件格式
Value 6 may be followed by a space-separated list of these keywords:
qseqid means Query Seq - id 查询序列的id
qlen means Query sequence length 查询序列的长度
sseqid means Subject Seq - id
sallseqid means All subject Seq - id(s), separated by a ';'
slen means Subject sequence length
qstart means Start of alignment in query 查询序列比对起始处
qend means End of alignment in query 查询序列比对结束处
sstart means Start of alignment in subject 比对到参考序列的起始处
send means End of alignment in subject 比对到参考序列的结束处
qseq means Aligned part of query sequence
sseq means Aligned part of subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive - scoring matches
gapopen means Number of gap openings
gaps means Total number of gaps
ppos means Percentage of positive - scoring matches
qframe means Query frame
stitle means Subject Title
salltitles means All Subject Title(s), separated by a '<>'
qcovhsp means Query Coverage Per HSP
Default: qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore