【生物信息】DIAMOND进行序列比对

本文介绍了DIAMOND,一款专为大规模序列数据分析设计的高效序列比对器,详细讲解了如何安装、创建数据库以及使用blastp进行蛋白质序列比对的过程,特别强调了不同灵敏度模式的选择和比对结果的解读。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

DIAMOND是一款用于蛋白质和翻译DNA搜索的序列比对器,专为大序列数据的高性能分析而设计。

官方文档:Home · bbuchfink/diamond Wiki (github.com)

1 安装DIAMOND

# 使用conda创建diamond环境并安装diamond
conda create --name diamond diamond
# 激活diamond
conda activate diamond
# 查看diamond版本
diamond --version

2 蛋白质序列比对(Protein alignment)

  1. 下载示例数据,这个数据集为FASTA格式,其中包含了14,323条蛋白质序列

    wget https://scop.berkeley.edu/downloads/scopeseq-2.07/astral-scopedom-seqres-gd-sel-gs-bib-40-2.07.fa

  2. 现在利用diamond makedb将刚下载的文件转换成DIAMOND数据库文件,这个数据库文件将用于后续的比对。

    diamond makedb --in astral-scopedom-seqres-gd-sel-gs-bib-40-2.07.fa -d astral40

  3. 用同一文件进行序列查找

    diamond blastp -q astral-scopedom-seqres-gd-sel-gs-bib-40-2.07.fa -d astral40 -o out.tsv --very-sensitive

    参数解释:

    -q 后接需要查询的文件

    -d 后接上一步生成的数据库文件

    -o 后接搜寻结果

    DIAMOND具有多种灵敏度设置,以适应不同的应用。默认模式是最快的,专为查找 >70% 序列同一性的同源性而定制,--sensitive 模式针对 >40% 同一性的命中量身定制,而 --very-sensitive 和 --ultra-sensitive 模式在整个成对比对范围内提供较高的灵敏度。灵敏度越高,越可能匹配到阳性结果。

  4. 结果解释

    部分结果:

    d1dlwa_ d1dlwa_ 100     116     0       0       1       116     1       116     6.42e-77        220
    d1dlwa_ d2gkma_ 35.4    113     73      0       1       113     13      125     1.43e-21        80.9
    d1dlwa_ d4i0va_ 31.9    119     75      2       1       113     2       120     9.11e-13        58.2
    d2gkma_ d2gkma_ 100     127     0       0       1       127     1       127     1.51e-87        248
    d2gkma_ d1dlwa_ 34.8    115     75      0       13      127     1       115     6.90e-23        84.3
    d2gkma_ d4i0va_ 33.6    110     69      1       13      118     2       111     1.35e-18        73.6
    d2gkma_ d6bmea_ 35.5    110     67      1       13      118     2       111     1.32e-16        68.6
    d2gkma_ d2bkma_ 37.3    67      38      2       13      76      5       70      5.18e-06        40.8
    d1ngka_ d1ngka_ 100     126     0       0       1       126     1       126     4.34e-91        257
    d1ngka_ d2bkma_ 38.4    125     73      2       1       125     4       124     1.42e-24        89.0

    各列含义解释:

    1. Query accession: the accession of the sequence that was the search query against the database, as specified in the input FASTA file after the > character until the first blank.

    2. Target accession: the accession of the target database sequence (also called subject) that the query was aligned against.

    3. Sequence identity: The percentage of identical amino acid residues that were aligned against each other in the local alignment.

    4. Length: The total length of the local alignment, which including matching and mismatching positions of query and subject, as well as gap positions in the query and subject.

    5. Mismatches: The number of non-identical amino acid residues aligned against each other.

    6. Gap openings: The number of gap openings.

    7. Query start: The starting coordinate of the local alignment in the query (1-based).

    8. Query end: The ending coordinate of the local alignment in the query (1-based).

    9. Target start: The starting coordinate of the local alignment in the target (1-based).

    10. Target end: The ending coordinate of the local alignment in the target (1-based).

    11. E-value: The expected value of the hit quantifies the number of alignments of similar or better quality that you expect to find searching this query against a database of random sequences the same size as the actual target database. This number is most useful for measuring the significance of a hit. By default, DIAMOND will report all alignments with e-value < 0.001, meaning that a hit of this quality will be found by chance on average once per 1,000 queries.

    12. Bit score: The bit score is a scoring matrix independent measure of the (local) similarity of the two aligned sequences, with higher numbers meaning more similar. It is always >= 0 for local Smith Waterman alignments.

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值