Swiss-Prot注释

最新推荐文章于 2024-04-10 14:39:10 发布

SicongFu

最新推荐文章于 2024-04-10 14:39:10 发布

阅读量1.9w

点赞数 1

1.下载Swiss-prot的蛋白质序列并构建blast数据库

Swiss-Prot 数据库中的蛋白质的功能经过了试验验证，注释是精确的。但是其蛋白质数目相比于Nr，就非常少了，仅有约54万条。由于数据库不大，适合于本地化Blast进行Swiss-Prot注释。

(1)下载Swiss-Prot的蛋白质序列并构建Blast数据库 $wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz

(这是windows下的下载链接：http://www.uniprot.org/downloads)

(2)解压下载好的数据库$gzip -d uniprot_sprot.fasta.gz

(3)建库 $makeblastdb -in uniprot_sprot.fasta-dbtype prot -title uniprot_sprot-parse_seqids -outuniprot_sprot -logfileuniprot_sprot.log

$cat uniprot_sprot.log

（在此之前，我将makeblastdb加入到环境变量中去了。还有下面的blastp我也加入到环境变量中去了。）

2.使用blastp进行Swiss-prot注释

$blastp -query proteins.fasta -out swiss-prot.tab -db uniprot_sprot -evalue 1e-5 -outfmt 7

$cat swiss-prot.tab

下面是注释的结果：

# BLASTP 2.2.30+
# Query: sp|Q197F8|002R_IIV3 Uncharacterized protein 002R OS=Invertebrate iridescent virus 3 GN=IIV3-002R PE=4 SV=1
# Database: uniprot_sprot
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 1 hits found
sp|Q197F8|002R_IIV3   sp|Q197F8|002R_IIV3   100.00   458   0   0   1458   1   458   0.0      949
# BLASTP 2.2.30+
# Query: sp|Q197F7|003L_IIV3 Uncharacterized protein 003L OS=Invertebrate iridescent virus 3 GN=IIV3-003L PE=4 SV=1
# Database: uniprot_sprot
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 1 hits found
sp|Q197F7|003L_IIV3   sp|Q197F7|003L_IIV3   100.00   156   0   0   1156   1   156   1e-111      320
# BLASTP 2.2.30+
# Query: sp|Q6GZX2|003R_FRG3G Uncharacterized protein 3R OS=Frog virus 3 (isolate Goorha) GN=FV3-003R PE=4 SV=1
# Database: uniprot_sprot
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 1 hits found
sp|Q6GZX2|003R_FRG3G   sp|Q6GZX2|003R_FRG3G   100.00   438   0   0   1438   1   438   0.0      900
# BLASTP 2.2.30+
# Query: sp|Q6GZX1|004R_FRG3G Uncharacterized protein 004R OS=Frog virus 3 (isolate Goorha) GN=FV3-004R PE=4 SV=1
# Database: uniprot_sprot
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 1 hits found
sp|Q6GZX1|004R_FRG3G   sp|Q6GZX1|004R_FRG3G   100.00   60   0   0   160   1   60   3e-36      121
# BLASTP 2.2.30+
# Query: sp|Q197F5|005L_IIV3 Uncharacterized protein 005L OS=Invertebrate iridescent virus 3 GN=IIV3-005L PE=4 SV=1
# Database: uniprot_sprot
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 1 hits found
sp|Q197F5|005L_IIV3   sp|Q197F5|005L_IIV3   99.08   217   2   0   1217   1   217   2e-156      439
# BLAST processed 5 queries

3.Swiss-Prot Annotation Practise

$mkdir -p /home/train/swiss-prot

$cd /home/train/swiss-prot

$blast.pl blastp uniprot_sprot ../proteins.fasta 1e-5 4 uniprot_sprot 5

————我做到这一步就总是卡住，，，，，，继续研究中.....

$parsing_blast_result.pl uniprot_sprot.xml 20 1e-5 0.2 > uniprot_sprot.xls

-------------------------------------------------------------------------分割线------------------------------------------------------------------------------------------------------------------------------

bash: /home/sicong/blast/bin/parsing_blast_result.pl: 权限不够的解决方法
$cd /home/sicong/blast/bin/
$chmod 755 parsing_blast_result.pl

------------------------------------------------------------------------------分割线-------------------------------------------------------------------------------------------------------------------------

接下来我用了blastx，将核酸序列比对到蛋白质数据库，这里就是Swiss-prot

$makeblastdb -in uniprot_sprot.fasta -dbtype prot -title uniprot_sprot -parse_seqids -out uniprot_sprot -logfile uniprot_sprot.log

$cat uniprot_sprot.log

$blastx -help

$blastx -query Trinity.fasta -out swiss-prot_.tab -db uniprot_sprot -evalue 1e-5 -outfmt 7

$cat swiss-prot_.tab

# BLASTX 2.2.30+
# Query: TRINITY_DN105_c0_g1_i1 len=201 path=[179:0-200] [-1, 179, -2]
# Database: uniprot_sprot
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 280 hits found
TRINITY_DN105_c0_g1_i1   sp|P46595|UBC4_SCHPO   100.00   67   0   0   1   201   35   101   9e-43      141
TRINITY_DN105_c0_g1_i1   sp|Q9UVR2|UBC1_MAGO7   97.01   67   2   0   1   201   35   101   4e-42      139
TRINITY_DN105_c0_g1_i1   sp|O74196|UBC1_COLGL   95.52   67   3   0   1   201   35   101   2e-41      137
TRINITY_DN105_c0_g1_i1   sp|P15732|UBC5_YEAST   89.55   67   7   0   1   201   36   102   1e-39      133
TRINITY_DN105_c0_g1_i1   sp|P15731|UBC4_YEAST   88.06   67   8   0   1   201   36   102   2e-39      132
TRINITY_DN105_c0_g1_i1   sp|P61078|UB2D3_RAT   92.54   67   5   0   1   201   35   101   7e-39      131
TRINITY_DN105_c0_g1_i1   sp|Q5R4V7|UB2D3_PONAB   92.54   67   5   0   1   201   35   101   7e-39      131
TRINITY_DN105_c0_g1_i1   sp|P61079|UB2D3_MOUSE   92.54   67   5   0   1   201   35   101   7e-39      131
TRINITY_DN105_c0_g1_i1   sp|Q4R5N4|UB2D3_MACFA   92.54   67   5   0   1   201   35   101   7e-39      131
TRINITY_DN105_c0_g1_i1   sp|P61077|UB2D3_HUMAN   92.54   67   5   0   1   201   35   101   7e-39      131
TRINITY_DN105_c0_g1_i1   sp|P62840|UB2D2_XENLA   92.54   67   5   0   1   201   35   101   7e-39      131
TRINITY_DN105_c0_g1_i1   sp|P62839|UB2D2_RAT   92.54   67   5   0
..........

........

.....

...

SicongFu

关注

1
点赞
踩
17

收藏

觉得还不错? 一键收藏
5
评论
Swiss-Prot注释

Swiss-Prot 数据库中的蛋白质的功能经过了试验验证，注释是精确的。但是其蛋白质数目相比于Nr，就非常少了，仅有约54万条。由于数据库不大，适合于本地化Blast进行Swiss-Prot注释。
复制链接

扫一扫