使用网页工具:
CD-search:http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi
使用RPS-BLAST将protein序列或者nucleotide序列比对到CDD数据库的PSSMs,得到注释结果。此工具一次只能搜索到一条序列。
结果如下:
#Batch CD-search tool NIH/NLM/NCBI #cdsid QM3-qcdsearch-6A273CE59470C7A-27758A4251131776 #datatype hitsConcise Results #status 0 #Start time 2016-06-03T14:59:15 Run time 0:00:00:12 #status success Query Hit type PSSM-ID From To E-Value Bitscore Accession Short name Incomplete Superfamily Q#1 - >At1g01220_1 specific 249743 819 877 2.51156e-10 56.0224 pfam00288 GHMP_kinases_N - cl21671 Q#1 - >At1g01220_1 superfamily 272091 819 877 2.51156e-10 56.0224 cl21671 GHMP_kinases_N superfamily - - Q#1 - >At1g01220_1 specific 254869 962 1043 2.94031e-07 47.8594 pfam08544 GHMP_kinases_C - cl08484 Q#1 - >At1g01220_1 superfamily 254869 962 1043 2.94031e-07 47.8594 cl08484 GHMP_kinases_C superfamily - - Q#1 - >At1g01220_1 multi-dom 254543 131 531 0 563.303 pfam07959 Fucokinase - - Q#1 - >At1g01220_1 multi-dom 225325 718 1048 5.17247e-41 152.917 COG2605 COG2605 - - Q#2 - >At1g01220_2 superfamily 252914 1 82 8.30867e-23 87.0152 cl04867 LRAT superfamily N - Q#3 - >At1g01350_1 specific 238093 277 318 2.90528e-09 50.9072 cd00162 RING - cl17238 Q#3 - >At1g01350_1 superfamily 276201 277 318 2.90528e-09 50.9072 cl17238 RING superfamily - - Q#3 - >At1g01350_1 specific 250023 201 227 4.96868e-07 44.0832 pfam00642 zf-CCCH - cl11592 Q#3 - >At1g01350_1 superfamily 264437 201 227 4.96868e-07 44.0832 cl11592 zf-CCCH superfamily - - Q#4 - >At1g01350_2 specific 199896 2 147 8.00992e-62 188.701 cd10910 limkain_b1_N_like - cl10034 Q#4 - >At1g01350_2 superfamily 275915 2 147 8.00992e-62 188.701 cl10034 LabA_like/DUF88 superfamily - -
-----------------------------------------------------------------------昏割线-------------------------------------------------------------------------------------------------------------
Batch CD-Search
http://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi
那能一次性比对多条protein序列(多达100,00条每次)
同时提供网页工具和脚本接口。
比对完毕后,可以下载注释结果。
序列:
>At1g01220_1
MSKQRKKADLATVLRKSWYHLRLSVRHPTRVPTWDAIVLTAASPEQAELYDWQLRRAKRMGRIASSTVTL
AVPDPDGKRIGSGAATLNAIYALARHYEKLGFDLGPEMEVANGACKWVRFISAKHVLMLHAGGDSKRVPW
ANPMGKVFLPLPYLAADDPDGPVPLLFDHILAIASCARQAFQDQGGLFIMTGDVLPCFDAFKMTLPEDAA
SIVTVPITLDIASNHGVIVTSKSESLAESYTVSLVNDLLQKPTVEDLVKKDAILHDGRTLLDTGIISARG
RAWSDLVALGCSCQPMILELIGSKKEMSLYEDLVAAWVPSRHDWLRTRPLGELLVNSLGRQKMYSYCTYD
LQFLHFGTSSEVLDHLSGDASGIVGRRHLCSIPATTVSDIAASSVILSSEIAPGVSIGEDSLIYDSTVSG
AVQIGSQSIVVGIHIPSEDLGTPESFRFMLPDRHCLWEVPLVGHKGRVIVYCGLHDNPKNSIHKDGTFCG
KPLEKVLFDLGIEESDLWSSYVAQDRCLWNAKLFPILTYSEMLKLASWLMGLDDSRNKEKIKLWRSSQRV
SLEELHGSINFPEMCNGSSNHQADLAGGIAKACMNYGMLGRNLSQLCHEILQKESLGLEICKNFLDQCPK
FQEQNSKILPKSRAYQVEVDLLRACGDEAKAIELEHKVWGAVAEETASAVRYGFREHLLESSGKSHSENH
ISHPDRVFQPRRTKVELPVRVDFVGGWSDTPPWSLERAGYVLNMAITLEGSLPIGTIIETTNQMGISIQD
DAGNELHIEDPISIKTPFEVNDPFRLVKSALLVTGIVQENFVDSTGLAIKTWANVPRGSGLGTSSILAAA
VVKGLLQISNGDESNENIARLVLVLEQLMGTGGGWQDQIGGLYPGIKFTSSFPGIPMRLQVVPLLASPQL
ISELEQRLLVVFTGQVRLAHQVLHKVVTRYLQRDNLLISSIKRLTELAKSGREALMNCEVDEVGDIMSEA
WRLHQELDPYCSNEFVDKLFEFSQPYSSGFKLVGAGGGGFSLILAKDAEKAKELRQRLEEHAEFDVKVYN
WSIFQRPPSCEVTVLPLPGIKVKRPRKISTLVAFGFGDNAVKRLCNGQPDS
>At1g01220_2
GVVLSCLDCFLKNGSLYCFEYGVSPSVFLTKVRGGTCTTAQSDTTDSVIHRAMYLLQNGFGNYDIFKNNC
EDFALYCKTGLLIMDKLGVGRSGQASSIVGAPLAALLSSPFKLLIPSPIGVATVTAGMYCMSRYATDIGV
RSDVIKVSVEDLALNLDVKTIEQGEEEEEDEEEDSDTDYVR
>At1g01350_1
MSDSGEPKPSQQEEPLPQPAAQETQSQQVCTFFKKPTKSKNIRKRTIDADEEDGDSKSESSILQNLKKVA
KPDSKLYFSSGPSKSSTTTSGAPERSVFHYDSSKEIQVQNDSGATATLETETDFNQDARAIRERVLKKAD
EALKGNKKKASDEKLYKGIHGYTDHKAGFRREQTISSEKAGGSHGPLRASAHIRVSARFDYQPDICKDYK
ETGYCGYGDSCKFLHDRGDYKPGWQIEKEWEEAEKVRKRNKAMGVEDEDDEADKDSDEDENALPFACFIC
REPFVDPVVTKCKHYFCEHCALKHHTKNKKCFVCNQPTMGIFNAAHEIKKRMAEERSKAEQGLRRLWPVG
EVVPKSK
>At1g01350_2
AVTRVWWDINRCPVPADVDVRRVGPCIKRALEKLGYSGPLTITAVGILTDVPHDFLRQVHSSGIALHHVP
TVSETALSGIGWAVVKWTWYNQPPANLMLISYEHIYLTTLDMLGRIGYNTVRSILPDDPQQAASSASPST
GSFLWESLLASLPAADDMDSGAPQEDKCGEMGEPALLCEQCRFTVQGFENFSTHLKSEEHAHESQYYRNE
DDETDGDDYVRDSEDDEEE
>At1g01660_1
MAELMAMGNDVVHVAVKSDVRESRSTLLWALRNLGAKKVCILHVYQPKTASPAARKLEELEAIMYETLHD
YFDFCQQEGVNEDDIYISCIEMNDVKQGILELIHESKIKKLVMGAASDHHYSEKMFDLKSRKAKYVYQHA
PSSCEVMFMCDGHLIYTKEANLEDCMGETESEAGQSKPKLYSSASPKCSAELVSAIVAYIDTRRDRDMLE
PNASEDQSESDRNDQLYRQLKQALMEVEESKREAYEECVRRFKAENTAVEAIRSAREYEAMYNEEAKLRK
EGKEALAKQRKMVEKTKQERDDALIIILNGRKLYNEELRRRVEAEEMLGKEKEEHERTKKEIEEVRAIVQ
DGTLYNEQLRHRKEMEESMKRQEEELEKTKKEKEEACMISKNLMQLYEDEVRQRKEAEELVKRRREELEK
VKKEKEEACSVGQNFMRLYEEEARRRKGTEEELSKVAAEKDAASSVCSEILLLLQSYTRRHGTPSGFSDE
DSVTRQPP
>At1g01660_2
SYFICPISQEVMREPRVAADGFTYEAESLREWLDNGHETSPMTNLKLAHNNLVPNHALRSAIQEWLQRNS
>At1g02230_1
MMNPVGFRFRPNDEEIVDHYLRPKNLDSDTSHVDEVISTVDICSFEPWDLPSKSMIKSRDGVWYFFSVKE
MKYNRGDQQRRRTNSGFWKKTGKTMTVMRKRGNREKIGEKRVLVFKNRDGSKTDWVMHEYHATSLFPNQM
MTYTVCKVEFKGEETEISSSSTGSEIEQIHSLIPLVNSSGGSEVGESQTFSVLSLETLKYLTLVFLPFFR
GESQIDDATTPIEEEWKTWLNNDGDEQRNIMFMQDHRSDYTPLKSLTGVFSDDSSDDNDSDLISPKTNSI
GTSSTCASFASSNHQIDQTQHSPDSTVQLVSLTQEVSQGPGQVTVIREHKLGEESVKKKRASFVYRMIHR
LVKKIHQC
结果:
#Batch CD-search tool NIH/NLM/NCBI #cdsid QM3-qcdsearch-1C0005B737A3865C-39D60DE8A7221C8D #datatype hitsConcise Results #status 0 #Start time 2016-06-04T01:48:41 Run time 0:00:00:14 #status success Query Hit type PSSM-ID From To E-Value Bitscore Accession Short name Incomplete Superfamily Q#1 - >At1g01220_1 specific 249743 819 877 2.51156e-10 56.0224 pfam00288 GHMP_kinases_N - cl21671 Q#1 - >At1g01220_1 superfamily 272091 819 877 2.51156e-10 56.0224 cl21671 GHMP_kinases_N superfamily - - Q#1 - >At1g01220_1 specific 254869 962 1043 2.94031e-07 47.8594 pfam08544 GHMP_kinases_C - cl08484 Q#1 - >At1g01220_1 superfamily 254869 962 1043 2.94031e-07 47.8594 cl08484 GHMP_kinases_C superfamily - - Q#1 - >At1g01220_1 multi-dom 254543 131 531 0 563.303 pfam07959 Fucokinase - - Q#1 - >At1g01220_1 multi-dom 225325 718 1048 5.17247e-41 152.917 COG2605 COG2605 - - Q#2 - >At1g01220_2 superfamily 252914 1 82 8.30867e-23 87.0152 cl04867 LRAT superfamily N - Q#3 - >At1g01350_1 specific 238093 277 318 2.90528e-09 50.9072 cd00162 RING - cl17238 Q#3 - >At1g01350_1 superfamily 276201 277 318 2.90528e-09 50.9072 cl17238 RING superfamily - - Q#3 - >At1g01350_1 specific 250023 201 227 4.96868e-07 44.0832 pfam00642 zf-CCCH - cl11592 Q#3 - >At1g01350_1 superfamily 264437 201 227 4.96868e-07 44.0832 cl11592 zf-CCCH superfamily - - Q#4 - >At1g01350_2 specific 199896 2 147 8.00992e-62 188.701 cd10910 limkain_b1_N_like - cl10034 Q#4 - >At1g01350_2 superfamily 275915 2 147 8.00992e-62 188.701 cl10034 LabA_like/DUF88 superfamily - - Q#5 - >At1g01660_1 specific 238947 11 152 1.00963e-46 157.823 cd01989 STK_N - cl00292 Q#5 - >At1g01660_1 superfamily 275515 11 152 1.00963e-46 157.823 cl00292 AANH_like superfamily - - Q#6 - >At1g01660_2 specific 128780 2 65 5.29778e-30 98.8465 smart00504 Ubox - cl17238 Q#6 - >At1g01660_2 superfamily 276201 2 65 5.29778e-30 98.8465 cl17238 RING superfamily - - Q#7 - >At1g02230_1 specific 251251 4 133 2.83212e-63 196.656 pfam02365 NAM - cl03558 Q#7 - >At1g02230_1 superfamily 251251 4 133 2.83212e-63 196.656 cl03558 NAM superfamily - -
------------------------------------------------------------------------------昏割线------------------------------------------------------------------------------------------------------
domain 领域,范围