SWISS-PROT Format说明

SWISS-PROT Format



SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc), a minimal level of redundancy and high level of integration with other databases.

Record format

Each sequence entry is composed of lines. Different types of lines, each with their own format, are used to record the various data which make up the entry.

Each line begins with a two-character line code, which indicates the type of data contained in the line. The current line types and line codes and the order in which they appear in an entry, are shown below:

    ID     - Identification.
    AC     - Accession number(s).
    DT     - Date.
    DE     - Description.
    GN     - Gene name(s).
    OS     - Organism species.
    OG     - Organelle.
    OC     - Organism classification.
    RN     - Reference number.
    RP     - Reference position.
    RC     - Reference comments.
    RX     - Reference cross-references.
    RA     - Reference authors.
    RL     - Reference location.
    CC     - Comments or notes.
    DR     - Database cross-references.
    KW     - Keywords.
    FT     - Feature table data.
    SQ     - Sequence header.
           - (blanks) sequence data.
    //     - Termination line.
    

The program ignores all the description lines and uses only these line types: 'ID''DE''OS''SQ' and '//'.

  • The program uses the 'ENTRY_NAME' which is the first field of the ID line as the first line of the title
  • The data of the 'DE' and 'OS' lines are collected by the program and are used as the remaining lines of the title
  • The 'SQ' line is used to identify the beginning of the sequence. The program collect all the following lines until the teminalion line is found or end is reached
Useful links:

More information about SWISS-PROT

THE SWISS-PROT PROTEIN SEQUENCE DATA BANK - USER MANUAL



Example of 'ID' lines:
         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890
ID   CYC_BOVIN      STANDARD;      PRT;   104 AA.
ID   GIA2_GIALA     PRELIMINARY;   PRT;   296 AA.
    

Example of 'DE' lines:
         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890
DE   NADH DEHYDROGENASE (EC 1.6.99.3).

DE   LYSOPINE DEHYDROGENASE (EC 1.5.1.16) (OCTOPINE SYNTHASE)
DE   (LYSOPINE SYNTHASE) (FRAGMENT).
    

Example of 'OS' lines:
         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890
OS   ESCHERICHIA COLI.
OS   HOMO SAPIENS (HUMAN).
OS   ROUS SARCOMA VIRUS (STRAIN SCHMIDT-RUPPIN).
OS   NAJA NAJA (INDIAN COBRA), AND NAJA NIVEA (CAPE COBRA).
    
Example of a sequence specification:
         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890
SQ   SEQUENCE   233 AA;  25644 MW;  666D7069 CRC32;
     MSTESMIRDV ELAEEALPKK TGGPQGSRRC LFLSLFSFLI VAGATTLFCL LHFGVIGPQR
     EEFPRDLSLI SPLAQAVRSS SRTPSDKPVA HVVANPQAEG QLQWLNRRAN ALLANGVELR
     DNQLVVPSEG LYLIYSQVLF KGQGCPSTHV LLTHTISRIA VSYQTKVNLL SAIKSPCQRE
     TPEGAEAKPW YEPIYLGGVF QLEKGDRLSA EINRPDYLDF AESGQVYFGI IAL
//
    
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值