FASTA格式又称Pearson的格式,该种序列格式要求序列的标题行以大于号">"开头,下一行起为具体的序列。一般建议每行的字符数不超过60个,以方便程序处理。多条核苷酸序列格式即将该格式连续列出即可.
- This format contains a single header line providing the sequence name, and optionally a description, followed by lines of sequence data.
- Sequences in FASTA formatted files are preceded by a line starting with a " >" symbol.
- The first word on this line is the name of the sequence. The rest of the line is a description of the sequence.
Term Entry Name Molecule Type Gene Name Sequence Length |
e.g. FOSB_MOUSE Protein fosB 338 bp |
- The remaining lines contain the sequence itself, usually formated to 60 characters per line.
- Depending on the application blank lines in a FASTA file are ignored or treated as terminating the sequence
- Depending on the application spaces or other non-sequence symbols (dashes, underscores, periods) in a sequence are either ignored or treated as gaps.
- FASTA files containing multiple sequences are just the same, with one sequence listed right after another. This format is accepted for many multiple sequence alignment programs.