NCBI参考序列RefSeq

关于RefSeq的基本信息,可以参照一下几篇文章【开启传送门~!@#¥%……&*】

http://liucheng.name/381/

http://www.biosino.org/pages/ncbi-10.htm

官方版本:http://www.ncbi.nlm.nih.gov/RefSeq/RSfaq.html

 

不过可能我现在更关注与RefSeq的格式说明,这一阶段的失败教训提醒我,数据分析的时候一定要搞清楚各个数据项的意义。

方便查阅

AccessionMoleculeMethod @ Note说明 
AC_123456GenomicMixedAlternate complete genomic molecule. This prefix is used for records that are provided to reflect an alternate assembly or annotation. Primarily used for viral, prokaryotic records. 基因组序列,主要是病毒、原核生物。
AP_123456ProteinMixedProtein products; alternate protein record. This prefix is used for records that are provided to reflect an alternate assembly or annotation. The AP_ prefix was originally designated for bacterial proteins but this usage was changed. 蛋白序列,AP_原本只用于细菌的蛋白。
NC_123456GenomicMixedComplete genomic molecules including genomes, chromosomes, organelles, plasmids. 全基因组序列,包括细胞器的、质粒等
NG_123456GenomicMixedIncomplete genomic region; supplied to support the NCBI genome annotation pipeline. Represents either non-transcribed pseudogenes, or larger regions representing a gene cluster that is difficult to annotate via automatic methods. 不完整的基因组序列,
NM_123456
NM_123456789
mRNAMixedTranscript products; mature messenger RNA (mRNA) transcripts. 成熟的mRNA
NP_123456
NP_123456789
ProteinMixedProtein products; primarily full-length precursor products but may include some partial proteins and mature peptide products. 全长蛋白序列。但也有可能包括非全长的蛋白或成熟的多肽序列。
NR_123456RNAMixedNon-coding transcripts including structural RNAs, transcribed pseudogenes, and others. 不编码的RNA,假基因或其它
NT_123456GenomicAutomatedIntermediate genomic assemblies of BAC and/or Whole Genome Shotgun sequence data. BAC法或鸟枪法得到的基因组序列
NW_123456
NW_123456789
GenomicAutomatedIntermediate genomic assemblies of BAC or Whole Genome Shotgun sequence data. BAC法或鸟枪法得到的基因组序列
NZ_ABCD12345678GenomicAutomatedA collection of whole genome shotgun sequence data for a project. Accessions are not tracked between releases. The first four characters following the underscore (e.g. 'ABCD') identifies a genome project. 'ABCD'代表的是具体的基因组计划
XM_123456
XM_123456789
mRNAAutomatedTranscript products; model mRNA provided by a genome annotation process; sequence corresponds to the genomic contig. 转录序列
XP_123456
XP_123456789
ProteinAutomatedProtein products; model proteins provided by a genome annotation process; sequence corresponds to the genomic contig. 蛋白序列
XR_123456RNAAutomatedTranscript products; model non-coding transcripts provided by a genome annotation process; sequence corresponds to the genomic contig. 不编码的转录序列,
YP_123456
YP_123456789
ProteinMixedProtein products; no corresponding transcript record provided. Primarily used for bacterial, viral, and mitochondrial records. 蛋白序列,没有对应的转录序列。用于细菌、病毒和线粒体
ZP_12345678ProteinAutomatedProtein products; annotated on NZ_ accessions (often via computational methods). 蛋白序列。来自对应的NZ_开头的核酸序列。
NS_123456GenomicAutomatedGenomic records that represent an assembly which does not reflect the structure of a real biological molecule. The assembly may represent an unordered assembly of unplaced scaffolds, or it may represent an assembly of DNA sequences generated from a biological sample that may not represent a single organism. 比较复杂

@ Method:   
Mixed: indicates the process flow includes both automated processing and expert review for some of the records; curation analysis may be provided either by NCBI staff or collaborators.由专家手动检查过的
Automated: indicates records that are not individually reviewed; updates are released in bulk for a genome.自动注释的

For more:http://www.ncbi.nlm.nih.gov/RefSeq/key.html#accession

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值