背景
KEGG.db的包自2011年后不再更新,clusterprofiler做KEGG分析的时候,可以使用KEGG提供的API对所需物种信息进行获取,但是KEGG的服务器有时候会断网,这对我们批量进行数据分析的时候,会发生错误而分析中断。另外,目前我们在云上使用docker是无法联网的(不是云平台的原因)。因此我需要自己自备数据分析来获得更为全面正确的分析结果。
KEGG.db基本信息查看
> library("KEGG.db")
> ls("package:KEGG.db") #
[1] "KEGG" "KEGG_dbconn" "KEGG_dbfile" "KEGG_dbInfo" "KEGG_dbschema"
[6] "KEGGENZYMEID2GO" "KEGGEXTID2PATHID" "KEGGGO2ENZYMEID" "KEGGMAPCOUNTS" "KEGGPATHID2EXTID"
[11] "KEGGPATHID2NAME" "KEGGPATHNAME2ID"
> columns(KEGG.db)
Error in columns(KEGG.db) : object 'KEGG.db' not found
> # columns(org.Hs.eg.db) # show colums in .db
> # columns(KEGG.db) # Error in columns(KEGG.db) : object 'KEGG.db' not found
> # available.dbschemas()
> frame = toTable(KEGGPATHID2EXTID)
> head(frame)
pathway_id gene_or_orf_id
1 hsa00232 10
2 hsa00983 10
3 hsa01100 10
4 hsa00230 100
5 hsa01100 100
6 hsa05340 100
> frame = toTable(KEGGPATHID2NAME)
> head(frame)
path_id path_name
1 hsa00010 Glycolysis / Gluconeogenesis
2 hsa00020 Citrate cycle (TCA cycle)
3 hsa00030 Pentose phosphate pathway
4 hsa00040 Pentose and glucuronate interconversions
5 hsa00051 Fructose and mannose metabolism
6 hsa00052 Galactose metabolism
> class(KEGGPATHID2EXTID)
[1] "AnnDbBimap"
attr(,"package")
[1] "AnnotationDbi"
ls("package:KEGG.db")
返回的结果是一些AnnObObj
(Bimap objects)。是一种老的AnnotationDbi
的interface,现在已经不怎么建议使用了。现在推荐使用select方法(columns()
、cols()
、keytypes()
等)。
生成简易版KEGG.dbR包的代码
0. 最终的目录结构如下:
KEGG(目录,手动创建)
├── DESCRIPTION(文件,手动创建)
├── inst(目录)
│ └── extdata(目录)
│ └── KEGG.sqlite(文件,通过代码生成)
├── LICENSE(文件,手动创建)
├── NAMESPACE(文件,手动创建)
└── R(目录,手动创建)
└── zzz.R(文件,手动创建)
1. DESCRIPTION文件
参考2011版本的KEGG.db来写:
Package: KEGG.db
Title: KEGG.db for KEGG enrichment analysis.
Description: KEGG.db for KEGG enrichment analysis.
Version: 1.0
Author: xxx
Maintainer: xxx
Depends: R (>= 2.7.0), methods, AnnotationDbi (>= 1.44.0)
Imports: methods, AnnotationDbi
Suggests: DBI
License: BSD
License_restricts_use: yes
biocViews: AnnotationData, FunctionalAnnotation
2. LICENSE文件
参考2011版本的KEGG.db来写:
Free for academic use. Non-academic users are requested to obtain a license agreement with KEGG.
3. NAMESPACE文件
参考2011版本的KEGG.db来写:
import(methods)
import(AnnotationDbi)
### Only put what is statically exported here. All the AnnObj instances
### created at load time are dynamically exported (refer to R/zzz.R for
### the details).
export(
KEGG,
KEGG_dbconn,
KEGG_dbfile,
KEGG_dbschema,