一个非常全的人类基因id转换工具
GeoTcgaData包中有一个id转换的函数:id_conversion_vector()
, 它可以对人类的各种基因id进行转换。2019年9月第一次发布,目前这个包在CRAN的下载量为1633次:
> dlstats::cran_stats("GeoTcgaData")
start end downloads package
1 2019-09-01 2019-09-30 412 GeoTcgaData
2 2019-10-01 2019-10-31 168 GeoTcgaData
3 2019-11-01 2019-11-30 204 GeoTcgaData
4 2019-12-01 2019-12-31 206 GeoTcgaData
5 2020-01-01 2020-01-31 441 GeoTcgaData
6 2020-02-01 2020-02-24 202 GeoTcgaData
> sum(dlstats::cran_stats("GeoTcgaData")$downloads)
[1] 1633
2020年2月20日我在github和CRAN上对它进行了更新,支持转换的id种类更多!同时它对基因id的转换率也很不错。
小提示:在家里我发现用电脑从github上安装R包会很慢,但是用电脑连接手机热点却可以比较轻松的安装github上的R包。
GeoTcgaData的安装方法为:
从github上安装开发中的版本:
if(!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("huerqiang/GeoTcgaData")
如果电脑上没有安装devtools,也可以使用remotes来安装github上的包。remotes包比devtools小的多,也更好安装。
if(!requireNamespace("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_github("huerqiang/GeoTcgaData")
或者从CRAN上安装稳定版
install.packages("GeoTcgaData")
它所支持的id种类为:
id_ava()
[1] "hgnc_id" "symbol" "name" "locus_group" "locus_type" "status"
[7] "location" "location_sortable" "prev_name" "gene_family" "gene_family_id" "date_approved_reserved"
[13] "date_symbol_changed" "date_name_changed" "date_modified" "entrez_id" "ensembl_gene_id" "vega_id"
[19] "ucsc_id" "ena" "refseq_accession" "ccds_id" "uniprot_ids" "pubmed_id"
[25] "mgd_id" "rgd_id" "cosmic" "omim_id" "mirbase" "homeodb"
[31] "snornabase" "bioparadigms_slc" "orphanet" "pseudogene.org" "horde_id" "merops"
[37] "imgt" "iuphar" "kznf_gene_catalog" "mamit-trnadb" "cd" "lncrnadb"
[43] "enzyme_id" "intermediate_filament_db" "rna_central_ids" "lncipedia" "gtrnadb" "NCBI Gene ID"
使用方法如下:
genes <- c("A2ML1", "A2ML1-AS1", "A4GALT", "OR52L1", "AAAS")
#从gene symbol转到 Ensembl ID
ense <- id_conversion_vector("symbol", "ensembl_gene_id", genes)
ense
[1] "ENSG00000166535" "ENSG00000256661" "ENSG00000128274" "ENSG00000183313" "ENSG00000094914"
#从Ensembl ID 转到 uniprot id
uni <- id_conversion_vector("ensembl_gene_id", "uniprot_ids", ense)
#从Ensembl ID 转到 entrez id
ent <- id_conversion_vector("ensembl_gene_id", "entrez_id", ense)
uni
ent
[1] "A8K2U0" "" "Q9NPC4" "Q8NGH7" "Q9NRG9"
[1] " 144568" "100874108" " 53947" " 338751" " 8086"
此外,我们也可以查看基因的全称等。
genes <- c("A2ML1", "A2ML1-AS1", "A4GALT", "OR52L1", "AAAS")
id_conversion_vector("symbol", "name", genes)
[1] "alpha-2-macroglobulin like 1" "A2ML1 antisense RNA 1"
[3] "alpha 1,4-galactosyltransferase (P blood group)" "olfactory receptor family 52 subfamily L member 1 (gene/pseudogene)"
[5] "aladin WD repeat nucleoporin"
GeoTcgaData包的完整教程见:
https://github.com/huerqiang/GeoTcgaData
或者
https://cran.r-project.org/web/packages/GeoTcgaData/readme/README.html