使用集合映射和关联关系映射
Inter-conversion of gene ID’s is the most important aspect enabling genomic and proteomic data analysis. There are multiple tools available each with its own drawbacks. While performing enrichment analysis on Mass Spectrometry datasets, I had always struggled to prepare the input files required for each of the packages in R. It takes some data tweaking and cleanup to enable the R tools or packages to accept them as an input. The struggle is more in case of UniProt id’s as very few applications accept them as input. Although UniProt provides the retrieve id mapping function, it does not take into account the number of rows which means any protein or gene id which cannot be mapped is simply omitted from the output file. This makes combining the datasets difficult.
基因ID的相互转换是实现基因组和蛋白质组数据分析的最重要方面。 有多种可用的工具,每种工具都有其自身的缺点。 在对质谱数据集进行富集分析时,我一直在努力准备R中每个程序包所需的输入文件。需要进行一些数据调整和清理,以使R工具或程序包可以将它们作为输入来接受。 在UniProt id的情况下,斗争更加艰巨,因为很少有应用程序接受它们作为输入。 尽管UniProt提供了检索ID映射功能,但它没有考虑行数,这意味着从输出文件中会省略掉无法映射的任何蛋白质或基因ID。 这使得难以合并数据集。
There are numerous tools available for such kind of ID mapping. Here I am laying out a few R packages that I have used and worked smoothly.
有许多工具可用于此类ID映射。 在这里,我将介绍一些我使用和顺利工作过的R软件包。