GOSemSim:GO-terms Semantic Similarity Measures
Installation
Install GOSemSim
is easy, follow the guide in the Bioconductor page:
## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
## biocLite("BiocUpgrade") ## you may need this
biocLite("GOSemSim")
GO ID
找到 Gene Ontology (GO)勾选下面几个选项
然后获得所有酵母蛋白的Gene ontology数据
提取Gene Ontology ID
# -*- coding: utf-8 -*-
"""
Created on Fri Oct 28 19:04:38 2016
@author: sun
"""
import pandas as pd
import re
yeast=pd.read_csv('yeast.csv')
#Gene ontology (biological process)
#Gene ontology (molecular function)
#Gene ontology (cellular component)
bp=yeast['Gene ontology (biological process)']
bp=bp.fillna(value='')
for i in range(len(bp)):
temp=re.findall(r"GO:\d{7}",bp[i])
bp[i]=';'.join(temp)
mf=yeast['Gene ontology (molecular function)']
mf=mf.fillna(value='')
for i in range(len(mf)):
temp=re.findall(r"GO:\d{7}",mf[i])
mf[i]=';'.join(temp)
cc=yeast['Gene ontology (cellular component)']
cc=cc.fillna(value='')
for i in range(len(cc)):
temp=re.findall(r"GO:\d{7}",cc[i])
cc[i]=';'.join(temp)
yeast['Gene ontology (biological process)']=bp
yeast['Gene ontology (molecular function)']=mf
yeast['Gene ontology (cellular component)']=cc
yeast.to_csv('go.csv',index=False,columns =['Entry',
'Gene ontology (cellular component)',
'Gene ontology (molecular function)',
'Gene ontology (biological process)'])
获得Gene ontology
获取gold_yeast的Gene ontology
yeast_gold_protein_pair.csv
yeast=pd.read_csv('yeast_gold_protein_pair.csv')
go=pd.read_csv('go.csv',index_col=0)
protein_a=go.loc[yeast.idA,:]
protein_b=go.loc[yeast.idB,:]
protein_a.to_csv('GOProteinA.csv')
protein_b.to_csv('GOProteinB.csv')
GOProteinA.csv
GOProteinB.csv