GOSemSim

本文介绍了如何使用GOSemSim在R中计算基因本体(GO)术语的语义相似性。从安装Bioconductor和RStudio开始,然后通过示例解释如何处理GO ID,安装Supported organisms,以及使用goSim和mgoSim函数进行计算。在检查错误部分,作者指出可能因GO数据更新导致与原始论文结果不一致。
摘要由CSDN通过智能技术生成

GOSemSim:GO-terms Semantic Similarity Measures

 

Installation

Install GOSemSim is easy, follow the guide in the Bioconductor page:

## try http:// if https:// URLs are not supportedsource("https://bioconductor.org/biocLite.R")## biocLite("BiocUpgrade") ## you may need thisbiocLite("GOSemSim")
 

GO ID

找到 Gene Ontology (GO)勾选下面几个选项 

 

然后获得所有酵母蛋白的Gene ontology数据

 提取Gene Ontology ID

  1. # -*- coding: utf-8 -*-
  2. """
  3. Created on Fri Oct 28 19:04:38 2016
  4. @author: sun
  5. """
  6. import pandas as pd
  7. import re
  8. yeast=pd.read_csv('yeast.csv')
  9. #Gene ontology (biological process)
  10. #Gene ontology (molecular function)
  11. #Gene ontology (cellular component)
  12. bp=yeast['Gene ontology (biological process)']
  13. bp=bp.fillna(value='')
  14. for i in range(len(bp)):
  15. temp=re.findall(r"GO:\d{7}",bp[i])
  16. bp[i]=';'.join(temp)
  17. mf=yeast['Gene ontology (molecular function)']
  18. mf=mf.fillna(value='')
  19. for i in range(len(mf)):
  20. temp=re.findall(r"GO:\d{7}",mf[i])
  21. mf[i]=';'.join(temp)
  22. cc=yeast['Gene ontology (cellular component)']
  23. cc=cc.fillna(value='')
  24. for i in range(len(cc)):
  25. temp=re.findall(r"GO:\d{7}",cc[i])
  26. cc[i]=';'.join(temp)
  27. yeast['Gene ontology (biological process)']=bp
  28. yeast['Gene ontology (molecular function)']=mf
  29. yeast['Gene ontology (cellular component)']=cc
  30. yeast.to_csv('go.csv',index=False,columns =['Entry',
  31. 'Gene ontology (cellular component)',
  32. 'Gene ontology (molecular function)',
  33. 'Gene ontology (biological process)'])
 

获得Gene ontology

 

获取gold_yeast的Gene ontology

yeast_gold_protein_pair.csv
 
  1. yeast=pd.read_csv('yeast_gold_protein_pair.csv')
  2. go=pd.read_csv('go.csv',index_col=0)
  3. protein_a=go.loc[yeast.idA,:]
  4. protein_b=go.loc[yeast.idB,:]
  5. protein_a.to_csv('GOProteinA.csv')
  6. protein_b.to_csv('GOProteinB.csv')
GOProteinA.csv
GOProteinB.csv
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值