java similarity_Java WordNet Similarity

JWS——Java WordNet Similarity是由University Of Sussex的David Hope等开发的基于java与WordNet的语义相似度计算开源项目。其中实现了许多经典的语义相似度算法。是一款值得研究的语义相似度计算开源工具。

JWS是WordNet::Similarity(一个Perl版的WordNet相似度比较包)的Java实现版本,想用Java实现用WordNet比较词语相似度的朋友有福拉!!简述使用步骤:

1、下载WordNet(Win、2.1版):http://wordnet.princeton.edu/wordnet/download/;

2、下载WordNet-InfoContent(2.1版):http://wn-similarity.sourceforge.net/ 或http://www.d.umn.edu/~tpederse/Data/;

3、下载JWS(现有版本:beta.11.01):http://www.cogs.susx.ac.uk/users/drh21/;

4、安装WordNet;

5、解压WordNet-InfoContent-2.1,并将文件夹拷贝至WordNet目录D:/Program Files/WordNet/2.1下;

6、将JWS中的两个jar包:edu.mit.jwi_2.1.4.jar和edu.sussex.nlp.jws.beta.11.jar拷贝至Java的lib目录下,并设置环境变量;

7、在Eclipse下运行JWS中的例子程序:TestExamples

说明:由于下载的WordNet是2.1版本的,所以程序中有几处需要修改

String dir = "C:/Program Files/WordNet";    //这里指定WordNet的安装路径,按照你实际安装的路径加以修改

JWS ws = new JWS(dir, "3.0");                   //把3.0改为2.1即可

程序实例:

48304ba5e6f9fe08f3fa1abda7d326ab.png

1 importjava.util.TreeMap;2 import java.text.*;3 import edu.sussex.nlp.jws.*;4

5

6 //'TestExamples': how to use Java WordNet::Similarity7 //David Hope, 2008

8 public classTestExamples9 {10 public static voidmain(String[] args)11 {12

13 //1. SET UP:14 //Let's make it easy for the user. So, rather than set pointers in 'Environment Variables' etc. let's allow the user to define exactly where they have put WordNet(s)

15 String dir = "E:/Commonly Application/WordNet/";16 //That is, you may have version 3.0 sitting in the above directory e.g. C:/Program Files/WordNet/3.0/dict17 //The corresponding IC files folder should be in this same directory e.g. C:/Program Files/WordNet/3.0/WordNet-InfoContent-3.018

19 //Option 1 (Perl default): specify the version of WordNet you want to use (assuming that you have a copy of it) and use the default IC file [ic-semcor.dat]

20 JWS ws = new JWS(dir, "2.1");21 //Option 2 : specify the version of WordNet you want to use and the particular IC file that you wish to apply22 //JWS ws = new JWS(dir, "3.0", "ic-bnc-resnik-add1.dat");23

24

25 //2. EXAMPLES OF USE:26

27 //2.1 [JIANG & CONRATH MEASURE]

28 JiangAndConrath jcn =ws.getJiangAndConrath();29 //System.out.println("Jiang & Conrath\n");30 //all senses

31 TreeMap scores1 = jcn.jcn("apple", "banana", "n"); //all senses32 //TreeMap scores1 = jcn.jcn("apple", 1, "banana", "n");//fixed;all33 //TreeMap scores1 = jcn.jcn("apple", "banana", 2, "n");//all;fixed

34 for(String s : scores1.keySet())35 System.out.println(s + "\t" +scores1.get(s));36 //specific senses37 //System.out.println("\nspecific pair\t=\t" + jcn.jcn("apple", 1, "banana", 1, "n") + "\n");38 //max.

39 ///System.out.println("\nhighest score\t=\t" + jcn.max("java", "best", "n") + "\n\n\n");40

41 //*/42 //2.2 [LIN MEASURE]

43 Lin lin =ws.getLin();44 ///System.out.println("Lin\n");45 //all senses

46 TreeMap scores2 = lin.lin("like", "love", "n"); //all senses47 //TreeMap scores2 = lin.lin("kid", "child", "n");//fixed;all48 //TreeMap scores2 = lin.lin("apple", "banana", 2, "n");//all;fixed49 //for(String s : scores2.keySet())50 //System.out.println(s + "\t" + scores2.get(s));51 //specific senses

52 System.out.println("\nspecific pair\t=\t" + lin.lin("like", 1, "love", 1, "n") + "\n");53 //max.

54 System.out.println("\nhighest score\t=\t" + lin.max("From","date","n") + "\n\n\n");55

56 //... and so on for any other measure

57 }58 } //eof

48304ba5e6f9fe08f3fa1abda7d326ab.png

简单实现基于JWS的语义相似度计算程序,例如:

48304ba5e6f9fe08f3fa1abda7d326ab.png

1 importedu.sussex.nlp.jws.JWS;2 importedu.sussex.nlp.jws.Lin;3

4

5 public classSimilar {6

7 privateString str1;8 privateString str2;9 private String dir = "E:/Commonly Application/WordNet/";10 private JWS ws = new JWS(dir, "2.1");11

12 publicSimilar(String str1,String str2){13 this.str1=str1;14 this.str2=str2;15 }16

17 public doublegetSimilarity(){18 String[] strs1 =splitString(str1);19 String[] strs2 =splitString(str2);20 double sum = 0.0;21 for(String s1 : strs1){22 for(String s2: strs2){23 double sc=maxScoreOfLin(s1,s2);24 sum+=sc;25 System.out.println("当前计算: "+s1+" VS "+s2+" 的相似度为:"+sc);26 }27 }28 double Similarity = sum /(strs1.length *strs2.length);29 sum=0;30 returnSimilarity;31 }32

33 privateString[] splitString(String str){34 String[] ret = str.split(" ");35 returnret;36 }37

38 private doublemaxScoreOfLin(String str1,String str2){39 Lin lin =ws.getLin();40 double sc = lin.max(str1, str2, "n");41 if(sc==0){42 sc = lin.max(str1, str2, "v");43 }44 returnsc;45 }46

47 public static voidmain(String args[]){48 String s1="departure";49 String s2="leaving from";50 Similar sm= newSimilar(s1, s2);51 System.out.println(sm.getSimilarity());52 }53 }

48304ba5e6f9fe08f3fa1abda7d326ab.png

当时碰到想基于protege+Wordnet来处理语义分析这块,所以接触到JWS,但没有太多的时间去深入研究,是一个非常的遗憾,希望有研究的朋友,发个Blog Url,大家参考参考!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值