JWS——Java WordNet Similarity是由University Of Sussex的David Hope等开发的基于java与WordNet的语义相似度计算开源项目。其中实现了许多经典的语义相似度算法。是一款值得研究的语义相似度计算开源工具。
JWS是WordNet::Similarity(一个Perl版的WordNet相似度比较包)的Java实现版本,想用Java实现用WordNet比较词语相似度的朋友有福拉!!简述使用步骤:
1、下载WordNet(Win、2.1版):http://wordnet.princeton.edu/wordnet/download/;
2、下载WordNet-InfoContent(2.1版):http://wn-similarity.sourceforge.net/ 或http://www.d.umn.edu/~tpederse/Data/;
3、下载JWS(现有版本:beta.11.01):http://www.cogs.susx.ac.uk/users/drh21/;
4、安装WordNet;
5、解压WordNet-InfoContent-2.1,并将文件夹拷贝至WordNet目录D:/Program Files/WordNet/2.1下;
6、将JWS中的两个jar包:edu.mit.jwi_2.1.4.jar和edu.sussex.nlp.jws.beta.11.jar拷贝至Java的lib目录下,并设置环境变量;
7、在Eclipse下运行JWS中的例子程序:TestExamples
说明:由于下载的WordNet是2.1版本的,所以程序中有几处需要修改
String dir = "C:/Program Files/WordNet"; //这里指定WordNet的安装路径,按照你实际安装的路径加以修改
JWS ws = new JWS(dir, "3.0"); //把3.0改为2.1即可
程序实例:
1 importjava.util.TreeMap;2 import java.text.*;3 import edu.sussex.nlp.jws.*;4
5
6 //'TestExamples': how to use Java WordNet::Similarity7 //David Hope, 2008
8 public classTestExamples9 {10 public static voidmain(String[] args)11 {12
13 //1. SET UP:14 //Let's make it easy for the user. So, rather than set pointers in 'Environment Variables' etc. let's allow the user to define exactly where they have put WordNet(s)
15 String dir = "E:/Commonly Application/WordNet/";16 //That is, you may have version 3.0 sitting in the above directory e.g. C:/Program Files/WordNet/3.0/dict17 //The corresponding IC files folder should be in this same directory e.g. C:/Program Files/WordNet/3.0/WordNet-InfoContent-3.018
19 //Option 1 (Perl default): specify the version of WordNet you want to use (assuming that you have a copy of it) and use the default IC file [ic-semcor.dat]
20 JWS ws = new JWS(dir, "2.1");21 //Option 2 : specify the version of WordNet you want to use and the particular IC file that you wish to apply22 //JWS ws = new JWS(dir, "3.0", "ic-bnc-resnik-add1.dat");23
24
25 //2. EXAMPLES OF USE:26
27 //2.1 [JIANG & CONRATH MEASURE]
28 JiangAndConrath jcn =ws.getJiangAndConrath();29 //System.out.println("Jiang & Conrath\n");30 //all senses
31 TreeMap scores1 = jcn.jcn("apple", "banana", "n"); //all senses32 //TreeMap scores1 = jcn.jcn("apple", 1, "banana", "n");//fixed;all33 //TreeMap scores1 = jcn.jcn("apple", "banana", 2, "n");//all;fixed
34 for(String s : scores1.keySet())35 System.out.println(s + "\t" +scores1.get(s));36 //specific senses37 //System.out.println("\nspecific pair\t=\t" + jcn.jcn("apple", 1, "banana", 1, "n") + "\n");38 //max.
39 ///System.out.println("\nhighest score\t=\t" + jcn.max("java", "best", "n") + "\n\n\n");40
41 //*/42 //2.2 [LIN MEASURE]
43 Lin lin =ws.getLin();44 ///System.out.println("Lin\n");45 //all senses
46 TreeMap scores2 = lin.lin("like", "love", "n"); //all senses47 //TreeMap scores2 = lin.lin("kid", "child", "n");//fixed;all48 //TreeMap scores2 = lin.lin("apple", "banana", 2, "n");//all;fixed49 //for(String s : scores2.keySet())50 //System.out.println(s + "\t" + scores2.get(s));51 //specific senses
52 System.out.println("\nspecific pair\t=\t" + lin.lin("like", 1, "love", 1, "n") + "\n");53 //max.
54 System.out.println("\nhighest score\t=\t" + lin.max("From","date","n") + "\n\n\n");55
56 //... and so on for any other measure
57 }58 } //eof
简单实现基于JWS的语义相似度计算程序,例如:
1 importedu.sussex.nlp.jws.JWS;2 importedu.sussex.nlp.jws.Lin;3
4
5 public classSimilar {6
7 privateString str1;8 privateString str2;9 private String dir = "E:/Commonly Application/WordNet/";10 private JWS ws = new JWS(dir, "2.1");11
12 publicSimilar(String str1,String str2){13 this.str1=str1;14 this.str2=str2;15 }16
17 public doublegetSimilarity(){18 String[] strs1 =splitString(str1);19 String[] strs2 =splitString(str2);20 double sum = 0.0;21 for(String s1 : strs1){22 for(String s2: strs2){23 double sc=maxScoreOfLin(s1,s2);24 sum+=sc;25 System.out.println("当前计算: "+s1+" VS "+s2+" 的相似度为:"+sc);26 }27 }28 double Similarity = sum /(strs1.length *strs2.length);29 sum=0;30 returnSimilarity;31 }32
33 privateString[] splitString(String str){34 String[] ret = str.split(" ");35 returnret;36 }37
38 private doublemaxScoreOfLin(String str1,String str2){39 Lin lin =ws.getLin();40 double sc = lin.max(str1, str2, "n");41 if(sc==0){42 sc = lin.max(str1, str2, "v");43 }44 returnsc;45 }46
47 public static voidmain(String args[]){48 String s1="departure";49 String s2="leaving from";50 Similar sm= newSimilar(s1, s2);51 System.out.println(sm.getSimilarity());52 }53 }
当时碰到想基于protege+Wordnet来处理语义分析这块,所以接触到JWS,但没有太多的时间去深入研究,是一个非常的遗憾,希望有研究的朋友,发个Blog Url,大家参考参考!