wordnet java_WordNet词网研究6——之JWI(Java Wordnet Interface)WordNet Java接口 | 学步园...

JWI (the MIT Java Wordnet Interface) is a Java library for interfacing with Wordnet. JWI supports access to Wordnet versions 1.6 through 3.0, among other related Wordnet extensions.Wordnetis a freely and publicly available semantic dictionary of English, developed at Princeton University.

JWI is written for Java 1.5.0 and has the package namespaceedu.mit.jwi. The distribution does not include the Wordnet dictionary files; these can be downloaded from the Wordnetdownloadsite. This version of software is distributed under alicensethat makes it free to use for all purposes, as long as proper copyright acknowledgement is made.

The javadocAPIis posted online for your convenience. So is the versionchangelog. If you find JWI useful, have found a bug, or would like to request a new feature, pleasecontactme.

DescriptionLinkBinary Files Onlyedu.mit.jwi_2.2.2.jar(143 kb)User's Manualedu.mit.jwi_2.2.2_manual.pdf(276 kb)Source Onlyedu.mit.jwi_2.2.2_src.zip(143 kb)Javadocsedu.mit.jwi_2.2.2_javadoc.zip(617 kb) |onlineDevelopment Kit (binaries and source)edu.mit.jwi_2.2.2_jdk.jar(273 kb)All-in-One (jdk, javadocs, manual)edu.mit.jwi_2.2.2_all.zip(1,090 kb)

JWI是由MIT麻省理工学院,计算机科学与人工智能实验室, Mark Alan.Finlayson主持的项目。JWI是用于访问WordNet的Java API。

一、WordNet与JWI使用实例:

1.先安装WordNet

安装过程略,设置环境变量WNHOME,指向WordNet的安装根目录。例如:

WNHOME = “E:\Commonly Application\WordNet\2.1”;

2.下载JWI

3.测试

写一个测试类:

importjava.io.File;

importjava.io.IOException;

importjava.net.MalformedURLException;

importjava.net.URL;

importedu.mit.jwi.Dictionary;

importedu.mit.jwi.IDictionary;

importedu.mit.jwi.item.IIndexWord;

importedu.mit.jwi.item.IWord;

importedu.mit.jwi.item.IWordID;

importedu.mit.jwi.item.POS;

public classtest {

public static voidmain(String[] args) throwsIOException{

testDitctionary();

}

public static voidtestDitctionary() throwsIOException{

// construct the URL to theWordnetdictionary directory

String wnhome = System.getenv("WNHOME"); //获取环境变量WNHOME

String path = wnhome + File.separator+ "dict";

URL url=newURL("file", null, path);  //创建一个URL对象,指向WordNet的ditc目录

// construct the dictionary object and open it

IDictionary dict=newDictionary(url);

dict.open(); //打开词典

// look up first sense of the word "dog "

IIndexWord idxWord=dict.getIndexWord("dog", POS.NOUN);//获取一个索引词,(dog,名词)

IWordID wordID=idxWord.getWordIDs().get(0);//获取dog第一个词义ID

IWord word = dict.getWord(wordID); //获取该词

System .out. println ("Id = " + wordID );

System .out. println (" 词元 = " + word . getLemma ());

System .out. println (" 注解 = " + word . getSynset (). getGloss ());

}

}

执行得到的结果如下:

Id = WID-02064081-N-??-dog

词元 = dog

注解 = a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds; "the dog barked all night"

这是一个初步是使用JWI的例子。

二、JWI装载WordNet到内存的遍历性能优化:

JWI2.2.x系列一个新的特点是可以将WordNet装载到内存中,这一举措大大的改善了遍历WordNet的性能。其中实现该功能的是,JWI的edu.mit.jwi.RAMDictionary类,该类可以设定是否将WordNet装入内存。

写一个遍历WordNet的函数trek(),对使用RAMDictionary来打开WordNet,对不装载入内存和装载入内存进行比较;

importjava.io.File;

importjava.io.IOException;

importjava.util.Iterator;

importedu.mit.jwi.IDictionary;

importedu.mit.jwi.IRAMDictionary;

importedu.mit.jwi.RAMDictionary;

importedu.mit.jwi.data.ILoadPolicy;

importedu.mit.jwi.item.IIndexWord;

importedu.mit.jwi.item.IWordID;

importedu.mit.jwi.item.POS;

public classRAMDictionaryTest {

public static voidmain(String[] args) throwsIOException, Exception{

String wnhome = System.getenv("WNHOME"); //获取环境变量WNHOM

String path = wnhome + File.separator+ "dict"

File wnDir=newFile(path);

testRAMDictionary(wnDir);

}

public static voidtestRAMDictionary(File wnDir)throwsIOException, InterruptedException{

IRAMDictionary dict=newRAMDictionary(wnDir, ILoadPolicy.NO_LOAD);

dict.open();

//周游WordNet

System.out.print("没装载前:\n");

trek(dict);

//now load into memor

System.out.print("\nLoading Wordnet into memory...");

longt=System.currentTimeMillis();

dict.load(true);

System.out.printf("装载时间:done(%1d msec)\n", System.currentTimeMillis()-t);

//装载后在周游

System.out.print("\n装载后:\n");

trek(dict);

}

/*

* this method is Achieved to trek around the WordNet

*/

public static voidtrek(IDictionary dict){

inttickNext=0;

inttickSize=20000;

intseen=0;

System.out.print("Treking across Wordnet");

longt=System.currentTimeMillis();

for(POS pos:POS.values()){ //遍历所有词性

for(Iterator i=dict.getIndexWordIterator(pos);i.hasNext();){

//遍历某一个词性的所有索引

for(IWordID wid:i.next().getWordIDs()){

//遍历每一个词的所有义项

seen+=dict.getWord(wid).getSynset().getWords().size();//获取某一个synsets所具有的词

if(seen>tickNext){

System.out.print(".");

tickNext=seen + tickSize;

}

}

}

}

System.out.printf("done (%1d msec)\n",System.currentTimeMillis()-t);

System.out.println("In my trek I saw "+ seen + " words");

}

}

执行的效果为:

没装载前

Treking across Wordnet...........................done (3765 msec)

In my trek I saw 523260 words

Loading Wordnet into memory...装载时间:done(10625 msec)

装载后:

Treking across Wordnet...........................done (328 msec)

In my trek I saw 523260 words

由结果可见,不装如内存的周游WordNet的时间为3765 ms,而装入内存后的周游时间为328 ms,结果中的10625ms把装入内存所消耗的时间。就周游的时间而言转入内存后的时间更快速。

三、JWI获取同义词以及抛出NullPointerException原因解析

获取一个词的Synset内的同义词,一下是一个示例:获取“go”的同义词。

importjava.io.File;

importjava.io.IOException;

importjava.net.URL;

importedu.mit.jwi.Dictionary;

importedu.mit.jwi.IDictionary;

importedu.mit.jwi.item.IIndexWord;

importedu.mit.jwi.item.ISynset;

importedu.mit.jwi.item.IWord;

import

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值