与JWNL的Synset不同,MIT JWI查询WordNet的基本概念是Word。在MIT JWI中,一个Word(实现edu.mit.jwi.item.IWord接口)对象是一个Synset及其字面形式(surface form、word form)的整体。
MIT JWI访问WordNet的顺序是,先从输入token利用词形变换器(edu.mit.jwi.morph.IStemmer,我多用WordnetStemmer)变换成一组字面形式wordforms,对每一个word form查询WordNet获得对应的IndexWord,再通过IndexWord取得一组Word。每一个Word的含义是该word form对应一个意项(synset)。
因为WordNet中反义关系是词上的lexical relationship,而不是定义在synset之上的semantic relationship,所以要通过Word来查找反义词,如果用其对应的synset来查找反义词将得不到任何结果(空结果)。
下面是查找实例代码,很简单。在例子中,我们希望找到输入token的所有反义词,并按照意项--反义词表来组织。也即找到每一个意项下的反义词。
首先定义一个WordNetLemma类,用来保存token和pos(词性)的组合。
02 public class WordNetLemma implements Comparable<WordNetLemma> {
03 public String __token = "";
04 public POS __pos = null;
05
06 public WordNetLemma(String word) {
07 __token = word; __pos = null;
08 }
09
10 public WordNetLemma(String word, POS pos) {
11 this(word);
12 __pos = pos;
13 }
14
15 public boolean isPureWordForm() {
16 return __pos == null;
17 }
18
19 @Override
20 public String toString() {
21 if (isPureWordForm())
22 return __token;
23 else
24 return __token + "." + __pos.toString();
25 }
26
27 @Override
28 public int compareTo(WordNetLemma o) {
29 // TODO Auto-generated method stub
30 return this.toString().compareTo(o.toString());
31 }
32
33 @Override
34 public boolean equals(Object o) {
35 return (__token.equals(((WordNetLemma)o).__token) &&__pos.equals(((WordNetLemma)o).__pos));
36 }
37
38 @Override
39 public int hashCode() {
40 int result, c;
41 result = 17;
42 c = __token.hashCode();
43 result = 37 * result + c;
44 c = __pos.hashCode();
45 result = 37 * result + c;
46 return result;
47 }
48 }
然后定义读取所有反义词的函数。
02
03 static Map<WordNetLemma, List<LinkedHashSet<WordNetLemma>>>getAntonymousSynsetLemmas
04 (LinkedHashSet<WordNetLemma> lemmas, , IDictionary_wordnet) {
05 Map<WordNetLemma, List<LinkedHashSet<WordNetLemma>>> synonyms=
06 new TreeMap<WordNetLemma,List<LinkedHashSet<WordNetLemma>>>();
07 for (WordNetLemma lemma : lemmas) {
08 List<LinkedHashSet<WordNetLemma>> partial =this.getAntonymousSynsetLemmas(lemma, _wordnet);
09 // merge
10 synonyms.put(lemma, partial);
11 }
12 return synonyms;
13 }
14
15 static List<LinkedHashSet<WordNetLemma>>
16 getAntonymousSynsetLemmas(WordNetLemma lemma, IDictionary_wordnet) {
17 List<LinkedHashSet<WordNetLemma>> antonymyLemmas = newLinkedList<LinkedHashSet<WordNetLemma>>();
18 IIndexWord indexWord = _wordnet.getIndexWord(lemma.__token,lemma.__pos);
19 if (indexWord == null)
20 return antonymyLemmas;
21 List<IWordID> wordIDs = indexWord.getWordIDs();
22 for (IWordID wordID : wordIDs) {
23 IWord word = _wordnet.getWord(wordID); // get the @word corresponding to @wordID
24 LinkedHashSet<WordNetLemma> antonymyLemmasOfOneSynset = newLinkedHashSet<WordNetLemma>();
25 List<IWordID> antonymousWordIDs =word.getRelatedWords(edu.mit.jwi.item.Pointer.ANTONYM);
26 for (IWordID antonymousWordID : antonymousWordIDs) {
27 // one corresponding @antonymousWord
28 IWord antonymousWord =_wordnet.getWord(antonymousWordID);
29 // add this
30 String antonymousLemmaToken = antonymousWord.getLemma();
31 POS antonymousLemmaPOS = antonymousWord.getPOS();
32 WordNetLemma antonymousLemma = newWordNetLemma(antonymousLemmaToken, antonymousLemmaPOS);
33 antonymyLemmasOfOneSynset.add(antonymousLemma);
34 }
35 antonymyLemmas.add(antonymyLemmasOfOneSynset);
36 }
37 return antonymyLemmas;
38 }
39
40 public static void main(String[] args) {
41 String token = "familiar";
42 POs pos = POS.ADJECTIVE;
43 // replace it with your own WordNet path
44 final String WNRoot = "C:/development/WordNet/2.1/dict";
45 URL dicturl;
46 try {
47 dicturl = new URL("file", null, WNRoot);
48 IDictionary dict = new edu.mit.jwi.Dictionary(dicturl);
49 dict.open();
50 WordnetStemmer stemmer = new WordnetStemmer(dict);
51 List<String> wordforms = stemmer.findStems(token, pos);
52 LinkedHashSet<WordNetSynsetLemma> lemmas = newLinkedHashSet<WordNetSynsetLemma>();
53 for (String wordform : wordforms) {
54 lemmas.add(new WordNetSynsetLemma(wordform, pos));
55 }
56 Map<WordNetSynsetLemma,List<LinkedHashSet<WordNetSynsetLemma>>> antonyms =
getAntonymousSynsetLemmas(lemmas, dict);
57 System.out.println(antonyms);
59 }
60 } catch (MalformedURLException e) {
61 // TODO Auto-generated catch block
62 e.printStackTrace();
63 } catch (IOException e) {
64 // TODO Auto-generated catch block
65 e.printStackTrace();
66 }
67 }
上述代码的运行结果如下。其中familiar在形容词下有四个意项,前两个意项的反义词分别为unfamiliar和strange,后两个意项下不存在反义词。从下图也能看出上述结果是正确的。
{familiar.adjective=[[unfamiliar.adjective], [strange.adjective], [], []]}
如果将token替换成goodness,pos替换成POs.NOUN,则结果如下
{goodness.noun=[[badness.noun], [evilness.noun]]}