使用MIT JWI（Java WordNet Interface）查询WordNet反义词

最新推荐文章于 2021-04-02 11:03:45 发布

相门码农

最新推荐文章于 2021-04-02 11:03:45 发布

阅读量2.1k

点赞数

分类专栏： NLP

NLP 专栏收录该内容

5 篇文章

订阅专栏

本文介绍如何使用MITJWI库查询WordNet中的反义词关系。通过定义WordNetLemma类来处理词汇及词性组合，并提供查找所有反义词的函数。示例代码展示了从输入词汇获取反义词的过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

与JWNL的Synset不同，MIT JWI查询WordNet的基本概念是Word。在MIT JWI中，一个Word（实现edu.mit.jwi.item.IWord接口）对象是一个Synset及其字面形式（surface form、word form）的整体。

MIT JWI访问WordNet的顺序是，先从输入token利用词形变换器(edu.mit.jwi.morph.IStemmer，我多用WordnetStemmer)变换成一组字面形式wordforms，对每一个word form查询WordNet获得对应的IndexWord，再通过IndexWord取得一组Word。每一个Word的含义是该word form对应一个意项（synset）。

因为WordNet中反义关系是词上的lexical relationship，而不是定义在synset之上的semantic relationship，所以要通过Word来查找反义词，如果用其对应的synset来查找反义词将得不到任何结果（空结果）。

下面是查找实例代码，很简单。在例子中，我们希望找到输入token的所有反义词，并按照意项--反义词表来组织。也即找到每一个意项下的反义词。

首先定义一个WordNetLemma类，用来保存token和pos（词性）的组合。

Java语言: 高亮代码由发芽网提供

01 /** for pure word form, _pos = null*/
02 public class WordNetLemma implements Comparable<WordNetLemma> {
03    public String __token = "";
04    public POS __pos = null;
05
06    public WordNetLemma(String word) {
07       __token = word; __pos = null;
08    }
09
10    public WordNetLemma(String word, POS pos) {
11       this(word);
12       __pos = pos;
13    }
14
15    public boolean isPureWordForm() {
16       return __pos == null;
17    }
18
19    @Override
20   public String toString() {
21       if (isPureWordForm())
22          return __token;
23       else
24          return __token + "." + __pos.toString();
25    }
26
27    @Override
28    public int compareTo(WordNetLemma o) {
29       // TODO Auto-generated method stub
30       return this.toString().compareTo(o.toString());
31    }
32
33    @Override
34    public boolean equals(Object o) {
35       return (__token.equals(((WordNetLemma)o).__token) &&__pos.equals(((WordNetLemma)o).__pos));
36    }
37
38    @Override
39    public int hashCode() {
40       int result, c;
41       result = 17;
42       c = __token.hashCode();
43       result = 37 * result + c;
44       c = __pos.hashCode();
45       result = 37 * result + c;
46       return result;
47    }
48 }

然后定义读取所有反义词的函数。

Java语言: 高亮代码由发芽网提供

01 public class TestAntonymyRelationJWI {
02
03    static Map<WordNetLemma, List<LinkedHashSet<WordNetLemma>>>getAntonymousSynsetLemmas
04          (LinkedHashSet<WordNetLemma> lemmas, , IDictionary_wordnet) {
05       Map<WordNetLemma, List<LinkedHashSet<WordNetLemma>>> synonyms=
06             new TreeMap<WordNetLemma,List<LinkedHashSet<WordNetLemma>>>();
07       for (WordNetLemma lemma : lemmas) {
08          List<LinkedHashSet<WordNetLemma>> partial =this.getAntonymousSynsetLemmas(lemma, _wordnet);
09          // merge
10          synonyms.put(lemma, partial);
11       }
12       return synonyms;
13    }
14
15    static List<LinkedHashSet<WordNetLemma>>
16          getAntonymousSynsetLemmas(WordNetLemma lemma, IDictionary_wordnet) {
17       List<LinkedHashSet<WordNetLemma>> antonymyLemmas = newLinkedList<LinkedHashSet<WordNetLemma>>();
18       IIndexWord indexWord = _wordnet.getIndexWord(lemma.__token,lemma.__pos);
19       if (indexWord == null)
20          return antonymyLemmas;
21       List<IWordID> wordIDs = indexWord.getWordIDs();
22       for (IWordID wordID : wordIDs) {
23          IWord word = _wordnet.getWord(wordID); // get the @word corresponding to @wordID
24          LinkedHashSet<WordNetLemma> antonymyLemmasOfOneSynset = newLinkedHashSet<WordNetLemma>();
25          List<IWordID> antonymousWordIDs =word.getRelatedWords(edu.mit.jwi.item.Pointer.ANTONYM);
26          for (IWordID antonymousWordID : antonymousWordIDs) {
27            // one corresponding @antonymousWord
28             IWord antonymousWord =_wordnet.getWord(antonymousWordID);
29             // add this
30             String antonymousLemmaToken = antonymousWord.getLemma();
31             POS antonymousLemmaPOS = antonymousWord.getPOS();
32             WordNetLemma antonymousLemma = newWordNetLemma(antonymousLemmaToken, antonymousLemmaPOS);
33             antonymyLemmasOfOneSynset.add(antonymousLemma);
34          }
35          antonymyLemmas.add(antonymyLemmasOfOneSynset);
36       }
37       return antonymyLemmas;
38    }
39
40    public static void main(String[] args) {
41       String token = "familiar";
42       POs pos = POS.ADJECTIVE;
43       // replace it with your own WordNet path
44       final String WNRoot = "C:/development/WordNet/2.1/dict";
45       URL dicturl;
46       try {
47          dicturl = new URL("file", null, WNRoot);
48          IDictionary dict = new edu.mit.jwi.Dictionary(dicturl);
49          dict.open();
50          WordnetStemmer stemmer = new WordnetStemmer(dict);
51          List<String> wordforms = stemmer.findStems(token, pos);
52          LinkedHashSet<WordNetSynsetLemma> lemmas = newLinkedHashSet<WordNetSynsetLemma>();
53          for (String wordform : wordforms) {
54             lemmas.add(new WordNetSynsetLemma(wordform, pos));
55          }
56          Map<WordNetSynsetLemma,List<LinkedHashSet<WordNetSynsetLemma>>> antonyms =
getAntonymousSynsetLemmas(lemmas, dict);
57          System.out.println(antonyms);
59       }
60    } catch (MalformedURLException e) {
61       // TODO Auto-generated catch block
62       e.printStackTrace();
63    } catch (IOException e) {
64       // TODO Auto-generated catch block
65       e.printStackTrace();
66    }
67 }

上述代码的运行结果如下。其中familiar在形容词下有四个意项，前两个意项的反义词分别为unfamiliar和strange，后两个意项下不存在反义词。从下图也能看出上述结果是正确的。

{familiar.adjective=[[unfamiliar.adjective], [strange.adjective], [], []]}

如果将token替换成goodness，pos替换成POs.NOUN，则结果如下

{goodness.noun=[[badness.noun], [evilness.noun]]}