Wordnet是国际上非常有影响力的英语词汇知识库
相比于一般的知识表示方法,Wordnet更能够在语义的层面上给自然语言处理工作者带来一些帮助
其特点可以总结如下:
1.在Wordnet中,synset为最基本的单位。synset,顾名思义,就是Synonyms set(同义词集合)的意思,每一个synset都对应着一个独特的语义,在一个synset里可能包含一个或一组词条。当然,每一个词条也可能对应着几个不同的synset
举个例子,在car这个词条下,就存储着以下五个synset:
1. (598) car, auto, automobile, machine, motorcar -- (a motor vehicle with four wheels; usually propelled by an internal combustion engine; "he needs a car to get to work")
2. (24) car, railcar, railway car, railroad car -- (a wheeled vehicle adapted to the rails of railroad; "three cars had jumped the rails")
3. (1) cable car, car -- (a conveyance for passengers or freight on a cable railway; "they took a cable car to the top of the mountain")
4. car, gondola -- (the compartment that is suspended from an airship and that carries personnel and the cargo and the power plant)
5. car, elevator car -- (where passengers ride up and down; "the car was on the top floor")
也可以看到,每一个synset都包含着一组词条
2.Wordnet除了标明了词与词之间的同义关系,还建立了词之间的反义关系,上下位关系
反义关系好理解,上下位关系主要表征的是,一个词属于哪个父类,又含有哪些子类
3.受益于Wordnet上下位关系的层次结构,Wordnet可以提供计算两个词之间距离的功能
具体参考
Python自然语言处理
统计自然语言处理