自然语言处理☞WordNet

WordNet is a lexical database for the English language.It groups English words into sets of synonyms called synsets,provides short definitions and usage examples, and records a number of relations among these synonym sets or their members. WordNet can thus be seen as a combination of dictionary and thesaurus(分类词汇汇编). 

WordNet是一个英语的词汇数据库。它把同义的英语词汇组成一组称为synsets。提供了简短的定义和使用示例,并且记录同义词集中关系或它们的成员。WordNet可以被看成是词典和同义词的结合。

About WordNet(https://wordnet.princeton.edu/)

关于WordNet

WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download. WordNet's structure makes it a useful tool for computational linguistics and natural language processing.

WordNet®是一个大型的词汇数据库。名词、动词、形容词和副词被组合成为认知同义词集合(synset),每一个词集都表示了不同的概念。Synset通过概念语义和词语关系进行链接。结果网络中相关的词语和概念可以使用浏览器导航。WordNet也是可以免费和公开下载的。WordNet的结构使它成为计算语言学和自然语言处理的有用工具。

WordNet superficially resembles a thesaurus, in that it groups words together based on their meanings. However, there are some important distinctions. First, WordNet interlinks not just word forms—strings of letters—but specific senses of words. As a result, words that are found in close proximity to one another in the network are semantically disambiguated. Second, WordNet labels the semantic relations among words, whereas the groupings of words in a thesaurus does not follow any explicit pattern other than meaning similarity.

WordNet表面上像一个同义词词典,其内部是基于词语的语义将词语进行组合。无论怎样,(与普通的词典相比)都会有一些重要的区别。第一,WordNet的内部联系不仅仅是通过单词的形式(字符串中的字符)而是词的重要含义。因此,词在网络中与另一个词紧密关联是可以消除语义歧义的。其次,WordNet标注了词之间的语义关系,在词语分类汇编下词语组内除了语义相似度外没有遵循其它任何明确的模式。

Structure

结构

The main relation among words in WordNet is synonymy, as between the words shut and close or car and automobile. Synonyms--words that denote the same concept and are interchangeable in many contexts--are grouped into unordered sets (synsets). Each of WordNet’s 117 000 synsets is linked to other synsets by means of a small number of “conceptual relations.” Additionally, a synset contains a brief definition (“gloss”) and, in most cases, one or more short sentences illustrating the use of the synset members. Word forms with several distinct meanings are represented in as many distinct synsets. Thus, each form-meaning pair in WordNet is unique.

在WordNet中词语的主要关系就是同义,例如词语shut和close或者car和automobile。同义词表示了相同的概念或者在很多上下文下是可以交换的,被划分到同一集合内,集合与集合之间是无序的。WordNet的117000个synset词集中每一个与其它集合链接通过少量的“概念关系”,除此之外,一个词集内部包含了一个明确的定义(“gloss注释”)并且,大多数情况下,一个或更多的短句阐述了synset集合内部成员的使用。在很多不同的词集中词被形式化不同意的含义表达。因此,在WordNet中每一个形式涵义对都是唯一的。

Relations

关系

The most frequently encoded relation among synsets is the super-subordinate relation (also called hyperonymy, hyponymy or ISA relation). It links more general synsets like {furniture, piece_of_furniture} to increasingly specific ones like {bed} and {bunkbed}. Thus, WordNet states that the category furniture includes bed, which in turn includes bunkbed; conversely, concepts like bed and bunkbed make up the category furniture. All noun hierarchies ultimately go up the root node {entity}. Hyponymy relation is transitive: if an armchair is a kind of chair, and if a chair is a kind of furniture, then an armchair is a kind of furniture. WordNet distinguishes among Types (common nouns) and Instances (specific persons, countries and geographic entities). Thus, armchair is a type of chair, Barack Obama is an instance of a president. Instances are always leaf (terminal) nodes in their hierarchies.

在同义词集之间最常被编码的关系是从属关系(super-subordinate ralation也被称为hyperonymy,hyponymy or ISA关系)。它链接更一般的词集,像{furniture,piece_of_furniture},再增加特殊的词像{bed}和{bunkbed}。因此,WordNet表示furniture的分类包括床{bed},又包括双层床{bunkbed}。反之,概念像bed和bunkbed组成了furniture的分类。所有的名词的层次,最终上升为根节点{entity}。 Hyponymy(上义词)关系是可解释的,如armchair是chair中的一种,并且char是furniture中的一种,那么armchair也是furniture的一种。WordNet区分了类型(一般的名词)和实例(特定的人、国家和地理实体)。因此,armchaire是chair中的一类。Barack Obama是总统中的一个实例。实例总是层次关系的叶子节点(最终段的)。

Meronymy, the part-whole relation holds between synsets like {chair} and {back, backrest}, {seat} and {leg}. Parts are inherited from their superordinates: if a chair has legs, then an armchair has legs as well. Parts are not inherited “upward” as they may be characteristic only of specific kinds of things rather than the class as a whole: chairs and kinds of chairs have legs, but not all kinds of furniture have legs.

部分整体关系(Meronymy,part-whole)在词集之间例如{chair}和{back,backrest},{seat}和{leg},部分继承自它们的上义词:如一个chair有腿,那么armchair也有腿。反之,部分不能向上继承。因为它们可能只是特定种类的东西的特征,而不是而不是整个类的特征。椅子和某种椅子有腿,但不是所有的家具都有腿。

Verb synsets are arranged into hierarchies as well; verbs towards the bottom of the trees (troponyms) express increasingly specific manners characterizing an event, as in {communicate}-{talk}-{whisper}. The specific manner expressed depends on the semantic field; volume (as in the example above) is just one dimension along which verbs can be elaborated. Others are speed (move-jog-run) or intensity of emotion (like-love-idolize). Verbs describing events that necessarily and unidirectionally entail one another are linked: {buy}-{pay}, {succeed}-{try}, {show}-{see}, etc.

动词词集合也被安排为层次关系,动词树的底端(troponyms方式词)表达一个事情描述的更为准确的方式特征。例如{comunicate}--{talk}--{whisper}。 特定的表述方式依赖于语义环境;volumn(正如上面的例子所述)只是动词能够表述的一个维度。其它的速度(move-jog-run)或者情绪的集中程度(like-love-idolize)。从此描述了事件之间必要和非直接需求联系:{buy}-{pay},{succeed}-{try},{show}和{see}等

Adjectives are organized in terms of antonymy. Pairs of “direct” antonyms like wet-dry and young-old reflect the strong semantic contract of their members. Each of these polar adjectives in turn is linked to a number of “semantically similar” ones: dry is linked to parched, arid, dessicated and bone-dry and wet to soggy, waterlogged, etc. Semantically similar adjectives are “indirect antonyms” of the contral member of the opposite pole. Relational adjectives ("pertainyms") point to the nouns they are derived from (criminal-crime). 

形容词以反义词的形式管理。成对直接的反义词如:wet-dry 和 young-old反映了成员之间强烈的语义对比。这些反义形容词反过来又与一些语义相近的形容词链接:dry链接到 parched, arid, dessicated 和bone-dry,以及wet链接到soggy, waterlogged等。语义相近的形容词是间接的反义词。关系形容词("pertainyms")指向名词他们从(刑事犯罪)派生而来。


There are only few adverbs in WordNet (hardly, mostly, really, etc.) as the majority of English adverbs are straightforwardly derived from adjectives via morphological affixation (surprisingly, strangely, etc.)

在WordNet中只有很少的几个副词(hardly,mostly,really等),因为英语中大部分的副词都是从形容词附加而来(surprisingly,strangely等)

Cross-POS relations

交叉的词性关系

The majority of the WordNet’s relations connect words from the same part of speech (POS). Thus, WordNet really consists of four sub-nets, one each for nouns, verbs, adjectives and adverbs, with few cross-POS pointers. Cross-POS relations include the “morphosemantic” links that hold among semantically similar words sharing a stem with the same meaning: observe (verb), observant (adjective) observation, observatory (nouns). In many of the noun-verb pairs the semantic role of the noun with respect to the verb has been specified: {sleeper, sleeping_car} is the LOCATION for {sleep} and {painter}is the AGENT of {paint}, while {painting, picture} is its RESULT.

WordNet的主要关系连接词是将同一词性(POS)连接起来。因此,WordNet实时上包括4个子网络。分别是名词,动词、形容词和副词。以及少数的交叉词性指针。交叉词性关系包括”形态记忆”链接相似的语义,词之间共享了相同的词干:observe(动词),ovservant(形容词)observation,observatory(名词)。在很多名词-动词中,对名词的语义角色关于动词已经被指定:{sleeper,sleeping_car}是{sleep}的地点。{painer}是{paint}的代理,然而{painting,picture}是它的结果。

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值