[NLP-3] Part of Overview for Hypernym

Foreword

  • Hypernym-hyponym (is-a) relation
    • Definition: A noun X is a hyponym of a noun Y if X is a subtype or instance of Y. Thus “Shakespeare” is a hyponym of “author” (and conversely “author” is a hypernym of “Shakespeare”),“dog” is a hyponym of “canine”, “desk” is a hyponym of “furniture”, and so on.1
  • Taxonomy
    • Definition: Taxonomy (Semantic taxonomies or Lexical taxonomies or Syntactic taxonomies, a kind of semantic hierarchies) is the backbone of many knowledge-rich applications such as question answering2, query understanding3 and personalized recommendation4.
    • Challenges: Manually constructed taxonomies (Expert-compiled thesauri such as WordNet) (1) Limit the lexical coverage in scope and domain (2) are slow and labor-intensive process
    • Solutions: Automatically constructed taxonomies (1) Automatically learning taxonomic relations (2) Automatically constructing semantic hierarchies.
    • Automatically learning taxonomic relations
    • Patttern-based methods can only cover few linguistic circumstances
    • Word embedding (or distributed word representation) has been empirically proved effective in modeling some of the semantic relations between words by offsets of word vectors 5 6

Outline

  • Automatically learning taxonomic relations
    • Chinese “is-a” extraction from user generated categories7
    • Open-domain hypernym discovery8
    • Hypernym discovery based on syntactic patterns learning9
    • Supervised distributional hypernym discovery via domain adaptation10
    • Unsupervised hypernymy detection based on distributional inclusion vector embedding (DIVE)11
  • Automatically constructing semantic hierarchies
    • SemEval-2016 Task 14: Semantic Taxonomy Enrichment12
    • TALN at SemEval-2016 Task 14: Semantic Taxonomy Enrichment Via Sense-Based Embeddings13
    • Automatic taxonomy induction based on reinforcement learning14
    • Task-Guided Taxonomy Construction by hierarchical tree expansion (HiExpan)15

Chinese Hypernym-Hyponym Extraction from User Generated Categories

Introduction

  • Motivations: (1) “is-a” relations obtained from manual thesauri (WordNet) are limited (2) Language dependent (Such as English and Chinese) (3) Word embedding (distributed word representation) proved effective in modeling some of the semantic relations (4) The representation of is-a relations is complicated
  • Do what: Harvesting Chinese is-a relations from user generated categories
  • How to do: By studying the relations of word embedding between hyponyms and their respective hypernums, is-a relations can be identified by learning semantic prediction models
  • Challenges: Vector offsets of is-a relations are quite different in varied data sources and domains, which shown in Fig 1. this implies that models learned from one knowledge source are not necessarily effective to extract is-a relations from another source.
  • Contributions: A weakly supervised framework to extract is-a relations automatically (1) Build piecewise linear projection models trained on samples from an existing Chinese taxonomy16 (2) A bi-criteria optimization method is proposed to avoid “semantic drift”, so that projection models are trained without any labeling efforts.
    Fig1 sss

Methods

  • Problem Statement:

A taxonomy is a direct acyclic graph G = ( E , R ) G = (E, R) G=(E,R) where nodes E E E represent entities/classes and edges R R R denote is-a relations.
To extract is-a relations from user generated categories, we obtain the collection of entities E ∗ E^* E from the knowledge source (such as Baidu Baike). The set of user generated categories for each e e e in E ∗ E^* E is denoted as C a t ( e ) Cat(e) Cat(e). Thus we need to design a learning algorithm F F F based on R R R to predict whether there is an is-a relation between e e e and c c c where e e e in E ∗ E^* E and c c c in C a t ( e ) Cat(e) Cat(e).

  • Motivation of Method

Three observations

  • For a fixed x x x, if y 1 y_1 y1 and y 2 y_2 y2 are hypernyms of different levels. v ( x ) − v ( y 1 ) v(x)-v(y_1) v(x)v(y1) != v ( x ) − v ( y 2 ) v(x)-v(y_2) v(x)v(y2)
  • If ( x 1 x_1 x1, instanceOf, y 1 y_1 y1) and ( x 2 x_2 x2, subClassOf, y 2 y_2 y2) hold
  • For is-a pairs in two different domains ( x 1 , y 1 ) (x_1, y_1) (x1,y1) and ( x 2 , y 2 ) (x_2, y_2) (x2,y2)
  • The differences of is-a representations between knowledge sources suggest that a simple model trained on the taxonomy is not effective for is-a extraction from encyclopedias.

  • Two steps of methods

    • In the initial stage, we train piecewise linear projection models based on the taxonomy, aiming to learn prior representations of is-a relations in the embedding space.
    • Next, we iteratively extract new is-a relations from user generated categories using models in the previous round and adjust our models accordingly.
  • Initial Model Training

    • Train a Skip-gram model over a Chinese text corpus to obtain word embedding
    • Randomly sample is-a relations from R ∗ R^* R
    • Combine Mikolov et al5(vector offsets) and Fu et al17(projection matrices) to map words to their hypernyms. For a pair ( x i , y i ) (x_i, y_i) (xi,yi), we assume a projection matrix M M M and an offset vector b b b map x i x_i xi to y i y_i yi in the form: M ∗ v ( x i ) + b = v ( y i ) M*v(x_i)+b=v(y_i) Mv(xi)+b=v(yi).
    • Partition training data into K K K cluster by K-means based on vector offset. Assume each cluster share the same projection matrix and vector offset. Give the objective function, learned via SGD.
      在这里插入图片描述
  • Iterative Learning Process

    • Randomly sample some instances from unlabeled dataset and cluster. Compute difference each pair.
    • Pattern-based relation selection method.
    • Update the cluster centroid and re-assign the membership in each cluster.
    • Update model parameter.

Results

  • Dataset: Baidu Baike corpus
  • F-measure is improved by 2% because we consider both vector offsets and matrix projection in is-a representation learning, which is more precise.
    在这里插入图片描述

Conclusions

  • In the initial stage, word embedding based piecewise linear projection models are trained on a Chinese taxonomy to map entities to hypernyms.
  • Next, an iterative learning process combined with a pattern based relation selection algorithm is introduced to update models without human supervision.
  • Automatic pattern detection

Exploiting Multiple Sources for Open-domain Hypernym Discovery

Introduction

  • Motivations: (1) Lexical patterns perform badly (2) Extracts from encyclopedias has limited coverage
  • Do what: Given an entity name, we try to discover its hypernyms by leveraging knowledge from multiple sources.
  • How to do: First, we extract candidate hypernyms from the above sources. Then, we apply a statistical ranking model to select correct hypernyms.
  • Contributions: This paper proposes a simple yet effective distant supervision framework for Chinese open-domain hypernym discovery. (1) A set of novel features is proposed for the ranking model. (2) We also present a heuristic strategy to build a large-scale noisy training data for the model without human annotation.

Methods

  • Candidate Hypernym Collection from Multiple Sources

    • Collect potential hypernyms from four sources, i.e., search engine results, two encyclopedias, and morphology of the entity name.
    • Count the co-occurrence frequency and select TOP N nouns as the main candidates.
    • Furthermore, add the user-generated tags into the candidates (Baidubaike and Hudongbaike).
    • In addition, the head words of entities are also their hypernyms sometimes (the head word of Emperor Penguin indicates that it’s a kind of penguins).
    • Combine all of these hypernym candidates together and the final coverage rate reaches 93.24%.
  • Hypernym Ranking Model

    • Propose several effective features for the model
      • Hypernym Prior, In Titles, Synonyms, Radicals
        在这里插入图片描述
    • Present a heuristic strategy to collect training data

    We select positive training instances following two principles:
    (1) Principle 1: Among the four sources used for
    candidate collection, the more sources from
    which the hypernym candidate is extracted, the
    more likely it is a correct one.
    (2) Principle 2: The higher the prior of the candidate being a hypernym is, the more likely it is a
    correct one.

    • Compare three hypernym ranking models on this data set, including Support Vector Machine (SVM) with a linear kernel, SVM with a radial basis function (RBF) kernel and Logistic Regression (LR).

Results

在这里插入图片描述

Conclusions

  • This paper proposes a novel method for finding hypernyms of Chinese open-domain entities from multiple sources.
  • We collect candidate hypernyms with wide coverage from search results, encyclopedia category tags and the head word of the entity.
  • Then, we propose a set of features to build statistical models to rank the candidate hypernyms on the training data collected automatically.
  • For future work, we would like to explore knowledge from more sources to enhance our model, such as semantic thesauri and infoboxes in encyclopedias.

Learning syntactic patterns for automatic hypernym discovery


  1. Learning syntactic patterns for automatic hypernym discovery ↩︎

  2. Shuo Yang, Lei Zou, Zhongyuan Wang, Jun Yan, and Ji-Rong Wen. 2017. Efciently Answering Technical Questions - A Knowledge Graph Approach. In AAAI ↩︎

  3. Wen Hua, Zhongyuan Wang, Haixun Wang, Kai Zheng, and Xiaofang Zhou. 2017. Understand Short Texts by Harvesting and Analyzing Semantic Knowledge. TKDE (2017) ↩︎

  4. Yuchen Zhang, Amr Ahmed, Vanja Josifovski, and Alexander J. Smola. 2014. Taxonomy discovery for personalized recommendation. In WSDM ↩︎

  5. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781. ↩︎ ↩︎

  6. Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013b. Linguistic regularities in continuous space word representations. In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, pages 746 751. ↩︎

  7. Chinese Hypernym-Hyponym Extraction from User Generated Categories ↩︎

  8. Exploiting Multiple Sources for Open-domain Hypernym Discovery ↩︎

  9. Learning syntactic patterns for automatic hypernym discovery ↩︎

  10. Supervised Distributional Hypernym Discovery via Domain Adaptation ↩︎

  11. Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection ↩︎

  12. SemEval-2016 Task 14: Semantic Taxonomy Enrichment ↩︎

  13. Semantic Taxonomy Enrichment Via Sense-Based Embeddings ↩︎

  14. End-to-End Reinforcement Learning for Automatic Taxonomy Induction ↩︎

  15. HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion ↩︎

  16. Jinyang Li, Chengyu Wang, Xiaofeng He, Rong Zhang, and Ming Gao. 2015. User generated content oriented chinese taxonomy construction. In Web Technologies and Applications - 17th Asia-PacificWeb Conference, pages 623–634. ↩︎

  17. Ruiji Fu, Jiang Guo, Bing Qin, Wanxiang Che, Haifeng Wang, and Ting Liu. 2014. Learning semantic hierarchies via word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 1199 1209 ↩︎

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值