【Literature Reading】Term Set Expansion(术语集扩展/种子词扩展)

Hello! Today I am going to reading some literature about NLP/Data governance/Platform digital enablement… To recording them, I’ll put my reading notes on my CSDN blog! Welcome to communicating with me!


Term Set Expansion based on Multi-Context Term Embeddings: an End-to-end Workflow. Mamou et al. 2018


0 Overview

Overall, this paper is short and pithy. It has 4 sections and mainly proposed an algorithm to helping expanding the terms set having similar functional meaning.
在这里插入图片描述

1 Introduction

For quickly understanding what problem is this paper mainly address for, we should first learn about what is Term Set Expansion? In the following figure, I’ll show you two forms of similarity among terms.

  1. TSE based on Topical similarity
    Giving a word, then finding other words having a similar topic with it. For example, we input the word “python” and we want to find some words expressing the same theme with it in our corpus having represented with word vectors using linear bag-of-words. As a result, we found “bytecode”,“high-level programming language”… You will see that these words or phrases are description of “python”.
  2. TSE based on Functional similartity
    Again, we input a word “python”. Then it generated some terms like Java, C++, C# … via terms set expansion. You must grasp the difference: Java has similar function with python but not a description or supplyment of python.
    在这里插入图片描述Knowing the meaning of the term set expansion, we can consider some more complicated situation. Please look at the following figure. Now, we don’t input a single word anymore. We want to put a set of terms and find the expanded set of it. We add two new words and make two independent term sets. In the first set, “yellow” and “orange” are both a description of color. So words in the expanded set must be also color terms. The second seed set is the same.
    在这里插入图片描述

2 Term Set Expansion Algorithm Overview

In this section, the author introduced the algorithm of term set expansion elaborately. I understand it as a circular structure, you can look at the folloing flow chart.

  • We should generate the original seed set using for the first iteration by manual collection.
  • Trian your word embedding model.
  • Set a threshold and find some words has high similarity with the centroid of the seed set. In this step, these words are likely to contain words that do not need to be placed in the expended set.
  • Trian a binary classification model to screen the error words we don’t nedd.
  • Iterate !

在这里插入图片描述
You can read this following PPT for more details.在这里插入图片描述


I’ll share an example of my own practice later.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

WinniToast

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值