An unsupervised neural attention model for aspect extraction 读论文笔记

论文来源
http://www.comp.nus.edu.sg/~leews/publications/acl17.pdf
参考阅读笔记
https://www.jianshu.com/p/241cb238e21f
公式没有打,参考序号


Abstract

  1. aspect extraction —> one of aspect-based sentiment analysis (情感分析)
  2. existing work: topic model —> successful, but not coherent
  3. this paper —> novel neural approach —> aim at discovery coherent aspects —>by exploiting the distribution of word co-occurrences through the use of neural word embedding
    1. WIKIPEDIA定义 : Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers.
  4. topic model —> assume independently generated words
    word embedding —> encourage words that appear in similar contexts to be located close
  5. 使用attention机制在训练的时候来削弱不相关词

Introduction

  1. aspect extraction —> 提取表达观点的entity aspects (实体)
  2. Aspect extraction有两个子任务:(1)提取所有aspect terms ,如beef(2)aspect terms的聚类,例如beef, pork 都是food类的。
  3. previous work for aspect extraction:
    1. rule-based: usually do not group (In computer science, rule-based systems are used as a way to store and manipulate knowledge to interpret information in a useful way. They are often used in artificial intelligence applications and research.)
    2. supervised: require data annotation & 受领域限制
    3. unsupervised: avoid reliance on labeled data
  4. 近年,LDA&其变种 —> unsupervised的主导方法 —> LDA发现的the mixture of aspect可以很好的描述一个corpus —but—>单个aspect的提取是很差的—aspect经常包含不相关的词或者很低关联度的concepts
  5. poor quality可能原因:
    1. 传统LDA不直接encode word co-occurrence 统计数据(preserve主题关联度的主要信息),而是隐式的提取patterns,
    2. 通过文档级别的word generation(单词生成)建模,假设每个词是generated independently。
  6. 此外,LDA模型是估计每个文档的主题分布,如果文档很短,使得估计主题分布更加困难。
  7. 本文:克服LDA弱点 by neural embedding: already map words that usually co-occur within the same context —> filter the word embeddings with attention mechanism —> use the filtered words to construct aspect embedding
  8. aspect embedding的训练过程与autoencoders相似,本文使用降维的方法去提取embedded句子的common factors,通过 a linear combination of aspect embeddings重构每个句子。attention机制削弱了没有出现在任何aspect中的词,使得模型focus on aspect words。
  9. 模型命名为ABAE (Attention-based Aspect Extraction)
  10. ABAE优点:明确encode word-occurrence statistics into word embedding;降维提取最重要的aspects in the review corpus;attention mechanism remove 无关词以提高关联度
  11. ABAE intuitive & structurally simple & 易适大样本

  1. HU&LIU extract different product features through finding frequent nouns and noun phrases. also extracted opinion terms by finding the synonyms and antonyms of opinion seed words through WordNet. 频繁项集挖掘,dependency information。 依赖于提前定义好的规则,只在一少部分名词上work的比较好。
  2. Supervised方法:转换成序列标注问题,使用HMM和CRF,分别使用一组手动提取的特征。最近的,使用自动学习特征,CRF-based aspect extraction。监督学习需要大量标注的数据。并且基于规则的模型对提取的aspect terms分类通常不够细致。
  3. Unsupervised:主题模型,这些模型的输出是word distributions或ranking for aspect。aspect的获取往往没有分别提取和分类。最近也有一篇是利用RBM来同时提取aspect和sentiment,将aspect和sentiment作为RBM中分离的隐藏节点,然而这个模型依赖于先验知识,例如part-of-speech(POS) tagging词性标注和sentiment lexicons情感词典。 A biterm topic model(BTM)可以生成co-occurring word pairs。本文对比用BTM&ABAE
    1. Attention models:Attention模型被使用在机器翻译,句子总结sentence summarization,情感分类 sentiment classification和问答。不是利用所有的信息,attention model focus on the most pertinent info for a task.
  4. 实验结果在unsupervised设置下展示有效性

Model Description

  1. ultimate goal : learn a set of aspect embedding, where each aspect can be interpreted by looking at the nearest words(representative word) in the embedding space
  2. word embedding feature vector ew ∈ Rd
  3. word embedding matrix E ∈ RV x d V : vocabulary size
  4. aspect embedding matrix T ∈ RK x d K : 定义的aspect数
  5. K is much smaller than V, aspects 和 words有相同的embedding space
  6. aspect embedding 用于近似AM过滤后的aspect words
  7. ABAE input: a list of indexes for words in a review sentence . Tow step:(降维+重构 —> 最低失真,最多信息)
    1. AM —> filter non-aspect words ; construct a sentence embedding zs from weighted word embeddings
    2. 通过线性组合T中 aspect embedding 来重构 sentence embeddings —> rs

3.1 Sentence Embedding with Attention Mechanism

  1. zs — weighted summation of word embedding
    公式(1)
    其中ai 是对句子中每个word wi的权值(>0)
    代表wi 是topic的right word的概率,由AM模型计算得到,方法
    公式(2)(3)(4)

  2. ys : average of word embeddings

  3. M ∈ Rd x d : map between ys & ew , M通过学习获得

  4. AM — tow-step process:

    1. 给定一个句子,通过取平均 word representations得句子的representations
    2. word的weight通过以下获得:用M过滤word,获得word与K个aspects的关联度;filtered words与ys 求内积,获得filtered word与sentence的关联度

3.2 Sentence Reconstruction with Aspect Embeddings

  1. 已获得sentence embedding,接下来说如何reconstruction
  2. reconstruction : linear combination of aspect embeddings from T
    公式(5)

  3. rs : reconstruction vector

  4. T : aspect embedding matrix

  5. pt : aspect embedding 的 weight vector 输入句子属于相关aspect的概率

  6. pt 的获取: 将zs 从d维降至K维 +用非线性softmax生成归一化非负权重
    公式(6)
    W-weighted matrix parameter ; b-bias vector ; 通过学习获得

3.3 Training Objective

  1. ABAE is trained to minimize the reconstruction error

  2. 采用 contrastive max-margin objective function

  3. 对每个输入sentence,随机从training set抽取m个样本作为negative samples

  4. 对每个negative sample 计算 ni :average of its word embeddings

  5. 目标:使重构embedding rs 和目标sentence embedding zs 相似且区分于negative
    samples

  6. 未正规化目标函数J :hinge loss:使 rs 和zs 内积max,同时rs 和negative sample内积min
    公式(7)

  7. D: training data set

  8. θ: model params : E T M W b

3.4 Regularization Term(a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting)

  1. 想学习最具有代表性的aspects的vector
    represention,但T在训练过程中会出现冗余问题,为保证结果差异性,添加regularization term
    公式(8)

  2. I : identity matrix

  3. Tn : T的每一行归一化为1

  4. Tn*TnT中任意非对角的元素tij(i不等于j)都与两个不同的aspect embeddings的内积相关联

  5. U取得最小值条件:任何两个不同aspect间embedding点积为0

  6. 因此regularization term鼓励行正交,惩罚冗余 final objective function L
    公式(9)

  7. λ:超参数,控制regularization term权重


Experiment Setup

4.1 Datasets
4.2 Baseline Methods

  1. LocLDA:each sentence is treated as a separate document
  2. k-means:using the k-means centroids directly
  3. SAS:competitive in discovering meaningful aspects
  4. BTM:designed for short texts;优点 减缓短文数据稀疏

4.3 Experimental Settings

  1. 去掉标点,停用词,频次少于10次的词
  2. ABAE初始化:带有negative sampling的word2vec初始化E;embedding size 200,window size 10,negative sample 5;用k-means聚类结果的centroids初始化T;其他参数:randomly
  3. ABAE训练过程中:用Adam (Kingma and Ba, 2014) 修正E并优化其他参数;negative sample—> 20;正交惩罚 1 (通过网格搜索调试)
  4. restaurant topic数:14 beer topic数:14
  5. 一个aspect的representative words通过用cosine相似矩阵在其embedding space中的nearest words找

Evaluation and Results

  1. 判据

    1. 是否能够找到meaningful和semantically coherent aspect
    2. 是否能改进aspect identification performance on 真实的评价数据集
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值