An unsupervised neural attention model for aspect extraction 读论文笔记

最新推荐文章于 2021-07-19 16:11:27 发布

tiny_diudiu

最新推荐文章于 2021-07-19 16:11:27 发布

阅读量872

点赞数

分类专栏： aspect sentiment analysis学习文章标签： aspect extraction NLP sentiment analysis ABAE Attention modeling

aspect sentiment analysis学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

论文来源
http://www.comp.nus.edu.sg/~leews/publications/acl17.pdf
参考阅读笔记
https://www.jianshu.com/p/241cb238e21f
公式没有打，参考序号

Abstract

aspect extraction —> one of aspect-based sentiment analysis （情感分析）
existing work: topic model —> successful, but not coherent
this paper —> novel neural approach —> aim at discovery coherent aspects —>by exploiting the distribution of word co-occurrences through the use of neural word embedding
1. WIKIPEDIA定义 : Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers.
topic model —> assume independently generated words
word embedding —> encourage words that appear in similar contexts to be located close
使用attention机制在训练的时候来削弱不相关词

Introduction

aspect extraction —> 提取表达观点的entity aspects （实体）
Aspect extraction有两个子任务：（1）提取所有aspect terms ，如beef（2）aspect terms的聚类，例如beef, pork 都是food类的。
previous work for aspect extraction:
1. rule-based: usually do not group (In computer science, rule-based systems are used as a way to store and manipulate knowledge to interpret information in a useful way. They are often used in artificial intelligence applications and research.)
2. supervised: require data annotation & 受领域限制
3. unsupervised: avoid reliance on labeled data
近年，LDA&其变种 —> unsupervised的主导方法 —> LDA发现的the mixture of aspect可以很好的描述一个corpus —but—>单个aspect的提取是很差的—aspect经常包含不相关的词或者很低关联度的concepts
poor quality可能原因：
1. 传统LDA不直接encode word co-occurrence 统计数据（preserve主题关联度的主要信息），而是隐式的提取patterns，
2. 通过文档级别的word generation（单词生成）建模，假设每个词是generated independently。
此外，LDA模型是估计每个文档的主题分布，如果文档很短，使得估计主题分布更加困难。
本文：克服LDA弱点 by neural embedding: already map words that usually co-occur within the same context —> filter the word embeddings with attention mechanism —> use the filtered words to construct aspect embedding
aspect embedding的训练过程与autoencoders相似，本文使用降维的方法去提取embedded句子的common factors，通过 a linear combination of aspect embeddings重构每个句子。attention机制削弱了没有出现在任何aspect中的词，使得模型focus on aspect words。
模型命名为ABAE (Attention-based Aspect Extraction)
ABAE优点：明确encode word-occurrence statistics into word embedding；降维提取最重要的aspects in the review corpus；attention mechanism remove 无关词以提高关联度
ABAE intuitive & structurally simple & 易适大样本

HU&LIU extract different product features through finding frequent nouns and noun phrases. also extracted opinion terms by finding the synonyms and antonyms of opinion seed words through WordNet. 频繁项集挖掘，dependency information。依赖于提前定义好的规则，只在一少部分名词上work的比较好。
Supervised方法：转换成序列标注问题，使用HMM和CRF，分别使用一组手动提取的特征。最近的，使用自动学习特征，CRF-based aspect extraction。监督学习需要大量标注的数据。并且基于规则的模型对提取的aspect terms分类通常不够细致。
Unsupervised：主题模型，这些模型的输出是word distributions或ranking for aspect。aspect的获取往往没有分别提取和分类。最近也有一篇是利用RBM来同时提取aspect和sentiment，将aspect和sentiment作为RBM中分离的隐藏节点，然而这个模型依赖于先验知识，例如part-of-speech(POS) tagging词性标注和sentiment lexicons情感词典。 A biterm topic model(BTM)可以生成co-occurring word pairs。本文对比用BTM&ABAE
1. Attention models：Attention模型被使用在机器翻译，句子总结sentence summarization，情感分类 sentiment classification和问答。不是利用所有的信息，attention model focus on the most pertinent info for a task.
实验结果在unsupervised设置下展示有效性

Model Description

ultimate goal : learn a set of aspect embedding, where each aspect can be interpreted by looking at the nearest words(representative word) in the embedding space
word embedding feature vector ew ∈ Rd
word embedding matrix E ∈ RV x d V : vocabulary size
aspect embedding matrix T ∈ RK x d K : 定义的aspect数
K is much smaller than V, aspects 和 words有相同的embedding space
aspect embedding 用于近似AM过滤后的aspect words
ABAE input: a list of indexes for words in a review sentence . Tow step:（降维+重构 —> 最低失真，最多信息）
1. AM —> filter non-aspect words ; construct a sentence embedding zs from weighted word embeddings
2. 通过线性组合T中 aspect embedding 来重构 sentence embeddings —> rs

3.1 Sentence Embedding with Attention Mechanism

zs — weighted summation of word embedding
公式(1)
其中ai 是对句子中每个word wi的权值（>0）
代表wi 是topic的right word的概率，由AM模型计算得到，方法
公式(2)(3)(4)
ys : average of word embeddings
M ∈ Rd x d : map between ys & ew , M通过学习获得
AM — tow-step process:
1. 给定一个句子，通过取平均 word representations得句子的representations
2. word的weight通过以下获得：用M过滤word，获得word与K个aspects的关联度；filtered words与ys 求内积，获得filtered word与sentence的关联度

3.2 Sentence Reconstruction with Aspect Embeddings

已获得sentence embedding，接下来说如何reconstruction
reconstruction : linear combination of aspect embeddings from T
公式(5)
rs : reconstruction vector
T : aspect embedding matrix
pt : aspect embedding 的 weight vector 输入句子属于相关aspect的概率
pt 的获取: 将zs 从d维降至K维 +用非线性softmax生成归一化非负权重
公式（6）
W-weighted matrix parameter ; b-bias vector ; 通过学习获得

3.3 Training Objective

ABAE is trained to minimize the reconstruction error
采用 contrastive max-margin objective function
对每个输入sentence，随机从training set抽取m个样本作为negative samples
对每个negative sample 计算 ni ：average of its word embeddings
目标：使重构embedding rs 和目标sentence embedding zs 相似且区分于negative
samples
未正规化目标函数J ：hinge loss：使 rs 和zs 内积max，同时rs 和negative sample内积min
公式(7)
D: training data set
θ: model params : E T M W b

3.4 Regularization Term（a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting）

想学习最具有代表性的aspects的vector
represention，但T在训练过程中会出现冗余问题，为保证结果差异性，添加regularization term
公式(8)
I : identity matrix
Tn : T的每一行归一化为1
Tn*TnT中任意非对角的元素tij(i不等于j)都与两个不同的aspect embeddings的内积相关联
U取得最小值条件：任何两个不同aspect间embedding点积为0
因此regularization term鼓励行正交，惩罚冗余 final objective function L
公式(9)
λ：超参数，控制regularization term权重

Experiment Setup

4.1 Datasets
4.2 Baseline Methods

LocLDA：each sentence is treated as a separate document
k-means：using the k-means centroids directly
SAS：competitive in discovering meaningful aspects
BTM：designed for short texts;优点减缓短文数据稀疏

4.3 Experimental Settings

去掉标点，停用词，频次少于10次的词
ABAE初始化：带有negative sampling的word2vec初始化E；embedding size 200，window size 10，negative sample 5；用k-means聚类结果的centroids初始化T；其他参数：randomly
ABAE训练过程中：用Adam (Kingma and Ba, 2014) 修正E并优化其他参数；negative sample—> 20;正交惩罚 1 （通过网格搜索调试）
restaurant topic数：14 beer topic数：14
一个aspect的representative words通过用cosine相似矩阵在其embedding space中的nearest words找