CIKM-2014论文《Latent Aspect Mining via Exploring Sparsity and Intrinsic Information》阅读笔记

最新推荐文章于 2024-06-05 11:24:15 发布

JinSurvivor

最新推荐文章于 2024-06-05 11:24:15 发布

阅读量210

点赞数

分类专栏：阅读笔记文章标签： Topic Model Sparse Coding Aspect Mining

本文链接：https://blog.csdn.net/flyingfish93/article/details/82935637

版权

阅读笔记专栏收录该内容

1 篇文章 0 订阅

订阅专栏

权且当做阅读笔记。

The goal of this work:

1. 发现当前review未知的aspect信息，然后预测用户对于这些aspect的评分（Ratings）;

2. 挖掘每个aspect的关键terms(topic modeling过程)。

Aspects: 比Domain小一层的单位，一个Domain下面包含了多个aspects

Aspect sparsity 问题：Review只提到了一些aspects，而不是全部的aspects. 解决的办法：利用Lasso里面的 $l_{1}-regularizer$ 正则取 Means 方法来解决sparsity of aspect proportions.

心得：

1. 基本可以肯定是在2011年的工作STC(Sparse Topical Coding)的基础上的工作；

2. 作者说要改进Maximum A Posterior（MAP）直接运用在STC上，改成了提出一个新的算法：block coordinate gradient descent (块坐标梯度下降)。

3. 提出两个新的notions: user intrinsic aspect interest和item intrinsic aspect quality，个人预测是两个中间层的分布（可能是motinomial distribution）

Model Overview and Description:

1. 一些概念(notion)的区分问题：

（1）user intrinsic aspect interest和LRR模型提出来的aspect weight:

前者不依赖于item，后者依赖于item。例如，一个饮食爱好者，评论任何酒店，都倾向于评论该酒店的饮食，这和item--Hotel没有关系。

（2）item intrinsic aspect quality:

对于某个特定的item，如酒店（Hotel），内在的对于每个aspect的质量评估。例如，对于五星级酒店，那所有五星酒店的quality明显高于其他的hotel(这个就依赖于review本身的star就好了？？)

2. SACM(Sparse Aspect Coding Model)的特点

（1）分析了Aspect Sparse的原因并可以用以上两个notions去解决；

（2）Aspect Rating的建模根据高斯分布with the Mean related to item intrinsic aspect quality和用户内在方面兴趣(user intrinsic aspect interest)的方差(variance)。例如，一个用户对某个aspect感兴趣，则他会在各种review中都评论该aspect并给出aspect rating，这些rating都有高低，方差较大。

3. 模型背景--Sparse Topical Coding (STC)

（1) document code: $θ_{d}$ $\theta_{d}\in\mathbb{R}_{+}^{\mathit{K}}$ ，一个K维向量，表示每个doc在每个topic上的关联强度(associate strength)，和传统的概率模型不同， $\sum \theta_{d}\neq 1$ 。

（2) word code： $s_{dn}$ ，一个K维向量，它的第k行component $s_{dnk}$ 表示了文档d中第n个词在topic k上的关联强度(associate strength)。同样地， $\sum s_{dnk}\neq 1$ 。

Notice: 一个word可能assign to多个topic，这一点和传统话题模型不同。

(3) K*N维矩阵 $\beta \in \mathbb{R}_{+}^{\mathit{K\times N}}$ ：字典。

(4) 联合概率分布： $p(\theta _{d},s_{d},\{w_{dn}\}_{n\in I_{d}}\vert \beta)=p(\theta_{d})\sum p(s_{dn}\vert \theta_{d})p(w_{dn}\vert s_{dn},\beta)$

(5) 推断方法：MAP(Maximum A Posterior), 公式 $\hat{\Omega}_{MAP}=\underset{\Omega }{argmax}p(\Omega \vert \{w_{dn}\}_{d\in \textit{D}}, n\in I_{d})$

(6)

4. 模型描述--Sparse Aspect Coding Model

(1) Generative Process: 根据user intrinsic aspect interest $t_u$ 和item intrinsic aspect quality $q_h$ ，选定既有aspect的子集，用于描述当前review，并决定描述该review每个aspect的文本比例；然后选择一些opionionated words来构成该review。

(2) Aspect Rating(各方面的独立评分)

(3) 根据aspect weight求取该user的评分总和。

(4) document code $\theta_d$ ： $\theta_d=t_{u_d}\circ q_{h_d}$ 。利用Hadamard积计算。

(5) word code $s_{dn}$ ：从概率 $p(s_{dn}\vert \theta_d)$ 抽样，和传统概率模型不同， $s{dn}$ 从超高斯分布中抽取： $p(s_{dn}\vert \theta_d) \propto exp(-\gamma\left \| s_{dn}-\theta_d \right \|^2_2-\rho \left \| s_{dn} \right \|_1)$

(6) 每篇文档中的word count抽样：泊松分布（Poisson Distribution） $p(w_{dn}\vert s_{dn},\beta)=Poiss(w_{dn};s_{dn}^T\beta_{\cdot n})$

(7) Aspect weight $\eta _d$ ：一篇评论d中，用户对某方面k的权重（for the overall rating）： $\eta _d = \frac{exp(\theta_{dk})}{\sum _j{exp(\theta_{dj})}}$

(8) Aspect rating $Y^A_{dk}\sim N(q_{h_dk},\alpha^2t^2_{u_dk})$ ， $\alpha$ 是高斯分布的方差，代表了用户在评价时的aspect interest

(9) 总体评分（Overall Rating） $Y_{d}\sim N (\eta _d^TY^A_d,c^2)$ ， $c^2$ 是高斯分布的方差，是预设固定值

(10)MAP优化技术--Block Coordinate Gradient Descent（块坐标梯度下降）

MAP估计的目标函数： $\min f(\boldsymbol{\mathrm{Y,S,T, Q}},\beta, \alpha)+\lambda \left \| \boldsymbol{\mathrm{T}} \right \|_1+\rho \left \| \mathrm{\mathbf{S }}\right \|_1$ ，约束条件（s.t. ） $\mathrm{\mathbf{T}}\geq 0, \mathrm{\mathbf{Q}}\geq 0, \mathrm{\mathbf{S}}\geq 0, \alpha \geq 0, \beta_k \in S^{N-1},\forall k$
常用优化方法：BCD(Block Coordinate Descent)，STC采用了该方法。
SACM提出了一种新的Block Coordinate Gradient Descent(BCGD)，每次迭代先选择块 $\mathrm{\mathbf{B}} \in \{\mathrm{\mathbf{Y,S, T, Q, \beta, \alpha}}\}$ ，然后根据descent direction $\mathrm{\mathbf{d}}(\mathrm{\mathbf{x}};\mathrm{\mathbf{B}})$ 更新变量 $x^{new}=x+\alpha_{\mathrm{\mathbf{B}}}\mathrm{\mathbf{d}}(\mathrm{\mathbf{x}};\mathrm{\mathbf{B}})$
Descent Direction $\mathrm{\mathbf{d}}(\mathrm{\mathbf{x}};\mathrm{\mathbf{B}})$ ： $\mathrm{\mathbf{d}}(\mathrm{\mathbf{x}};\mathrm{\mathbf{B}})=\arg \min \bigtriangledown f(x)^T\mathrm{\mathbf{d}}+\frac{1}{2}\left \| \mathrm{\mathbf{d}} \right \|^2_2 +r(x+\mathrm{\mathbf{d}})$ （具体求解过程见原论文）
the Aspect Dictionary Block $\beta$ ：线性算法