Use document modeling to enhance PMF_1: CTR Model.

本文链接：https://blog.csdn.net/JAVA_N4A/article/details/53166470

Drawbacks of PMF

Matrix factorization only uses information from other users, it cannot generalize to completely unrated items.(They cannot be used for recommending new products which have yet to receive rating information from any user)
The prediction accuracy often drops significantly when the ratings are very sparse.
The learnt latent space is not easy to interpret.(CTR Model can do this)

Use LDA to Enhance PMF

LDA

Documents are represented as random mixtures over latent topics, where each topic is characterized by a distribution over words. LDA assumes the following generative process for a corpus $D$ consisting of $M$ documents each of length $Ni$ :
1. Choose $\theta _{i},\sim ,\mathrm {Dir} (\alpha )$ , where $i\in {1,\dots ,M}$ and {\displaystyle \mathrm {Dir} (\alpha )} $\mathrm {Dir} (\alpha )$ is the Dirichlet distribution for parameter $\alpha$
2. Choose $\varphi _{k},\sim ,\mathrm {Dir} (\beta )$ , where $k\in {1,\dots ,K}$
3. For each of the word positions $i,j$ , where $j$ , and $i\in {1,\dots ,M}$
(Note that the Multinomial distribution here refers to the Multinomial with only one trial. It is formally equivalent to the categorical distribution.)
4. Plate notation are as follows:
LDA

Categorical Distribution

K-dimensional categorical distribution is the most general distribution over a K-way event; any other discrete distribution over a size-K sample space is a special case. The parameters specifying the probabilities of each possible outcome are constrained only by the fact that each must be in the range 0 to 1, and all must sum to 1.

pmf
$p(x=i)=p_{i}$
$p(x)=p_{1}^{[x=1]}\cdots p_{k}^{[x=k]}$
$p(x)=[x=1]\cdot p_{1},+\dots +,[x=k]\cdot p_{k}$

$[x=i]$ is the Iverson bracket

Multinomial Distribution

pmf
$\frac{n!}{x_1!\cdots x_k!} p_1^{x_1} \cdots p_k^{x_k}$

When n is 1 and k is 2 the multinomial distribution is the Bernoulli distribution.
When k is 2 and number of trials are more than 1 it is the Binomial distribution.
When n is 1 it is the categorical distribution.

Combine LDA into PMF: CTR

For each item j,

(a) Draw topic proportions $\theta_j$ ∼ Dirichlet(α).
(b) Draw item latent offset $j ∼ N(0; λ^{-1}IK)$ and set the item latent vector as $v_j$ = $\epsilon_j$ + $\theta_j$ .
(c) For each word $w_{jn}$ ,
i. Draw topic assignment zjn ∼ Mult(θ).
ii. Draw word $w_{jn}$ ∼ Mult( $β_{zjn}$ ).
For each user-item pair $(i; j)$ , draw the rating

$r_{ij} ∼ N(u^T_iv_j; c^{-1}_{ij})$

The key property in CTR lies in how the item latent vector $v_j$ is generated. Note that $v_j = j + θ_j$ , where $j ∼ N(0; λ_v^{-1}I_k)$ , is equivalent to $v_j ∼ N(θ_j; λ_v^{-1}I_K)$ , where we assume the item latent vector vj is close to topic proportions θj, but could diverge from it if it has to. Note that the expectation of $r_{ij}$ is a linear function of $θ_j$ ,