Deconfounded Recommendation for Alleviating Bias Amplification
实际中用户历史信息D、用户U、用户对不同类型I的表现M、物品I、预测结果Y的因果图,存在两条后门路径
U<-D->M->Y
M<-U->Y
我们需要矫正的是对U的embedding,因此不需要考虑M中介的问题(第二条后门路径)。对于第一条后门路径,可以阻断D->U或者D->M,但是M需要经由U和D来计算,其取值难以估计,不方便阻断,因此最简单的切断这条后门路径的方式是阻断D->U
符号 | 解释 |
---|---|
u = [ u 1 , . . . , u K ] , u K ∈ R H u=[u_1,...,u_K],u_K \in \R^H u=[u1,...,uK],uK∈RH | 用户 |
x = [ x u , 1 , . . . , x u , K ] x=[x_{u,1},...,x_{u,K}] x=[xu,1,...,xu,K] | 用户特征 |
d u = [ p u ( g 1 ) , . . . , p u ( g N ) ] d_u=[p_u(g_1),...,p_u(g_N)] du=[pu(g1),...,pu(gN)] | 用户历史上对某一类型I的倾向性 |
m = M ( d , u ) ∈ R H m=M(d,u)\in \R^H m=M(d,u)∈RH | 用户基于历史交互的组特征 |
H u \mathcal{H}_u Hu | 交互记录的I的集合 |
q i = [ q g 1 i , . . . , q g n i ] ∈ R H q^i=[q_{g_1}^i,...,q_{g_n}^i]\in \R^H qi=[qg1i,...,qgni]∈RH | I属于每个组的概率 |
v = [ v 1 , . . . , v N ] , v N ∈ R H v=[v_1,...,v_N],v_N\in \R^H v=[v1,...,vN],vN∈RH | 组的特征 |
P
(
Y
∣
U
=
u
,
I
=
i
)
=
∑
d
∈
D
∑
m
∈
M
P
(
d
)
P
(
u
∣
d
)
P
(
m
∣
d
,
u
)
P
(
i
)
P
(
Y
∣
u
,
i
,
m
)
P
(
u
)
P
(
i
)
=
∑
d
∈
D
∑
m
∈
M
P
(
d
∣
u
)
P
(
m
∣
d
,
u
)
P
(
Y
∣
u
,
i
,
m
)
=
∑
d
∈
D
P
(
d
∣
u
)
P
(
Y
∣
u
,
i
,
M
(
d
,
u
)
)
=
P
(
d
u
∣
u
)
P
(
Y
∣
u
,
i
,
M
(
d
u
,
u
)
)
P
(
Y
∣
d
o
(
U
=
u
)
,
I
=
i
)
=
∑
d
∈
D
P
(
d
∣
d
o
(
U
=
u
)
)
P
(
Y
∣
d
o
(
U
=
u
)
,
i
,
M
(
d
,
d
o
(
U
=
u
)
)
)
=
∑
d
∈
D
P
(
d
)
P
(
Y
∣
d
o
(
U
=
u
)
,
i
,
M
(
d
,
d
o
(
U
=
u
)
)
)
=
∑
d
∈
D
P
(
d
)
P
(
Y
∣
u
,
i
,
M
(
d
,
u
)
)
\begin{aligned} P&(Y|U=\mathbf{u},I=\mathbf{i}) \\ &=\frac{\textstyle \sum_{\mathbf{d} \in D} \sum_{\mathbf{m} \in M}P(\mathbf{d})P(\mathbf{u}|\mathbf{d})P(\mathbf{m}|\mathbf{d},\mathbf{u})P(\mathbf{i})P(Y|\mathbf{u},\mathbf{i},\mathbf{m})}{P(\mathbf{u})P(\mathbf{i})} \\ &=\textstyle \sum_{\mathbf{d} \in D} \sum_{\mathbf{m} \in M}P(\mathbf{d}|\mathbf{u})P(\mathbf{m}|\mathbf{d},\mathbf{u})P(Y|\mathbf{u},\mathbf{i},\mathbf{m})\\ &=\textstyle \sum_{\mathbf{d} \in D}P(\mathbf{d}|\mathbf{u})P(Y|\mathbf{u},\mathbf{i},M(\mathbf{d},\mathbf{u}))\\ &=P(\mathbf{d}_u|\mathbf{u})P(Y|\mathbf{u},\mathbf{i},M(\mathbf{d}_u,\mathbf{u}))\\ P&(Y|do(U=\mathbf{u}),I=\mathbf{i}) \\ & = \displaystyle \sum_{\mathbf{d} \in \mathcal{D} }P(\mathbf{d}|do(U=u))P(Y|do(U=\mathbf{u}),\mathbf{i},M(\mathbf{d},do(U=\mathbf{u}))) \\ & = \displaystyle \sum_{\mathbf{d} \in \mathcal{D} }P(\mathbf{d})P(Y|do(U=\mathbf{u}),\mathbf{i},M(\mathbf{d},do(U=\mathbf{u})))\\ & = \displaystyle \sum_{\mathbf{d} \in \mathcal{D} }P(\mathbf{d})P(Y|\mathbf{u},\mathbf{i},M(\mathbf{d},\mathbf{u})) \end{aligned}
PP(Y∣U=u,I=i)=P(u)P(i)∑d∈D∑m∈MP(d)P(u∣d)P(m∣d,u)P(i)P(Y∣u,i,m)=∑d∈D∑m∈MP(d∣u)P(m∣d,u)P(Y∣u,i,m)=∑d∈DP(d∣u)P(Y∣u,i,M(d,u))=P(du∣u)P(Y∣u,i,M(du,u))(Y∣do(U=u),I=i)=d∈D∑P(d∣do(U=u))P(Y∣do(U=u),i,M(d,do(U=u)))=d∈D∑P(d)P(Y∣do(U=u),i,M(d,do(U=u)))=d∈D∑P(d)P(Y∣u,i,M(d,u))
由于D的范围是无限的,对上面应用后门准则计算后的公式进行优化,只考虑交互过的D
$p_u(g_n)=\displaystyle \sum_{i \in I}p(g_n|i)p(i|u)=\frac{\sum_{i \in \mathcal{H}u}q{g_n}^i}{|\mathcal{H}_u|} $
P
(
Y
∣
d
o
(
U
=
u
)
,
I
=
i
)
=
∑
d
∈
D
P
(
d
)
P
(
Y
∣
u
,
i
,
M
(
d
,
u
)
)
≈
∑
d
∈
D
P
(
d
)
f
(
u
,
i
,
M
(
d
,
u
)
)
=
f
(
u
,
i
,
M
(
∑
d
∈
D
P
(
d
)
d
,
u
)
)
=
f
(
u
,
i
,
M
(
d
ˉ
,
u
)
)
\begin{aligned} P&(Y|do(U=\mathbf{u}),I=\mathbf{i}) \\ & = \displaystyle \sum_{\mathbf{d} \in \mathcal{D} }P(\mathbf{d})P(Y|\mathbf{u},\mathbf{i},M(\mathbf{d},\mathbf{u})) \\ & \approx \displaystyle \sum_{\mathbf{d} \in \mathcal{D} }P(\mathbf{d})f(\mathbf{u},\mathbf{i},M(\mathbf{d},\mathbf{u})) \\ & = f(\mathbf{u},\mathbf{i},M(\displaystyle \sum_{\mathbf{d} \in \mathcal{D}}P(\mathbf{d})\mathbf{d},\mathbf{u})) \\ & = f(\mathbf{u},\mathbf{i},M(\bar{\mathbf{d}},\mathbf{u})) \\ \end{aligned}
P(Y∣do(U=u),I=i)=d∈D∑P(d)P(Y∣u,i,M(d,u))≈d∈D∑P(d)f(u,i,M(d,u))=f(u,i,M(d∈D∑P(d)d,u))=f(u,i,M(dˉ,u))
可以利用FM来求解
M
(
d
ˉ
,
u
)
M(\bar{\mathbf{d}},\mathbf{u})
M(dˉ,u)
M
(
d
ˉ
,
u
)
=
∑
a
=
1
N
∑
b
=
1
K
p
(
g
a
)
v
a
⊙
x
u
,
b
u
b
=
∑
a
=
1
N
+
K
∑
b
=
1
N
+
K
w
a
c
a
⊙
w
b
c
b
\begin{aligned} M(\bar{\mathbf{d}},\mathbf{u}) & = \displaystyle \sum_{a=1}^N\displaystyle \sum_{b=1}^Kp(g_a)v_a\odot x_{u,b}\mathbf{u}_b\\ & =\displaystyle \sum_{a=1}^{N+K}\displaystyle \sum_{b=1}^{N+K}w_a\mathbf{c}_a\odot w_b\mathbf{c}_b \end{aligned}
M(dˉ,u)=a=1∑Nb=1∑Kp(ga)va⊙xu,bub=a=1∑N+Kb=1∑N+Kwaca⊙wbcb
其中
w
=
[
d
ˉ
,
x
u
]
c
=
[
v
,
u
]
\begin{aligned} \mathbf{w}&=[\bar{\mathbf{d}},\mathbf{x}_u] \\ \mathbf{c}&=[\mathbf{v},\mathbf{u}] \end{aligned}
wc=[dˉ,xu]=[v,u]
根据timestamp信息分为两组,运用KL分歧对用户的兴趣变化进行量化。将普通推荐系统模型与融入了去处混杂因子的模型预测结果进行融合
η
u
=
𝐾
𝐿
(
d
u
1
∣
d
u
2
)
+
𝐾
𝐿
(
d
u
2
∣
d
u
1
)
=
∑
n
=
1
N
P
u
1
(
g
n
)
P
u
1
(
g
n
)
P
u
2
(
g
n
)
+
∑
n
=
1
N
P
u
2
(
g
n
)
P
u
2
(
g
n
)
P
u
1
(
g
n
)
Y
u
,
i
=
(
1
−
η
^
u
)
∗
Y
u
,
i
R
S
+
η
^
u
∗
Y
u
,
i
D
E
C
R
S
\begin{aligned} \eta_u&=𝐾𝐿(d_u^1|d_u^2)+𝐾𝐿(d_u^2|d_u^1) \\ &=\displaystyle \sum_{n=1}^NP_u^1(g_n)\frac{P_u^1(g_n)}{P_u^2(g_n)}+\displaystyle \sum_{n=1}^NP_u^2(g_n)\frac{P_u^2(g_n)}{P_u^1(g_n)}\\ Y_{u,i}&=(1-\hat{\eta}_u)*Y_{u,i}^{RS}+\hat{\eta}_u*Y_{u,i}^{DECRS} \end{aligned}
ηuYu,i=KL(du1∣du2)+KL(du2∣du1)=n=1∑NPu1(gn)Pu2(gn)Pu1(gn)+n=1∑NPu2(gn)Pu1(gn)Pu2(gn)=(1−η^u)∗Yu,iRS+η^u∗Yu,iDECRS
其中(MinMaxScaler一下权重的超参数)
η
^
u
=
(
η
u
−
η
m
i
n
η
m
a
x
−
η
m
i
n
)
α
\hat{\eta}_u=(\frac{\eta_u-\eta_{min}}{\eta_{max}-\eta_{min}})^\alpha
η^u=(ηmax−ηminηu−ηmin)α