基本形式
FM模型在线性模型的基础上,增加了一个二次项:
y = w 0 + ∑ i = 1 n w i x i + ∑ i = 1 n − 1 ∑ j = i + 1 n w i j x i x j y = w_0 + \sum^{n}_{i=1} w_i x_i + \sum^{n-1}_{i=1} \sum^{n}_{j=i+1} w_{ij} x_i x_j y=w0+i=1∑nwixi+i=1∑n−1j=i+1∑nwijxixj
w 0 w_0 w0 是常数项,这里有1个参数需要训练;
∑ i = 1 n w i x i \sum^{n}_{i=1} w_i x_i ∑i=1nwixi 是常见的一次线性模型,这里有n个 w i w_i wi需要训练;
∑ i = 1 n − 1 ∑ j = i + 1 n w i j x i x j \sum^{n-1}_{i=1} \sum^{n}_{j=i+1} w_{ij} x_i x_j ∑i=1n−1∑j=i+1nwijxixj 也就是n个特征之间两两组合,所以这里共有 n × ( n − 1 ) 2 \frac{n\times(n-1)}{2} 2n×(n−1)个 w i j w_{ij} wij需要训练 。
总共有 1 + n + n × ( n − 1 ) 2 1+n+\frac{n\times(n-1)}{2} 1+n+2n×(n−1)个参数。
优化改进
存在问题:参数 w i j w_{ij} wij学习困难, ∂ L ∂ w i j = x i x j \frac{\partial L}{\partial w_{ij}} = x_i x_j ∂wij∂L=xixj ,由于特征稀疏, x i x_i xi和 x j x_j xj同时非0的情况很少,所以参数 w i j w_{ij} wij很难得到更新。
改进方法:将
w
i
j
w_{ij}
wij表示成两个向量的内积,也就是:
w
i
j
=
<
v
i
,
v
j
>
w_{ij} = <v_i, v_j>
wij=<vi,vj>
对二次项进行变形化简:
∑ i = 1 n − 1 ∑ j = i + 1 n w i j x i x j = ∑ i = 1 n − 1 ∑ j = i + 1 n < v i , v j > x i x j = 1 2 ( ∑ i = 1 n ∑ j = 1 n < v i , v j > x i x j − ∑ i = 1 n < v i , v i > x i x i ) = 1 2 ( ∑ i = 1 n ∑ j = 1 n ∑ f = 1 k v i , f v j , f x i x j − ∑ i = 1 n ∑ f = 1 k v i , f v i , f x i x i ) = 1 2 ∑ f = 1 k [ ( ∑ i = 1 n v i , f x i ) ⋅ ( ∑ j = 1 n v j , f x j ) − ∑ i = 1 n v i , f 2 x i 2 ] = 1 2 ∑ f = 1 k [ ( ∑ i = 1 n v i , f x i ) 2 − ∑ i = 1 n v i , f 2 x i 2 ] \begin{aligned} \sum^{n-1}_{i=1} \sum^{n}_{j=i+1} w_{ij} x_i x_j & = \sum^{n-1}_{i=1} \sum^{n}_{j=i+1} <v_i, v_j> x_i x_j \\ & = \frac{1}{2}( \sum^{n}_{i=1} \sum^{n}_{j=1} <v_i,v_j> x_i x_j - \sum^{n}_{i=1} <v_i, v_i> x_i x_i) \\ &= \frac{1}{2} ( \sum^{n}_{i=1} \sum^{n}_{j=1} \sum^{k}_{f=1} v_{i,f} v_{j,f} x_i x_j - \sum^{n}_{i=1} \sum^{k}_{f=1} v_{i,f} v_{i,f} x_i x_i ) \\ &= \frac{1}{2} \sum_{f=1}^{k} \left[ (\sum_{i=1}^{n} v_{i,f} x_i) \cdot (\sum_{j=1}^{n} v_{j,f} x_j) - \sum_{i=1}^{n} v_{i,f}^2 x_i^2 \right] \\ &= \frac{1}{2} \sum_{f=1}^{k} \left[ (\sum_{i=1}^{n} v_{i,f} x_i)^2 - \sum_{i=1}^{n} v_{i,f}^2 x_i^2 \ \right] \end{aligned} i=1∑n−1j=i+1∑nwijxixj=i=1∑n−1j=i+1∑n<vi,vj>xixj=21(i=1∑nj=1∑n<vi,vj>xixj−i=1∑n<vi,vi>xixi)=21(i=1∑nj=1∑nf=1∑kvi,fvj,fxixj−i=1∑nf=1∑kvi,fvi,fxixi)=21f=1∑k[(i=1∑nvi,fxi)⋅(j=1∑nvj,fxj)−i=1∑nvi,f2xi2]=21f=1∑k[(i=1∑nvi,fxi)2−i=1∑nvi,f2xi2 ]
所以最终模型的假设函数是:
y
=
w
0
+
∑
i
=
1
n
w
i
x
i
+
1
2
∑
f
=
1
k
[
(
∑
i
=
1
n
v
i
,
f
x
i
)
2
−
∑
i
=
1
n
v
i
,
f
2
x
i
2
]
y = w_0 + \sum^{n}_{i=1} w_i x_i + \frac{1}{2} \sum_{f=1}^{k} \left[ (\sum_{i=1}^{n} v_{i,f} x_i)^2 - \sum_{i=1}^{n} v_{i,f}^2 x_i^2 \ \right]
y=w0+i=1∑nwixi+21f=1∑k[(i=1∑nvi,fxi)2−i=1∑nvi,f2xi2 ]
这里需要训练的参数有
1
+
n
+
k
n
1+n+kn
1+n+kn个。
损失函数
假设标签值取值范围{-1,1},则损失函数定义为:
L
=
−
∑
i
=
1
m
l
n
σ
(
y
(
i
)
⋅
y
^
(
i
)
)
L = - \sum_{i=1}^{m} ln \sigma (y^{(i)} \cdot \hat{y}^{(i)})
L=−i=1∑mlnσ(y(i)⋅y^(i))
其中
σ
(
y
)
=
1
1
+
e
−
y
\sigma(y)=\frac{1}{1+e^{-y}}
σ(y)=1+e−y1
求梯度
∂ L ∂ θ = ∂ L ∂ y ∂ y ∂ θ = ? \begin{aligned} \frac{\partial L}{\partial \theta} &= \frac{\partial L}{\partial y} \frac{\partial y}{ \partial \theta} \\ &= ? \end{aligned} ∂θ∂L=∂y∂L∂θ∂y=?
首先看 ∂ L ∂ y \frac{\partial L}{ \partial y} ∂y∂L:
也就是类似对 y = l n ( 1 1 + e − a x ) y=ln(\frac{1}{1+e^{-ax}}) y=ln(1+e−ax1)求导数。
∂ L ∂ θ = ∂ L ∂ y ∂ y ∂ θ = − ( 1 − 1 1 + e − y ( i ) y ^ ( i ) ) ⋅ y ( i ) ⋅ ∂ y ∂ θ \begin{aligned} \frac{\partial L}{\partial \theta} &= \frac{\partial L}{\partial y} \frac{\partial y}{ \partial \theta} \\ &= - (1-\frac{1}{1+e^{-y^{(i)}\hat{y}^{(i)} } } ) \cdot y^{(i)} \cdot \frac{\partial y}{ \partial \theta} \\ \end{aligned} ∂θ∂L=∂y∂L∂θ∂y=−(1−1+e−y(i)y^(i)1)⋅y(i)⋅∂θ∂y
然后看 ∂ y ∂ θ \frac{\partial y}{ \partial \theta } ∂θ∂y:
f
(
x
)
=
{
1
,
i
f
θ
=
w
0
x
i
,
i
f
θ
=
w
i
x
i
∑
j
=
1
n
v
j
,
f
x
j
−
v
i
,
f
x
i
2
,
i
f
θ
=
v
i
,
f
f(x)=\left\{ \begin{aligned} & 1 ,& & if \ \theta=w_0 \\ & x_i ,& & if \ \theta=w_i \\ & x_i \sum_{j=1}^{n} v_{j,f}x_j - v_{i,f}x_i^2 ,& & if \ \theta=v_{i,f} \end{aligned} \right.
f(x)=⎩⎪⎪⎪⎪⎨⎪⎪⎪⎪⎧1,xi,xij=1∑nvj,fxj−vi,fxi2,if θ=w0if θ=wiif θ=vi,f
其中二次项的求导比较难理解:
∂
(
1
2
∑
f
=
1
k
[
(
∑
i
=
1
n
v
i
,
f
x
i
)
2
−
∑
i
=
1
n
v
i
,
f
2
x
i
2
]
)
∂
(
v
i
,
f
)
=
1
2
[
2
⋅
(
∑
i
=
1
n
v
i
,
f
x
j
)
⋅
x
i
−
2
⋅
x
i
2
⋅
v
i
,
f
]
=
(
∑
j
=
1
n
v
j
,
f
x
j
)
⋅
x
i
−
v
i
,
f
x
i
2
\begin{aligned} & \frac{\partial( \frac{1}{2} \sum_{f=1}^{k} \left[ (\sum_{i=1}^{n} v_{i,f} x_i)^2 - \sum_{i=1}^{n} v_{i,f}^2 x_i^2 \ \right]) }{\partial(v_{i,f})} \\ &= \frac{1}{2} \left[ 2 \cdot (\sum_{i=1}^{n} v_{i,f} x_j ) \cdot x_i - 2 \cdot x_i^2 \cdot v_{i,f} \right]\\ &= (\sum_{j=1}^{n} v_{j,f} x_j ) \cdot x_i - v_{i,f} x_i^2 \\ \end{aligned}
∂(vi,f)∂(21∑f=1k[(∑i=1nvi,fxi)2−∑i=1nvi,f2xi2 ])=21[2⋅(i=1∑nvi,fxj)⋅xi−2⋅xi2⋅vi,f]=(j=1∑nvj,fxj)⋅xi−vi,fxi2
第一个等号;第二个等号,因为求导跟
i
i
i无关,所以换个符号以示区分。
参考:
[1] FM因子分解机的原理、公式推导、Python实现和应用
[2] DeepFM算法解析及Python实现
[3] FM因式分解(原理+代码)