1. FM简单推导
FM (factor machine)算法是有监督的机器学习算法,可以用来分类和回归,一般用来做CTR预估。FM算法的亮点是提出了一种n个特征组合的方式:
y ^ = w 0 + ∑ i = 1 n w 1 x + ∑ i = 1 n − 1 ∑ j = i + 1 n W i , j x i x j \hat y =w_0 +\sum_{i=1}^{n} w_1x+\sum_{i=1}^{n-1} \sum_{j=i+1}^{n}W_{i,j}x_ix_j y^=w0+i=1∑nw1x+i=1∑n−1j=i+1∑nWi,jxixj
为了防止过拟合,减少参数个数,将对称矩阵 W ∈ R n × n W\in R^{n \times n} W∈Rn×n降维,即 W = V V T , V ∈ R n × k W=VV^T, V\in R^{n \times k} W=VVT,V∈Rn×k, 上式可以变为:
y ^ = w 0 + ∑ i = 1 n w 1 x + ∑ i = 1 n − 1 ∑ j = i + 1 n < V i , V j > x i x j \hat y =w_0 +\sum_{i=1}^{n} w_1x+\sum_{i=1}^{n-1} \sum_{j=i+1}^{n} <V_i,V_j>x_ix_j y^=w0+i=1∑nw1x+i=1∑n−1j=i+1∑n<Vi,Vj>xixj
根据 ( a + b + c ) 2 = a 2 + b 2 + c 2 + 2 ( a b + a c + b c ) a b + b c + a c = 1 2 { ( a + b + c ) 2 − ( a 2 + b 2 + c 2 ) } (a+b+c)^2=a^2+b^2+c^2+2(ab+ac+bc)\\ ab+bc+ac=\frac{1}{2}\{(a+b+c)^2-(a^2+b^2+c^2)\} (a+b+c)2=a2+b2+c2+2(ab+ac+bc)ab+bc+ac=21{(a+b+c)2−(a2+b2+c2)} 二阶项可以进一步化简为:
∑ i = 1 n − 1 ∑ j = i + 1 n < V i , V j > x i x j = 1 2 { ∑ i = 1 , j = 1 n < V i , V j > x i x j − ∑ i = 1 n < V i , V i > x i x i } = 1 2 ( ∑ i = 1 n ∑ j = 1 n ∑ h = 1 k v i h v j h x i x j − ∑ i = 1 n ∑ h = 1 k v i h 2 x i 2 ) = 1 2 ∑ h = 1 k ( ∑ i = 1 n ∑ j = 1 n v i , h v j , h x i x j − ∑ i = 1 n v i , h 2 x i 2 ) = 1 2 ∑ h = 1 k ( ( ∑ i = 1 n v i , h x i ) ( ∑ j = 1 n v j , h x j ) − ∑ i = 1 n ( v i , h x i ) 2 ) = 1 2 ∑ h = 1 k ( ( ∑ i = 1 n v i , h x i ) 2 − ∑ i = 1 n ( v i , h x i ) 2 ) \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} <V_i,V_j>x_ix_j\\ = \frac{1}{2}\{\sum_{i=1,j=1}^{n}<V_i,V_j>x_ix_j-\sum_{i=1}^n<V_i,V_i>x_ix_i\}\\ =\frac{1}{2}\left(\sum_{i=1}^{n}\sum_{j=1}^n\sum_{h=1}^{k}v_{ih}v_{jh} x_ix_j - \sum_{i=1}^n\sum_{h=1}^{k}v_{ih}^2x_i^2 \right )\\ =\frac{1}{2}\sum_{h=1}^k\left( \sum_{i=1}^n\sum_{j=1}^n v_{i,h}v_{j,h}x_ix_j-\sum_{i=1}^nv_{i,h}^2x_i^{2} \right)\\ =\frac{1}{2}\sum_{h=1}^k\left( \left( \sum_{i=1}^nv_{i,h}x_i\right) \left( \sum_{j=1}^nv_{j,h}x_j\right)-\sum_{i=1}^n\left(v_{i,h}x_i\right)^2\right)\\ =\frac{1}{2}\sum_{h=1}^k\left( \left( \sum_{i=1}^nv_{i,h}x_i\right)^2 -\sum_{i=1}^n\left(v_{i,h}x_i\right)^2\right) i=1∑n−1j=i+1∑n<Vi,Vj>xixj=21{i=1,j=1∑n<Vi,Vj>xixj−i=1∑n<Vi,Vi>xixi}=21(i=1∑nj=1∑nh=1∑kvihvjhxixj−i=1∑nh=1∑kvih2xi2)=21h=1∑k(i=1∑nj=1∑nvi,hvj,hxixj−i=1∑nvi,h2xi2)=21h=1∑k((i=1∑nvi,hxi)(j=1∑nvj,hxj)−i=1∑n(vi,hxi)2)=21h=1∑k⎝⎛(i=1∑nvi,hxi)2−i=1∑n(vi,hxi)2⎠⎞
因此,可以化简为和的平方-平方和的形式。
2. 矩阵形式
这里,我们使用 x ⃗ ∈ R 1 × n \vec{x}\in R^{1\times n} x∈R1×n 表示单样本向量, X X X表示批处理样本,其每一行是一个向量 x ⃗ \vec{x} x, X ∈ R b × n X\in R^{b\times n} X∈Rb×n。
因此单样本形式为:
- ∑ h = 1 k ( ∑ i = 1 n v i , h x i ) 2 = x ⃗ V V T x ⃗ T \sum_{h=1}^k \left( \sum_{i=1}^nv_{i,h}x_i\right)^2=\vec{x}VV^T\vec{x}^T h=1∑k(i=1∑nvi,hxi)2=xVVTxT
- ∑ h = 1 k ∑ i = 1 n ( v i , h x i ) 2 = [ ( x ⃗ ⊙ x ⃗ ) ( V ⊙ V ) ] . s u m ( a x i s = 1 ) \sum_{h=1}^k \sum_{i=1}^n\left(v_{i,h}x_i\right)^2=\left[\left(\vec{x} \odot\vec{x} \right)\left(V\odot V\right )\right].sum(axis=1) h=1∑ki=1∑n(vi,hxi)2=[(x⊙x)(V⊙V)].sum(axis=1)
即
∑ i = 1 n − 1 ∑ j = i + 1 n < V i , V j > x i x j = 1 2 ( x ⃗ V V T x ⃗ T − [ ( x ⃗ ⊙ x ⃗ ) ( V ⊙ V ) ] . s u m ( a x i s = 1 ) ) \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} <V_i,V_j>x_ix_j\\ =\frac{1}{2}\left(\vec{x}VV^T\vec{x}^T-\left[\left(\vec{x} \odot\vec{x} \right)\left(V\odot V\right )\right].sum(axis=1) \right) i=1∑n−1j=i+1∑n<Vi,Vj>xixj=21(xVVTxT−[(x⊙x)(V⊙V)].sum(axis=1))
批处理形式:
∑ f = 1 b ∑ i = 1 n − 1 ∑ j = i + 1 n < V i , V j > x i x j = 1 2 [ ( X V ) ⊙ ( X V ) − ( X ⊙ X ) ( V ⊙ V ) ] . s u m ( a x i s = 1 ) \sum_{f=1}^b \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} <V_i,V_j>x_ix_j\\ =\frac{1}{2}\left[ (XV) \odot (XV)-\left(X \odot X \right)\left(V\odot V\right ) \right].sum(axis=1) f=1∑bi=1∑n−1j=i+1∑n<Vi,Vj>xixj=21[(XV)⊙(XV)−(X⊙X)(V⊙V)].sum(axis=1)
参考文献
https://blog.csdn.net/zhenhailiu/article/details/89407025