【论文笔记】FM: Factorization Machines

本文详细解析了FM(Factorization Machines)算法,从二阶版本的理论推导入手,通过实对称矩阵分解降低计算复杂度,并介绍了反向传播求解方法,以及如何扩展到多阶模型。重点讲解了线性时间复杂度的实现策略。
摘要由CSDN通过智能技术生成

本文记录因子分析机FM算法的推导和理解笔记

论文地址

https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf

二阶FM推导过程

FM在预测任务是考虑了不同特征之间的交叉情况, 以2阶的交叉为例:
y ^ ( x ) = w 0 + ∑ i = 1 n w i ∗ x i + ∑ i = 1 n ∑ j = i + 1 n W x i x j (1) \hat{y}(x)=w_0+\sum_{i=1}^{n}w_i*x_i+\sum_{i=1}^{n}\sum_{j=i+1}^{n}Wx_ix_j \tag{1} y^(x)=w0+i=1nwixi+i=1nj=i+1nWxixj(1)
其中的 w 0 w_0 w0, w i w_i wi, W W W是模型需要学习的内容。由于在实际场景中, x i x_i xi x j x_j xj都是维度很大并且稀疏的one-hot类型的向量,如果直接学习交叉项的权重 W W W很容易过拟合。
但是注意到 W W W应该是一个实对称的矩阵,由实对称矩阵理论的性质:
每个实对称矩阵 A A A可以分解成这样一种形式: A = Q Λ Q T A=Q\Lambda Q^T A=QΛQT ,其中 Λ \Lambda Λ为对角阵, Q Q Q为正交矩阵
进而 W W W可以被分解成 W = V V T W=VV^T W=VVT,其中 V ∈ R n × k V \in R^{n \times k} VRn×k,所以式子(1)可以化成: y ^ ( x ) = w 0 + ∑ i = 1 n w i ∗ x i + ∑ i = 1 n ∑ j = i + 1 n ⟨ v i , v j ⟩ x i x j (2) \hat{y}(x)=w_0+\sum_{i=1}^{n}w_i*x_i+\sum_{i=1}^{n}\sum_{j=i+1}^{n} \langle v_i, v_j \rangle x_ix_j \tag{2} y^(x)=w0+i=1nwixi+i=1nj=i+1nvi,vjxixj(2)
v i v_i vi v j v_j vj可以用长度为 k k k的向量表示: ⟨ v i , v j ⟩ = ∑ f = 1 k v i , f ⋅ v j , f \langle v_i, v_j \rangle = \sum_{f=1}^{k}v_{i,f} \cdot v_{j,f} vi,vj=f=1kvi,fvj,f
所以有:
y ^ ( x ) = w 0 + ∑ i = 1 n w i ∗ x i + ∑ i = 1 n ∑ j = i + 1 n ∑ f = 1 k v i , f ⋅ v j , f x i x j (3) \hat{y}(x)=w_0+\sum_{i=1}^{n}w_i*x_i+\sum_{i=1}^{n}\sum_{j=i+1}^{n}\sum_{f=1}^{k}v_{i,f} \cdot v_{j,f}x_ix_j \tag{3} y^(x)=w0+i=1nwixi+i=1nj=i+1nf=1kvi,fvj,fxixj(3)
直接求解这个算法的时间复杂度为 O ( k n 2 ) O(kn^2) O(kn2),但是可以通过调整求解方式将复杂度降为 O ( k n ) O(kn) O(kn)

M = ∑ i = 1 n ∑ j = i + 1 n ∑ f = 1 k v i , f v j , f x i x j M=\sum_{i=1}^{n}\sum_{j=i+1}^{n}\sum_{f=1}^{k}v_{i,f}v_{j,f}x_ix_j M=i=1nj=i+1nf=1kvi,fvj,fxixj
N = ∑ i = 1 n ∑ j = 1 n ∑ f = 1 k v i , f v j , f x i x j N=\sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{f=1}^{k}v_{i,f}v_{j,f}x_ix_j N=i=1nj=1nf=1kvi,fvj,fxixj
由于:
N = ∑ i = 1 n ∑ j = 1 n ∑ f = 1 k v i , f v j , f x i x j = ∑ i = 1 n ∑ f = 1 k ( ∑ j = 1 i − 1 v i , f v j , f x i x j + ∑ j = i i v i , f v j , f x i x j + ∑ j = i + 1 n v i , f v j , f x i x j ) = ∑ i = 1 n ∑ f = 1 k ( 2 ∑ j = i + 1 n v i , f v j , f x i x j + v i , f v i , f x i x i ) = 2 ∑ i = 1 n ∑ f = 1 k ∑ j = i + 1 n v i , f v j , f x i x j + ∑ i = 1 n ∑ f = 1 k v i , f v i , f x i x i = 2 M + ∑ i = 1 n ∑ f = 1 k v i , f v i , f x i x i (4) \begin{aligned} N= & \sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{f=1}^{k}v_{i,f}v_{j,f}x_ix_j \\ = & \sum_{i=1}^{n}\sum_{f=1}^{k}(\sum_{j=1}^{i-1} v_{i,f}v_{j,f}x_ix_j + \sum_{j=i}^{i} v_{i,f}v_{j,f}x_ix_j + \sum_{j=i+1}^{n} v_{i,f}v_{j,f}x_ix_j ) \\ = & \sum_{i=1}^{n}\sum_{f=1}^{k}(2\sum_{j=i+1}^{n}v_{i,f}v_{j,f}x_ix_j+ v_{i,f}v_{i,f}x_ix_i ) \\ = & 2 \sum_{i=1}^{n}\sum_{f=1}^{k}\sum_{j=i+1}^{n}v_{i,f}v_{j,f}x_ix_j + \sum_{i=1}^{n}\sum_{f=1}^{k} v_{i,f}v_{i,f}x_ix_i \\ = & 2M+ \sum_{i=1}^{n}\sum_{f=1}^{k} v_{i,f}v_{i,f}x_ix_i \tag{4} \end{aligned} N=====i=1nj=1nf=1kvi,fvj,fxixji=1nf=1k(j=1i1vi,fvj,fxixj+j=iivi,fvj,fxixj+j=i+1nvi,fvj,fxixj)i=1nf=1k(2j=i+1nvi,fvj,fxixj+vi,fvi,fxixi)2i=1nf=1kj=i+1nvi,fvj,fxixj+i=1nf=1kvi,fvi,fxixi2M+i=1nf=1kvi,fvi,fxixi(4)
所以有:
M = ( N − ∑ i = 1 n ∑ f = 1 k v i , f v i , f x i x i ) / 2 = 1 2 ∑ i = 1 n ∑ j = 1 n ∑ f = 1 k v i , f v j , f x i x j − 1 2 ∑ i = 1 n ∑ f = 1 k v i , f v i , f x i x i = 1 2 ( ∑ i = 1 n ∑ f = 1 k v i , f x i ) ( ∑ j = 1 n ∑ f = 1 k v j , f x j ) − 1 2 ∑ i = 1 n ∑ f = 1 k v i , f 2 x i 2 = 1 2 ( ∑ i = 1 n ∑ f = 1 k v i , f x i ) 2 − 1 2 ∑ i = 1 n ∑ f = 1 k ( v i , f x i ) 2 (5) \begin{aligned} M & = (N- \sum_{i=1}^{n}\sum_{f=1}^{k} v_{i,f}v_{i,f}x_ix_i)/2 \\ & =\frac{1}{2} \sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{f=1}^{k}v_{i,f}v_{j,f}x_ix_j - \frac{1}{2} \sum_{i=1}^{n}\sum_{f=1}^{k} v_{i,f}v_{i,f}x_ix_i \\ &=\frac{1}{2} (\sum_{i=1}^{n}\sum_{f=1}^{k}v_{i,f}x_i)(\sum_{j=1}^{n}\sum_{f=1}^{k}v_{j,f}x_j) - \frac{1}{2} \sum_{i=1}^{n}\sum_{f=1}^{k} v_{i,f}^{2}x_i^2\\ &=\frac{1}{2} (\sum_{i=1}^{n}\sum_{f=1}^{k}v_{i,f}x_i)^2-\frac{1}{2} \sum_{i=1}^{n}\sum_{f=1}^{k}(v_{i,f}x_i)^2 \tag{5} \end{aligned} M=(Ni=1nf=1kvi,fvi,fxixi)/2=21i=1nj=1nf=1kvi,fvj,fxixj21i=1nf=1kvi,fvi,fxixi=21(i=1nf=1kvi,fxi)(j=1nf=1kvj,fxj)21i=1nf=1kvi,f2xi2=21(i=1nf=1kvi,fxi)221i=1nf=1k(vi,fxi)2(5)
所以(3)式可以转化为:
y ^ ( x ) = w 0 + ∑ i = 1 n w i ∗ x i + 1 2 ( ∑ i = 1 n ∑ f = 1 k v i , f x i ) 2 − 1 2 ∑ i = 1 n ∑ f = 1 k ( v i , f x i ) 2 (6) \hat{y}(x)=w_0+\sum_{i=1}^{n}w_i*x_i+\frac{1}{2} (\sum_{i=1}^{n}\sum_{f=1}^{k}v_{i,f}x_i)^2- \frac{1}{2} \sum_{i=1}^{n}\sum_{f=1}^{k}(v_{i,f}x_i)^2 \tag{6} y^(x)=w0+i=1nwixi+21(i=1nf=1kvi,fxi)221i=1nf=1k(vi,fxi)2(6)
求解上面表达式所需要的时间复杂度为 O ( k n ) O(kn) O(kn),由于 k ≪ n k \ll n kn且为常数,所以为线性复杂度。

二阶FM反向传播

在式(6)我们要求解的为模型的权重 w 0 w_0 w0, w i w_i wi, v i , f v_{i,f} vi,f
w 0 w_0 w0求导: ∂ y ^ ∂ w 0 = 1 \frac{\partial \hat y}{ \partial w_0} =1 w0y^=1
w i w_i wi求导: ∂ y ^ ∂ w i = x i \frac{\partial \hat y}{\partial w_i} =x_i wiy^=xi
v i , f v_{i,f} vi,f求导: ∂ y ^ ∂ v i , f = ( ∑ i = 1 n ∑ f = 1 k v i , f x i ) x i − v i , f x i ⋅ x i \frac{\partial \hat y}{\partial v_{i,f}} =(\sum_{i=1}^{n}\sum_{f=1}^{k}v_{i,f}x_i)x_i - v_{i,f}x_i \cdot x_i vi,fy^=(i=1nf=1kvi,fxi)xivi,fxixi

多阶FM

设特征直接相互交叉的类别数为d,那么有:
y ^ ( x ) = w 0 + ∑ i = 1 n w i x i + ∑ l = 2 d ∑ i 1 = 1 n . . . ∑ i l = i l − 1 + 1 n ( ∏ j = 1 l x i j ) ( ∑ f = 1 k l ∏ j = 1 l v i j , f ( l ) ) \hat y(x)=w_0+\sum_{i=1}^{n}w_ix_i + \sum_{l=2}^{d}\sum_{i_1=1}^n...\sum_{i_l=i_{l-1}+1}^n(\prod_{j=1}^{l}x_{i_j})(\sum_{f=1}^{k_l}\prod_{j=1}^lv_{i_j,f}^{(l)}) y^(x)=w0+i=1nwixi+l=2di1=1n...il=il1+1n(j=1lxij)(f=1klj=1lvij,f(l))
直接求解的复杂度为 O ( k d n d ) O(kdn^d) O(kdnd),但是可以通过上面的方法近似降成线性复杂度。

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值