一、分解机模型(Factorization Machine, FM)(2010年)
FM的模型方程为:
y
=
w
0
+
∑
i
=
1
n
w
i
x
i
x
j
+
∑
i
=
1
n
∑
j
=
i
+
1
n
<
v
i
,
v
j
>
x
i
x
j
y=w_0 + \sum_{i=1}^{n}{w_ix_ix_j}+\sum_{i=1}^{n}\sum_{j=i+1}^{n}{<v_i,v_j>x_ix_j}
y=w0+i=1∑nwixixj+i=1∑nj=i+1∑n<vi,vj>xixj
设
X
=
(
x
1
,
x
2
,
…
,
x
n
)
′
,
X
∈
R
n
×
1
X=(x_1,x_2,…,x_n)^{'}, X\in R^{n\times1}
X=(x1,x2,…,xn)′,X∈Rn×1。构造交叉项的权值向量为
V
i
=
(
v
i
1
,
v
i
2
,
…
,
v
i
k
)
′
,
V
i
∈
R
k
×
1
V_i=(v_{i1},v_{i2},…,v_{ik})^{'},V_i\in R^{k\times 1}
Vi=(vi1,vi2,…,vik)′,Vi∈Rk×1。
我们问题的目标转化为求 T T T矩阵的上三角和。
其中:
T
=
[
V
1
′
V
1
x
1
x
1
.
.
.
V
1
′
V
n
x
1
x
n
.
.
.
V
i
′
V
j
x
i
x
j
.
.
.
V
n
′
V
1
x
n
x
1
.
.
.
V
n
′
V
n
x
n
x
n
]
T=\left[ \begin{array}{ccc} V_1^{'}V_1x_1x_1 & ... & V_1^{'}V_nx_1x_n\\ ... & V_i^{'}V_jx_ix_j & ...\\ V_n^{'}V_1x_nx_1 & ... & V_n^{'}V_nx_nx_n \end{array} \right ]
T=⎣⎡V1′V1x1x1...Vn′V1xnx1...Vi′Vjxixj...V1′Vnx1xn...Vn′Vnxnxn⎦⎤
V = [ V 1 ′ V 2 ′ . . . V n ′ ] n × k = [ ( v 11 v 12 . . . v 1 k ) ( v 21 v 22 . . . v 2 k ) . . . ( v n 1 v n 2 . . . v n k ) ] n × k = [ v 11 v 12 . . . v 1 k v 21 v 22 . . . v 2 k . . . . . . v i j . . . v n 1 v n 2 . . . v n k ] n × k V= \left[ \begin{array}{c} V_{1}^{'} \\ V_{2}^{'} \\ ... \\ V_{n}^{'} \end{array} \right ]_{n\times k} = \left[ \begin{array}{c} (v_{11} & v_{12} & ... & v_{1k}) \\ (v_{21} & v_{22} & ... & v_{2k}) \\ ... \\ (v_{n1} & v_{n2} & ... & v_{nk}) \end{array} \right ]_{n\times k} =\left[ \begin{array}{cccc} v_{11} & v_{12} & ... & v_{1k}\\ v_{21} & v_{22} & ... & v_{2k}\\ ... & ... & v_{ij} & ...\\ v_{n1} & v_{n2} & ... & v_{nk} \end{array} \right ]_{n \times k} V=⎣⎢⎢⎡V1′V2′...Vn′⎦⎥⎥⎤n×k=⎣⎢⎢⎡(v11(v21...(vn1v12v22vn2.........v1k)v2k)vnk)⎦⎥⎥⎤n×k=⎣⎢⎢⎡v11v21...vn1v12v22...vn2......vij...v1kv2k...vnk⎦⎥⎥⎤n×k
解:
∑
i
=
1
n
∑
j
=
i
+
1
n
<
v
i
,
v
j
>
x
i
x
j
=
1
2
∑
i
=
1
n
∑
j
=
1
n
<
v
i
,
v
j
>
x
i
x
j
−
1
2
∑
i
=
1
n
<
v
i
,
v
i
>
x
i
x
i
=
1
2
∑
i
=
1
n
∑
j
=
1
n
∑
k
=
1
K
v
i
k
v
j
k
x
i
x
j
−
1
2
∑
i
=
1
n
∑
k
=
1
K
v
i
k
v
i
k
x
i
x
i
=
1
2
∑
k
=
1
K
[
∑
i
=
1
n
(
v
i
k
x
i
)
∑
j
=
1
n
(
v
j
k
x
j
)
]
−
1
2
∑
k
=
1
K
∑
i
=
1
n
x
i
2
v
i
k
2
=
1
2
∑
k
=
1
K
(
∑
i
=
1
n
v
i
k
x
i
)
2
−
1
2
∑
k
=
1
K
(
∑
i
=
1
n
v
i
k
2
x
i
2
)
=
1
2
∑
k
=
1
K
[
(
∑
i
=
1
n
v
i
k
x
i
)
2
−
∑
i
=
1
n
v
i
k
2
x
i
2
]
\sum_{i=1}^{n}\sum_{j=i+1}^{n}<v_i,v_j>x_ix_j = \\ \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}<v_i,v_j>x_ix_j - \frac{1}{2}\sum_{i=1}^{n}<v_i,v_i>x_ix_i \\ = \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{k=1}^{K}{v_{ik}v_{jk}x_ix_j} - \frac{1}{2}\sum_{i=1}^{n}\sum_{k=1}^{K}{v_{ik}v_{ik}x_ix_i} \\ = \frac{1}{2}\sum_{k=1}^{K}[\sum_{i=1}^{n}(v_{ik}x_i)\sum_{j=1}^{n}(v_{jk}x_j)] - \frac{1}{2}\sum_{k=1}^{K}\sum_{i=1}^{n}x_i^2v_{ik}^2 \\ = \frac{1}{2}\sum_{k=1}^{K}(\sum_{i=1}^{n}v_{ik}x_i)^2 - \frac{1}{2}\sum_{k=1}^{K}(\sum_{i=1}^{n}v_{ik}^2x_i^2) \\ = \frac{1}{2}\sum_{k=1}^{K}[(\sum_{i=1}^{n}v_{ik}x_i)^2-\sum_{i=1}^{n}v_{ik}^2x_i^2]
i=1∑nj=i+1∑n<vi,vj>xixj=21i=1∑nj=1∑n<vi,vj>xixj−21i=1∑n<vi,vi>xixi=21i=1∑nj=1∑nk=1∑Kvikvjkxixj−21i=1∑nk=1∑Kvikvikxixi=21k=1∑K[i=1∑n(vikxi)j=1∑n(vjkxj)]−21k=1∑Ki=1∑nxi2vik2=21k=1∑K(i=1∑nvikxi)2−21k=1∑K(i=1∑nvik2xi2)=21k=1∑K[(i=1∑nvikxi)2−i=1∑nvik2xi2]
公式整理如下:
特征维度为n(包含各特征的one-hot信息和密集型特征),交叉项参数矩阵的维度为k。
记: 模型输入
i
n
p
u
t
∈
R
n
×
1
input \in R^{n \times 1}
input∈Rn×1。
f
m
1
=
(
v
′
⋅
i
n
p
u
t
)
2
v
∈
R
n
×
k
f
m
1
∈
R
k
×
1
f
m
2
=
(
v
′
)
2
⋅
i
n
p
u
t
2
f
m
2
∈
R
k
×
1
o
u
t
=
W
′
⋅
i
n
p
u
t
+
1
2
⋅
1
(
f
m
1
−
f
m
2
)
W
∈
R
n
×
1
fm_1 = (v^{'} \cdot input)^2 \qquad v \in R^{n \times k} \quad fm_1 \in R^{k \times 1} \\ fm_2 = (v^{'})^2 \cdot input^2 \qquad fm_2 \in R^{k \times 1} \\ out = W^{'} \cdot input + \frac{1}{2} \cdot \mathbf{1}(fm_1-fm_2) \qquad W \in R^{n \times 1}
fm1=(v′⋅input)2v∈Rn×kfm1∈Rk×1fm2=(v′)2⋅input2fm2∈Rk×1out=W′⋅input+21⋅1(fm1−fm2)W∈Rn×1
其中:
1
=
(
1
,
1
,
1
,
.
.
.
,
1
)
∈
R
1
×
k
\mathbf{1} = (1,1,1,...,1) \in R^{1 \times k}
1=(1,1,1,...,1)∈R1×k
模型中待学习的参数是
W
W
W和
v
v
v。
代码如下:
class FM_model(nn.Module):
def __init__(self, n, k):
super(FM_model, self).__init__()
self.n = n # len(items) + len(users)
self.k = k
self.linear = nn.Linear(self.n, 1, bias=True)
self.v = nn.Parameter(torch.randn(self.k, self.n))
def fm_layer(self, x):
# x 属于 R^{batch*n}
linear_part = self.linear(x)
# 矩阵相乘 (batch*p) * (p*k)
inter_part1 = torch.mm(x, self.v.t()) # out_size = (batch, k)
# 矩阵相乘 (batch*p)^2 * (p*k)^2
inter_part2 = torch.mm(torch.pow(x, 2), torch.pow(self.v, 2).t()) # out_size = (batch, k)
output = linear_part + 0.5 * torch.sum(torch.pow(inter_part1, 2) - inter_part2)
# 这里torch求和一定要用sum
return output # out_size = (batch, 1)
def forward(self, x):
output = self.fm_layer(x)
return output