1 摘要
-
Deep models like deep neural networks, on the other hand, cannot be directly applied for the high-dimensional input because of the huge feature space
另一方面,由于巨大的特征空间,像深度神经网络这样的深度模型不能直接应用于高维输入。 -
Product-based Neural Networks (PNN) with an embedding layer to learn a distributed representation of the categorical data, a product layer to capture interactive patterns between interfield categories, and further fully connected layers to explore high-order feature interactions.
基于乘积的神经网络(PNN),具有一个嵌入层来学习分类数据的分布式表示,一个乘积层来捕获域间类别之间的交互模式,以及全连接层来探索高阶特征交互。
2 介绍
-
In order to improve the multi-field categorical data interaction, 1 presented an embedding methodology based on pre-training of a factorization machine. However, the quality of embedding initialization is largely limited by the factorization machine
为了改善多字段分类数据的交互,1 提出了一种基于因子分解机预训练的嵌入方法。但是,嵌入初始化的质量在很大程度上受到分解机的限制。 -
Previous work has shown that local dependencies between features from different fields can be effectively explored by feature vector “product” operations instead of “add” operations.
先前的工作表明,可以通过特征向量“乘积”操作而不是“添加”操作来有效地探索来自不同字段的特征之间的局部依赖性。
3 相关工作
- Prediction Model (CCPM) was proposed in 2 to predict ad click by convolutional neural networks (CNN). However, in CCPM the convolutions are only performed on the neighbor fields in a certain alignment, which fails to model the full interactions among non-neighbor features
2 中提出了预测模型(CCPM),以通过卷积神经网络(CNN)预测广告点击。 但是,在 CCPM中,卷积仅以一定的对齐方式在相邻字段上执行,这无法对非相邻要素之间的完整交互进行建模
4 CTR预估的深度学习
- The task is to build a prediction model to estimate the probability of a user clicking a specific ad in a given context.
任务是建立一个预测模型,以估计用户在给定上下文中点击特定广告的概率。
4.1 基于乘积的神经网络
-
From a top-down perspective, the output of PNN is a real number y ^ ∈ ( 0 , 1 ) \hat{y}\in(0,1) y^∈(0,1) as the predicted CTR
从上到下看,PNN 的输出是 0 或 1 作为 CTR 的预测值
y ^ = σ ( W 3 l 2 + b 3 ) (1) \hat{y}=\sigma(W_3l_2+b_3) \tag{1} y^=σ(W3l2+b3)(1) 其中, σ ( x ) = 1 / ( 1 + e − x ) \sigma(x)=1/(1+e^{-x}) σ(x)=1/(1+e−x) -
The output l 2 l_2 l2 of the second hidden layer is constructed as
第二个隐藏层的输出 l 2 l_2 l2 构造为
l 2 = r e l u ( W 2 l 1 + b 2 ) (2) l_2 = relu(W_2l_1+b_2) \tag{2} l2=relu(W2l1+b2)(2) -
The first hidden layer is fully connected with the product layer. The inputs to it consist of linear signals l z l_z lz and quadratic signals l p l_p lp
第一个隐藏层与乘积层完全连接。它的输入包括线性信号 l z l_z lz 和二次信号 l p l_p lp
l 1 = r e l u ( l z + l p + b 1 ) (3) l_1 = relu(l_z+l_p+b_1) \tag{3} l1=relu(lz+lp+b1)(3)
-
let us define the operation of tensor inner product:
张量内积操作
A ⊙ B ≜ ∑ i , j A i , j B i , j (4) \boldsymbol{A} \odot \boldsymbol{B} \triangleq \sum_{i, j} \boldsymbol{A}_{i, j} \boldsymbol{B}_{i, j} \tag{4} A⊙B≜i,j∑Ai,jBi,j(4) 对 A、B 进行逐元素相乘,将乘法结果求和后为标量。 -
l z l_z lz 和 l p l_p lp 分别通过 z z z 和 p p p 计算得到
l z = ( l z 1 , l z 2 , … , l z n , … , l z D 1 ) , l z n = W z n ⊙ z l p = ( l p 1 , l p 2 , … , l p n , … , l p D 1 ) , l p n = W p n ⊙ p (5) \begin{array}{ll} \boldsymbol{l}_{z}=\left(l_{z}^{1}, l_{z}^{2}, \ldots, l_{z}^{n}, \ldots, l_{z}^{D_{1}}\right), & l_{z}^{n}=\boldsymbol{W}_{z}^{n} \odot \boldsymbol{z} \\ \\\boldsymbol{l}_{p}=\left(l_{p}^{1}, l_{p}^{2}, \ldots, l_{p}^{n}, \ldots, l_{p}^{D_{1}}\right), & l_{p}^{n}=\boldsymbol{W}_{p}^{n} \odot \boldsymbol{p} \end{array} \tag{5} lz=(lz1,lz2,…,lzn,…,lzD1),lp=(lp1,lp2,…,lpn,…,lpD1),lzn=Wzn⊙zlpn=Wpn⊙p(5) -
线性信号 z z z 和二次信号 p p p
z = ( z 1 , z 2 , … , z N ) ≜ ( f 1 , f 2 , … , f N ) p = { p i , j } , i = 1 … N , j = 1 … N (6) \begin{array}{l} \boldsymbol{z}=\left(\boldsymbol{z}_{1}, \boldsymbol{z}_{2}, \ldots, \boldsymbol{z}_{N}\right) \triangleq\left(\boldsymbol{f}_{1}, \boldsymbol{f}_{2}, \ldots, \boldsymbol{f}_{N}\right) \\ \\ \boldsymbol{p}=\left\{\boldsymbol{p}_{i, j}\right\}, i=1 \ldots N, j=1 \ldots N \end{array} \tag{6} z=(z1,z2,…,zN)≜(f1,f2,…,fN)p={pi,j},i=1…N,j=1…N(6) 其中 f i \boldsymbol{f}_i fi 是域 i i i 的 embedding 向量, p i , j = g ( f i , f j ) \boldsymbol{p}_{i,j}=g(\boldsymbol{f}_i, \boldsymbol{f}_j) pi,j=g(fi,fj) 可以定义为一对特征的任何操作
4.2 Inner Product-based Neural Network(IPNN)
-
In IPNN, we firstly define the pairwise feature interaction as vector inner product: ⟨ f i , f j ⟩ \langle\boldsymbol{f}_{i}, \boldsymbol{f}_{j}\rangle ⟨fi,fj⟩
在 IPNN 中,我们首先将成对特征交互定义为向量内积 -
Such pairwise connection expands the capacity of the neural network, but also enormously increases the complexity. Inspired by FM, we come up with the idea of matrix factorization to reduce complexity.
这种成对的连接扩展了神经网络的容量,但是也极大地增加了复杂性。 受到 FM 的启发,我们提出了矩阵分解的思想以降低复杂性。假设 W p n = θ n θ n T W^n_p=\theta^n\theta^{nT} Wpn=θnθnT,这是具有强烈假设的一阶分解。
W p n ⊙ p = ∑ i = 1 N ∑ j = 1 N θ i n θ j n ⟨ f i , f j ⟩ = ⟨ ∑ i = 1 N δ i n , ∑ i = 1 N δ i n ⟩ (7) \boldsymbol{W}_{p}^{n} \odot \boldsymbol{p}=\sum_{i=1}^{N} \sum_{j=1}^{N} \theta_{i}^{n} \theta_{j}^{n}\left\langle\boldsymbol{f}_{i}, \boldsymbol{f}_{j}\right\rangle=\langle\sum_{i=1}^{N} \boldsymbol{\delta}_{i}^{n}, \sum_{i=1}^{N} \boldsymbol{\delta}_{i}^{n}\rangle \tag{7} Wpn⊙p=i=1∑Nj=1∑Nθinθjn⟨fi,fj⟩=⟨i=1∑Nδin,i=1∑Nδin⟩(7) 这样,我们使用 δ i n = θ i n f i \delta^n_i=\theta^n_if_i δin=θinfi一般的矩阵分解应为
W p n ⊙ p = ∑ i = 1 N ∑ j = 1 N ⟨ θ n i , θ n j ⟩ ⟨ f i , f j ⟩ (8) \boldsymbol{W}_{p}^{n} \odot \boldsymbol{p}=\sum_{i=1}^{N} \sum_{j=1}^{N}\left\langle\boldsymbol{\theta}_{n}^{i}, \boldsymbol{\theta}_{n}^{j}\right\rangle\left\langle\boldsymbol{f}_{i}, \boldsymbol{f}_{j}\right\rangle \tag{8} Wpn⊙p=i=1∑Nj=1∑N⟨θni,θnj⟩⟨fi,fj⟩(8)
4.3 Outer Product-based Neural Network
-
Vector inner product takes a pair of vectors as input and outputs a scalar. Different from that, vector outer product takes a pair of vectors and produces a matrix
向量内积将一对向量作为输入并输出标量。 与此不同,向量外积采用一对向量并产生矩阵 -
The only difference between IPNN and OPNN is the quadratic term p p p. In OPNN, we define feature interaction as g ( f i , f j ) = f i f j T g(f_i,f_j)=f_if^T_j g(fi,fj)=fifjT
IPNN 和 OPNN 之间的唯一区别是二次项 p p p。 在 OPNN 中,我们将特征交互定义为 g ( f i , f j ) = f i f j T g(f_i,f_j)=f_if^T_j g(fi,fj)=fifjT为了降低复杂度,提出了叠加的想法
p = ∑ i = 1 N ∑ j = 1 N f i f j T = f Σ ( f Σ ) T , f Σ = ∑ i = 1 N f i (9) \boldsymbol{p}=\sum_{i=1}^{N} \sum_{j=1}^{N} \boldsymbol{f}_{i} \boldsymbol{f}_{j}^{T}=\boldsymbol{f}_{\Sigma}\left(\boldsymbol{f}_{\Sigma}\right)^{T}, \quad \boldsymbol{f}_{\Sigma}=\sum_{i=1}^{N} \boldsymbol{f}_{i} \tag{9} p=i=1∑Nj=1∑NfifjT=fΣ(fΣ)T,fΣ=i=1∑Nfi(9)
5 实验
- PNN*: This model has a product layer, which is a concatenation of inner product and outer product.
PNN *:此模型具有一个乘积层,该乘积层是内乘积和外乘积的连接。
6 总结
- We designed two types of PNN: IPNN based on inner product and OPNN based on outer product. We also discussed solutions to reduce complexity, making PNN efficient and scalable.
我们设计了两种类型的PNN:基于内乘积的 IPNN 和基于外乘积的 OPNN。我们还讨论了降低复杂性,使 PNN 高效且可扩展的解决方案。