欢迎关注公众号:python科技园,一起学习算法知识。
第一部分:理论
0. FiBiNET模型框架
FiBiNET全称FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction,是新浪微博提出的一种基于深度学习的广告推荐/点击率预测算法。可以认为FiBiNET是在wide & deep模型的基础上对它的wide部分进行了一些创新的改进,或者直接视为FNN的一个变体。主要的创新点在于:
- 在传统的embedding stage加入了一个SENET层对embedding特征升级为新的一种embedding,得到与特征重要性(Feature Importance)相关的信息;
- 不使用传统的inner product或Hadamard product方法,而是选择了结合二者的一种新的bilinear interaction方法来获得特征之间的联系;
模型的整体架构图如图(1)所示:
从图(1)中可以看到,相比于我们熟悉的基于深度学习的CTR预估模型,主要增加了SENET Layer和创新升级了Bilinear-Interaction Layer两个结构。此处摘抄一张模型数据流程图。
下面主要介绍一下SENET Layer和Bilinear-Interaction Layer。
1. SENET Layer
SENET全称Squeeze-and-Excitation Network,在CV中用的比较多,可以对特征间的依赖关系进行一定的提取。SENET一共分为三个部分:Squeeze,Excitation和Re-Weight,按照顺序执行后从原始的embedding特征向量得到加权后的。如图(2)所示。
(1.1)Squeeze
这一步主要是将每个特征组的Embedding向量进行汇总统计,文中使用均值池化(也可以使用最大池化,但文中表示平均池化效果要好于最大池化)对Embedding向量 进行压缩为 ,其中 表示第 个特征的全局信息, 是标量。 的具体计算方式如下:。
举个栗子:
,经过平均池化后,就变成了:
此例子中 ,每个特征的向量维度为3,即 。
(1.2)Excitation
这一步基于特征组的压缩统计量 来学习特征组的重要性权重,文章使用两层的神经网络来学习。第一层为一个维度缩减层,第二层为维度提升层。
公式表示为:
其中:,是一个向量,形式上同 ; 为激活函数;;
接着举个栗子哦:
该例子中,暂忽略了激活函数,设置 。
(1.3)Re-Weight
最后一步是把和按照类似于Hadamard product的方法对其中的个元素进行element-wise的相乘得到SENET的最终产出 ,如果把视为一个权重向量,那么这一步也可以被叫做加权或rescale。新的Embedding 向量通过如下的方式计算得到:
继续举栗子哦:
2. Bilinear-Interaction Layer
传统的特征交叉方式广泛采用了内积(fm,ffm等)和哈达玛积(AFM,NFM等)。而这两种方式在稀疏数据上很难有效对特征交叉进行建模。文章提出结合内积和哈达玛积并引入一个额外的参数矩阵来学习特征交叉。
内积:
哈达玛积:
即:,
交叉向量 可以通过一下三种方式计算得到:
(2.1)Filed-All Type: ,所有特征组进行两两交叉时共享一个参数矩阵,额外参数量为 ;
(2.2)Field-Each Type: ,每个特征组维护一个参数矩阵,额外参数量为 ;
(2.3)Field-Interaction Type: ,每对交互特征 都有一个参数矩阵,额外参数量为 ;
该 Bilinear-Interaction Layer 得到:
(1) 经过bilinear函数的转换得到一个包含特征之间的关联的向量,其中 ,每个 保持不变,形式上同 ;。
(2) 经过bilinear函数的转换得到一个包含特征之间的关联的向量,其中 ,每个 保持不变,形式上同 ;。
3. Combination Layer
在combination层把(1)和(2)的输出 和 简单的连接为 ,。
4. Deep Network
最后把 送到多层全连接的神经网络结构,也就是我们通常说的DNN,得到最终的输出。
其中: ,
第二部分:代码实践
1. SENET函数
def senet(inputs):
Z = tf.reduce_mean(inputs, axis=-1, )
w1 = np.array([[1., 1.], [2., 2.], [1., 1.]])
dot1 = tf.tensordot(Z, w1, axes=(-1, 0))
w2 = np.array([[1., 1., 2], [2., 2., 6]])
dot2 = tf.tensordot(dot1, w2, axes=(-1, 0))
return dot2
inputs = np.array([[[1., 1., 1., 1.], [2., 2., 2., 2.], [1., 1., 1., 2.]], \
[[1., 1., 1., 1.], [2., 2., 2., 2.], [1., 2., 1., 2.]]])
print("inputs_shape: {} \n".format(inputs.shape))
senet_result = senet(inputs)
sess = tf.InteractiveSession()
senet_result_sess = sess.run(senet_result)
print("senet_result_sess: \n", senet_result_sess)
"""
inputs_shape: (2, 3, 4)
senet_result_sess:
[[18.75 18.75 50. ]
[19.5 19.5 52. ]]
"""
2. BilinearInteraction函数
import itertools
import numpy as np
def fibinet(inputs, bilinear_type):
print("bilinear_type =", bilinear_type)
if bilinear_type == "all":
W = np.array([[1., 1., 1., 1.], [2., 2., 1., 1.], [1., 1., 2., 2.], [3., 3., 1., 1.]])
print("W:\n", W)
p = [tf.multiply(tf.tensordot(v_i, W, axes=(-1, 0)), v_j) for v_i, v_j in itertools.combinations(inputs, 2)]
elif bilinear_type == "each": # 示例中共3个field向量,每个field向量对应一个 W 权重
W_list = np.array([[[1., 1., 1., 1.], [2., 2., 1., 1.], [1., 1., 2., 2.], [3., 3., 1., 1.]], \
[[1., 1., 1., 1.], [2., 2., 2., 2.], [1., 1., 2., 2.], [3., 3., 1., 1.]], \
[[1., 1., 1., 1.], [2., 2., 1., 1.], [2., 2., 2., 2.], [3., 3., 1., 1.]]])
print("W_list:\n", W_list)
p = [tf.multiply(tf.tensordot(inputs[i], W_list[i], axes=(-1, 0)), inputs[j]) for i, j in itertools.combinations(range(len(inputs)), 2)]
elif bilinear_type == "interaction": # 示例中共3个field向量,两两交互共产生3个组合
W_list = np.array([[[1., 1., 1., 1.], [2., 2., 1., 1.], [1., 1., 2., 2.], [3., 3., 1., 1.]], \
[[1., 1., 1., 1.], [2., 2., 1., 1.], [1., 1., 2., 2.], [3., 3., 1., 1.]], \
[[1., 1., 1., 1.], [2., 2., 1., 1.], [1., 1., 2., 2.], [3., 3., 1., 1.]]])
print("W_list:\n", W_list)
p = [tf.multiply(tf.tensordot(v[0], w, axes=(-1, 0)), v[1]) for v, w in zip(itertools.combinations(inputs, 2), W_list)]
return p
(1) all 模式
inputs = np.array([[1., 1., 1., 1.], [2., 2., 2., 2.], [1., 1., 1., 2.]]) # 两两交互,共产生3个组合
print("input: ", inputs)
print()
p = fibinet(inputs, bilinear_type="all")
print()
sess = tf.InteractiveSession()
p_sess = sess.run(p)
print(p_sess)
"""
input: [[1. 1. 1. 1.]
[2. 2. 2. 2.]
[1. 1. 1. 2.]]
bilinear_type = all
W:
[[1. 1. 1. 1.]
[2. 2. 1. 1.]
[1. 1. 2. 2.]
[3. 3. 1. 1.]]
[array([14., 14., 10., 10.]), array([ 7., 7., 5., 10.]), array([14., 14., 10., 20.])]
"""
(2) each 模式
inputs = np.array([[1., 1., 1., 1.], [2., 2., 2., 2.], [1., 1., 1., 2.]]) # 两两交互,共产生3个组合
print("input: ", inputs)
print()
p = fibinet(inputs, bilinear_type="each")
print()
sess = tf.InteractiveSession()
p_sess = sess.run(p)
print(p_sess)
"""
input: [[1. 1. 1. 1.]
[2. 2. 2. 2.]
[1. 1. 1. 2.]]
bilinear_type = each
W_list:
[[[1. 1. 1. 1.]
[2. 2. 1. 1.]
[1. 1. 2. 2.]
[3. 3. 1. 1.]]
[[1. 1. 1. 1.]
[2. 2. 2. 2.]
[1. 1. 2. 2.]
[3. 3. 1. 1.]]
[[1. 1. 1. 1.]
[2. 2. 1. 1.]
[2. 2. 2. 2.]
[3. 3. 1. 1.]]]
[array([14., 14., 10., 10.]), array([ 7., 7., 5., 10.]), array([14., 14., 12., 24.])]
"""
(3) interaction 模式
inputs = np.array([[1., 1., 1., 1.], [2., 2., 2., 2.], [1., 1., 1., 2.]]) # 两两交互,共产生3个组合
print("input: ", inputs)
print()
p = fibinet(inputs, bilinear_type="interaction")
print()
sess = tf.InteractiveSession()
p_sess = sess.run(p)
print(p_sess)
"""
input: [[1. 1. 1. 1.]
[2. 2. 2. 2.]
[1. 1. 1. 2.]]
bilinear_type = interaction
W_list:
[[[1. 1. 1. 1.]
[2. 2. 1. 1.]
[1. 1. 2. 2.]
[3. 3. 1. 1.]]
[[1. 1. 1. 1.]
[2. 2. 1. 1.]
[1. 1. 2. 2.]
[3. 3. 1. 1.]]
[[1. 1. 1. 1.]
[2. 2. 1. 1.]
[1. 1. 2. 2.]
[3. 3. 1. 1.]]]
[array([14., 14., 10., 10.]), array([ 7., 7., 5., 10.]), array([14., 14., 10., 20.])]
"""
第三部分:案例实践
1. 准备数据
titanic数据集 的目标是根据乘客信息预测他们在Titanic号撞击冰山沉没后能否生存。结构化数据一般会使用Pandas中的DataFrame进行预处理。
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
df_data = pd.read_csv('data/train.csv')
titanic数据集下载地址: https://www.kaggle.com/c/titanic/data
字段说明:
# 类别变量重新编码
# 数值变量,用0填充缺失值
sparse_feature_list = ["Pclass", "Sex", "Cabin", "Embarked"]
dense_feature_list = ["Age", "SibSp", "Parch", "Fare"]
sparse_feature_reindex_dict = {}
for i in sparse_feature_list:
cur_sparse_feature_list = df_data[i].unique()
sparse_feature_reindex_dict[i] = dict(zip(cur_sparse_feature_list, \
range(1, len(cur_sparse_feature_list)+1)
)
)
df_data[i] = df_data[i].map(sparse_feature_reindex_dict[i])
for j in dense_feature_list:
df_data[j] = df_data[j].fillna(0)
# 分割数据集
data = df_data[sparse_feature_list + dense_feature_list]
label = df_data["Survived"].values
xtrain, xtest, ytrain, ytest = train_test_split(data, label, test_size=0.2, random_state=2020)
xtrain_data = {"Pclass": np.array(xtrain["Pclass"]), \
"Sex": np.array(xtrain["Sex"]), \
"Cabin": np.array(xtrain["Cabin"]), \
"Embarked": np.array(xtrain["Embarked"]), \
"Age": np.array(xtrain["Age"]), \
"SibSp": np.array(xtrain["SibSp"]), \
"Parch": np.array(xtrain["Parch"]), \
"Fare": np.array(xtrain["Fare"])}
xtest_data = {"Pclass": np.array(xtest["Pclass"]), \
"Sex": np.array(xtest["Sex"]), \
"Cabin": np.array(xtest["Cabin"]), \
"Embarked": np.array(xtest["Embarked"]), \
"Age": np.array(xtest["Age"]), \
"SibSp": np.array(xtest["SibSp"]), \
"Parch": np.array(xtest["Parch"]), \
"Fare": np.array(xtest["Fare"])}
2. 构建模型
(2.1)加载python模块
import tensorflow as tf
from tensorflow.python.keras import backend as K
from tensorflow.python.keras.layers import Input, Embedding, \
Dot, Flatten, Concatenate, Dense
from tensorflow.keras.models import Model
from tensorflow.python.keras.layers import Layer
from tensorflow.python.keras.initializers import Zeros, glorot_normal
from tensorflow.python.keras.optimizers import Adam
from tensorflow.python.keras.regularizers import l2
from deepctr.layers.core import PredictionLayer, DNN
from deepctr.layers.utils import Linear
from keras.utils import plot_model
(2.2)定义类别变量的输入层、Embedding层
def input_embedding_layer(
shape=1, \
name=None, \
vocabulary_size=1, \
embedding_dim=1):
input_layer = Input(shape=[shape, ], name=name)
embedding_layer = Embedding(vocabulary_size, embedding_dim)(input_layer)
return input_layer, embedding_layer
(2.3)定义 SENETLayer, BilinearInteraction层
class SENETLayer(Layer):
"""SENETLayer used in FiBiNET.
Input shape
- A list of 3D tensor with shape: ``(batch_size,1,embedding_size)``.
Output shape
- A list of 3D tensor with shape: ``(batch_size,1,embedding_size)``.
Arguments
- **reduction_ratio** : Positive integer, dimensionality of the
attention network output space.
- **seed** : A Python integer to use as random seed.
References
- [FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction](https://arxiv.org/pdf/1905.09433.pdf)
"""
def __init__(self, reduction_ratio=3, seed=1024, **kwargs):
self.reduction_ratio = reduction_ratio
self.seed = seed
super(SENETLayer, self).__init__(**kwargs)
def build(self, input_shape):
if not isinstance(input_shape, list) or len(input_shape) < 2:
raise ValueError('A `AttentionalFM` layer should be called '
'on a list of at least 2 inputs')
self.filed_size = len(input_shape)
self.embedding_size = input_shape[0][-1]
reduction_size = max(1, self.filed_size // self.reduction_ratio)
self.W_1 = self.add_weight(shape=(
self.filed_size, reduction_size), initializer=glorot_normal(seed=self.seed), name="W_1")
self.W_2 = self.add_weight(shape=(
reduction_size, self.filed_size), initializer=glorot_normal(seed=self.seed), name="W_2")
self.tensordot = tf.keras.layers.Lambda(
lambda x: tf.tensordot(x[0], x[1], axes=(-1, 0)))
# Be sure to call this somewhere!
super(SENETLayer, self).build(input_shape)
def call(self, inputs, training=None, **kwargs):
if K.ndim(inputs[0]) != 3:
raise ValueError(
"Unexpected inputs dimensions %d, expect to be 3 dimensions" % (K.ndim(inputs)))
inputs = Concatenate(axis=1)(inputs)
Z = tf.reduce_mean(inputs, axis=-1, )
A_1 = tf.nn.relu(self.tensordot([Z, self.W_1]))
A_2 = tf.nn.relu(self.tensordot([A_1, self.W_2]))
V = tf.multiply(inputs, tf.expand_dims(A_2, axis=2))
return tf.split(V, self.filed_size, axis=1)
def compute_output_shape(self, input_shape):
return input_shape
def compute_mask(self, inputs, mask=None):
return [None] * self.filed_size
def get_config(self, ):
config = {'reduction_ratio': self.reduction_ratio, 'seed': self.seed}
base_config = super(SENETLayer, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
class BilinearInteraction(Layer):
"""BilinearInteraction Layer used in FiBiNET.
Input shape
- A list of 3D tensor with shape: ``(batch_size,1,embedding_size)``.
Output shape
- 3D tensor with shape: ``(batch_size,1,embedding_size)``.
Arguments
- **str** : String, types of bilinear functions used in this layer.
- **seed** : A Python integer to use as random seed.
References
- [FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction](https://arxiv.org/pdf/1905.09433.pdf)
"""
def __init__(self, bilinear_type="interaction", seed=1024, **kwargs):
self.bilinear_type = bilinear_type
self.seed = seed
super(BilinearInteraction, self).__init__(**kwargs)
def build(self, input_shape):
if not isinstance(input_shape, list) or len(input_shape) < 2:
raise ValueError('A `AttentionalFM` layer should be called '
'on a list of at least 2 inputs')
embedding_size = int(input_shape[0][-1])
if self.bilinear_type == "all":
self.W = self.add_weight(shape=(embedding_size, embedding_size), initializer=glorot_normal(
seed=self.seed), name="bilinear_weight")
elif self.bilinear_type == "each":
self.W_list = [self.add_weight(shape=(embedding_size, embedding_size), initializer=glorot_normal(
seed=self.seed), name="bilinear_weight" + str(i)) for i in range(len(input_shape) - 1)]
elif self.bilinear_type == "interaction":
self.W_list = [self.add_weight(shape=(embedding_size, embedding_size), initializer=glorot_normal(
seed=self.seed), name="bilinear_weight" + str(i) + '_' + str(j)) for i, j in
itertools.combinations(range(len(input_shape)), 2)]
else:
raise NotImplementedError
super(BilinearInteraction, self).build(
input_shape) # Be sure to call this somewhere!
def call(self, inputs, **kwargs):
if K.ndim(inputs[0]) != 3:
raise ValueError(
"Unexpected inputs dimensions %d, expect to be 3 dimensions" % (K.ndim(inputs)))
if self.bilinear_type == "all":
p = [tf.multiply(tf.tensordot(v_i, self.W, axes=(-1, 0)), v_j)
for v_i, v_j in itertools.combinations(inputs, 2)]
elif self.bilinear_type == "each":
p = [tf.multiply(tf.tensordot(inputs[i], self.W_list[i], axes=(-1, 0)), inputs[j])
for i, j in itertools.combinations(range(len(inputs)), 2)]
elif self.bilinear_type == "interaction":
p = [tf.multiply(tf.tensordot(v[0], w, axes=(-1, 0)), v[1])
for v, w in zip(itertools.combinations(inputs, 2), self.W_list)]
else:
raise NotImplementedError
return Concatenate(axis=-1)(p)
def compute_output_shape(self, input_shape):
filed_size = len(input_shape)
embedding_size = input_shape[0][-1]
return (None, 1, filed_size * (filed_size - 1) // 2 * embedding_size)
def get_config(self, ):
config = {'bilinear_type': self.bilinear_type, 'seed': self.seed}
base_config = super(BilinearInteraction, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
(2.4)定义FIBINET模型结构
def fibinet(sparse_feature_list, \
sparse_feature_reindex_dict, \
dense_feature_list, \
bilinear_type='interaction', \
reduction_ratio=3, \
dnn_hidden_units=(128, 128), \
l2_reg_linear=1e-5, \
l2_reg_embedding=1e-5, \
l2_reg_dnn=0, \
init_std=0.0001, \
seed=1024, \
dnn_dropout=0.3, \
dnn_activation='relu', \
task='binary'):
sparse_input_layer_list = []
sparse_embedding_layer_list = []
dense_input_layer_list = []
# 1. Input & Embedding sparse features
for i in sparse_feature_list:
shape = 1
name = i
vocabulary_size = len(sparse_feature_reindex_dict[i]) + 1
embedding_dim = 64
cur_sparse_feaure_input_layer, cur_sparse_feaure_embedding_layer = \
input_embedding_layer(
shape = shape, \
name = name, \
vocabulary_size = vocabulary_size, \
embedding_dim = embedding_dim)
sparse_input_layer_list.append(cur_sparse_feaure_input_layer)
sparse_embedding_layer_list.append(cur_sparse_feaure_embedding_layer)
# 2. Input dense features
for j in dense_feature_list:
dense_input_layer_list.append(Input(shape=(1, ), name=j))
# === linear part ===
sparse_linear_input = Concatenate(axis=-1)(sparse_embedding_layer_list)
dense_linear_input = Concatenate(axis=-1)(dense_input_layer_list)
linear_logit = Linear()([sparse_linear_input, dense_linear_input])
# === fibinet part ===
senet_embedding_list = SENETLayer(reduction_ratio, seed)(sparse_embedding_layer_list)
senet_bilinear_out = BilinearInteraction(bilinear_type=bilinear_type, seed=seed)(senet_embedding_list)
bilinear_out = BilinearInteraction(bilinear_type=bilinear_type, seed=seed)(sparse_embedding_layer_list)
dnn_input = Concatenate(axis=-1)(
[Flatten()(Concatenate(axis=-1)([senet_bilinear_out, bilinear_out])), \
dense_linear_input] \
)
dnn_output = DNN(dnn_hidden_units, dnn_activation, l2_reg_dnn, dnn_dropout, False, seed)(dnn_input)
dnn_logit = tf.keras.layers.Dense(1, use_bias=False, activation=None)(dnn_output)
# === output ===
out = PredictionLayer(task)(tf.keras.layers.add([linear_logit, dnn_logit]))
fibinet_model = Model(inputs = sparse_input_layer_list + dense_input_layer_list, outputs=out)
return fibinet_model
(2.5)应用FIBINET模型
fibinet_model = fibinet(sparse_feature_list, \
sparse_feature_reindex_dict, \
dense_feature_list)
(2.6)打印FIBINET模型 summary
print(fibinet_model.summary())
(2.7)输出FIBINET模型结构图
plot_model(fibinet_model, to_file='fibinet_model.png')
(2.8)编译 FIBINET 模型,训练模型
fibinet_model.compile(loss='binary_crossentropy', \
optimizer=Adam(lr=1e-3), \
metrics=['accuracy'])
history = fibinet_model.fit(xtrain_data, ytrain, epochs=5, batch_size=32, validation_data=(xtest_data, ytest))
(2.9)绘制损失函数图
import matplotlib.pyplot as plt
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
print(plt.show())
最后说明:
该模型的调优实践经验,请参考博客:FiBiNET: paper reading + 实践调优经验
参考:
[1] Huang, Tongwen, Zhiqi Zhang, and Junlin Zhang. "FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction." arXiv preprint arXiv:1905.09433 (2019).
[2] Cheng, Heng-Tze, et al. "Wide & deep learning for recommender systems." Proceedings of the 1st workshop on deep learning for recommender systems. ACM, 2016.
[3] Zhang, Weinan, Tianming Du, and Jun Wang. "Deep learning over multi-field categorical data." European conference on information retrieval. Springer, Cham, 2016.
[4] Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.