欢迎关注公众号:python科技园,一起学习python和算法知识。
模型是在
和
的基础上发展而来的,与
相比的主要区别是用两个向量的内积
取代了单一的权重系数
。具体地说,
为每个特征学习了一个隐向量权重。在做特征交叉时,使用两个特征隐向量的内积作为交叉特征的权重。想了解更多细节,请参考 FM算法解析
其中:
本质上, 算法引入隐向量的做法,与矩阵分解用隐向量表示用户和物品的做法异曲同工。
是将矩阵分解隐向量的思想进行了进一步的扩展,从单纯的 user embedding、item embedding 扩展到了所有的特征上。
隐向量的引入使 能够更好的解决数据稀疏性的问题,举例来说,在某商品推荐的场景下,样本有2个特征,分别是频道(channel)和品牌(brand),某训练样本的特征组合是 (ESPN, Adidas)。在
中,只有当 ESPN 和 Adidas 同时出现在一个训练样本中,模型才能学到这个组合特征对应的权重;而在
中,ESPN 的隐向量也可以通过(ESPN, Nike) 样本进行更新,Adidas的隐向量也可以通过 (NBC, Adidas) 进行更新,大幅度降低了模型对数据稀疏性的要求。同时对于一个未曾出现过的组合 (NBC, Nike),由于模型之前已经分别学习过 NBC 和 Nike 的隐向量,已经具备计算该特征组合权重的能力,泛化性得到大幅提升。
代码安排:
1. 先使用原生的模式计算一下交叉项,即
a = [
[[0., 1., 6., 11.],
[1., 2., 3., 4.],
[4., 5., 6., 1.]]
]
from itertools import combinations
import tensorflow as tf
from tensorflow.python.keras.layers import Dot
def raw_fm_cross_layer(embbeding_list):
dot_list = []
for i, j in combinations(embbeding_list, 2):
i = tf.convert_to_tensor([i])
j = tf.convert_to_tensor([j])
cur_dot_value = Dot(axes=1)([i, j])
dot_list.append(cur_dot_value)
return sum(dot_list)
# 计算,并查看结果值
embedding_list = a[0]
raw_fm_result = raw_fm_cross_layer(embedding_list)
sess = tf.InteractiveSession()
raw_fm_result = sess.run(raw_fm_result)
print(raw_fm_result)
# [[152.]]
2. 使用隐向量模式计算一下交叉项,即
from tensorflow.python.keras import backend as K
import tensorflow as tf
def fm_cross_layer(embedding_list):
square_of_sum = tf.square(tf.reduce_sum(
embedding_list, axis=1, keepdims=True))
sum_of_square = tf.reduce_sum(
embedding_list * embedding_list, axis=1, keepdims=True)
cross_term = square_of_sum - sum_of_square
cross_term = 0.5 * tf.reduce_sum(cross_term, axis=2, keepdims=False)
return cross_term
查看一下结果值。
# 计算,并查看结果值
embedding_list = tf.convert_to_tensor(a)
fm_result = fm_cross_layer(embedding_list)
sess = tf.InteractiveSession()
fm_result = sess.run(fm_result)
print(fm_result)
# [[152.]]
可以看到 方式1 和 方式2 的结果是一致的。
接下来继续,通过实际例子应用一下 算法。
一、准备数据
titanic数据集 的目标是根据乘客信息预测他们在Titanic号撞击冰山沉没后能否生存。
结构化数据一般会使用Pandas中的DataFrame进行预处理。
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
df_data = pd.read_csv('data/train.csv')
titanic数据集下载地址: https://www.kaggle.com/c/titanic/data
字段说明:
- Survived: 0代表死亡,1代表存活 【y标签】
- Pclass: 乘客所持票类,有三种值(1,2,3) 【类别变量】
Name: 乘客姓名 【舍去】- Sex: 乘客性别 【类别变量】
- Age: 乘客年龄(有缺失) 【数值特征】
- SibSp: 乘客兄弟姐妹/配偶的个数(整数值) 【数值特征】
- Parch: 乘客父母/孩子的个数(整数值) 【数值特征】
Ticket: 票号(字符串) 【舍去】- Fare: 乘客所持票的价格(浮点数,0-500不等) 【数值特征】
- Cabin: 乘客所在船舱(有缺失) 【类别变量】
- Embarked: 乘客登船港口:S、C、Q(有缺失) 【类别变量】
# 类别变量重新编码
# 数值变量,用0填充缺失值
sparse_feature_list = ["Pclass", "Sex", "Cabin", "Embarked"]
dense_feature_list = ["Age", "SibSp", "Parch", "Fare"]
sparse_feature_reindex_dict = {}
for i in sparse_feature_list:
cur_sparse_feature_list = df_data[i].unique()
sparse_feature_reindex_dict[i] = dict(zip(cur_sparse_feature_list, \
range(1, len(cur_sparse_feature_list)+1)
)
)
df_data[i] = df_data[i].map(sparse_feature_reindex_dict[i])
for j in dense_feature_list:
df_data[j] = df_data[j].fillna(0)
# 分割数据集
data = df_data[sparse_feature_list + dense_feature_list]
label = df_data["Survived"].values
xtrain, xtest, ytrain, ytest = train_test_split(data, label, test_size=0.2, random_state=2020)
xtrain_data = {"Pclass": np.array(xtrain["Pclass"]), \
"Sex": np.array(xtrain["Sex"]), \
"Cabin": np.array(xtrain["Cabin"]), \
"Embarked": np.array(xtrain["Embarked"]), \
"Age": np.array(xtrain["Age"]), \
"SibSp": np.array(xtrain["SibSp"]), \
"Parch": np.array(xtrain["Parch"]), \
"Fare": np.array(xtrain["Fare"])}
xtest_data = {"Pclass": np.array(xtest["Pclass"]), \
"Sex": np.array(xtest["Sex"]), \
"Cabin": np.array(xtest["Cabin"]), \
"Embarked": np.array(xtest["Embarked"]), \
"Age": np.array(xtest["Age"]), \
"SibSp": np.array(xtest["SibSp"]), \
"Parch": np.array(xtest["Parch"]), \
"Fare": np.array(xtest["Fare"])}
二、构建模型
(1)加载python模块
import tensorflow as tf
from tensorflow.python.keras import backend as K
from tensorflow.python.keras.layers import Input, Embedding, \
Dot, Flatten, Concatenate, Dense
from tensorflow.keras.models import Model
from tensorflow.python.keras.layers import Layer
from tensorflow.python.keras.initializers import Zeros
from tensorflow.python.keras.optimizers import Adam
from keras.utils import plot_model
(2)定义类别变量的输入层和Embedding层
def input_embedding_layer(
shape=1, \
name=None, \
vocabulary_size=1, \
embedding_dim=1):
input_layer = Input(shape=[shape, ], name=name)
embedding_layer = Embedding(vocabulary_size, embedding_dim)(input_layer)
return input_layer, embedding_layer
(3)定义 线性层、FM二阶交叉层、预测层
class Linear(Layer):
def __init__(self, l2_reg=0.0, mode=2, use_bias=True, **kwargs):
self.l2_reg = l2_reg
#self.l2_reg = tf.contrib.layers.l2_regularizer(float(l2_reg_linear))
if mode not in [0, 1, 2]:
raise ValueError("mode must be 0, 1 or 2")
self.mode = mode
self.use_bias = use_bias
super(Linear, self).__init__(**kwargs)
def build(self, input_shape):
if self.use_bias:
self.bias = self.add_weight(name='linear_bias',
shape=(1,),
initializer=tf.keras.initializers.Zeros(),
trainable=True)
if self.mode == 1:
self.kernel = self.add_weight(
'linear_kernel',
shape=[int(input_shape[-1]), 1],
initializer=tf.keras.initializers.glorot_normal(),
regularizer=tf.keras.regularizers.l2(self.l2_reg),
trainable=True)
elif self.mode == 2 :
self.kernel = self.add_weight(
'linear_kernel',
shape=[int(input_shape[1][-1]), 1],
initializer=tf.keras.initializers.glorot_normal(),
regularizer=tf.keras.regularizers.l2(self.l2_reg),
trainable=True)
super(Linear, self).build(input_shape) # Be sure to call this somewhere!
def call(self, inputs, **kwargs):
if self.mode == 0:
sparse_input = inputs
linear_logit = reduce_sum(sparse_input, axis=-1, keep_dims=True)
elif self.mode == 1:
dense_input = inputs
fc = tf.tensordot(dense_input, self.kernel, axes=(-1, 0))
linear_logit = fc
else:
sparse_input, dense_input = inputs
fc = tf.tensordot(dense_input, self.kernel, axes=(-1, 0))
linear_logit = tf.reduce_sum(sparse_input, axis=-1, keep_dims=False) + fc
if self.use_bias:
linear_logit += self.bias
return linear_logit
def compute_output_shape(self, input_shape):
return (None, 1)
def compute_mask(self, inputs, mask):
return None
def get_config(self, ):
config = {'mode': self.mode, 'l2_reg': self.l2_reg,'use_bias':self.use_bias}
base_config = super(Linear, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
class FM(Layer):
"""Factorization Machine models pairwise (order-2) feature interactions
without linear term and bias.
Input shape
- 3D tensor with shape: ``(batch_size,field_size,embedding_size)``.
Output shape
- 2D tensor with shape: ``(batch_size, 1)``.
References
- [Factorization Machines](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf)
"""
def __init__(self, **kwargs):
super(FM, self).__init__(**kwargs)
def build(self, input_shape):
if len(input_shape) != 3:
raise ValueError("Unexpected inputs dimensions % d,\
expect to be 3 dimensions" % (len(input_shape)))
super(FM, self).build(input_shape) # Be sure to call this somewhere!
def call(self, inputs, **kwargs):
if K.ndim(inputs) != 3:
raise ValueError(
"Unexpected inputs dimensions %d, expect to be 3 dimensions"
% (K.ndim(inputs)))
concated_embeds_value = inputs
square_of_sum = tf.square(tf.reduce_sum(
concated_embeds_value, axis=1, keep_dims=True))
sum_of_square = tf.reduce_sum(
concated_embeds_value * concated_embeds_value, axis=1, keep_dims=True)
cross_term = square_of_sum - sum_of_square
cross_term = 0.5 * tf.reduce_sum(cross_term, axis=2, keep_dims=False)
return cross_term
def compute_output_shape(self, input_shape):
return (None, 1)
class PredictionLayer(Layer):
"""
Arguments
- **task**: str, ``"binary"`` for binary logloss or ``"regression"`` for regression loss
- **use_bias**: bool.Whether add bias term or not.
"""
def __init__(self, task='binary', use_bias=True, **kwargs):
if task not in ["binary", "multiclass", "regression"]:
raise ValueError("task must be binary, multiclass or regression")
self.task = task
self.use_bias = use_bias
super(PredictionLayer, self).__init__(**kwargs)
def build(self, input_shape):
if self.use_bias:
self.global_bias = self.add_weight(
shape=(1,), initializer=Zeros(), name="global_bias")
# Be sure to call this somewhere!
super(PredictionLayer, self).build(input_shape)
def call(self, inputs, **kwargs):
x = inputs
if self.use_bias:
x = tf.nn.bias_add(x, self.global_bias, data_format='NHWC')
if self.task == "binary":
x = tf.sigmoid(x)
output = tf.reshape(x, (-1, 1))
return output
def compute_output_shape(self, input_shape):
return (None, 1)
def get_config(self, ):
config = {'task': self.task, 'use_bias': self.use_bias}
base_config = super(PredictionLayer, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
(4)定义 FM模型结构
def fm(sparse_feature_list, \
sparse_feature_reindex_dict, \
dense_feature_list, \
task='binary'):
sparse_input_layer_list = []
sparse_embedding_layer_list = []
dense_input_layer_list = []
# 1. Input & Embedding sparse features
for i in sparse_feature_list:
shape = 1
name = i
vocabulary_size = len(sparse_feature_reindex_dict[i]) + 1
embedding_dim = 64
cur_sparse_feaure_input_layer, cur_sparse_feaure_embedding_layer = \
input_embedding_layer(
shape = shape, \
name = name, \
vocabulary_size = vocabulary_size, \
embedding_dim = embedding_dim)
sparse_input_layer_list.append(cur_sparse_feaure_input_layer)
sparse_embedding_layer_list.append(cur_sparse_feaure_embedding_layer)
#print("sparse_embedding_layer_list: ", sparse_embedding_layer_list)
#print("\n"*3)
# 2. Input dense features
for j in dense_feature_list:
dense_input_layer_list.append(Input(shape=(1, ), name=j))
#print("dense_input_layer_list", dense_input_layer_list)
#print("\n"*3)
# === linear ===
sparse_linear_input = Concatenate(axis=-1)(sparse_embedding_layer_list)
dense_linear_input = Concatenate(axis=-1)(dense_input_layer_list)
#print("sparse_linear_input", sparse_linear_input)
#print("dense_linear_input", dense_linear_input)
#print("\n"*3)
linear_logit = Linear()([sparse_linear_input, dense_linear_input])
#print("linear_logit", linear_logit)
#print("\n"*3)
# === fm cross ===
sparse_embedding_layer_list = Concatenate(axis=1)(sparse_embedding_layer_list)
#print("sparse_embedding_layer_list", sparse_embedding_layer_list)
#print("\n"*3)
fm_cross_logit = FM()(sparse_embedding_layer_list)
#print(K.ndim(tf.convert_to_tensor(sparse_embedding_layer_list)))
#print("fm_cross_logit", fm_cross_logit)
#print("\n"*3)
# === predict ===
#out = Dense(1, activation='sigmoid')(tf.keras.layers.add([linear_logit, fm_cross_logit]))
#out = PredictionLayer(task)(linear_logit + fm_cross_logit)
out = PredictionLayer(task)(tf.keras.layers.add([linear_logit, fm_cross_logit]))
#print(out)
fm_model = Model(inputs= sparse_input_layer_list + dense_input_layer_list, outputs=out)
return fm_model
(5)开始 应用 FM 模型
fm_model = fm(sparse_feature_list, \
sparse_feature_reindex_dict, \
dense_feature_list)
(6)打印 FM 模型 summary
print(fm_model.summary())
(7)输出 FM 模型结构图
plot_model(fm_model, to_file='fm_model.png')
(8)编译 FM 模型,训练模型
fm_model.compile(loss='binary_crossentropy', \
optimizer=Adam(lr=1e-3), \
metrics=['accuracy'])
history = fm_model.fit(xtrain_data, ytrain, epochs=5, batch_size=32, validation_data=(xtest_data, ytest))
(9)绘制 损失函数 图
import matplotlib.pyplot as plt
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()
参考:
1. 深度学习推荐系统,王喆著,京东购买链接:https://u.jd.com/j7l2xP
2. FM算法解析,https://zhuanlan.zhihu.com/p/37963267