( 1 )离散类特征编码后过于稀疏,不利于直接输人神经网络进行训练,如何 解决稀疏特征向量稠密化的问题。
( 2 ) 如何解决特征自动交叉组合的问题。
( 3 ) 如何在输出层中达成问题设定的优化目标。
系列导读
- 深度学习推荐系统之deepcrossing简单介绍与代码实现
- 深度学习推荐系统之wide & deep介绍和代码实现
- 深度学习推荐系统之deepFM介绍和代码实现
- NFM(2017)结构与原理简介(代码)
简介
本文要介绍的Deep Crossing模型是由微软研究院在论文《Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features》中提出的,Deep Crossing模型用于其搜索引擎Bing中的搜索广告推荐场景,主要是用来解决大规模特征自动组合问题,从而减轻或者避免手工进行特征组合的开销。Deep Crossing可以说是深度学习CTR模型的最典型和基础性的模型。
Deep Crossing完善了深度学习在推荐领域的实际应用流程,提出了一套完整的从特征工程、稀疏向量稠密化、多层神经网络进行优化目标拟合的解决方案,开启为后续的研究打下了良好的基础
模型介绍
Deep Crossing 是基于ResNet的经典DNN结构的模型,其模型结构如下:
如图所示,其网络结构主要包括 4 层 Embedding 层、Stacking 层、Multiple Residual Units 层和 Scoring 层
Embedding 层:
Embedding 层的作用是将稀疏的类别型特征转换成稠密的 Embedding 向量。 从图 3-6 中可以看到,每一个特征( 如 Feature#1,这里指的是 经 one-hot 编码后的稀疏特征向量 )经过 Embedding 层后,会转换成对应的 Embedding 向量( 如 Embedding#1 )。
一般来说,Embedding 向量的维度应远小于原始的稀疏特征向量,几十到上百 维一般就能满足需求。这里补充一点,图 3-6 中的 Feature#2 实际上代表了数值 型特征,可以看到,数值型特征不需要经过 Embedding 层,直接进人了 Stacking 层。
Stacking 层:
Stacking 层( 堆叠层 )的作用比较简单,是把不同的 Embedding 特征和数值型特征拼接在一起,形成新的包含全部特征的特征向量,该层通常也 被称为连接 concatenate ) 层。
Multiple Residual Units 层:
该层的主要结构是多层感知机,相比标准的以 感知机为基本单元的神经网络 ,Deep Crossing 模型采用了多层残差网络 ( Multi-Layer Residual Network ) 作为 MLP 的具体实现。最著名的残差网络是在 ImageNet 大赛中由微软研究员何恺明提岀的 152 层残差网络[3]。在推荐模型中的 应用,也是残差网络首次在图像识别领域之外的成功推广。 通过多层残差网络对特征向量各个维度进行充分的交叉组合,使模型能够抓 取到更多的非线性特征和组合特征的信息,进而使深度学习模型在表达能力上较 传统机器学习模型大为增强。
Scoring层:
Scoring 层作为输出层,就是为了拟合优化目标而存在的。对于
CTR 预估这类二分类问题,Scoring 层往往使用的是逻辑回归模型,而对于图像
分类等多分类问题,Scoring 层往往采用 softmax 模型。
以上是 Deep Crossing 的模型结构,在此基础上采用梯度反向传播的方法进
行训练,最终得到基于 Deep Crossing 的 CTR 预估模型。
代码实现:
# coding=utf-8
# Author:Jo Choi
# Date:2021-03-16
# Email:cai_oo@sina.com.cn
# Blog: *
'''
数据集:criteo_sample
数据集长度:54,805
------------------------------
运行结果:
----------------------------
inary_crossentropy: 0.4814 - auc: 0.7158 - val_loss: 0.6446 - val_binary_crossentropy: 0.6446 - val_auc: 0.6695
----------------------------
'''
import itertools
import pandas as pd
import numpy as np
from tqdm import tqdm
from collections import namedtuple
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler,LabelEncoder
from utils import SparseFeat, DenseFeat, VarLenSparseFeat
def data_process(data_df,dense_features,sparse_features):
"""
数据预处理,包括填充缺失值,数值处理,类别编码
:param data_df: Data_Frame格式的数据
:param dense_features: 数值特征名称列表
:param sparse_features: 离散特征名称列表
"""
#数值型特征缺失值填充0.0
data_df[dense_features] = data_df[dense_features].fillna(0.0)
for f in dense_features:
data_df[f] = data_df[f].apply(lambda x: np.log(x + 1) if x > -1 else -1)
#离散型特征缺失值填充-1
data_df[sparse_features] = data_df[sparse_features].fillna("-1")
for f in sparse_features:
#标准化
lbe = LabelEncoder()
data_df[f] = lbe.fit_transform(data_df[f])
#返回
return data_df[dense_features + sparse_features]
def build_input_layers(feature_columns):
"""
构建输入层
:param feature_columns : 数据集中的所有特征对应的特征标记
"""
# 构建input 层字典,并以dense 和 sparse 两类字典的形式返回
dense_input_dict,sparse_input_dict = {} ,{}
for fc in feature_columns:
if isinstance(fc,SparseFeat):
sparse_input_dict[fc.name] = Input(shape = (1,), name = fc.name)
elif isinstance(fc,DenseFeat):
dense_input_dict[fc.name] = Input(shape = (fc.dimension, ), name = fc.name)
return dense_input_dict, sparse_input_dict
def build_embedding_layers(feature_columns, input_layers_dict, is_linear):
# 定义一个embedding层对应的字典
embedding_layers_dict = dict()
# 将特征中的sparse特征筛选出来
sparse_feature_columns = list(filter(lambda x: isinstance(x, SparseFeat), feature_columns)) if feature_columns else []
# 如果是用于线性部分的embedding层,其维度为1,否则维度就是自己定义的embedding维度
if is_linear:
for fc in sparse_feature_columns:
embedding_layers_dict[fc.name] = Embedding(fc.vocabulary_size + 1, 1, name = '1d_emb_' + fc.name)
else:
for fc in sparse_feature_columns:
embedding_layers_dict[fc.name] = Embedding(fc.vocabulary_size + 1, fc.embedding_dim , name = 'kd_emb_' + fc.name)
return embedding_layers_dict
# 将所有的sparse特征embedding拼接
def concat_embedding_list(feature_columns, input_layer_dict, embedding_layer_dict, flatten = False):
# 将sparse特征筛选出来
sparse_feature_columns = list(filter(lambda x: isinstance(x, SparseFeat), feature_columns))
embedding_list = []
for fc in sparse_feature_columns:
# 获取输入层
_input = input_layer_dict[fc.name]
# B x 1 x dim 获取对应的embedding层
_embed = embedding_layer_dict[fc.name]
# B x dim 将input层输入到embedding层中
embed = _embed(_input)
# 是否需要flatten , 如果embedding列表最终是直接输入到Dense层中,需要进行Flatten,否则不需要
if flatten:
embed = Flatten()(embed)
embedding_list.append(embed)
return embedding_list
# DNN残差块的定义
class ResidualBlock(Layer):
'''
:units 表示DNN隐藏层神经元数量
'''
def __init__(self,units):
super(ResidualBlock, self).__init__()
self.units = units
def build(self, input_shape):
out_dim = input_shape[-1]
self.dnn1 = Dense(self.units, activation = 'relu')
# 保持输入的维度和输出的维度一致才能进行残差连接
self.dnn2 = Dense(out_dim, activation = 'relu')
def call(self , inputs):
x = inputs
x = self.dnn1(x)
x = self.dnn2(x)
# 残差操作
x = Activation('relu')(x + inputs)
return x
def get_dnn_logits(dnn_inputs, block_nums = 3):
'''
: block_nums 表示DNN残差块的数量
'''
dnn_out = dnn_inputs
for i in range(block_nums):
dnn_out = ResidualBlock(64)(dnn_out)
# 将dnn的输出转化成logits
dnn_logits = Dense(1, activation = 'sigmoid')(dnn_out)
return dnn_logits
def DeepCrossing(dnn_feature_columns):
# 构建输入层,即所有特征对应的Input()层,这里使用字典的形式返回,方便后续构建模型
dense_input_dict, sparse_input_dict = build_input_layers(dnn_feature_columns)
# 构建模型的输入层,模型的输入层不能是字典的形式,应该将字典的形式转换成列表的形式
# 注意:这里实际的输入与Input()层的对应,是通过模型输入时候的字典数据的 key 与对应的name的Input层
input_layers = list(dense_input_dict.values()) + list(sparse_input_dict.values())
# 构建维度为k的embedding层,这里使用字典的形式返回,方便后面搭建模型
embedding_layer_dict = build_embedding_layers(dnn_feature_columns, sparse_input_dict, is_linear = False)
# 将所有的dense特征拼接到一起
dense_dnn_list = list(dense_input_dict.values())
# B x n (n表示数值特征的数量)
dense_dnn_inputs = Concatenate(axis = 1)(dense_dnn_list)
# 因为需要将其与dense特征拼接到一起所以需要FLATTEN. 不进行Flatten的Embedding层输出的维度为:B x 1 x dim
sparse_dnn_list = concat_embedding_list(dnn_feature_columns, sparse_input_dict,embedding_layer_dict, flatten = True)
# B x m x dim (m表示离散特征的数量,dim表示embedding的维度)
sparse_dnn_inputs = Concatenate(axis = 1)(sparse_dnn_list)
# 将dense特征和Sparse特征拼接到一起
dnn_inputs = Concatenate(axis = 1)([dense_dnn_inputs, sparse_dnn_inputs]) # B x (n + m*dim)
# 输入到dnn中,需要提前定义需要几个残差块
output_layer = get_dnn_logits(dnn_inputs, block_nums = 3)
model = Model(input_layers, output_layer)
return model
if __name__ == "__main__":
# 读取数据
data = pd.read_csv('./data/criteo_sample.txt')
# 划分dense和sparse特征
columns = data.columns.values
dense_features = [feat for feat in columns if 'I' in feat]
sparse_features = [feat for feat in columns if 'C' in feat]
# 简单的数据预处理
train_data = data_process(data, dense_features, sparse_features)
train_data['label'] = data['label']
# 将特征做标记
dnn_feature_columns = [SparseFeat(feat, vocabulary_size = data[feat].nunique(), embedding_dim = 4)
for feat in sparse_features] + [DenseFeat(feat, 1,)
for feat in dense_features]
# 构建DeepCrossing模型
history = DeepCrossing(dnn_feature_columns)
history.summary()
history.compile(optimizer = "adam",
loss = "binary_crossentropy",
metrics =["binary_crossentropy", tf.keras.metrics.AUC(name = 'auc')])
# 将输入数据转化成字典的形式输入
train_model_input = {name: data[name] for name in dense_features + sparse_features}
# 模型训练
history.fit(train_model_input, train_data['label'].values,
batch_size = 64, epochs = 5, validation_split=0.2, )
Model: "functional_3"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
C1 (InputLayer) [(None, 1)] 0
__________________________________________________________________________________________________
****
__________________________________________________________________________________________________
concatenate_5 (Concatenate) (None, 117) 0 concatenate_3[0][0]
concatenate_4[0][0]
__________________________________________________________________________________________________
residual_block_3 (ResidualBlock (None, 117) 15157 concatenate_5[0][0]
__________________________________________________________________________________________________
residual_block_4 (ResidualBlock (None, 117) 15157 residual_block_3[0][0]
__________________________________________________________________________________________________
residual_block_5 (ResidualBlock (None, 117) 15157 residual_block_4[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 1) 118 residual_block_5[0][0]
==================================================================================================
Total params: 54,805
Trainable params: 54,805
Non-trainable params: 0
__________________________________________________________________________________________________
Epoch 1/5
3/3 [==============================] - ETA: 0s - loss: 0.6640 - binary_crossentropy: 0.6640 - auc: 0.495 - 3s 1s/step - loss: 0.6020 - binary_crossentropy: 0.6020 - auc: 0.5398 - val_loss: 0.7823 - val_binary_crossentropy: 0.7823 - val_auc: 0.6083
Epoch 2/5
3/3 [==============================] - ETA: 0s - loss: 0.6958 - binary_crossentropy: 0.6958 - auc: 0.519 - 0s 57ms/step - loss: 0.5757 - binary_crossentropy: 0.5757 - auc: 0.5316 - val_loss: 0.6354 - val_binary_crossentropy: 0.6354 - val_auc: 0.5712
Epoch 3/5
3/3 [==============================] - ETA: 0s - loss: 0.6011 - binary_crossentropy: 0.6011 - auc: 0.591 - 0s 63ms/step - loss: 0.5254 - binary_crossentropy: 0.5254 - auc: 0.6249 - val_loss: 0.6661 - val_binary_crossentropy: 0.6661 - val_auc: 0.6268
Epoch 4/5
3/3 [==============================] - ETA: 0s - loss: 0.6564 - binary_crossentropy: 0.6564 - auc: 0.603 - 0s 95ms/step - loss: 0.4947 - binary_crossentropy: 0.4947 - auc: 0.6799 - val_loss: 0.6742 - val_binary_crossentropy: 0.6742 - val_auc: 0.6695
Epoch 5/5
3/3 [==============================] - ETA: 0s - loss: 0.4772 - binary_crossentropy: 0.4772 - auc: 0.688 - 0s 61ms/step - loss: 0.4814 - binary_crossentropy: 0.4814 - auc: 0.7158 - val_loss: 0.6446 - val_binary_crossentropy: 0.6446 - val_auc: 0.6695
4/4 [==============================] - ETA: 0s - loss: 0.3814 - binary_crossentropy: 0.3814 - auc: 0.786 - 0s 3ms/step - loss: 0.4994 - binary_crossentropy: 0.4994 - auc: 0.7175
test AUC: 0.499385