前言
DeepFM算法是对wide&deep的改进,是将wide层变成FM。主要改进有如下:
- 加强了对浅层特征网络组合的能力
- FM层和Deep层共享embedding
- 自动交叉特征,端到端,无需特征工程
- 评估模型需要用到新指标,Gini Normalization
一、模型结构
FM模型部分是由一阶特征和二阶特征Concatenate到一起在经过一个Sigmoid得到logits,所以在实现的时候需要单独考虑linear部分和FM交叉特征部分。
Deep Module是为了学习高阶的特征组合,在上图中使用用全连接的方式将Dense Embedding输入到Hidden Layer,这里面Dense Embeddings就是为了解决DNN中的参数爆炸问题,这也是推荐模型中常用的处理方法。
二、数据输入处理
在Deep FM输入数据的过程中,为了防止内存爆炸,one-hot的表示方法会过于稀疏, 采用特殊的数据存储方式.采用feature_index和feature_values方便其后embedding_lookup查找对应的隐向量表示.对离散特征,索引值为n个,就有n个index,递增加入其feature_values为1. 对于连续型变量, 索引值只有1个, 取值values是原值的本身.
图中的黄点蓝点代表什么?
每个Field都含有不同的特征分类, 输入一个样本, 一个field上有且仅有一个选项有值, 即黄色的代表1, 蓝色代表0.
- 尽管Field输入长度不同,但是embedding后向量的长度均为k
- 在FM中得到隐变量Vik现在为嵌入层网络的权重
三、代码实现
#encoding:utf-8
"""
Tensorflow implementation of DeepFM [1]
Reference:
[1] DeepFM: A Factorization-Machine based Neural Network for CTR Prediction,
Huifeng Guo, Ruiming Tang, Yunming Yey, Zhenguo Li, Xiuqiang He.
"""
import numpy as np
import tensorflow as tf
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.metrics import roc_auc_score
from time import time
from tensorflow.contrib.layers.python.layers import batch_norm as batch_norm
#from yellowfin import YFOptimizer
class DeepFM(BaseEstimator, TransformerMixin):
def __init__(self, feature_size, field_size,
embedding_size=8, dropout_fm=[1.0, 1.0],
deep_layers=[32, 32], dropout_deep=[0.5, 0.5, 0.5],
deep_layers_activation=tf.nn.relu,
epoch=10, batch_size=256,
learning_rate=0.001, optimizer_type="adam",
batch_norm=0, batch_norm_decay=0.995,
verbose=False, random_seed=2016,
use_fm=True, use_deep=True,
loss_type="logloss", eval_metric=roc_auc_score,
l2_reg=0.0, greater_is_better=True):
assert (use_fm or use_deep)
assert loss_type in ["logloss", "mse"], \
"loss_type can be either 'logloss' for classification task or 'mse' for regression task"
self.feature_size = feature_size # denote as M, size of the feature dictionary
self.field_size = field_size # denote as F, size of the feature fields
self.embedding_size = embedding_size # denote as K, size of the feature embedding
self.dropout_fm = dropout_fm
# 定义DNN的层数
self.deep_layers = deep_layers
self.dropout_deep = dropout_deep
# DNN的激活函数
self.deep_layers_activation = deep_layers_activation
self.use_fm = use_fm
self.use_deep = use_deep
self.l2_reg = l2_reg
self.epoch = epoch
self.batch_size = batch_size
self.learning_rate = learning_rate
self.optimizer_type = optimizer_type #优化器,adam
self.batch_norm = batch_norm #normalization用的
self.batch_norm_decay = batch_norm_decay
self.verbose = verbose #是否打印训练过程中的信息
self.random_seed = random_seed
self.loss_type = loss_type
self.eval_metric = eval_metric
self.greater_is_better = greater_is_better
self.train_result, self.valid_result = [], []
self._init_graph()
def _init_graph(self):
self.graph = tf.Graph()
with self.graph.as_default(): #设置为默认图
tf.set_random_seed(self.random_seed)
self.feat_index = tf.placeholder(tf.int32, shape=[None, None],
name="feat_index") # None * F
self.feat_value = tf.placeholder(tf.float32, shape=[None, None],
name="feat_value") # None * F
self.label = tf.placeholder(tf.float32, shape=[None, 1], name="label") # None * 1
self.dropout_keep_fm = tf.placeholder(tf.float32, shape=[None], name="dropout_keep_fm")
self.dropout_keep_deep = tf.placeholder(tf.float32, shape=[None], name="dropout_keep_deep")
self.train_phase = tf.placeholder(tf.bool, name="train_phase")
self.weights = self._initialize_weights()
# model
#按照index索引找出权重矩阵中对应的embedding表示,再与对应的feat_value相乘
self.embeddings = tf.nn.embedding_lookup(self.weights["feature_embeddings"],
self.feat_index) # None * F * K
feat_value = tf.reshape(self.feat_value, shape=[-1, self.field_size, 1])
self.embeddings = tf.multiply(self.embeddings, feat_value)
# ---------- first order term ----------
#计算一次项,找到索引对应的偏置,b*w随后按照axis=2求和
self.y_first_order = tf.nn.embedding_lookup(self.weights["feature_bias"], self.feat_index) # None * F * 1
self.y_first_order = tf.reduce_sum(tf.multiply(self.y_first_order, feat_value), 2) # None * F
self.y_first_order = tf.nn.dropout(self.y_first_order, self