Wide&Deep简介

MusicDancing

已于 2022-01-23 22:07:16 修改

阅读量4.3k

点赞数 3

分类专栏：深度学习文章标签：机器学习

于 2021-03-17 16:50:05 首次发布

本文链接：https://blog.csdn.net/MusicDancing/article/details/114936783

版权

深度学习专栏收录该内容

22 篇文章 0 订阅

订阅专栏

参考：

Wide&Deep模型原理与实现 - 知乎

深度模型（八）：Wide And Deep_Jony0917的专栏-CSDN博客

1.LR&FM

LR属于记忆力比较强的model，主要记忆每个特征的历史点击率，在泛化上有很大的缺陷，需要大量的人工特征工程来提高泛化能力。另外，这种线性模型特征与特征之间在模型中是独立的，无法学到在训练集中未出现过的交叉信息。

因此，在第二阶段切换到了FM(Factorization Machines)，该模型可以在很少特征工程的情况下通过学习特征的embedding 表示来学习训练集中从未见过的组合特征。

FM 模型公式：

虽然理论上讲FM可以对高阶特征进行组合建模，但是我们一般在使用中受计算复杂度和参数维度的限制，只是用到了二阶特征。很自然的，对于更高阶的特征组合可以使用多层神经网络去解决。

基于Embedding的模型，比如FM和深度NN，通过为每个特征学习一个低纬稠密的嵌入向量，可以泛化到历史数据中未出现的特征组合。但如果历史数据稀疏并且高秩(high-rank)，则很难学习到有效的embedding表示，模型很可能会过度泛化。

2. 推荐系统

推荐系统可以看作是一个搜索排名系统，其中输入查询是一组用户和上下文信息，输出是一个按顺序排列的物品列表。对于一个查询，推荐任务是在数据库中查找相关的物品，然后根据特定的目标(如单击行为或购买行为)对这些物品进行排序。

推荐系统的挑战之一是实现记忆与泛化。记忆可以简单定义为学习并利用物品或特征在历史数据中的的共现关系。泛化则是基于相关性的传递性，探索历史数据中未出现或很少出现的组合。基于记忆的推荐系统通常更加主题化，推荐的物品跟用户历史物品比较直接相关。而基于泛化的推荐，倾向于提升推荐物品的多样性。

3. Wide&Deep模型

2016年google 提出了wide&deep模型，拉开了深度学习在ctr预告领域大规模应用的序幕， Wide&Deep模型是由单层的Wide部分和多层的Deep部分组成的混合模型，即线性模型+DNN部分。

Wide部分：让模型具有较强的“记忆能力”；

Deep部分：让模型具有“泛化能力”。

正是这样的结构特点，使模型兼具了逻辑回归和深度神经网络的优点：能够快速处理并记忆大量历史行为特征，并且具有强大的表达能力，不仅在当时迅速成为业界争相应用的主流模型，而且衍生出了大量以Wide&Deep模型为基础结构的混合模型，影响力一直延续至今。

3.1 Wide&Deep学习框架

通过联合训练一个线性模型组件和一个神经网络组件，在一个模型中实现记忆和泛化。

Wide&Deep模型把单输入层的Wide部分与由Embedding层和多隐层组成的Deep部分连接起来，一起输入最终的输出层。

1. wide：原始输入特征(spare类型特征) + 手动交叉特征(cross类型特征)；

其善于处理大量稀疏的id类特征；

2. deep: 稠密特征(real value 类型特征 + 离散特征embedding后的特征)

其利用神经网络表达能力强的特点，进行深层的特征交叉，挖掘藏在特征背后的数据模式。

最终利用逻辑回归模型，输出层将Wide部分和Deep部分组合起来，形成统一的模型。

下图介绍了W&D将哪些特征作为Deep部分的输入，哪些特征作为Wide部分的输入。

Deep部分的输入是全量的特征向量，包括用户年龄、已安装应用数量、设备类型、已安装应用、曝光应用等特征。已安装应用、曝光应用等类别特征，需要经过Embedding层输入连接层，拼接成1200维的Embedding向量，再经过3层ReLU全连接层，最终输入LogLoss输出层。

注意：架构图中Wide部分只是使用了交叉特征，我们在使用的时候可以把原始的离散特征或者打散后的连续特征加过来。

3.2 Wide部分

通过Cross-product transformation 在Memorization上增加了低价非线性，使模型具有较强的“记忆能力”。“记忆能力”可以被理解为模型直接学习并利用历史数据中物品或者特征的“共现频率”的能力。

Wide部分是 $y=w^{T}x+b$ 的广义线性模型，其中y是预测值，x是特征向量，w为模型参数，b为偏差。特征包括原始特征和转换特征，最重要的一个特征转换就是特征交叉变换，定义为

其中， $c_{ki}$ 是一个布尔变量，如果第i个特征是第k个变换的一部分则为1，反之为0。对于二值特征，一个组合特征当原特征都为0的时候才会0（例如“性别=女”且“语言=英语”时为1，其他情况均为0）。这捕获了二元特征之间的相互作用，并为广义线性模型增加了非线性。

一般来说，协同过滤、逻辑回归等简单模型有较强的“记忆能力”。由于这类模型的结构简单，原始数据往往可以直接影响推荐结果，产生类似于“如果点击过A，就推荐B”这类规则式的推荐，这就相当于模型直接记住了历史数据的分布特点，并利用这些记忆进行推荐。像逻辑回归这类简单模型，如果发现这样的“强特征”，则其相应的权重就会在模型训练过程中被调整得非常大，这样就实现了对这个特征的直接记忆。

3.3 Deep部分

Deep部分的主要作用是让模型具有“泛化能力”，利用较少的特征工程，DNN可以通过稀疏特征学习到的低维稠密向量生成更好的未知特征组合。“泛化能力”可以被理解为模型传递特征的相关性，以及发掘稀疏甚至从未出现过的稀有特征与最终标签相关性的能力。

Deep部分是前馈神经网络，稀疏的离散特征首先经过embedding层，转化为低维的稠密向量，向量的纬度从10到100不等。embedding向量连接之后(embedding vector 进随机初始化)，输入到隐藏网络层，每一层隐藏网络层的操作为：

深度神经网络通过特征的多次自动组合，可以深度发掘数据中潜在的模式，即使是非常稀疏的特征向量输入，也能得到较稳定平滑的推荐概率，这就是简单模型所缺乏的“泛化能力”。

embedding column示意图：

以 sparse_column_with_keys(column_name = 'gender', keys = [female, male]) 为例，假设 female 对应 id=0，male 对应 id=1，每个id在 embedding feature 中对应 1 个 6 维的浮点数向量。在实际训练数据中，当 gender 特征取值为‘female’时，给到 DNN 输入层的将是 id=0 对应的向量（tf.embedding_lookup_sparse）。embedding_column 设置了一个 trainable 参数，指定是否根据模型训练误差更新特征对应的 embedding。

3.4 联合训练

wide的部分和deep的部分使用其输出对数几率的加权和作为预测，将其输入到联合训练的一个共同的逻辑损失函数。注意到这里的联合训练(joint training)和集成学习(ensemble)是有区别的。

集成学习中训练阶段多个模型是独立分开训练的，并不知道彼此的存在，在预测阶段预测值综合了多个模型的预测值。相反，联合训练阶段则同时训练多个模型，共同优化参数。这对模型大小也有影响：对于集成学习而言，由于训练是独立的，因此每个模型的大小通常会更大（例如：更多特征和交叉特征）来实现一个集成模型合理的精确度。

在二分类任务中，模型的预测值为：

wide和deep模型的联合训练是通过使用小批量随机优化同时将输出的梯度反向传播到模型的wide和deep部分来完成的。在实验中，使用带L1正则的FTRL算法作为wide部分的优化器，AdaGrad作为deep部分的优化器。对于逻辑回归问题，模型的预测是：

下面是利用tensorflow的高级API tf.estimator 实现wide and deep架构的代码，简洁明了。

import tensorflow as tf
estimator = tf.estimator.DNNLinearCombinedClassifier(
                     n_classes=2,
                     linear_feature_columns=feature_columns,
                     dnn_hidden_units=[4,16,32],
                     model_dir='drive/My Drive/model/click_1',
                     linear_optimizer=tf.train.ProximalAdagradOptimizer(
                                        learning_rate=0.1,
                                        l1_regularization_strength=0.001)
)

4. 模型流程

参考：Wide and deep by Google模型学习_狮子的窝-CSDN博客

1. Data Generation

Label: 标准是 app acquisition，用户下载为 1，否则为 0。

Vocabularies: 将类别特征(categorical features)映射为整型的 id，连续的实值先用累计分布函数CDF归一化到[0,1]，再划档离散化。

2. Model Training

500 百万的训练数据， Input layer 输入Continuous features 和 Categorical features。在已经训练模型基础上，采用热启动的方式，也就是从之前的模型中读取 embeddings 以及 linear model weights来初始化一个新模型。当新的训练数据来临的时候，在已有模型的基础上进行训练，以减少计算的复杂度与时间开销。

3. Model Serving

当模型训练并且优化好之后，载入推荐引擎，对每一个query request，排序系统从检索系统接收候选列表，以及每一个app对应的特征，然后根据app特征通过Wide&Deep Model计算出每一个app分数，并由高到底排序。文中还提到使用更小的batch与并行操作以提高推荐引擎的性能。

# -*- coding: utf-8 -*-
 
import tensorflow as tf
import tempfile
import pandas as pd
import urllib
import numpy as np
import warnings
 
from __future__ import print_function
 
warnings.filterwarnings("ignore")
 
# Categorical base columns.
gender = tf.contrib.layers.sparse_column_with_keys(column_name="gender", keys=["Female", "Male"])
race = tf.contrib.layers.sparse_column_with_keys(column_name="race", keys=["Amer-Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White"])
education = tf.contrib.layers.sparse_column_with_hash_bucket("education", hash_bucket_size=1000)
relationship = tf.contrib.layers.sparse_column_with_hash_bucket("relationship", hash_bucket_size=100)
workclass = tf.contrib.layers.sparse_column_with_hash_bucket("workclass", hash_bucket_size=100)
occupation = tf.contrib.layers.sparse_column_with_hash_bucket("occupation", hash_bucket_size=1000)
native_country = tf.contrib.layers.sparse_column_with_hash_bucket("native_country", hash_bucket_size=1000)
 
# Continuous base columns.
age = tf.contrib.layers.real_valued_column("age")
age_buckets = tf.contrib.layers.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
education_num = tf.contrib.layers.real_valued_column("education_num")
capital_gain = tf.contrib.layers.real_valued_column("capital_gain")
capital_loss = tf.contrib.layers.real_valued_column("capital_loss")
hours_per_week = tf.contrib.layers.real_valued_column("hours_per_week")
 
wide_columns = [
  gender, native_country, education, occupation, workclass, relationship, age_buckets,
  tf.contrib.layers.crossed_column([education, occupation], hash_bucket_size=int(1e4)),
  tf.contrib.layers.crossed_column([native_country, occupation], hash_bucket_size=int(1e4)),
  tf.contrib.layers.crossed_column([age_buckets, education, occupation], hash_bucket_size=int(1e6))]
 
deep_columns = [
  tf.contrib.layers.embedding_column(workclass, dimension=8),
  tf.contrib.layers.embedding_column(education, dimension=8),
  tf.contrib.layers.embedding_column(gender, dimension=8),
  tf.contrib.layers.embedding_column(relationship, dimension=8),
  tf.contrib.layers.embedding_column(native_country, dimension=8),
  tf.contrib.layers.embedding_column(occupation, dimension=8),
  age, education_num, capital_gain, capital_loss, hours_per_week]
 
model_dir = tempfile.mkdtemp()
m = tf.contrib.learn.DNNLinearCombinedClassifier(
    model_dir=model_dir,
    linear_feature_columns=wide_columns,
    dnn_feature_columns=deep_columns,
    dnn_hidden_units=[100, 50])
 
# Define the column names for the data sets.
COLUMNS = ["age", "workclass", "fnlwgt", "education", "education_num",
  "marital_status", "occupation", "relationship", "race", "gender",
  "capital_gain", "capital_loss", "hours_per_week", "native_country", "income_bracket"]
LABEL_COLUMN = 'label'
CATEGORICAL_COLUMNS = ["workclass", "education", "marital_status", "occupation",
                       "relationship", "race", "gender", "native_country"]
CONTINUOUS_COLUMNS = ["age", "education_num", "capital_gain", "capital_loss",
                      "hours_per_week"]
 
# Download the training and test data to temporary files.
# Alternatively, you can download them yourself and change train_file and
# test_file to your own paths.
train_file = tempfile.NamedTemporaryFile()
test_file = tempfile.NamedTemporaryFile()
urllib.urlretrieve("http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/adult.data", train_file.name)
urllib.urlretrieve("http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/adult.test", test_file.name)
 
# Read the training and test data sets into Pandas dataframe.
df_train = pd.read_csv(train_file, names=COLUMNS, skipinitialspace=True)
df_test = pd.read_csv(test_file, names=COLUMNS, skipinitialspace=True, skiprows=1)
df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
df_test[LABEL_COLUMN] = (df_test['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
 
def input_fn(df):
  # Creates a dictionary mapping from each continuous feature column name (k) to
  # the values of that column stored in a constant Tensor.
  continuous_cols = {k: tf.constant(df[k].values)
                     for k in CONTINUOUS_COLUMNS}
  # Creates a dictionary mapping from each categorical feature column name (k)
  # to the values of that column stored in a tf.SparseTensor.
  categorical_cols = {k: tf.SparseTensor(
      indices=[[i, 0] for i in range(df[k].size)],
      values=df[k].values,
      dense_shape=[df[k].size, 1])
                      for k in CATEGORICAL_COLUMNS}
  # Merges the two dictionaries into one.
  feature_cols = dict(continuous_cols.items() + categorical_cols.items())
  # Converts the label column into a constant Tensor.
  label = tf.constant(df[LABEL_COLUMN].values)
  # Returns the feature columns and the label.
  return feature_cols, label
 
def train_input_fn():
  return input_fn(df_train)
 
def eval_input_fn():
  return input_fn(df_test)
 
print('df_train shape:',np.array(df_train).shape)
print('df_test shape:',np.array(df_test).shape)
 
m.fit(input_fn=train_input_fn, steps=200)
results = m.evaluate(input_fn=eval_input_fn, steps=1)
for key in sorted(results):
    print("%s: %s" % (key, results[key]))

MusicDancing

关注

3
点赞
踩
18

收藏

觉得还不错? 一键收藏
0
评论
Wide&Deep简介

1.LR&FMLR属于Memorization比较强的model，主要记忆每个特征的历史点击率，在Generalization 上有很大的缺陷，需要大量的人工特征工程来提高泛化能力。另外，这种线性模型特征与特征之间在模型中是独立的，无法学到在训练集中未出现过的交叉信息。因此，在第二阶段我们切换到了FM(Factorization Machines),该模型可以在很少特征工程的情况下通过学习特征的embedding 表示来学习训练集中从未见过的组合特征，FM 模型公式：虽然理.
复制链接

扫一扫