Wide & Deep Learning for Recommender Systems
推荐系统中的Wide & Deep Learning
摘要
Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item inter- actions are sparse and high-rank. In this paper, we present Wide & Deep learning—jointly trained wide linear models and deep neural networks—to combine the benefits of memorization and generalization for recommender systems. We productionized and evaluated the system on Google Play, a commercial mobile app store with over one billion active users and over one million apps. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models. We have also open-sourced our implementation in TensorFlow.
译文:具有非线形特征变换的广义线性模型被广泛用于大规模回归和稀疏输入的分类问题。通过一系列跨产品特征变换来记忆特征交互是有效且可解释的,而泛化则需要更多的特征工程工作。利用较少的特征工程,深度神经网络可以通过稀疏特征学习到的低维稠密向量生成更好的未知特征组合。然而,当用户-商品交互行为比较稀疏且排名较高时,有向量的深度神经网络会过拟合并且推荐不太相关的东西。在本文中,我们提出wide & deep学习 —— 同时训练线形模型和深度神经网络,为推荐系统结合记忆和泛化的优势。我们在Google Play上制作并评估了该系统,Google Play是一个商业移动应用商店,拥有超过10亿活跃用户和超过100万个应用。在线实验结果表明,与仅限广泛和仅限深度的模型相比,Wide&Deep明显增加了应用下载量。我们还在TensorFlow中开源了我们的实现。
解读:这里有两个词很重要,在后面也会反复出现:memorization 和 generalization,译文中翻译成记忆和泛化并不是特别好(但也想不到什么更合理的)。memorization是指学习已知的特征变换和特征组合对结果的影响,generalization是指学习未知的特征变换和特征组合对结果的影响。以论文中用到的Google Play预测举例,用线性模型学习用户的年龄、工作,应用的下载量、类型等对用户是否会下载应用的影响是memorization,用深度模型学习未知的特征组合(不是很恰当的例子:用户的年龄*工作/应用的下载量+类型)对用户是否会下载应用的影响是generalization
1. 引言
A recommender system can be viewed as a search ranking system, where the input query is a set of user and contextual information, and the output is a ranked list of items. Given a query, the recommendation task is to find the relevant items in a database and then rank the items based on certain objectives, such as clicks or purchases.
One challenge in recommender systems, similar to the general search ranking problem, is to achieve both memorization and generalization. Memorization can be loosely defined as learning the frequent co-occurrence of items or features and exploiting the correlation available in the historical data. Generalization, on the other hand, is based on transitivity of correlation and explores new feature combinations that have never or rarely occurred in the past. Recommendations based on memorization are usually more topical and directly relevant to the items on which users have already performed actions. Compared with memorization, generalization tends to improve the diversity of the recommended items. In this paper, we focus on the apps recommendation problem for the Google Play store, but the approach should apply to generic recommender systems.
For massive-scale online recommendation and ranking systems in an industrial setting, generalized linear models such as logistic regression are widely used because they are simple, scalable and interpretable. The models are often trained on binarized sparse features with one-hot encoding. E.g., the binary feature “user_installed_app=netflix” has value 1 if the user installed Netflix. Memorization can be achieved effectively using cross-product transformations over sparse features, such as AND(user_installed_app=netflix, impression_app=pandora”), whose value is 1 if the user installed Netflix and then is later shown Pandora. This explains how the co-occurrence of a feature pair correlates with the target label. Generalization can be added by using features that are less granular, such as AND(user_installed_category=video, impression_category=music), but manual feature engineer- ing is often required. One limitation of cross-product trans- formations is that they do not generalize to query-item feature pairs that have not appeared in the training data.
Embedding-based models, such as factorization machines [5] or deep neural networks, can generalize to previously un- seen query-item feature pairs by learning a low-dimensional dense embedding vector for each query and item feature, with less burden of feature engineering. However, it is difficult to learn effective low-dimensional representations for queries and items when the underlying query-item matrix is sparse and high-rank, such as users with specific preferences or niche items with a narrow appeal. In such cases, there should be no interactions between most query-item pairs, but dense embeddings will lead to nonzero predictions for all query-item pairs, and thus can over-generalize and make less relevant recommendations. On the other hand, linear mod- els with cross-product feature transformations can memorize these “exception rules” with much fewer parameters.
In this paper, we present the Wide & Deep learning frame- work to achieve both memorization and generalization in one model, by jointly training a linear model component and a neural network component as shown in Figure 1.
The main contributions of the paper include:
• The Wide & Deep learning framework for jointly train- ing feed-forward neural networks with embeddings and linear model with feature transformations for generic recommender systems with sparse inputs.
• The implementation and evaluation of the Wide & Deep recommender system productionized on Google Play, a mobile app store with over one billion active users and over one million apps.
• We have open-sourced our implementation along with a high-level API in TensorFlow1.
While the idea is simple, we show that the Wide & Deep framework significantly improves the app acquisition rate on the mobile app store, while satisfying the training and serving speed requirements.
译文:推荐系统可以看作一个搜索排序系统,其中输入语句是一组用户和上下文信息,输出是一个排了序的商品列表。给定一个查询语句,推荐任务是在数据库中查询相关的商品,然后基于某些目标(例如点击或者购买)对商品排名。
与一般搜索排名问题类似,推荐系统中的一个挑战是实现记忆和泛化。记忆可以宽松定义为学习商品或者特征的共同出现频繁程度和利用历史数据中可用的相关性。
另一方面,泛化是基于相关性的传递性,探索从未出现或者极少出现过的新的特征组合。基于记忆的推荐系统通常更加直接地和与用户交互过的商品相关。和基于记忆的推荐系统相比,基于泛化的推荐系统倾向于提升推荐商品的多样性。本文中,我们关注的是Google Play商店的应用推荐问题,但该方法应适用于通用推荐系统。
在工业环境中的大规模在线推荐和排名系统中,广义线性模型(如逻辑回归)被广泛使用,因为它们简单,可扩展且可解释。模型通常使用one-hot编码的二值化稀疏特征训练。例如,如果用户安装了Netflix,则二进制功能“user_installed_app = netflix”的值为1。在稀疏特征上使用跨产品特征变换可以有效的实现记忆,例如AND(user_installed_app = netflix,impression_app = pandora“),如果用户安装了Netflix且出现在Pandora则其值为1。这解释了特征对的共现如何与目标标签相关联。可以使用不太精细的特征(例如AND(user_installed_category = video,impression_category = music))添加泛化,但通常需要人工特征工程。跨产品变换的一个局限性是不能产生没有在训练集中出现过的查询语句-商品特征对。
基于嵌入的模型,例如分解机或者深度神经网络,通过学习每个query和item的低维稠密embedding向量,可以泛化从未出现过的查询语句-商品特征对,同时减少特征工程的负担。然而,由于基础的query-item矩阵是稀疏和高排序的,例如具有特定偏好的用户或者小众商品,学习query和item的低维表征是困难的。在这种情况下,大部分的query-item对之间没有交互,但是稠密embedding会导致对所有quert-item对的非零预测,因此可能过拟合和使用不太相关的推荐。另一方面,有着跨产品交叉特征变换的线性模型可以用更少的参数记忆这些“异常规则”。
在本文中,我们提出wide & deep模型,模型通过同时训练一个线形模型和一个神经网络(图1)来同时实现记忆和泛化。
本论文的主要贡献包括:
(1)通用于具有稀疏输入的推荐系统的wide&deep框架,同时训练带有嵌入的前馈神经网络和带有特征变换的线形模型。
(2)在Google Play上实施的Wide&Deep推荐系统的实施和评估,Google Play是一个拥有超过10亿活跃用户和超过100万个应用的移动应用商店。
(3)我们开源了基于Tensorflow1的高级API的实现。
虽然这个想法很简单,但是实践表明wide&deep框架显著提高了移动app score 的app下载率,同时满足了训练和服务的速度要求。
2. 推荐系统概述
An overview of the app recommender system is shown in Figure 2. A query, which can include various user and contextual features, is generated when a user visits the app store. The recommender system returns a list of apps (also referred to as impressions) on which users can perform certain actions such as clicks or purchases. These user actions, along with the queries and impressions, are recorded in the logs as the training data for the learner.
Since there are over a million apps in the database, it is intractable to exhaustively score every app for every query within the serving latency requirements (often O(10) milliseconds). Therefore, the first step upon receiving a query is retrieval. The retrieval system returns a short list of items that best match the query using various signals, usually a combination of machine-learned models and human-defined rules. After reducing the candidate pool, the ranking system ranks all items by their scores. The scores are usually P(y|x), the probability of a user action label y given the features x, including user features (e.g., country, language, demographics), contextual features (e.g., device, hour of the day, day of the week), and impression features (e.g., app age, historical statistics of an app). In this paper, we focus on the ranking model using the Wide & Deep learning framework.
译文:app推荐系统的框架如图2所示。当用户访问app store的时候,会生成一个包含了丰富的用户和上下文信息的query。推荐系统会返回一个应用列表(也称为印象),用户可以在上面执行某些操作,例如点击或购买。这些用户行为以及查询和印象,会作为模型的训练数据。
数据库中有超过一百万个应用程序,因此在服务延迟要求(通常为o(10)毫秒)内为每个查询语句全面的对每个app评分是不现实的。因此,接收到查询语句的第一步是检索。检索系统通过机器学习模型和人工定义规则筛选返回与查询最匹配的item的简短列表。减少候选池中app数量后,排名系统通过分数对这些app进行排序。分数通常是根据用户特征(国家、语言、人口统计)、上下文特征(设备、时间、星期)、印象特征(应用年龄、应用历史数据)x预测的用户行为标签y=1的概率p(y|x)。在本文中,我们重点关注wide&deep学习框架在排名模型上的应用。
解读:这里的impressions翻译成印象可能不便于理解,应该是指(a list of apps)应用的列表,论及特征(impression features)的时候,指应用的特征
3. wide&deep模型
3.1 wide部分
The wide component is a generalized linear model of the form y = w T x + b y = w^Tx + b y=wTx+b, as illustrated in Figure 1 (left). y is the prediction, x = [ x 1 , x 2 , . . . , x d ] x=[x_1,x_2,...,x_d] x=[x1,x2,...,xd]is a vector of d features, w = [ w 1 , w 2 , . . . , w d ] w=[w_1,w_2,...,w_d] w=[w1,w2,...,wd]are the model parameters and b is the bias. The feature set includes raw input features and transformed features. One of the most important transformations is the cross-product transformation, which is defined as:
θ k ( x ) = ∏ i = 1 d x i c k i , c k i ∈ { 0 , 1 } \theta_k(x)=\prod^d_{i=1}x_i^{c_{ki}}, c_{ki}\in\{0,1\