【阅读笔记】Practical Lessons from Predicting Clicks on Ads at Facebook

最新推荐文章于 2021-11-13 10:41:42 发布

SrdLaplaceGua

最新推荐文章于 2021-11-13 10:41:42 发布

阅读量3.3k

点赞数 3

分类专栏：机器学习

本文链接：https://blog.csdn.net/SrdLaplace/article/details/80891416

版权

作者：
Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, Joaquin Quiñonero Candela
Facebook
1601 Willow Road, Menlo Park, CA, United States
{panjunfeng, oujin, joaquinq, sbowers}@fb.com
发布时间：August 24 - 27, 2014

最近准备参加个比赛，复习一下特征选择经典的方法——用GBDT的路径当作是特征，然后再用逻辑回归拟合。感觉facebook技术挺强的，快手现在用的AB test好像也是从facebook学来的。AB test和GBDT+LR这两个方法让人有一种轻巧的感觉，原理并不复杂，但是很巧妙。读完这篇文章让我对GBDT和LR的理解更深刻。

读后感

用已有特征训练GBDT模型，然后利用GBDT模型学习到的树来构造新特征，最后把这些新特征【加入原有特征一起or not】训练模型。构造的新特征向量是取值0/1的，向量的每个元素对应于GBDT模型中树的叶子结点。当一个样本点通过某棵树最终落在这棵树的一个叶子结点上，那么在新特征向量中这个叶子结点对应的元素值为1，而这棵树的其他叶子结点对应的元素值为0。新特征向量的长度等于GBDT模型里所有树包含的叶子结点数之和。

特征选择的另一个常用方法：LASSO去掉没用的特征

文章主要内容：
1.发现GBDT+LR效果好于单独的
2.精确度会随时间衰减（所以需要学习需要online，探究了线性模型最佳学习速率的选择，树模型训练起来太费时间，只把线性模型（LR）Online）
3.准确率随GBDT数目增加下降速度递减，数据量少的时候可能还会overfit，500棵几乎就饱和了
4.特征重要性：前10个特征贡献了一半的特征重要性，最后300个特征贡献不到1%
5.历史特征比上下文特征重要，但上下文特征在冷启动时很重要
6.数据量大所以需要下采样，100%-10% Uniform subsampling只影响很少的准确度，更少的话准确度会影响变大。Negative down sampling最后计算结果CTR还需要做纠正

体会：
- GBDT的缺点是不方便在线学习，优点是鲁棒性强
- LR的缺点是对异常值敏感，优点是可以在线学习

实现：
训练时可以将一半数据用来训练GBDT，一般用来将训练好的GBDT选择好的特征来训练LR
P.S. sklearn中的GBDT的apply可以给出叶子节点的编号，实现起来挺方便的

ABSTRACT

In this paper we introduce a model which combines decision trees with logistic regression, outperforming either of these methods on its own by over 3%, an improvement with significant impact to the overall system performance.（DT+LR）

We then explore how a number of fundamental parameters impact the final prediction performance of our system. Not surprisingly, the most important thing is to have the right features; Picking the optimal handling for data freshness, learning rate schema and data sampling improve the model slightly, though much less than adding a high-value feature, or picking the right model to begin with.（特征比模型调参重要）

1. INTRODUCTION

We begin with an overview of our experimental setup in Section2.

In Section 3 we evaluate different probabilistic linear classifiers and diverse online learning algorithms. In the context of linear classification we go on to evaluate the impact of feature transforms and data freshness. Inspired by the practical lessons learned, particularly around data freshness and online learning, we present a model architecture that incorporates an online learning layer, whilst producing fairly compact models.（整合了在线学习层）

Section 4 describes a key component required for the online learning layer, the online joiner, an experimental piece of infrastructure that can generate a live stream of real-time training data.

Lastly we present ways to trade accuracy for memory and compute time and to cope with massive amounts of training data.（在线学习层大量数据下trade accuracy for memory and compute time）

In Section 5 we describe practical ways to keep memory and latency contained for massive scale applications.

In Section 6 we delve into the tradeoff between training data volume and accuracy.（数据量和准确性之间的权衡）

2. EXPERIMENTAL SETUP

Evaluation metrics: Since we are most concerned with the impact of the factors to the machine learning model, we use the accuracy of prediction instead of metrics directly related to profit and revenue. In this work, we use Normalized Entropy (NE) and calibration as our major evaluation metric.

NE=−1N

最低0.47元/天解锁文章

SrdLaplaceGua

关注

3
点赞
踩
9

收藏

觉得还不错? 一键收藏
2
评论
【阅读笔记】Practical Lessons from Predicting Clicks on Ads at Facebook

作者： Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, Joaquin Quiñonero Candela Facebook 1601 Willow Road, Menlo Park, CA, Uni...
复制链接

扫一扫