elasticnet4j Java中的高性能弹性网逻辑回归求解器

Co-authored by Chinmay Nerurkar & Abraham Greenstein

Chinmay NerurkarAbraham Greenstein合着

But does it scale? It’s probably the most destructive question in Machine Learning history. It does not matter how accurate a model is if it cannot be deployed.

但这会扩展吗? 这可能是机器学习历史上最具破坏性的问题。 如果无法部署,模型的精确度无关紧要。

Scale is particularly important for our digital advertising products at Xandr. We conduct well over a hundred billion auctions for online ads daily on our real time platform. The Xandr Invest bidding engines allow our buy-side clients to participate in these auctions which typically requires real-time computation of bids that use an assortment of automated optimization algorithms. These algorithms utilize machine learning, control theory, optimization theory, and Bayesian statistics to ensure that ad campaigns spend their budgets on the specified schedule while appropriately valuing impressions and optimally shading bids. They also use models trained on billions of ad impressions, so model training needs to be as efficient as possible. Since we maintain some of the lowest margins in the industry, simply throwing more machines at the problem to achieve scale is not an acceptable solution. As a result, the first thing anyone at Xandr asks when discussing a new algorithm is, “Does it scale?” Because if it does not, we cannot use it.

规模对于我们在Xandr的数字广告产品特别重要。 我们每天都在实时平台上进行超过一千亿次的在线广告拍卖。 Xandr Invest出价引擎允许我们的买方客户参与这些拍卖,这通常需要使用各种自动化优化算法对出价进行实时计算。 这些算法利用机器学习,控制理论,最优化理论和贝叶斯统计信息来确保广告系列在指定的时间表上花费其预算,同时适当地评估展示次数和优化底价。 他们还使用对数十亿次广告展示进行培训的模型,因此模型培训需要尽可能高效。 由于我们保持着行业中最低的利润率,因此简单地将更多的机器投入生产以实现规模化是不可接受的解决方案。 结果,Xandr的任何人在讨论新算法时首先要问的是:“它可以扩展吗?” 因为如果不这样做,我们将无法使用它。

One of the primary engagement metrics used by digital advertisers is user clicks, so accurate real-time prediction of the click-through-rate (CTR) is often needed to determine when and how much to bid. One component of Xandr’s click prediction algorithm is a logistic regression model with an L1 penalty. It models the relationship between the probability of a click and several categorical predictors related to the online ad auction. There are many open source packages that train logistic regression models, but we couldn’t find one that scaled and satisfied our technical requirements. So, we wrote our own.

数字广告客户使用的主要参与度指标之一是用户点击次数 ,因此通常需要准确的实时点击率(CTR)预测才能确定何时以及要多少出价。 Xandr点击预测算法的一个组成部分是具有L1惩罚的逻辑回归模型。 它对点击概率和与在线广告拍卖相关的几个分类预测变量之间的关系进行建模。 有许多开源软件包可以训练逻辑回归模型,但是我们找不到能够满足我们技术要求的扩展软件包。 因此,我们编写了自己的。

产品要求和挑战 (Product Requirements and Challenges)

At Xandr, the combination of our industry, scale, and business model created many unique technical requirements that our logistic regression solver needed to satisfy. Some of these issues are detailed below.

在Xandr,我们的行业,规模和业务模型的结合创造了许多独特的技术要求,这是我们的逻辑回归求解器需要满足的。 下面将详细介绍其中一些问题。

可扩展性 (Scalability)

We respect each advertiser’s data privacy. To enforce this, we create one model per Ad Campaign, each trained using its own historical data. We also perform cross validation to search for the best hyperparameter on a dynamically generated grid of regularization parameters. This entails iteratively training and evaluating hundreds of models for each ad campaign to obtain the best one to be used for bidding. In addition, concept drift and ever-changing market dynamics make frequent model updates preferable (currently each ad campaign gets a new model approximately every three hours). We also need to scale the training process up to accommodate tens of thousands of ad campaigns. As a result, when choosing a training algorithm, computational efficiency is a primary consideration.

我们尊重每个广告客户的数据隐私。 为此,我们为每个广告系列创建一个模型,每个模型都使用自己的历史数据进行了训练。 我们还执行交叉验证,以在动态生成的正则化参数网格上搜索最佳超参数。 这就需要针对每个广告系列进行迭代训练和评估数百种模型,以获得用于竞标的最佳模型。 此外,概念漂移和不断变化的市场动态使频繁的模型更新成为可取的(当前,每个广告系列大约每三个小时就会获得一个新的模型)。 我们还需要扩大培训过程,以容纳成千上万的广告系列。 结果,当选择训练算法时,计算效率是主要考虑因素。

稀疏数据 (Sparse Data)

The majority of our model training predictors are high-dimensional categorical type, that have undergone common feature engineering techniques such as one hot encoding and hashing. As a result, we end up with extremely large but very sparse design matrices. Our training algorithm needs to take advantage of this sparsity and handle matrix operations efficiently.

我们大多数的模型训练预测器都是高维分类类型,它们经历了常见的特征工程技术,例如一种热编码和散列。 结果,我们得到了非常大但非常稀疏的设计矩阵。 我们的训练算法需要利用这种稀疏性并有效地处理矩阵运算。

稀疏模型 (Sparse Models)

Model scoring happens in real-time and each ad campaign uses its own model, so our bidder needs to store and score a large number of models efficiently. As a result, we need sparse models, but ones that also do not compromise prediction accuracy. Sparse models are also easier to debug and interpret. To accommodate this, we use L1 regularization. However, L1 penalty is non-differential, so many existing regression packages do not cleanly support the L1 penalty.

模型评分是实时进行的,每个广告系列都使用自己的模型,因此我们的竞标者需要有效地存储和评分大量模型。 结果,我们需要稀疏的模型,但也不能影响预测准确性。 稀疏模型也更易于调试和解释。 为了适应这一点,我们使用L1正则化。 但是,L1罚分是非微分的,因此许多现有的回归软件包都不完全支持L1罚分。

坚固性 (Robustness)

Our training data has highly skewed feature distributions and extremely imbalanced target classes. During the research phase, we observed divergence issues in model training. As a result, we needed a training algorithm where most convergence issues could be detected and corrected.

我们的训练数据具有高度偏斜的特征分布和极不平衡的目标类别。 在研究阶段,我们在模型训练中观察到了分歧问题。 结果,我们需要一种训练算法,在该算法中可以检测和纠正大多数收敛问题。

兼容性和可维护性 (Compatibility and Maintainability)

Our model training pipeline must integrate into complex existing engineering systems that are written in Java. Therefore, the solution should be in the same language environment, have few dependencies, and not require a bunch of ‘glue code’ to integrate into our stack. As a result, many existing general purpose packages did not meet our needs.

我们的模型训练管道必须集成到用Java编写的复杂的现有工程系统中。 因此,解决方案应该在相同的语言环境中,具有很少的依赖性,并且不需要一堆“胶水代码”来集成到我们的堆栈中。 结果,许多现有的通用软件包无法满足我们的需求。

客制化 (Customization)

Each ad campaign has different business strategies and can therefore have very different distributions of data. Freedom to modify our training algorithm is a big advantage. For example, we dynamically generate different regularization penalty factors for different features. This gives us significant ad campaign performance gains. The majority of the off-the-shelf packages lack that option.

每个广告系列具有不同的业务策略,因此可以具有非常不同的数据分布。 自由修改我们的训练算法是一个很大的优势。 例如,我们针对不同功能动态生成不同的正则化惩罚因子。 这使我们大大提高了广告系列的效果。 大多数现成的软件包都缺少该选项。

介绍ElasticNet4j (Introducing ElasticNet4j)

When we started building an engineering solution to generate logistic regression models for predicting CTR, we experimented with different open source machine learning packages. We could not find one that satisfied our business and computational performance requirements. We also tried popular distributed frameworks that came with their own Machine Learning suite, but we could not meet scalable performance requirements with these tools.

当我们开始构建工程解决方案以生成用于预测CTR的逻辑回归模型时,我们尝试了不同的开源机器学习包。 我们找不到满足我们的业务和计算性能要求的产品。 我们还尝试了其自己的机器学习套件随附的流行分布式框架,但使用这些工具无法满足可扩展的性能要求。

After surveying different training algorithms and experimenting with them, we found that the Path-wise Coordinate Descent algorithm by Friedman, Hastie and Tibshirani was a good fit for our specific problem. The coordinate decent aspect makes it well suited for L1 regularization, scalable with wide datasets, and makes convergence issues easy to detect. Moreover, it is a second order method, but it does not require expensive and potentially unstable Hessian inversions or first order approximations of the inverse Hessian (which can lead to slow convergence when the objective function has low curvature). The algorithm can handle large datasets, takes advantage of feature sparsity and calculates model weights along the full regularization path. When solving for the full regularization path, the algorithm uses current parameter estimates as a warm start for the next parameter, which provides significant speed gains compared to other variations of Gradient Descent or quasi-Newton methods.

在调查了不同的训练算法并进行了试验之后,我们发现Friedman,Hastie和Tibshirani提出的“ 路径坐标下降”算法非常适合我们的特定问题。 坐标体面的方面使其非常适合L1正则化,可使用宽数据集进行扩展,并使收敛问题易于检测。 此外,它是一种二阶方法,但不需要昂贵的且可能不稳定的Hessian反演或反Hessian的一阶近似(当目标函数的曲率较低时,这可能会导致收敛缓慢)。 该算法可以处理大型数据集,利用特征稀疏性并沿着整个正则化路径计算模型权重。 在求解完整的正则化路径时,该算法将当前参数估计值用作下一个参数的热启动,与梯度下降或拟牛顿方法的其他变体相比,可提供显着的速度增益。

Based on these considerations we built our in-house Machine Learning library ElasticNet4j that both fits our business needs while achieving desired computational performance. The library implements the GLMNET logistic regression algorithm in Java to operate on sparsely featured data. When training click prediction models, each ad campaign's dataset is fit into the memory of a single machine for data-locality and the algorithm operates on the dataset on a single core for cache coherence to train models quickly. The implementation has optimizations to minimize the number of calculations and caches results for better performance.

基于这些考虑,我们构建了内部机器学习库ElasticNet4j ,既满足我们的业务需求,又获得了所需的计算性能。 该库使用Java实现GLMNET逻辑回归算法,以处理稀疏特征数据。 在训练点击预测模型时,每个广告系列的数据集都适合一台机器的内存以实现数据局部性,并且该算法在单个核上的数据集上运行以实现缓存一致性,从而快速训练模型。 该实现进行了优化,以最大程度地减少计算次数,并缓存结果以提高性能。

Our LR Library ElasticNet4j was built to be simple, efficient and performant. Our engineering solution uses multiple instances of the library simultaneously in a multi-threaded environment on a single bare-metal machine or, more recently, Docker containers on Kubernetes to generate CTR models for thousands of ad campaigns every hour. This library makes it easy to incorporate new optimizations into the current training algorithm or create new trainers that plug in to the library seamlessly.

我们的LR库ElasticNet4j被构建为简单,高效和高性能。 我们的工程解决方案在一台裸机或多台最近在Kubernetes上的Docker容器中的多线程环境中同时使用该库的多个实例,以每小时生成成千上万个广告系列的CTR模型。 通过该库,可以轻松地将新的优化方法并入当前的训练算法中,或者创建可以无缝插入库中的新训练器。

GitHub: https://github.com/appnexus/ElasticNet4j

GitHub: https : //github.com/appnexus/ElasticNet4j

Joint work by Tian Yu, Yana Volkovich, Noah Stebbins, Lei Hu, Abraham Greenstein, Moussa Taifi, Ph.D. & Chinmay Nerurkar

Tian Yu, Yana Volkovich, Noah Stebbins, Lei HuAbraham Greenstein, Moussa Taifi,Ph.D。Chinmay Nerurkar

翻译自: https://medium.com/xandr-tech/elasticnet4j-a-performant-elastic-net-logistic-regression-solver-in-java-782831f1879e

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值