glm/glm.hpp
Authors: Patrick Hall and Michael Proksch, Ph.D.
作者:Patrick Hall和Michael Proksch博士
TL; 博士 (TL; DR)
- This post gives a technical overview of transitioning from trusted generalized linear models (GLMs) to newer gradient boosting machines (GBMs) while actually considering known risks, compliance requirements, and business impact. 这篇文章提供了从可信的广义线性模型(GLM)到更新的梯度提升机(GBM)的过渡的技术概述,同时实际考虑了已知的风险,合规性要求和业务影响。
It comes with code.
它带有代码 。
Check out Michael Proksch’s Part 1 post and the H2O.ai post.
介绍 (Introduction)
In Part 1, we proposed better revenue and managing regulatory requirements with machine learning (ML). We made the first part of the argument by showing how gradient boosting machines (GBMs), a type of ML, can match exactly, then exceed, both the technical merits and the business value of popular generalized linear models (GLMs) using a straightforward insurance example.
在第1部分中 ,我们提出了使用机器学习(ML)更好的收入和管理法规要求的方法。 我们通过使用一种简单的保险方法展示了梯度提升机(GBM)(一种类型的ML)如何精确匹配然后超越了流行的广义线性模型(GLM)的技术优点和商业价值,从而提出了论点的第一部分。例。
Part 2 of this blog uses a more realistic and detailed credit card default scenario to show how monotonicity constraints, Shapley values and other post-hoc explanations, and discrimination testing can enable practitioners to create direct comparisons between GLM and GBM models. Such comparisons can then enable practitioners to build from GLM to more complex GBM models in a step-by-step manner, while retaining model transparency and the ability to test for discrimination. In our credit use case, we show that a GBM can lead to better accuracy, more revenue, and that the GBM is also likely to fulfill model documentation, adverse action notice, and discrimination testing requirements.
本博客的第2部分使用更现实,更详细的信用卡默认方案来说明单调性约束,Shapley值和其他事后解释以及歧视性测试如何使从业人员能够在GLM模型和GBM模型之间进行直接比较。 这样的比较可以使从业人员以逐步的方式从GLM建立到更复杂的GBM模型,同时保持模型的透明性和测试歧视的能力。 在我们的信用用例中,我们证明了GBM可以提高准确性,增加收入,并且GBM还可能满足模型文档,不良行为通知和歧视测试的要求。
Some bankers have recently voiced skepticism about artificial intelligence (AI) in lending — and rightly so. (See: https://www.americanbanker.com/opinion/ai-models-could-struggle-to-handle-the-market-downturn or https://www.americanbanker.com/opinion/dont-let-ai-trigger-a-fair-lending-violation.) To be clear, we’re not advocating for brain-dead AI hype. We hope to present a judicious, testable and step-by-step method to transition from GLM to GBM in Part 2 of this post. Perhaps obviously, we feel that GBMs can model the real-world and credit risk better than GLMs. We also think ML can be done while preserving extremely high-levels of transparency and while keeping algorithmic discrimination at bay.
一些银行家最近对借贷中的人工智能表示怀疑,这是正确的。 (请参阅: https : //www.americanbanker.com/opinion/ai-models-could-struggle-to-handle-the-market-downturn或https://www.americanbanker.com/opinion/dont-let-ai -触发公平的贷款违规行为 。)要明确一点,我们并不主张进行脑部死亡的AI宣传。 我们希望在本文的第二部分中提出一种明智,可测试且循序渐进的方法,以从GLM过渡到GBM。 也许显然,我们认为GBM可以比GLM更好地模拟现实世界和信用风险。 我们还认为ML可以在保持极高透明度的同时,避免算法歧视的情况下完成。
Part 1 of this article has already shown that GBMs might not only be more accurate predictors, but when combined with Shapley values, they can also be more accurate for explanations and attributions of causal impact to predictors. To build off Part 1, we now want to showcase the potential transparency and business benefits presented by a transition from GLM to GBM in lending. For full transparency (and hopefully reproducibility) we use the UCI credit card data, which is freely available from the UCI machine learning dataset repository, and open source h2o-3 code. The credit card dataset contains information about 30,000 credit card customer’s demographic characteristics and payment and billing information. The dependent variable to predict is payment delinquency. Our use case will compare a GLM to several different GBMs from a technical and business perspective.
本文的第1部分已经显示,GBM不仅可能是更准确的预测因素,而且与Shapley值结合使用时,它们也可以更准确地解释因果关系和对预测因素的影响。 为了构建第1部分的基础,我们现在要展示从GLM过渡到GBM的贷款所带来的潜在透明度和业务收益。 为了获得完全的透明性(并希望实现可重复性),我们使用UCI信用卡数据 (可从UCI机器学习数据集存储库免费获得)和开源h2o-3代码 。 信用卡数据集包含有关30,000个信用卡客户的人口统计特征的信息以及付款和计费信息。 要预测的因变量是付款拖欠。 我们的用例将从技术和业务角度将GLM与几种不同的GBM进行比较。
机器学习可以推动业务影响 (Machine Learning Can Drive Business Impact)
In order to show the business value of GBMs compared to GLMs, we trained several different algorithms (GLM, Monotonic GBM (MGBM), GBM, and a Hybrid GBM). Here’s a short summary of those models:
为了显示GBM与GLM相比的商业价值,我们训练了几种不同的算法(GLM,单调GBM(MGBM),GBM和混合GBM)。 以下是这些模型的简短摘要:
GLM: Elastic net penalized logistic GLM.
GLM:弹性净罚逻辑GLM。
MGBM: 46 trees, max. depth 3, all variables monotonically constrained.
MGBM:最多46棵树。 深度3,所有变量都单调约束。
GBM: 44 trees, max. depth 5, no monotonic constraints.
GBM:最多44棵树。 深度5,没有单调约束。
Hybrid: 50 trees, max. depth 14, learned monotonic constraints for some variables.
混合:最多50棵树。 深度14,了解一些变量的单调约束。
As you’ll see below, there’s a large difference in projected revenue across these models in our tests. Why? The models judge risk differently. A GLM is not able to reflect non-linear relationships and interactions between variables without manual adjustments. GBMs do inherently model non-linear relationships and interactions between variables. In the past, GLMs were considered highly transparent, and as such, preferred for business purposes. GBMs were considered to be black-boxes in the past. But using monotonicity constraints can give GBMs much of the same inherent interpretability as GLMs, and without sacrificing accuracy. Table 1 provides an overview of the different test models and their capabilities.
正如您将在下面看到的,在我们的测试中,这些模型之间的预计收入有很大差异。 为什么? 这些模型对风险的判断不同。 如果没有手动调整,GLM将无法反映变量之间的非线性关系和相互作用。 GBM确实对变量之间的非线性关系和相互作用建模。 过去,GLM被认为是高度透明的,因此,出于商业目的而被首选。 GBM过去被视为黑匣子。 但是,使用单调性约束可以使GBM具有与GLM相同的内在可解释性,并且不会牺牲准确性。 表1概述了不同的测试模型及其功能。
All models were selected by grid search and evaluated using validation AUC. The outcome can be seen in Table 2. The GLM model has the lowest AUC score with 0.73, while the best GBMs reach an AUC of 0.79. Great … but what does that mean for a business?
通过网格搜索选择所有模型,并使用验证AUC进行评估。 结果见表2。GLM模型的AUC得分最低,为0.73,而最佳的GBM的AUC得分为0.79。 太好了……但这对企业意味着什么?
To assess the business impact of each model, we make a few basic assumptions as summarized in Table 3. A credit card customer, which we accurately classify as a delinquent customer, is neutral to costs. An accurately classified customer that is not delinquent will bring $NT 20,000 of lifetime value, classifying a customer incorrectly as delinquent will cost $NT 20,000 in lifetime value, and incorrectly extending credit to a delinquent customer will lead to write-offs of $NT 100,000.
为了评估每种模型的业务影响,我们做出一些基本假设,如表3所示。我们将信用卡客户准确分类为拖欠客户,这对成本没有影响。 正确分类的未拖欠客户将带来20,000新台币的终身价值,错误地将客户分类为拖欠客户将花费20,000新台币的终身价值,并且错误地向未偿还客户提供信贷将导致100,000新台币的冲销。
Based on those assumptions, the outcome of the GLM model shows the lowest revenue of $NT 7.9M. The model with the highest impact is the Hybrid GBM, with a business value of $NT 19.22M, almost 2.5 times of the GLM model!
根据这些假设,GLM模型的结果显示最低收入为790万新台币。 影响最大的模型是Hybrid GBM,商业价值为新台币1,922万,几乎是GLM模型的2.5倍!
How is this possible? The Hybrid GBM model is much better at avoiding those costly false negative predictions. This simple example shows that the transition from GLM to GBM models can significantly increase business value and reduce risk. Now, let’s have a look at how to compare GLMs to GBMs from a technical perspective, so you can decide how and when to transition from GLMs to GBMs.
这怎么可能? 混合GBM模型在避免那些代价高昂的假阴性预测方面要好得多。 这个简单的示例表明,从GLM模型过渡到GBM模型可以显着提高业务价值并降低风险。 现在,让我们从技术角度看一下如何将GLM与GBM进行比较,以便您可以决定如何以及何时从GLM过渡到GBM。
更好的模型,相同的透明度 (Better Model, Same Transparency)
It is true that GLMs have exceptional interpretability. They are so interpretable because of their additive monotonic form and their low-degree of interacting variables. Believe it or not, GBMs can also have extremely high interpretability. By the judicious application of monotonicity and interaction constraints, GBM users can now bring domain knowledge to bear on a modeling task, increase interpretability, avoid overfitting to development data, and make judicious decisions about when to use what kind of model. (E.g.: https://xgboost.readthedocs.io/en/latest/tutorials/monotonic.html, https://xgboost.readthedocs.io/en/latest/tutorials/feature_interaction_constraint.html.)
GLM确实具有出色的解释能力。 由于它们的加性单调形式和低程度的交互变量,因此它们可以如此解释。 信不信由你,GBM还可以具有极高的可解释性。 通过明智地应用单调性和交互性约束,GBM用户现在可以将领域知识带到建模任务上,提高可解释性,避免过度拟合开发数据,并对何时使用哪种模型做出明智的决策。 (例如: https ://xgboost.readthedocs.io/en/latest/tutorials/monotonic.html,https: //xgboost.readthedocs.io/en/latest/tutorials/feature_interaction_constraint.html 。)
单调性约束 (Monotonicity Constraints)
Users can often specify the same monotonicity of variables found in GLMs in a GBM. In Figure 1 (right),the GLM-modeled behavior of the feature PAY_0, or a customer’s most recent repayment status, is monotonic increasing. As PAY_0 values become larger, probability of default also becomes larger. In Figure 1 (right), the MGBM models the same behavior using a positive monotonic constraint — just like the GLM! As PAY_0 increases under the GBM, probability of default also increases.
用户通常可以指定GBM中GLM中发现的变量的单调性。 在图1(右)中,要素PAY_0的GLM建模行为或客户最近的还款状态呈单调递增。 随着PAY_0值变大,违约的可能性也变大。 在图1(右)中,MGBM使用正单调约束对相同的行为进行建模-就像GLM一样! 随着GBM下PAY_0的增加,违约的可能性也会增加。
What’s different is the functional form of the GLM versus the functional form of the GBM. The GLM is restricted to a logistic curve in this case, whereas the GBM can take on an arbitrarily complex stair-step form that also obeys the user-supplied monotonicity constraint. Additionally, Figure 1 shows how the GBM and Hybrid models behave for PAY_0 without monotonicity constraints. Which looks more realistic to you? The histogram and mean behavior of PAY_0 in Figure 1 (left) can help you determine which model fits the data best. The goal is to match the red mean target line as weighted by the histogram. The good news is, no matter which functional form you prefer, you can now use it in an interpretable way that reflects business domain knowledge.
GLM的功能形式与GBM的功能形式不同。 在这种情况下,GLM被限制为逻辑曲线,而GBM可以采用任意复杂的阶梯形式,该形式也服从用户提供的单调性约束。 另外,图1显示了GBM和Hybrid模型在没有单调约束的情况下对PAY_0的表现。 您觉得哪个更现实? 图1(左)中PAY_0的直方图和平均行为可以帮助您确定哪种模型最适合数据。 目标是匹配直方图加权的红色平均目标线。 好消息是,无论您喜欢哪种功能形式,现在都可以以可解释的方式使用它来反映业务领域知识。
事后解释 (Post-hoc Explanation)
How are we able to see the way that the GBM modeled PAY_0? Through what’s known as partial dependence and individual conditional expectation (ICE) plots [1], [2]. In Figure 1 (right), partial dependence (red) displays the estimated average behavior of the model. The other ICE curves in the plot show how certain individuals behave under the model. By combining the two approaches, we get a chart that displays the overall and individual behavior of our model. Partial dependence and ICE are just one example of post-hoc explanation techniques, or processes we can run on a model after it’s trained to get a better understanding of how it works.
我们如何看待GBM为PAY_0建模的方式? 通过所谓的部分依赖和个体条件期望(ICE)图[1],[2]。 在图1(右)中,部分相关性(红色)显示了模型的估计平均行为。 图中的其他ICE曲线显示了某些个体在模型下的行为。 通过结合这两种方法,我们得到了一个图表,显示了模型的整体和个体行为。 部分依赖和ICE只是事后解释技术的一个示例,即我们可以在训练模型后更好地了解其工作原理之后在模型上运行的过程。
Shapley values, a Nobel-laureate technique from game theory, provide additional and crucial insights for the GBM models [3]. When applied to tree-based models like GBM, Shapley values are a highly accurate measurement of how variables contribute to a model’s predictions, both overall and for any individual customer. This is incredibly helpful for interpretability as it enables:
Shapley值是博弈论中的诺贝尔奖获得者,它为GBM模型提供了其他重要的见解[3]。 当将Shapley值应用于基于树的模型(例如GBM)时,它是对变量如何对模型的整体或任何单个客户的预测作出贡献的高度准确的度量。 这对解释性非常有用,因为它可以:
- Analysis of overall business drivers of model predictions (Figure 2). 分析模型预测的总体业务驱动因素(图2)。
- Descriptions of predictions for individual customers, and potentially generation of adverse action notices (Figure 3). 对单个客户的预测的描述,以及潜在的不利行动通知的生成(图3)。
- Comparison of GLM coefficients and contributions to GBM Shapley values (Figures 2 and 3). 比较GLM系数和对GBM Shapley值的贡献(图2和3)。
Just like Figure 1 shows how the treatment of PAY_0 changes from a GLM to different GBMs, a direct comparison between the overall (Figure 2) and per-customer (Figure 3) variable contributions for GLM and GBM is now possible. In Figure 2, we can see how each model treats variables from an overall perspective, and compare simple models (e.g., Pearson correlation, GLM) to the more complex models. In Figure 3, we can see how each model arrived at its prediction for three individual customers. All this enables a direct comparison of GLM and GBM treatment of variables, so you can both adequately document GBMs and make decisions about the transition to GBM with confidence! Moreover, the per-customer information displayed in Figure 3 could also provide the raw data needed for adverse action notices, a serious regulatory consideration in credit lending.
就像图1显示了PAY_0的处理方式从GLM变为不同的GBM一样,现在可以直接比较GLM和GBM的总体(图2)和每个客户(图3)变量的贡献。 在图2中,我们可以从整体角度看每个模型如何处理变量,并将简单模型(例如Pearson相关性,GLM)与更复杂的模型进行比较。 在图3中,我们可以看到每种模型如何得出三个单独客户的预测。 所有这些都可以直接比较GLM和GBM对变量的处理方式,因此您既可以充分记录GBM,又可以放心地决定过渡到GBM! 此外,图3中显示的每个客户信息还可以提供不利行动通知所需的原始数据,这是信贷放贷中的一项严格的监管考虑。
Another crucial aspect of model diagnostics that must be conducted under several federal and local regulations in the US is discrimination testing. Of course, if certain types of discrimination are found, then they must be fixed too. The transparency of GLMs is often helpful in this context. However, the constraints and post-hoc explanation steps outlined above, make finding and fixing discrimination in GBM models much easier than it used to be. Moreover, the concept of the multiplicity of good models in ML — where a single dataset can generate many accurate candidate models — presents a number of options for fixing discrimination that were often not available for GLMs. (See: https://www.youtube.com/watch?v=rToFuhI6Nlw.)
在美国,必须根据多项联邦和地方法规进行模型诊断的另一个关键方面是歧视测试。 当然,如果发现某些类型的歧视,则也必须解决。 在这种情况下,GLM的透明度通常会有所帮助。 但是,上面概述的约束条件和事后解释步骤使GBM模型中的查找和解决方法比以前容易得多。 此外,机器学习中好的模型的多样性(单个数据集可以生成许多准确的候选模型)提出了许多解决方法,这些方法通常不能用于GLM来解决歧视问题。 (请参阅: https : //www.youtube.com/watch?v=rToFuhI6Nlw 。)
鉴别测试 (Discrimination Testing)
In our credit card example, the GBM is tested for discrimination using measures with long-standing legal and regulatory precedent: adverse impact ratio (AIR), marginal effect (ME), and standardized mean difference (SMD). Those results are available in Table 4. Luckily, between men and women, there is little evidence of discrimination for any of our models. However, if discrimination was found, GBMs may actually present more options for remediation than GLMs.
在我们的信用卡示例中,使用具有长期法律和监管先例的措施对GBM进行了歧视测试:不良影响率(AIR),边际效应(ME)和标准化均值(SMD)。 这些结果可在表4中获得。幸运的是,在男女之间,几乎没有证据表明我们的任何模型都存在歧视。 但是,如果发现歧视,GBM实际上可能比GLM提供更多的补救方案。
Basically, GBMs just have more knobs to turn than GLMs, leaving more wiggle room to find an accurate and non-discriminatory model. In addition to variable and hyperparameter selection, researchers also have put forward potentially compliant adversarial approaches for training non-discriminatory ML models [4], and GBMs now offer users monotonicity and interaction constraints that can help fix discriminatory model outcomes. Likewise, the post-hoc explanation techniques described above can also be used to understand drivers of algorithmic discrimination and to validate their removal from GBM models.
基本上,GBM比GLM具有更多的旋转旋钮,从而留出了更多的摆动空间来寻找准确且无歧视的模型。 除了变量和超参数选择之外,研究人员还提出了潜在的合规对抗方法,用于训练非歧视性ML模型[4],GBM现在为用户提供单调性和交互性约束,可以帮助解决歧视性模型结果。 同样,上述事后解释技术也可以用于了解算法歧视的驱动因素,并验证其从GBM模型中的移除。
机器学习与合规 (Machine Learning and Compliance)
Many government agencies have telegraphed likely future ML regulation, or outside of the US, started to implement such regulations. It’s important to note that US government watchdogs are not saying ML is forbidden. Generally speaking, they are saying make sure your ML is documented, explainable, managed, monitored, and minimally discriminatory. Arguably, the steps outlined in Part 2 provide a blueprint for explainability and discrimination testing with GBM, which should in turn help with aspects of model documentation. Moreover, most large financial institutions already have model governance and monitoring processes in place for their traditional predictive models. These could potentially be adapted to ML models.
许多政府机构已经通报未来可能的ML法规,或者在美国以外开始实施此类法规。 重要的是要注意,美国政府监管机构并未说禁止使用ML。 一般来说,他们说要确保您的ML已记录在案,可解释,可管理,受监控且具有最小的歧视性。 可以说,第2部分中概述的步骤为使用GBM进行可解释性和辨别力测试提供了一个蓝图,这反过来应有助于模型文档编制的各个方面。 而且,大多数大型金融机构已经为其传统的预测模型建立了模型治理和监视过程。 这些可能会适应ML模型。
Of course, it’s really not the place of non-regulators to opine on what is, and what is not, compliant with regulations. So, have a look for yourself to see what some US government agencies are thinking:
当然,非监管者实际上不是要对符合法规的问题进行讨论。 因此,请看看自己,看看一些美国政府机构在想什么:
Innovation spotlight: Providing adverse action notices when using AI/ML models
机器学习的动力 (Momentum for Machine Learning)
Outside of government, some financial services organizations are already claiming to use machine learning in regulated dealings and researchers are publishing on GBM and Shapley values for credit lending applications. For instance, in 2018 Equifax announced their Neurodecision system, “a patent-pending machine learning technology for regulatory-compliant, advanced neural network modeling in credit scoring.” Since 2018, Wells Fargo has also introduced several machine learning techniques for model validation, including LIME-SUP [5], explainable neural networks [6], and a number of additional model debugging methods.
在政府外部,一些金融服务组织已经声称要在受监管的交易中使用机器学习,研究人员正在发布GBM和Shapley值以用于信贷申请。 例如,在2018年,Equifax 宣布了其Neurodecision系统,“一种正在申请专利的机器学习技术,用于在信用评分中符合法规的高级神经网络建模。” 自2018年以来,Wells Fargo还引入了几种用于模型验证的机器学习技术,包括LIME-SUP [5],可解释的神经网络[6]以及许多其他模型调试方法 。
In 2019, Bracke et al. at The Bank of England published an explainable AI use case for credit risk featuring GBM and Shapley values [7]. Later the same year, Bussman et al. published a similar piece, introducing a GBM and Shapley value example in the journal Credit Risk Management [8]. In March of 2020, Gill et al. published a mortgage lending workflow based on monotonically constrained GBM, explainable neural networks, and Shapley values, that gave careful consideration to US adverse action notice and anti-discrimination requirements [9].
在2019年,Bracke等人。 英格兰银行(Bank of England)出版了一个具有GBM和Shapley值的可解释的AI信用风险用例[7]。 同年下半年,Bussman等人。 发表了类似的文章,在《 信用风险管理 》杂志[8]中介绍了GBM和Shapley值示例。 2020年3月,吉尔等人。 基于单调约束的GBM,可解释的神经网络和Shapley值发布了抵押贷款工作流程,其中仔细考虑了美国的不利行动通知和反歧视要求[9]。
结论 (Conclusion)
It now appears possible to take cautious steps away from trusted GLMs to more sophisticated GBMs. The use of constraints, post-hoc explanation, and discrimination testing enables you to compare GBMs to GLMs. These techniques may very well enable compliance with adverse action notice, discrimination, and documentation requirements too. And with a little luck, GBMs could lead to better financial outcomes for consumers, insurers, and lenders. As all this momentum and hype mounts for machine learning in regulated financial services, we hope that Parts 1 and 2 of this post will be helpful for those looking to responsibly transition from GLM to GBM.
现在似乎可以采取谨慎的步骤,从受信任的GLM到更复杂的GBM。 使用约束,事后解释和判别测试使您能够将GBM与GLM进行比较。 这些技术也很可能也符合不良行为通知,歧视和文件要求。 幸运的是,GBM可以为消费者,保险公司和贷方带来更好的财务结果。 由于所有这些在受控金融服务中用于机器学习的势头和炒作都在不断增加,因此我们希望这篇文章的第1部分和第2部分将对希望以负责任的方式从GLM过渡到GBM的人们有所帮助。
翻译自: https://towardsdatascience.com/from-glm-to-gbm-part-2-7045e3fd52a2
glm/glm.hpp