凸优化机器学习深度学习_通过优化机器学习提高洞察率

最新推荐文章于 2023-07-07 19:33:29 发布

weixin_26720753

最新推荐文章于 2023-07-07 19:33:29 发布

阅读量773

点赞数

文章标签：机器学习人工智能深度学习 python tensorflow

原文链接：https://medium.com/capital-one-tech/boosting-insight-rates-with-optimization-machine-learning-c6da434f42c

版权

凸优化机器学习深度学习

In the machine learning era, it’s tempting to apply complex algorithms to every user experience to maximize returns. ML is a powerful tool with wide applicability in optimization sciences. You’re waiting for the ‘but’ and I won’t disappoint. But, there is a framework that can help optimize both your outcomes and the effort applied to extract them, focusing your machine learning efforts on the most valuable problems.

在机器学习时代，吸引人的是将复杂的算法应用于每种用户体验以最大化回报。 ML是功能强大的工具，在优化科学中具有广泛的适用性。您正在等待“但是”，我不会失望的。但是，有一个框架可以帮助优化您的结果和提取结果所花费的精力，从而使您的机器学习工作重点放在最有价值的问题上。

As an experimentation product leader, I’ve run hundreds of experiments and machine learning-optimized campaigns over the last 15 years and am relentlessly driven to find ways of maximizing return on every optimization dollar. With the explosion of e-commerce businesses over the last decade, the arms race to the most sophisticated machine learning system to achieve advantage is a logical one, but have you considered the time and expertise to build and manage it? How do you decide where and when to deploy machine learning?

作为实验产品的负责人，在过去的15年中，我进行了数百次实验和针对机器学习进行了优化的活动，并且不懈地努力寻找使每一美元优化收益最大化的方法。在过去十年中，随着电子商务业务的爆炸式增长，向最先进的机器学习系统进行军备竞赛以实现优势是合乎逻辑的，但是您是否考虑了构建和管理它的时间和专业知识？您如何决定何时何地部署机器学习？

Enter the Levels of Intelligence Framework. It’s a lightweight process I use in my work to help test and learn more efficiently by matching the right method to each optimization project. In this post I’ll be covering experiment prioritization, an inventory of your testing methods and how to select the right mix of tools to achieve your business outcomes.

输入情报框架级别。我在工作中使用了一个轻量级的过程，通过将正确的方法与每个优化项目相匹配来帮助测试和更有效地学习。在这篇文章中，我将介绍实验优先级，测试方法的清单以及如何选择合适的工具组合来实现业务成果。

In 10 minutes you’ll have the tools to:

在10分钟内，您将拥有以下工具：

Democratize experiment prioritization, ensuring a balance of diverse ideas and valuable business opportunities
使实验优先级民主化，确保平衡各种想法和宝贵的商机
A simple, extensible equation to efficiently tackle complex experimentation designs
一个简单，可扩展的方程式，可有效解决复杂的实验设计
A framework to match your hypotheses with the right experimental method for the job
一个使假设与正确的实验方法相匹配的框架

设置问题的简单示例(A Simple Example to Set up the Problem)

Imagine for a moment that you are building a landing page to direct your email recipients to during your big 4th quarter new product campaign. You know the window of time is small (30 days), the audience finite, and that your ability to react quickly to market signals is critical. Would you build a custom model to understand behavior, apply past segment analysis to split tests, or write rules to govern delivery of this experience?

想象一下，您正在构建一个登陆页面，以便在第四季度的新产品大推广活动中将电子邮件收件人定向到。您知道时间窗口很小(30天)，受众有限，而且您对市场信号快速React的能力至关重要。您是否将构建自定义模型来了解行为，将过去的细分分析应用于拆分测试，或者编写规则来管理这种体验的提供？

screenshot of example landing page with stock art header photo, blank squares indicating image placeholders, and black text. — Anatomy of a landing page

要构建您的框架，首先要清楚了解范围(To Build Your Framework, First Clearly Understand the Scope)

If this landing page is well-trafficked, converts users effectively, and contains rich user data, we may use a different approach than one that is more uncertain, less valuable, and potentially has a longer timeframe to reach its potential. Ask yourself the following questions to begin building your own levels of intelligence framework:

如果此目标网页的交易量高，有效地转换了用户并包含了丰富的用户数据，那么我们可能会使用一种不确定性更高，价值更低，可能需要更长时间才能实现其潜力的方法。问自己以下问题，以开始构建自己的智能框架级别：

我对正在优化的体验已经了解了什么？ (What do I already know about the experience I’m optimizing?)

Do I have historical patterns with rich segment-based information?
我是否具有包含丰富的基于细分的信息的历史模式？
Is this similar to past experiences we’ve delivered?
这与我们过去提供的经验类似吗？
Do I have learnings from past A/B tests?
我从过去的A / B测试中学到什么吗？

我需要多少时间来发展这种经验？ (How much time do I have to develop this experience?)

Am I timeboxed by an offer date?
我是否按要约日期进行时间打箱？
Are market conditions changing rapidly?
市场状况是否在Swift变化？
Is this a response to viral marketing or a crisis?
这是对病毒式营销还是危机的回应？

在这个领域的胜利有多有价值？ (How valuable is a win in this space?)

Is it a significant portion of this quarter’s revenue?
它占本季度收入的很大一部分吗？
Do I have other experiences of equal or greater value?
我还有其他同等或更大价值的经验吗？
What are the consequences of poor performance?
表现不佳会有什么后果？

我可以使用哪些功能进行优化？ (What capabilities do I have at my disposal for optimization?)

Are we able to segment audiences and run an A/B test?
我们是否可以细分受众群体并进行A / B测试？
Can I deliver rules-based personalization based on past insights?
我可以根据过去的见识进行基于规则的个性化设置吗？
Do I have a content recommender or bandit-optimization system?
我有内容推荐器或土匪优化系统吗？

For the optimizers in the audience, you’re probably thinking this looks a lot like the PIE framework and you’d be right. PIE stands for Potential, Importance, and Effort and it forms the base layer for intelligent optimizations. With PIE, we are able to determine which experiments are worth running and rank order them through the scoring system. By adding in our available intelligence capabilities, we can match up the right effort per experience.

对于受众中的优化人员，您可能会认为这看起来很像PIE框架，并且您是对的。 PIE代表潜力，重要性和努力，它构成了智能优化的基础层。使用PIE，我们可以确定哪些实验值得运行，并通过计分系统对它们进行排名。通过增加我们可用的情报功能，我们可以根据经验匹配正确的工作。

下一步实现计分潜力和重要性 (Next Implement Scoring Potential and Importance)

Every organization uses different methods for scoring potential and importance, but the key here is that whether you use 1–10, 1–5, smiley face emojis or High, Medium, Low, make your voting democratic. Great optimization teams consist of engineers, product managers, designers and, potentially, data analysts/scientists. Make data available to all parties and vote independently. Tally those scores and find your weighted averages.

每个组织都使用不同的方法来评估潜力和重要性，但是关键是要使用1–10、1–5，笑脸表情符号还是“高”，“中”，“低”，使投票变得民主。出色的优化团队由工程师，产品经理，设计师以及潜在的数据分析师/科学家组成。将数据提供给所有各方并独立投票。计算这些分数并找到您的加权平均值。

scoring table with blue gridlines and blue header row with white text — *The scoring in this example is for illustration purposes only* *本示例中的评分仅用于说明目的*

In this example, you can see that each test idea has two core rating outcomes. First, you can see how each team rates the hypothesis across PIE on the X axis and on the Y you can see the rating for each category.

在此示例中，您可以看到每个测试构想都有两个核心评分结果。首先，您可以在X轴上看到每个团队如何对PIE的假设进行评分，在Y轴上可以看到每个类别的评分。

This works especially well when you have many ideas and locations you’d like to test, but what about that time-boxed landing page of ours from the example? In that case we know we’re going to optimize it, but are not sure how to use our available time window effectively.

当您有很多想法和位置要测试时，这种方法特别有效，但是从示例中我们看到的那个有时间限制的登录页面呢？在那种情况下，我们知道我们将对其进行优化，但是不确定如何有效地使用可用的时间窗口。

Let’s go back to our landing page anatomy and apply our initial scoring model against those areas.

让我们回到目标网页剖析，并针对这些区域应用我们的初始评分模型。

Now that we have broken down the experience into separate priorities, we can look at the levels of intelligence through the lens of effort vs. reward. If you are already using a sophisticated auto ML capability that takes in hundreds of features and context, congratulations, you may be able to solve most problems with your current pipeline. For everyone else, we will need to review capabilities against cost and potential gains.

现在，我们已将经验分解为不同的优先事项，我们可以通过努力与奖励的视角来考察智力水平。如果您已经在使用具有数百种功能和上下文的先进自动ML功能，那么恭喜您，您也许能够解决当前管道中的大多数问题。对于其他所有人，我们将需要根据成本和潜在收益来评估功能。

Below is a generic inventory of testing methods/intelligence services a team might use to determine the right application of effort and risk against an experience.

以下是团队可能使用的测试方法/情报服务的一般清单，以确定正确的工作量和风险。

screenshot of slide showing 3 levels of intelligence with black and blue text — Levels of Intelligence framework

As you can see in the chart above, we are not making discrete claims of value for each level of intelligence. Oftentimes a rules-based optimization based on past behavior could extract 95% of the value of a sophisticated deep learning algorithm at only a fraction of the cost. It can be tempting to overpower your objectives with technology, but experience testing multiple methods will help you learn which is appropriate for the situation and size of outcome.

如您在上表中所见，我们并未针对每个智能水平提出离散的价值主张。通常，基于过去行为的基于规则的优化可以仅花费一小部分成本就能提取复杂的深度学习算法的95％的价值。用技术来超越目标可能很诱人，但是测试多种方法的经验将帮助您了解哪种方法适合情况和结果的大小。

行动框架的一般示例 (A General Example of the Framework in Action)

In the general example to follow, we’ll use generic data, acquisition rates and importance scores to understand how to weigh our choices around where and when to use each level of intelligence.

在下面的一般示例中，我们将使用通用数据，获取率和重要性得分来了解如何权衡在何处以及何时使用各个级别的情报。

In this hypothetical example, we’ve determined that our landing page URL will be sent to 2 million of our finest customers. We’ve already segmented our total customer set into the top two deciles of likelihood to respond and that gives us a round number of 2 million emails. Let’s assume we do not A/B test or further optimize the email itself in this example, although you better believe we would in the real world.

在这个假设的示例中，我们已确定目标网页网址将发送给200万名最优秀的客户。我们已经将客户群划分为响应可能性的前两个十分之一，这使我们获得了大约200万封电子邮件。让我们假设在此示例中我们不进行A / B测试或进一步优化电子邮件本身，尽管您最好相信我们会在现实世界中这样做。

Let’s say our historical estimates assume a .5–1% CTR for the email to the landing page. That gives us 10,000–20,000 visitors likely to land on that page. Additionally, let’s say the product we’re selling is worth approximately $100 and has a gross margin of 40%. That means $40 in incremental revenue for every conversion. If we believe we’re amazing and can convert 10% based on similar campaigns, that gives us a net range of $40,000–80,000 for our potential outcome. We may, given our methods, be able to push that percentage a few points up or down.

假设我们的历史估算假设到达目标网页的电子邮件的点击率是0.5-1％。这使我们有10,000–20,000个访问者可能登陆该页面。另外，假设我们要销售的产品价值约100美元，毛利率为40％。这意味着每次转化可增加40美元的收入。如果我们认为自己很了不起，并且可以根据类似的广告系列转化10％的收益，那么我们获得的净收入范围为40,000-80,000美元。根据我们的方法，我们也许可以将该百分比提高或降低几个点。

This is where our levels of intelligence comes in. We’ve decided to run the campaign and we know a general range of its opportunity size. We have three zones on the page we can optimize. How do we determine methods for each zone, if at all?

这就是我们提高情报水平的地方。我们决定进行竞选，我们知道其机会大小的一般范围。我们可以优化页面上的三个区域。如果有的话，我们如何确定每个区域的方法？

英雄(50％) (The Hero (50%))

screenshot of example hero image on a landing page with blue callout circles direction to black text

Our lead-in space helps set the mood and drives customers down the page with a draw that could be emotional, conscious or aspirational tone. It’s often one of the most valuable methods for driving conversion. Since our net upside here is around $40k, how much effort and complexity should we apply? Knowing we have a relatively small audience and time window, does that change your decision?

我们的导入空间有助于设定情绪，并通过可能具有情感，意识或理想气息的绘画吸引客户进入页面。它通常是推动转化的最有价值的方法之一。由于我们这里的净增长约为4万美元，因此我们应该花费多少精力和复杂性？知道我们的听众和时间窗口相对较小，这会改变您的决定吗？

If I think that the hero is worth 50% of the total optimization value, I would think about using the right balance of intelligence to return. A multi-armed bandit from intelligence level 2 could be viable here if you have a long enough time window and your bandit can start updating traffic allocations within a week. 3–5 highly diverse creatives could be used to drive to a single winning creative for your pre-compiled segment. Somewhat expensive, complex, but reasonable if your systems allow.

如果我认为英雄值得总优化价值的50％，我会考虑使用适当的智力平衡来回报。如果您有足够长的时间窗口，并且您的匪徒可以在一周内开始更新流量分配，则情报级别为2的多臂匪徒在这里可能是可行的。可以使用3–5个高度多样化的广告素材为您的预编译细分受众群生成单个获胜的广告素材。如果您的系统允许，则有些昂贵，复杂但合理。

If you have automated systems for recommendations in place and deep enough knowledge about your prospects (you targeted them directly with an email), try for level 3 intelligence and track your performance lift above other methods. Only through experience will you develop the right patterns for your unique user experiences and customers.

如果您拥有自动的建议系统，并且对潜在客户有足够的了解(您直接通过电子邮件将其作为目标)，请尝试获得3级智能，并在其他方法之上跟踪业绩提升。只有通过经验，您才能为您独特的用户体验和客户开发正确的模式。

收益(15％) (The Benefits (15%))

screenshot of example image & paragraph placeholders on example landing page w/ blue callout circles directing to black text

The benefits section exists to confirm user intent to purchase, answer questions and highlight use cases that are desirable to your customer. Historically, you see this section having a 15% impact on net sales. We’re already running a MAB on our hero so what might we do here to avoid unneeded complexity, but still get value?

好处部分用于确认用户的购买意图，回答问题并突出显示您的客户所需的用例。从历史上看，您会看到此部分对净销售额的影响为15％。我们已经在英雄上运行了MAB，那么我们在这里可以做些什么来避免不必要的复杂性，但仍能获得价值？

One approach might be to do nothing. The complexity of a multivariate test analysis is high and the relative value of the zone is low. If your other 2 zones are well crafted, no intelligence can often be the right answer. Detecting a measurable effect here is likely going to be difficult with the given audience and level of impact. We could run a simple A/B test, but our time might be best spent against zones 1 and 3, where 85% of the impact lives.

一种方法可能是什么都不做。多元测试分析的复杂性很高，而区域的相对值却很低。如果您的其他两个区域都经过精心设计，那么没有情报通常是正确的答案。对于给定的受众和影响程度，在此处检测可测量的效果可能会很困难。我们可以进行简单的A / B测试，但最好将时间花在1区和3区，其中85％的冲击寿命得以维持。

行动号召(35％) (The Call to Action (35%))

screenshot of CTA banner on example landing page with blue callout circles directing to black text

Now we’re at the heart of things. This is where the visitor decides to buy or abandon (ignoring the shopping cart element). There is often a form, button, etc that leads the user through the signup or acquisition process and the role of colors, shape, design, visuals can either confirm the intent to purchase or lead to abandon.

现在，我们处于事物的核心。这是访客决定购买或放弃的地方(忽略购物车元素)。通常会有一种形式，按钮等引导用户完成注册或获取过程，而颜色，形状，设计，视觉效果的作用可以确认购买意图或导致放弃。

We often spend much time here, tailoring button colors, shapes, CTA language. If you determine that your CTA is valuable (35% impact in our example), then you may be looking at something in the level 2 intelligence range. A combination of known cohorts, rules or more sophisticated testing methods like multi-armed bandits are available. In our example the dollar value is relatively low and our time is short so I’d likely opt for rules-based optimization that uses past behavior to cohort and deliver experiences in a more targeted A/B test.

我们经常在这里花很多时间，定制按钮的颜色，形状，CTA语言。如果您确定CTA是有价值的(在我们的示例中为35％的影响)，那么您可能正在研究的是2级智能范围。可以使用已知队列，规则或更复杂的测试方法(例如多臂匪)的组合。在我们的示例中，美元价值相对较低，而我们的时间却很短，因此我可能会选择基于规则的优化，该优化使用过去的行为来组建并提供针对性更高的A / B测试经验。

In any case, you’re matching your opportunity size, cost to execute and available technologies against each experience to bring balance to the testing force.

无论如何，您都需要根据每种经验来匹配机会规模，执行成本和可用技术，以平衡测试人员的力量。

结论 (Conclusion)

Building your design of experiments, like most things in life, is all about preparation. This process is designed to help reduce waste so that you can run more and better tests in the future, maximizing your insight rate.

像生活中的大多数事物一样，建立实验设计就是准备工作。此过程旨在帮助减少浪费，以便您将来可以运行更多更好的测试，从而最大限度地提高洞察率。

Evaluate the overall priority of your hypothesis/testing area with a method like PIE — Ensure all parties can participate fairly in the rating system
使用PIE之类的方法评估假设/测试区域的总体优先级-确保各方都能公平地参与评分系统
Review the business performance of each area of your experience to determine where effort should be best spent for ROI
审查您所经历的每个领域的业务绩效，以确定应该在哪些方面最好地投入ROI
Apply the levels of intelligence framework against those identified opportunities to minimize the cost of testing, while balancing the value extracted from the experiment or optimization
针对发现的机会应用智能框架级别，以最大程度地降低测试成本，同时平衡从实验或优化中提取的价值

With this process in your toolkit, continue to review your testing velocity, win rates, cost per experiment and overall ROI of your testing program. There’s nothing better than having data to support your decisions, not just about what variation is winning, but how you got to the testing methods and hypotheses themselves.

在您的工具包中执行此过程后，请继续查看测试速度，获胜率，每次实验费用以及测试程序的总体投资回报率。没有什么比拥有数据来支持您的决策更好的了，不仅有什么在赢得成功，而且还有如何掌握测试方法和假设本身。

**Background vector created by GarryKillian — www.freepik.com

** GarryKillian创建的背景矢量— www.freepik.com

Originally published at https://www.capitalone.com.

最初在https://www.capitalone.com上发布。

DISCLOSURE STATEMENT: © 2020 Capital One. Opinions are those of the individual author. Unless noted otherwise in this post, Capital One is not affiliated with, nor endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are property of their respective owners.