在客户门口犹豫不决_犹豫不决的客户以及在哪里找到他们可以减少犹豫不决的购物车放弃...

最新推荐文章于 2025-05-28 20:48:34 发布

weixin_26729763

最新推荐文章于 2025-05-28 20:48:34 发布

阅读量229

点赞数

文章标签： python java 算法人工智能机器学习

原文链接：https://blog.rosetta.ai/hesitant-customers-and-where-to-find-them-reduce-shopping-cart-abandonment-with-hesitant-279e41f7ef7e

版权

在客户门口犹豫不决

Shopping cart abandonment has long been an issue for E-Commerce. According to a survey, 75% of online shoppers tend to add things in their shopping cart and never check out the cart. To address this challenge, one of the solution is to identify hesitant customers and provide them deals or coupons that encourage them to checkout their cart. In this article, we focus on the data and the modeling perspectives of this challenge. We will discuss how this challenge can be formulated as Machine Learning problems, and talk about data collection and the modeling components.

长期以来，放弃购物车一直是电子商务的问题。根据一项调查，有75％的在线购物者倾向于在购物车中添加商品，而从未结帐。为了应对这一挑战，一种解决方案是识别犹豫的客户，并向他们提供鼓励他们结帐购物车的交易或优惠券。在本文中，我们重点关注此挑战的数据和建模角度。我们将讨论如何将此挑战表述为机器学习问题，并讨论数据收集和建模组件。

任务定义 (Task Definition)

What is hesitant customer? Under the scope of our problem, hesitant customers are those having a hard time deciding which items to buy or whether they should buy or not. When given correct incentives, they will be able finalize buying decision and willing to checkout their carts. To narrow down the problem for easier modeling, we define hesitant customers as:

什么是犹豫的客户？根据我们的问题的范围，犹豫不决的客户是那些有一个很难决定买哪些项目或他们是否应该购买与否。如果给予正确的激励，他们将能够最终确定购买决定并愿意结帐他们的购物车。为了缩小问题以简化建模，我们将犹豫不决的客户定义为：

Customers who hit a popped up deal AND eventually lead to transaction with the deal.

达成突然达成交易并最终导致交易的客户。

The bigger problem of reducing shopping cart abandonment rate can be boiled down to two tasks:

降低购物车放弃率的更大问题可以归结为两个任务：

Hesitant Behavior Identification
犹豫行为识别
Deals Matching
交易匹配

犹豫行为识别(Hesitant Behavior Identification)

The first task is to recognize hesitant behavior given a sequence of interactions between end users and browsers. These interactions include actions that typical recommender systems rely on, such as clicking on items, viewing items, and selecting items to cart. More advanced actions that are crucial for our purpose, like switching between tabs, should also be taken into account. The task is formalized as a binary classification, where each sequence of interactions are associated with a binary label that indicate whether the action sequence is considered hesitant.

第一个任务是在给定最终用户和浏览器之间的一系列交互作用的情况下识别犹豫的行为。这些交互包括典型推荐系统所依赖的操作，例如单击项目，查看项目以及选择要购物的项目。对于我们而言至关重要的更高级的操作，例如在选项卡之间切换，也应予以考虑。该任务被形式化为二进制分类，其中每个交互序列都与一个二进制标签相关联，该二进制标签指示该动作序列是否被视为犹豫。

交易匹配 (Deals Matching)

Each identified hesitant customers is matched with a deal that are most relevant to their interest. These deals are composed of images and descriptive information about the deals. Intuitively, this task can be viewed as a ranking problem, where the most relevant deals should be given the highest score. Yet, in some situations, pair-wise ranking losses that aim to ensure positive samples are scored higher than the negative samples could be hard to trained. A proxy loss function, such as binary cross entropy, can help ease the training process.

每个确定的犹豫客户都与与其兴趣最相关的交易匹配。这些交易由图像和有关交易的描述性信息组成。直观上，该任务可以看作是排名问题，其中最相关的交易应获得最高分。但是，在某些情况下，难以确保成对排名损失的目的是确保阳性样本的得分高于阴性样本。代理损失函数(例如二进制交叉熵)可以帮助简化训练过程。

评估指标 (Evaluation Metrics)

犹豫行为识别(Hesitant Behavior Identification)

Due to the high imbalance in the collected data (significant amount of negative data), the most commonly used metric, accuracy, is not suitable for our current task. Alternatively, these are some of the metrics that are reasonable for this task:

由于所收集数据的高度不平衡(大量负数据)，最常用的度量标准(准确性)不适合我们当前的任务。另外，这些是一些适合此任务的指标：

Area under the ROC Curve. (AUC): AUC only cares whether you rank rather than the actual predicted score of it. An AUC score at 0.5 indicates the performance is as good as random guess.
ROC曲线下的面积。 (AUC) ：AUC只关心您是否排名而不是实际的预测分数。 AUC得分为0.5表示性能与随机猜测一样好。
F-1 Score: F-1 takes into account both recall and precision, which makes it also a great candidate for unbalanced data.
F-1得分：F-1同时考虑了查全率和准确性，这使其也非常适合不平衡数据。

交易匹配 (Deals Matching)

As a task related information retrieval, rank-based metrics are suitable for this task. Below we list a few of them:

作为与任务相关的信息检索，基于等级的指标适用于此任务。下面我们列出其中一些：

Hit@k: Hit@k computes the number of hits from the top k highest scored/ ranked items for each instance and divided by the total number of instances.
Hit @ k ：Hit @ k计算每个实例的前k个得分最高/排名最高的项目的命中次数，然后除以实例总数。
Mean Reciprocal Rank (MRR): MRR can be viewed as an extension of Hit@k, where the quality of the “hit” is considered. Intuitively, the hit at the higher rank, should be given more credits than hit at the lower rank. MRR assigns a score for each hit 1/(2^h), where h is the rank of the hit item.
平均倒数排名(MRR) ：MRR可以视为Hit @ k的扩展，其中考虑了“ hit”的质量。凭直觉，较高等级的命中应比较低等级的命中多。 MRR为每个匹配项1 /(2 ^ h )分配一个分数，其中h是匹配项的等级。

业务方面的考虑 (Business Side Consideration)

From the business perspective, we tend to look at conversion rate and revenue with A/B testing when comparing new and old algorithms . It’s always a good idea to collaborate closely with the business department and make sure that the new technology helps the company work towards the long-term goals.

从业务角度来看，在比较新算法和旧算法时，我们倾向于通过A / B测试来查看转化率和收益。与业务部门紧密合作并确保新技术有助于公司实现长期目标始终是一个好主意。

数据采集 (Data Collection)

In the beginning, we do not have annotated data both tasks. Adapting reinforcement learning may be a solution, but the effect (successful identification of hesitant customers) should be much worse compared with expert-defined rules. Thus, we collaborated with E-Commerce experts to devise a set of rules for determining when to provide deals. With these rules, our front-end system would pop up deals on the web page of our customers if any of the rules are triggered. We are be able to collect data about whether a customer click a deal and later on checkout their cart with the deal. After we have collected enough data (~10K positive click-deal interaction), we started switching to a ML-based identification model, and continue to collect the same data.

在开始时，两个任务都没有注释数据。适应性强化学习可能是一种解决方案，但与专家定义的规则相比，其效果(成功识别犹豫的客户)应该会更糟。因此，我们与电子商务专家合作，制定了一套确定何时提供交易的规则。有了这些规则，如果触发了任何规则，我们的前端系统就会在客户的网页上弹出交易。我们能够收集有关客户是否点击交易以及稍后通过交易结帐其购物车的数据。收集到足够的数据(约1万次积极的点击交易)后，我们开始切换到基于ML的识别模型，并继续收集相同的数据。

Intuitively, we want to model continuous behavior of end users. Yet, tracking too granular interactions would result in too noisy data that can harm the performance of models. We discretized and carefully selected a salient set of additional data to collect, including:

直观地，我们希望对最终用户的连续行为进行建模。但是，跟踪过于细微的交互将导致噪声太大，从而损害模型的性能。我们离散化并精心选择了一组重要的其他数据来收集，其中包括：

The amount of time each customer spent on each web page.
每个客户在每个网页上花费的时间。
The timestamp of customers switching between tabs.
客户在选项卡之间切换的时间戳。
The interactions between customers and the web page (click, select to cart, checkout, and etc).
客户与网页之间的交互(单击，选择购物车，结帐等)。
The items that each customer viewed in each session.
每个客户在每个会话中查看的项目。

造型 (Modeling)

犹豫行为识别(Hesitant Behavior Identification)

As we discussed earlier, at the beginning of the data collection process, we utilize expert knowledge to identify customers who might be interested in clicking pop-up deals. In the next stage, we model it as a temporal prediction problem with user behavior data in the recent sessions taken into account. Sequential models, such as RNN and attention-based models, are naturally strong baselines for this task.

如前所述，在数据收集过程的开始，我们利用专业知识来识别可能对单击弹出式交易感兴趣的客户。在下一阶段，我们将其建模为时间预测问题，并考虑到最近会话中的用户行为数据。顺序模型(例如RNN和基于注意力的模型)自然是完成此任务的基础。

交易匹配 (Deals Matching)

Although in the early stage there is not much training data for finding the best deal to pop up, we develop content-based model that largely leverages content data (image, deal information, and etc) for retrieving the most relevant deals for each customer. Later on, we gradually refine our model towards a hybrid fashion, which balances between content data and collected user behavioral data.

尽管在初期阶段并没有太多的培训数据来寻找最佳交易，但我们开发了基于内容的模型，该模型在很大程度上利用内容数据(图像，交易信息等)来为每个客户检索最相关的交易。后来，我们逐渐将我们的模型改进为混合方式，在内容数据和收集的用户行为数据之间取得平衡。

结论 (Conclusion)

This blog post has highlighted the importance of hesitant customer detection and how we boil it down to two sub-tasks. We talked how our model is evaluated, how the data collection is done, and the modeling component . In the next blog post of this series, I plan to release the evaluation results and some analysis. Till next time :)

这篇博客文章强调了犹豫的客户检测的重要性以及我们如何将其简化为两个子任务。我们讨论了如何评估模型，如何完成数据收集以及建模组件。在本系列的下一篇博客文章中，我计划发布评估结果和一些分析。直到下次：)

For more information about Rosetta.ai, please visit our website, or our FB fan page.

有关Rosetta.ai的更多信息，请访问我们的网站或FB粉丝页面。