

Relationship management is one of the determining factors in the business health. One of the most important factors of this connection is the ability to identify when a customer is likely to cancel a service. For that reason, it is necessary to take initiatives that maximize customer retention.

关系管理是业务健康的决定因素之一。 这种联系的最重要因素之一是能够确定客户何时可能取消服务。 因此,有必要采取措施最大限度地保留客户。

Therefore, projects that identify customers prone to churn have become a frequent concern for organizations, as the cost of retention is usually lower than the cost of acquisition.


Although it has gained the attention of many companies, there is no magic formula to solve the churn problem. In addition, the solution can have numerous complexities, like identifying the churn reason to apply different retention strategies.

尽管它已经引起了许多公司的关注,但是还没有神奇的公式可以解决流失问题。 此外,解决方案可能具有许多复杂性,例如确定应用不同保留策略的客户流失原因。

挑战性 (Challenges)

获取新客户的成本是否大于保留成本? (Is the cost of acquiring new customers greater than the cost of retention?)

It is essential to observe financial and strategic expenses in order to acquire and retain customers, since for some companies the cost of acquisition may be 5x higher than the cost of retention.


将治疗哪种类型的客户流失? (What type of churn will be treated?)

It is important to highlight that the churn increase for a product or service occurs in many ways, such as:


  1. Volunteer: when the customer chooses to cancel the service due to dissatisfaction or preference for a competitor.

    志愿者 :当客户由于对竞争对手的不满或偏爱而选择取消服务时。

  2. Silent: happens when a customer stops using the service for a long period and it does not generate costs — as using a credit card without monthly fees.

    静默 :当客户长时间停止使用服务且不会产生成本时(例如使用没有月费的信用卡),会发生静默

  3. Involuntary: when the consumer does not intend to cancel the service, but due to a negligence he may end up having his plan not renewed or canceled for irregular use, lack of payment, among others.

    非自愿的 :当消费者不打算取消服务,但是由于疏忽,他最终可能会因为不正当使用,缺乏付款等原因而没有续签或取消其计划。

您的专家对这个问题了解多少? (How much do your experts know about the problem?)

Having a skilled team is very important to analyze if the project can be executed internally or if it needs outsourced help. Personalized solutions and prepared professionals can help to overcome the challenges of the problem and obtain rich and applicable results.

拥有一支熟练的团队对于分析项目是否可以在内部执行或是否需要外包帮助非常重要。 个性化的解决方案和专业的专业人员可以帮助克服问题的挑战,并获得丰富而适用的结果。

您是否有一个数据库可以提取有关业务及其客户的信息? (Do you have a database that allows you to extract information about the business and its customers?)

A solid database makes project execution much more feasible and generates robust and reliable results. This is a fundamental step to obtain customer knowledge and, consequently, understand how to map and develop your solution. Which brings us to the next question:

可靠的数据库使项目执行更加可行,并产生可靠可靠的结果。 这是获取客户知识并因此了解如何映射和开发解决方案的基本步骤。 这就引出了下一个问题:

您对客户有多了解? (How well do you know your clients?)

It is also necessary to diagnose how your actions reflect on customers and, for that, you need to gather the information that defines their individual profile and behavior. This analysis is the key to identify whether or not they are prone to churn.

还需要诊断您的行为如何影响客户,为此,您需要收集定义其个人资料和行为的信息。 该分析是确定它们是否容易流失的关键。

解决方法 (Ways to solve)

When it comes to solving the problem, there are a few more challenges to be overcomed by the team of experts. The first one is related to combine technical knowledge and business understanding, since exploratory analysis and the feature engineering must consider the organizational model to be successful.

解决问题时,专家团队还需要克服一些其他挑战。 第一个涉及将技术知识和业务理解相结合,因为探索性分析和功能工程必须考虑组织模型的成功。

After characteristics consolidation and the insertion of business insights, it is time to start modeling. At this stage, you may encounter imbalanced data, in other words, by splitting the base of people who churned and people who remained faithful to the service, you may find an exacerbated higher proportion of loyal customers.

在特征合并和业务见解插入之后,是时候开始建模了。 在此阶段,您可能会遇到数据不平衡的情况,换句话说,通过分散搅动的人群和忠于服务的人群,您会发现忠诚客户的比例更高。

The biggest problem with imbalanced data is that, if it is not addressed, machine learning algorithms tend to have a good response only for the majority class. This implies the generation of many false negatives, as there is an inclination to classify customers who are likely to leave as loyals.

数据不平衡的最大问题是,如果不加以解决,机器学习算法往往仅对大多数人有很好的响应。 由于存在将可能离开的客户归为忠诚客户的倾向,因此这意味着会产生许多假否定情况。

处理不平衡数据的技术 (Techniques to deal with imbalanced data)

At this point, it is necessary to use techniques to solve the imbalanced dataset problem and optimize the filter of customer’s behavior. Among them we can mention some of the most common ones: Oversampling, Undersampling, SMOTE and ADASYN. It is worth mentioning that they are not generalists, which explains why each problem is treated according to its specificity.

在这一点上,有必要使用技术来解决数据集不平衡的问题并优化客户行为的过滤器。 在它们当中,我们可以提到一些最常见的:过采样,欠采样,SMOTE和ADASYN。 值得一提的是,他们不是通才,这解释了为什么每个问题都要根据其具体性进行处理。

Undersampling and Oversampling are more elementary techniques and mean the reduction of the class with greater representativeness and expansion of the one with less representativeness, respectively.


SMOTE and ADASYN are more complex and make synthetic samples of the data. Both are similar strategies but ADASYN uses density distribution to create the synthetic elements.

SMOTE和ADASYN更复杂,它们是数据的综合样本。 两者都是相似的策略,但是ADASYN使用密度分布来创建合成元素。

了解您的客户流失解决方案的性能 (Understand the performance of your churn solution)

The churn model must be built based on the expected responses, being concerned with performance and how the output should be presented. When measuring model performance it is important to choose the correct metric for evaluation. Accuracy, for example, can give us a false sense of an stunning model, however, the result can be due to a correct classification only of the majority class — in which there is no presence of churn.

流失模型必须基于预期的响应,性能和输出表示方式来构建。 在测量模型性能时,选择正确的评估指标非常重要。 例如,准确性可能使我们对令人震惊的模型有错误的认识,但是,结果可能是由于仅对大多数类别进行了正确分类而没有流失。

Image for post
Walber on WalberWikipedia 维基百科上

Such evaluation can be centered on how much the solution improves your current retention strategy. If we consider that the retention actions are done on random clients, we can evaluate how much the sample indicated by the model would improve the selection of clients prone to churn.

这样的评估可以集中在解决方案可以在多大程度上改善您当前的保留策略上。 如果我们认为保留操作是针对随机客户执行的,则我们可以评估该模型指示的样本将改善易流失客户的选择的程度。

Traditional evaluation metrics, like precision and recall, can also be fairly useful. The former is the number of correct indications over the total of number indications, while the second is the percentage of churn clients correctly classified over the total number of churns. Another method is the f1-score that can be described as:

传统的评估指标,如准确性和召回率,也可能非常有用。 前者是正确指示的数量占总数指示的总数,而第二个是正确分类的流失客户在流失总数中的百分比。 另一种方法是f1得分,可以描述为:

F1 = 2 * (precision * recall) / (precision + recall)

F1 = 2 *(精度*召回率)/(精度+召回率)

了解结果 (Understanding the results)

In order to evaluate the metric to be used, it is crucial to understand operational costs to retain a customer given the potential for expected future revenue (lifetime value — LTV).


Customers with a high LTV may justify a higher expense for retention, while customers with a low LTV may not justify the investment to retain it.


From the knowledge of the parameters for retaining a customer, this operation can be marked out, whether or not it makes the acceptance of wrongly classified consumers more flexible. This factor is directly related to penalties for generating false positives — when a loyal customer is classified as a churn.

根据保留客户的参数知识,可以标明此操作,无论是否使接受错误分类的消费者更为灵活。 当忠实的客户被归类为客户流失时,此因素与产生误报的罚款直接相关。

If the cost of the retention operation is low, you can choose to flag more customers and thus get the majority of real churns. However, this will result in the presence of more false positives. Likewise, if the cost is high, it is essential to focus on the accuracy of the selected group, in order to avoid unnecessary expenses.

如果保留操作的成本较低,则可以选择标记更多的客户,从而获得大部分的实际客户流失。 但是,这将导致出现更多的误报。 同样,如果成本很高,则必须重点关注所选组的准确性,以避免不必要的支出。

In classification models, the threshold to classify a client as a churner is, by default, having a probability of leaving the service superior to 50%. This limit can be changed according to the business, for example, if higher precision is required, we can evaluate as churn only elements with a probability above 70%.

在分类模型中,默认情况下,将客户分类为客户的阈值具有使服务保持在50%以上的可能性。 可以根据业务更改此限制,例如,如果需要更高的精度,我们可以仅将概率高于70%的元素评估为流失。

Image for post
Sin-Yi Chou on 仙乙丑Github Github上

该模型 (The model)

The expected output can influence the employed strategy used to solve the problem. In addition to classification algorithms, which have binary responses, there are approaches that use survival and hybrid models. Survival analysis models do not classify customers as prone to churn or not. The generated response is a curve that can be operated to track each client’s probability to churn over time.

预期的输出会影响解决该问题所采用的策略。 除了具有二进制响应的分类算法外,还有使用生存和混合模型的方法。 生存分析模型不会将客户分类为容易流失的客户。 生成的响应是一条曲线,可用于跟踪每个客户随时间流逝的可能性。

To overcome survival analysis problems that involve complex and non-linear risk functions, models that extend binary classifications and transform their results into survival analysis have been developed. Such models are known as hybrid models and some of them are: RF-SRC, deepSurv and WTTE-RNN.

为了克服涉及复杂和非线性风险函数的生存分析问题,开发了扩展二进制分类并将其结果转换为生存分析的模型。 这种模型称为混合模型,其中一些是:RF-SRC,deepSurv和WTTE-RNN。

结论 (Conclusion)

In summary, it is clear that churn modeling is vital for companies to be able to retain customers and reduce costs. Therefore, it is necessary to be aware that the success of these resources goes through several aspects — ranging from the knowledge of the public, to the complexity and robustness of the model. In case of any doubts, feel free to contact me!

总之,很明显,流失模型对于公司能够保留客户并降低成本至关重要。 因此,有必要意识到,这些资源的成功涉及多个方面-从公众的知识到模型的复杂性和鲁棒性。 如有任何疑问,请随时与我联系!

翻译自: https://towardsdatascience.com/unraveling-churn-and-its-challenges-a207276ff4a9


