行销(Marketing)预测生命周期价值 (CLV)
在营销中,对营销活动的预算始终是挑战。我们不想花太多钱而导致ROI下降。但是,我们也不想花费太多,也没有明显的影响或结果。在确定营销策略的预算时,至关重要的是要知道运行给定的营销活动会带来多少预期回报。了解单个客户的客户生命周期价值(CLV)可以帮助营销人员证明其营销预算的合理性,并定位潜在的高价值客户。通常,获取新客户要比保留现有客户更为昂贵,因此,为了建立具有正ROI的营销策略,了解生命周期价值和与获取新客户相关的成本至关重要。例如,如果客户的平均CLV为100元,而获得新客户仅需花费10元,那么在获得新客户时,我们的业务就会产生更多的收入。
有多种方法可以计算CLV。一种方法是找到客户的平均购买金额,购买频率和寿命,并进行简单的计算以获得CLV。例如,考虑一个假设的案例,客户的平均购买金额为100,并且他/她平均每月进行五次购买。则该客户每月的平均价值为500美元,这就是平均购买金额乘以平均购买频率。现在,我们需要知道该客户的生命周期。估算客户生命周期的一种方法是查看平均每月客户流失率,即客户离开并终止与公司的业务关系的客户所占的百分比。您可以用客户流失率除以客户的生命周期。假设在我们假设的情况下,客户流失率为5%,则估计客户的生命周期为20年 (1/5%)。给定该客户每月的平均价值为500元,寿命为20年,则该客户的CLV为120,000。最终的CLV金额是通过将500(每月的平均值)乘以12个月以及20年的寿命来计算的。
在本文中,我会建立一个回归模型来预测连续的变量CLV。我会使用一个在线零售的数据集,仍然是来自Kaggle,数据集是OnlineRetail.csv 。
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session
/kaggle/input/onlineretail/OnlineRetail.csv
Load Packages
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, median_absolute_error
%matplotlib inline
Load the data
df=pd.read_csv(r"../input/onlineretail/OnlineRetail.csv", encoding="cp1252")
df.head(3)
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 12/1/2010 8:26 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 12/1/2010 8:26 | 3.39 | 17850.0 | United Kingdom |
2 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 12/1/2010 8:26 | 2.75 | 17850.0 | United Kingdom |
df.shape
(541909, 8)
Data Clean-Up
Negative Quantity
df.loc[df['Quantity'] <= 0].shape
(10624, 8)
df = df.loc[df['Quantity'] > 0]
Missing CustomerID
df = df[pd.notnull(df['CustomerID'])]
Excluding Incomplete Month
print('Date Range: %s ~ %s' % (df['InvoiceDate'].min(), df['InvoiceDate'].max()))
Date Range: 1/10/2011 10:32 ~ 9/9/2011 9:52
df = df.loc[df['InvoiceDate'] < '2011-12-01']
Total Sales
df['Sales'] = df['Quantity'] * df['UnitPrice']
Per Order Data
orders_df = df.groupby(