RFM代码

最新推荐文章于 2024-04-04 13:33:36 发布

anonymox

最新推荐文章于 2024-04-04 13:33:36 发布

阅读量473

点赞数

分类专栏： # ——机器学习实战

本文链接：https://blog.csdn.net/cathycheny/article/details/109554334

版权

——机器学习实战专栏收录该内容

4 篇文章 0 订阅

订阅专栏

案例1

按照人为指定规则，对RFM三个指标进行打分（1~5分），然后计算出得分的平均值。将每个指标下，评分大于和小于平均值的，分别视为该指标下的1和0。

最后根据三个指标下的分层结果，将人群分为8个组。

1. 数据格式

假设已经是清洗好的数据，格式如下：

在这里插入图片描述

2. R、F、M指标分布

消费金额

从下图可以看出，消费金额的分布是比较右偏的。70%的人的消费金额集中在3000英镑以下，而有10%左右的人消费的金额超过了8000英镑。比较符合我们日常的认知。

import matplotlib.pyplot as plt
import seaborn as sns

plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
plt.rcParams['font.size'] = 15

# 这里已经把统计结果写出来了，就直接拿过来看吧。（就是从上面的数据集中统计出来的）
lbs = ['1000英镑以内', '[1000, 3000]英镑', '[3000, 5000]英镑', '[5000, 8000]英镑', '[8000, 10000]英镑']
x= [2677, 2216, 837, 552, 685]
plt.figure(figsize=(6, 6))
plt.title('消费金额占比', fontsize = 18)
plt.pie(x, labels = lbs, autopct='%1.1f%%', colors = sns.color_palette('Blues'))

在这里插入图片描述

消费频率

在这个数据集中，只发生一次交易的有 1493人，占比约34.4%。其他 65.6%的客户为在这段期间有复购的用户。

最近一次消费距今天数

这个也是右偏比较明显的。

count    4339.000000
mean       91.038258
std       100.010502
min        -1.000000
25%        16.000000
50%        49.000000
75%       140.500000
max       372.000000
Name: 最近一次购买天数, dtype: float64

在这里插入图片描述

3. 构建模型

3.1 R、F、M人为规则打分

打分规则

def days_score(x):
    if x <= 30:
        score = 5
    elif x>30 & x <=90:
        score = 4
    elif x >90 & x <=180:
        score = 3
    elif x>180 & x<=365:
        score = 2
    else:
        score = 1
    return score

def frequency_score(x):
    if x <= 10:
        score = 1
    elif x>10 & x <=30:
        score = 2
    elif x >30 & x <=50:
        score = 3
    elif x>50 & x<=80:
        score = 4
    else:
        score = 5
    return score

def revenue_score(x):
    if x <= 1000.0:
        score = 1
    elif (x>1000) & (x<=3000):
        score = 2
    elif (x >3000) & (x <=5000):
        score = 3
    elif (x>5000) & (x<=8000):
        score = 4
    else:
        score = 5
    return score

3个维度下的打分

customer_data['days_score'] = customer_data['最近一次购买天数'].apply(days_score)
customer_data['frequency_score'] = customer_data['订单数'].apply(frequency_score)
customer_data['revenue_score'] = customer_data['总消费金额'].apply(revenue_score)
customer_data.head()

在这里插入图片描述

3.2 获取每个指标下的打分平均值

avg_r_score = customer_data['days_score'].mean()
avg_f_score = customer_data['frequency_score'].mean()
avg_m_score = customer_data['revenue_score'].mean()

print('平均最近一次消费时间间隔 R得分:', avg_r_score)
print('平均消费频次 F得分:', avg_f_score)
print('平均消费金额 M得分:', avg_m_score)

'''
平均最近一次消费时间间隔 R得分: 4.39525236229546
平均消费频次 F得分: 1.0778981332104172
平均消费金额 M得分: 1.60659138050242
'''

3.3 按照是否大于平均分来分层

customer_data['R'] = customer_data['days_score'].apply(lambda x: 1 if x>round(avg_r_score, 1) else 0)
customer_data['F'] = customer_data['frequency_score'].apply(lambda x: 1 if x>round(avg_f_score, 1) else 0)
customer_data['M'] = customer_data['revenue_score'].apply(lambda x: 1 if x>round(avg_m_score, 1) else 0)

customer_data.head()

在这里插入图片描述

3.4 分层结果生成标签

customer_data.loc[((customer_data['R']==1) & (customer_data['M']==1) & (customer_data['M']==1)), 'customer_type'] = '重要价值客户'
customer_data.loc[((customer_data['R']==0) & (customer_data['M']==1) & (customer_data['M']==1)), 'customer_type'] = '重要保持客户'
customer_data.loc[((customer_data['R']==1) & (customer_data['M']==0) & (customer_data['M']==1)), 'customer_type'] = '重要发展客户'
customer_data.loc[((customer_data['R']==0) & (customer_data['M']==0) & (customer_data['M']==1)), 'customer_type'] = '重要挽留客户'
customer_data.loc[((customer_data['R']==1) & (customer_data['M']==1) & (customer_data['M']==0)), 'customer_type'] = '一般价值客户'
customer_data.loc[((customer_data['R']==0) & (customer_data['M']==1) & (customer_data['M']==0)), 'customer_type'] = '一般保持客户'
customer_data.loc[((customer_data['R']==1) & (customer_data['M']==0) & (customer_data['M']==0)), 'customer_type'] = '一般发展客户'
customer_data.loc[((customer_data['R']==0) & (customer_data['M']==0) & (customer_data['M']==0)), 'customer_type'] = '流失客户'

customer_data.head()

在这里插入图片描述

4. 查看分层人数

customer_data.customer_type.value_counts()
customer_data.customer_type.value_counts().plot(kind = 'pie')

在这里插入图片描述

#定位流失客户，可以进行进一步分析
customer_data[customer_data['customer_type']=='流失客户']

案例2

将R、F、M三个指标等深分成2个组，并获取中间的分界值（threshold）。然后对分组结果进行二值化（0和1）。

最后根据三个指标下的分层结果，将人群分为8个组。

这样的好处是不用人为的指定打分规则，并且可以避免数据右偏的情况。不过最终的结果里，因为是等深分箱，所以单个指标的分层是50%：50%的。

1. 基础数据格式

在这里插入图片描述

2. 计算R、F、M指标

2.1 F反映客户对打折产品的偏好 interest

注意这个例子里面，F计算的是打折订单的比例，而不是整体的购物频次

F = trad_flow.groupby(['cumid', 'type'])[['transID']].count()
display(F.head())

F_trans = pd.pivot_table(F, index='cumid', columns='type', values='transID')
# print(F_trans.head())

F_trans['Special_offer'] = F_trans['Special_offer'].fillna(0)
# print(F_trans.head())

F_trans["interest"] = F_trans['Special_offer'] / (F_trans['Special_offer'] + F_trans['Normal'])
F_trans.head()

在这里插入图片描述

指标分布
从下图可以可以看到，用户的折扣订单占比这个指标，会有比较明显的右偏情况。

F_trans['interest'].plot(kind='hist',bins=20,figsize=(15,6))
plt.title('interest')

在这里插入图片描述

2.2 M反映客户的总消费金额 value

客户的总消费金额，等于其所有类型订单的消费金额。这里比较简单，直接相加即可。

M = trad_flow.groupby(['cumid', 'type'])[['amount']].sum()
display(M.head())

M_trans = pd.pivot_table(M, index='cumid', columns='type', values='amount')
M_trans['Special_offer'] = M_trans['Special_offer'].fillna(0)
M_trans['returned_goods'] = M_trans['returned_goods'].fillna(0)
M_trans["value"] = M_trans['Normal'] + M_trans['Special_offer'] + M_trans['returned_goods']
M_trans.head()

在这里插入图片描述

数据分布
相对来说，这个数据集里面的用户消费金额这一指标，并没有明显的右偏。整个分布还是接近正态的。

M_trans['value'].plot(kind='hist',bins=20,figsize=(15,6))
plt.title('value')

在这里插入图片描述

2.3 通过计算R反映客户是否为沉默客户 time_new

此处修改为时间戳方便后面 qcut 函数分箱

# 先请洗一下数据集中的时间
# 定义一个从文本转化为时间的函数
import time
def to_time(t):
    out_t = time.mktime(time.strptime(t, '%d%b%y:%H:%M:%S'))  # 此处修改为 时间戳 方便后面 qcut 函数分箱
    return out_t

trad_flow["time_new"] = trad_flow['time'].apply(to_time)
R = trad_flow.groupby(['cumid'])[['time_new']].max()
R.head()

在这里插入图片描述

数据分布
这里先不换算距今天天数了，之间按照时间戳看一下。从下图可知，用户最近一次购买距今天数，这个也是比较右偏的。

R["time_new"].plot(kind='hist',bins=20,figsize=(15,6))
plt.title('time_new')

在这里插入图片描述

3. 构建模型

3.1 等深分桶并做二值化

R、F、M 三个指标分别等深分成两个桶，并做二值化

from sklearn import preprocessing


threshold = pd.qcut(F_trans['interest'], 2, retbins=True)[1][1] #等深分成两个桶
print(f'\nthreshold: {threshold:.5f}')

binarizer = preprocessing.Binarizer(threshold=threshold)  # 二值化
interest_q = pd.DataFrame(binarizer.transform(F_trans['interest'].values.reshape(-1, 1)))
interest_q.index = F_trans.index
interest_q.columns = ["interest"]
# print(interest_q[:5])
display(interest_q['interest'].value_counts())


threshold = pd.qcut(M_trans['value'], 2, retbins=True)[1][1] #等深分成两个桶
print(f'\nthreshold: {threshold:.2f}')
binarizer = preprocessing.Binarizer(threshold=threshold)  # 二值化
value_q = pd.DataFrame(binarizer.transform(M_trans['value'].values.reshape(-1, 1)))
value_q.index = M_trans.index
value_q.columns = ["value"]
# print(value_q[:5])
display(value_q['value'].value_counts())


threshold = pd.qcut(R["time_new"], 2, retbins=True)[1][1] #等深分成两个桶
print(f'\nthreshold: {threshold:.0f}')
binarizer = preprocessing.Binarizer(threshold=threshold)  # 二值化
time_new_q = pd.DataFrame(binarizer.transform(R["time_new"].values.reshape(-1, 1)))
time_new_q.index = R.index
time_new_q.columns = ["time"]
# print(time_new_q[:5])
display(time_new_q['time'].value_counts())

在这里插入图片描述

3.2 分层打标结果

analysis = pd.concat([interest_q, value_q, time_new_q], axis=1)
analysis.head()

在这里插入图片描述

3.3 生成标签

# 生成RFM标签
label = {
    (0, 0, 0): '无兴趣-低价值-沉默',
    (1, 0, 0): '有兴趣-低价值-沉默',
    (1, 0, 1): '有兴趣-低价值-活跃',
    (0, 0, 1): '无兴趣-低价值-活跃',
    (0, 1, 0): '无兴趣-高价值-沉默',
    (1, 1, 0): '有兴趣-高价值-沉默',
    (1, 1, 1): '有兴趣-高价值-活跃',
    (0, 1, 1): '无兴趣-高价值-活跃'
}

analysis['label'] = analysis[['interest', 'value', 'time']].apply(lambda x: label[(x[0], x[1], x[2])], axis=1)

analysis.head()

在这里插入图片描述

4. 查看分层人数

这种分层方式生成的结果中，各层人数比较接近（相当于每个指标按照中位数分的上下两组）。如果不希望各层人数太接近的，可以自己自定义分层规则（参考案例1）

output = pd.DataFrame(analysis['label'].value_counts()).rename(columns={'label':'people_cnt'})
output['percentage'] = output['people_cnt']/sum(output['people_cnt'])
output.sort_index(ascending=False)

在这里插入图片描述

anonymox

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
RFM代码

文章目录案例11. 数据格式2. R、F、M指标分布消费金额消费频率最近一次消费距今天数3. 构建模型3.1 R、F、M人为规则打分3.2 获取每个指标下的打分平均值3.3 按照是否大于平均分来分层3.4 分层结果生成标签4. 查看分层人数案例21. 基础数据格式2. 计算R、F、M指标2.1 F反映客户对打折产品的偏好 interest2.2 M反映客户的总消费金额 value2.3 通过计算R反映客户是否为沉默客户 time_new3. 构建模型3.1 等深分桶并做二值化3.2 分层打标结果3.3
复制链接

扫一扫