postgresql 分析_使用Postgresql进行rfm分析

postgresql 分析

RFM, stands for recency, frequency and monetary, is one method of customer segmentation, a process of dividing customers into groups based on similar characteristics. This method can help companies identify their customers who are most likely to respond the marketing campaign. Thus, company can make marketing campaigns that are more suitable for each groups.

RFM代表新近度,频率和货币,是客户细分的一种方法,它是根据相似特征将客户分为几组的过程。 这种方法可以帮助公司确定最有可能响应市场营销活动的客户。 因此,公司可以进行更适合每个群体的营销活动。

Unlike traditional method that using demographic data, the idea of RFM analysis is to segment customers by transaction data. This makes RFM more practical than the traditional method, also RFM can analyse the entire population that available in the transaction data history. However, not all features in the data set will be used in RFM analysis, we only use these three features:

与使用人口统计数据的传统方法不同,RFM分析的想法是按交易数据细分客户。 这使RFM比传统方法更实用,RFM还可分析交易数据历史记录中可用的全部人口。 但是,并非数据集中的所有功能都将用于RFM分析,我们仅使用以下三个功能:

  • The last time customer purchased a product/service (Recency)

    客户上次购买产品/服务的时间(新近度)
  • The number of purchases made by a customer during a certain period of time (Frequency)

    客户在一定时间段内的购买次数(频率)
  • The amount of money spent by a customer during a certain period of time (Monetary)

    客户在一定时间段内花费的金额(货币)

In this article, I will do RFM analysis using PostgreSQL, the data set is an online retail data set from here. The table contains 8 attributes as follow:

在本文中,我将使用PostgreSQL进行RFM分析,该数据集是从此处开始的在线零售数据集。 该表包含8个属性,如下所示:

  1. InvoiceNo : Invoice number. Nominal, a 6-digit integral number that uniquely assigned to each transaction. Any number that starts with letter ‘c’ indicates a cancellation.

    InvoiceNo:发票编号。 标称,唯一分配给每个交易的6位整数。 以字母“ c”开头的任何数字均表示已取消。

  2. StockCode : Product code. Nominal, a 5-digit integral number uniquely assigned to each distinct product.

    StockCode:产品代码。 标称,唯一地分配给每个不同产品的5位整数。

  3. Description : Product name. Nominal.

    描述:产品名称。 名义上

  4. Quantity : The quantities of each product (item) per transaction. Numeric.

    数量:每笔交易中每种产品(项目)的数量。 数字。

  5. InvoiceDate : Invoice Date and time. Numeric, the day and time when each transaction was generated.

    InvoiceDate:发票日期和时间。 数字,每笔交易生成的日期和时间。

  6. UnitPrice : Numeric, product price per unit in sterling.

    UnitPrice:数值,单位为英镑的产品价格。

  7. CustomerID : Nominal, a 5-digit integral number uniquely assigned to each customer.

    客户ID:标称,唯一分配给每个客户的5位整数。

  8. Country : Nominal, the name of the country where each customer resides.

    国家:名义上,每个客户居住的国家的名称。

The first thing to do is create the table and import the data to PostgreSQL. Since there are columns containing both numbers and letters, the encoding must be set to windows-1251.

首先要做的是创建表并将数据导入PostgreSQL。 由于存在同时包含数字和字母的列,因此必须将编码设置为Windows-1251。

And here is a glimpse of the database, since it won’t be so clear to put all the columns, I’ll just point out the columns that will need some treatments.

这里是数据库的一瞥,因为并不是很清楚地放置所有列,所以我只指出需要进行一些处理的列。

Image for post
Image for post

As mentioned, the data include cancelled transaction that can be seen from the table above, therefore any invoiceno containing ‘C’, NULLs in customerid and 0 in unitprice must be omitted.

如前所述,该数据包括可从上表中看到的已取消交易,因此,必须省略任何包含“ C”,customerid中的NULL和单价中的0的发票。

The data set is now ready for RFM analysis with 4 tier groups for each dimensions (R, F and M). 1 indicates longest time of transaction, least transaction and lowest spend while 4 indicates most recent, most transaction and highest spend.

现在,该数据集已准备好用于RFM分析,每个维度(R,F和M)具有4个层组。 1表示最长的交易时间,最少的交易和最低的支出,而4表示最近的交易,最多的交易和最高的支出。

Image for post

There are 4338 numbers of customers segmented to one of 64 groups (4x4x4). Now I’d like to create a tree map for visualization, but I’ll do it in Python. Before that, it’s crucial to define the groups into some levels because having many groups is not practical.

有4338个客户细分为64个组(4x4x4)之一。 现在,我想创建一个树形图以进行可视化,但是我将在Python中完成。 在此之前,至关重要的是将组定义为一定级别,因为有许多组是不切实际的。

import pandas as pdimport matplotlib.pyplot as pltimport squarifydef rfm_level(df):if ((df['rfm_recency'] >= 4) and (df['rfm_frequency'] >= 4) and (df['rfm_monetary'] >= 4)):return 'Best Customers'elif ((df['rfm_recency'] >= 3) and (df['rfm_frequency'] >= 3)and (df['rfm_monetary'] >= 3)):return 'Loyal' elif ((df['rfm_recency'] >= 3) and (df['rfm_frequency'] >= 1)and (df['rfm_monetary'] >= 2)): return 'Potential Loyalist'elif ((df['rfm_recency'] >= 3) and (df['rfm_frequency'] >= 1)and (df['rfm_monetary'] >= 1)):return 'Promising'elif ((df['rfm_recency'] >= 2) and (df['rfm_frequency'] >= 2)and (df['rfm_monetary'] >= 2)):return 'Customers Needing Attention' elif ((df['rfm_recency'] >= 1) and (df['rfm_frequency'] >= 2)and (df['rfm_monetary'] >= 2)):return 'At Risk'elif ((df['rfm_recency'] >= 1) and (df['rfm_frequency'] >= 1)and (df['rfm_monetary'] >= 2)):return 'Hibernating'
else:
return 'Lost'#Create a new variable rfm_level
data['rfm_level'] = data.apply(rfm_level, axis=1)
data
Image for post

The customers now divided into 8 segments, here are what each segments means:

现在,客户分为8个细分,这是每个细分的含义:

  1. Best Customers

    最佳顾客

    Customers in this segment bought product very recent, very often and spend the most among others.

    该细分市场中的客户购买商品的频率很高,而且购买频率最高。

  2. Loyal

    忠诚

    Loyal customers are those who spent good amount of money and they usually responsive to promotions.

    忠实的客户是那些花费大量金钱并且通常会响应促销的客户。

  3. Potential Loyalist

    潜在的忠实者

    Potential loyalists are recent customers that bought product more than once and spent good amount of money. We can offer membership and recommend other products to customers in this segment.

    潜在的忠实拥护者是最近的客户,他们购买了不止一次产品,并花费了大量金钱。 我们可以提供会员资格并向该细分市场的客户推荐其他产品。

  4. Promising

    有希望

    This segment is for new shoppers that haven’t spent much. We can create brand awareness and offer free trials for them.

    该细分受众群用于那些花费不多的新购物者。 我们可以建立品牌知名度,并为他们提供免费试用。

  5. Customers Needing Attention

    需要注意的顾客

    This customers bought much and spent good money but have not bought very recently. We can reactive them by make limited time offers and recommend products based on their past purchases.

    这些客户购买了很多东西,花了很多钱,但最近还没有买。 我们可以通过提供限时优惠来响应他们,并根据他们过去的购买来推荐产品。

  6. At Risk

    有一定风险

    This group of customers spent big money and purchased often a long time ago. We need to bring them back by sending emails to reconnect and offer renewals.

    这群客户花了很多钱,很久以前就购买了。 我们需要通过发送电子邮件以重新连接并提供续订的方式来带回他们。

  7. Hibernating

    冬眠

    This customers have been inactive for a long time, they have low number of orders and low spenders. We can offer other relevant products and special discounts.

    该客户长期处于非活动状态,他们的订单数量少且消费低。 我们可以提供其他相关产品和特别折扣。

  8. Lost

    丢失

    Customers with lowest performance of RFM, we can ignore them or reach them out to get back their interest.

    RFM表现最差的客户,我们可以忽略他们,也可以伸出他们来吸引他们的兴趣。

# Calculate total customers in each segment
rfm_agg = data.groupby('rfm_level').agg({'customerid':'count'})
print(rfm_agg)
Image for post

From the data tabulated above, there are 43% customers from online retail data set that place the top tier of RFM levels (“Best Customers”, “Loyal”, “Potential Loyalist”). The size of customers in each level can also be seen from the visualization below.

根据上面列出的数据,在线零售数据集中有43%的客户位于RFM级别的顶层(“最佳客户”,“忠诚”,“潜在忠实者”)。 您还可以从下面的图表中看到每个级别的客户规模。

#RFM visualization
fig = plt.gcf()
ax = fig.add_subplot()
fig.set_size_inches(13, 7)
squarify.plot(sizes=rfm_agg['customerid'],
label=['At Risk',
'Best Customers',
'Customers Needing Attention',
'Hibernating',
'Lost',
'Loyal',
'Potential Loyalist',
'Promising'], alpha=0.6)
plt.title("RFM Segments",fontsize=20)
plt.axis('off')
plt.show()
Image for post

That’s all I can share about RFM analysis, thank you for taking your time to read my article. If you have any comments, feel free to leave your feedback below or reach me on LinkedIn. Have a nice day!

关于RFM分析,这就是我能分享的所有内容,感谢您抽出宝贵的时间阅读我的文章。 如果您有任何意见,请随时在下面留下您的反馈或通过LinkedIn与我联系。 祝你今天愉快!

https://www.putler.com/rfm-analysis/

https://www.putler.com/rfm-analysis/

https://www.silota.com/docs/recipes/sql-recency-frequency-monetary-rfm-customer-analysis.html

https:// www。 silota.com/docs/recipes/sql-recency-frequency-monetary-rfm-customer-analysis.html

https://towardsdatascience.com/recency-frequency-monetary-model-with-python-and-how-sephora-uses-it-to-optimize-their-google-d6a0707c5f17

https://towardsdatascience.com/recency-frequency-monetary-model-with-python-and-how-sephora-uses-it-to-optimize-their-google-d6a0707c5f17

https://www.moengage.com/blog/rfm-analysis-using-predictive-segments/

https://www.moengage.com/blog/rfm-analysis-using-predictive-segments/

翻译自: https://medium.com/@rafiqairwandi/rfm-analysis-using-postgresql-2e5c1fe42d6

postgresql 分析

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值