大家好,今天给大家分享一篇 kaggle 数据集的新文章:基于一份超市消费数据集的用户个性化分析以及用户分群的实现。
更多详细内容参考原数据集地址:
https://www.kaggle.com/code/sonalisingh1411/customer-personality-analysis-segmentation/data?select=marketing_campaign.csv
主要内容
本文的主要内容:
-
数据和字段的基本信息
-
缺失值分析与处理
-
从4个方面来具体探索字段信息
-
时间字段的处理
-
双变量分析
-
相关性分析
-
用户分群与可视化
缺失值情况:
技术提升
本文由技术群粉丝分享,项目源码、数据、技术交流提升,均可加交流群获取,群友已超过2000人,添加时最好的备注方式为:来源+兴趣方向,方便找到志同道合的朋友
方式①、添加微信号:dkl88191,备注:来自CSDN +研究方向
方式②、微信搜索公众号:Python学习与数据挖掘,后台回复:加群
字段含义
主要是涉及到4个方面的字段:人、商品、促销、地点
PEOPLE
-
ID: Customer’s unique identifier.
-
Year_Birth: Customer’s birth year.
-
Education: Customer’s education level.
-
Marital_Status: Customer’s marital status.
-
Income: Customer’s yearly household income.
-
Kidhome: Number of children in customer’s household.
-
Teenhome: Number of teenagers in customer’s household.
-
Dt_Customer: Date of customer’s enrollment with the company.
-
Recency: Number of days since customer’s last purchase.
-
Complain: 1 if customer complained in the last 2 years, 0 otherwise.
PRODUCTS
-
MntWines: Amount spent on wine in last 2 years.
-
MntFruits: Amount spent on fruits in last 2 years.
-
MntMeatProducts: Amount spent on meat in last 2 years.
-
MntFishProducts: Amount spent on fish in last 2 years.
-
MntSweetProducts: Amount spent on sweets in last 2 years.
-
MntGoldProds: Amount spent on gold in last 2 years.
PROMOTION
-
NumDealsPurchases: Number of purchases made with a discount.
-
AcceptedCmp1: 1 if customer accepted the offer in the 1st campaign, 0 otherwise.
-
AcceptedCmp2: 1 if customer accepted the offer in the 2nd campaign, 0 otherwise.
-
AcceptedCmp3: 1 if customer accepted the offer in the 3rd campaign, 0 otherwise.
-
AcceptedCmp4: 1 if customer accepted the offer in the 4th campaign, 0 otherwise.
-
AcceptedCmp5: 1 if customer accepted the offer in the 5th campaign, 0 otherwise.
-
Response: 1 if customer accepted the offer in the last campaign, 0 otherwise.
PLACE
-
NumWebPurchases: Number of purchases made through the company’s web site.
-
NumCatalogPurchases: Number of purchases made using a catalogue.
-
NumStorePurchases: Number of purchases made directly in stores.
-
NumWebVisitsMonth: Number of visits to company’s web site in the last month.
双变量分析
相关性
分群结果
plt.figure(figsize= (15,8))
plt.scatter(X[y == 0, 0], X[y == 0, 1],
s = 25, c = 'mediumblue',
label = 'one')
plt.scatter(X[y == 1, 0], X[y == 1, 1],
s = 25, c = 'turquoise',
label = 'two')
plt.scatter(X[y == 2, 0], X[y == 2, 1],
s = 25, c = 'red',
label = 'three')
plt.scatter(X[y == 3, 0], X[y == 3, 1],
s = 25, c = 'green',
label = 'four')
plt.scatter(X[y == 4, 0], X[y == 4, 1],
s = 25, c = 'yellow',
label = 'five')
plt.scatter(kmeans.cluster_centers_[:, 0],
kmeans.cluster_centers_[:, 1],
s = 55, c = 'black',
label = 'Centroids')
plt.title('Clusters of customers',fontsize = 20)
plt.xlabel('Income',fontsize = 15)
plt.ylabel('Expenses',fontsize = 15)
plt.legend(fontsize = 15)
plt.show()