客户个性分析聚类大数据

最新推荐文章于 2024-07-22 00:19:33 发布

程序员加菲

最新推荐文章于 2024-07-22 00:19:33 发布

阅读量6.7k

点赞数 4

文章标签：聚类 big data 数据挖掘

本文链接：https://blog.csdn.net/GETITNOW/article/details/121494304

版权

客户个性分析聚类大数据

作者：桂Sir 联系方式：1052656099@qq.com

不同的消费者，由于受年龄、性别、群体、职业、民族等自身类型的不同，以及生活习惯，兴趣、爱好、和个人性格因素的影响，在对同一产品的选购过程中往往会表现出不同的心理差异。所以客户个性分析能够帮助企业更好地了解其客户，并使他们更容易根据不同类型客户的特定需求、行为和关注点来修改产品。例如，公司无需花钱向公司数据库中的每个客户推销新产品，而是可以分析哪个客户群最有可能购买该产品，然后仅在该特定市场上销售该产品。

我们的数据集是一个杂货公司客户记录数据，我们将对其进行无监督的聚类将客户分组，探索不同客户对业务的重要性。根据客户的不同需求和行为修改产品，帮助企业满足不同类型客户的关注

项目链接： https://www.kaggle.com/imakash3011/customer-personality-analysis

import numpy as np
import pandas as pd
import datetime
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import colors
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from yellowbrick.cluster import KElbowVisualizer
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt, numpy as np
from mpl_toolkits.mplot3d import Axes3D
from sklearn.cluster import AgglomerativeClustering
from matplotlib.colors import ListedColormap
from sklearn import metrics
import warnings
import sys
if not sys.warnoptions:
    warnings.simplefilter("ignore")
np.random.seed(42)

数据集预览

首先预览数据集观察到共有29列变量

data = pd.read_csv("D:/Project/BIGdata/Homework/marketing_campaign.csv", sep="\t")
data.head()

	ID	Year_Birth	Education	Marital_Status	Income	Kidhome	Teenhome	Dt_Customer	Recency	MntWines	...	NumWebVisitsMonth	Z_CostContact	Z_Revenue	Response
0	5524	1957	Graduation	Single	58138.0	0	0	04-09-2012	58	635	...	7	3	11	1
1	2174	1954	Graduation	Single	46344.0	1	1	08-03-2014	38	11	...	5	3	11	0
2	4141	1965	Graduation	Together	71613.0	0	0	21-08-2013	26	426	...	4	3	11	0
3	6182	1984	Graduation	Together	26646.0	1	0	10-02-2014	26	11	...	6	3	11	0
4	5324	1981	PhD	Married	58293.0	1	0	19-01-2014	94	173	...	5	3	11	0

5 rows × 29 columns

观察数据集变量的数据类型观察到存在缺失值

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2240 entries, 0 to 2239
Data columns (total 29 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ID                   2240 non-null   int64  
 1   Year_Birth           2240 non-null   int64  
 2   Education            2240 non-null   object 
 3   Marital_Status       2240 non-null   object 
 4   Income               2216 non-null   float64
 5   Kidhome              2240 non-null   int64  
 6   Teenhome             2240 non-null   int64  
 7   Dt_Customer          2240 non-null   object 
 8   Recency              2240 non-null   int64  
 9   MntWines             2240 non-null   int64  
 10  MntFruits            2240 non-null   int64  
 11  MntMeatProducts      2240 non-null   int64  
 12  MntFishProducts      2240 non-null   int64  
 13  MntSweetProducts     2240 non-null   int64  
 14  MntGoldProds         2240 non-null   int64  
 15  NumDealsPurchases    2240 non-null   int64  
 16  NumWebPurchases      2240 non-null   int64  
 17  NumCatalogPurchases  2240 non-null   int64  
 18  NumStorePurchases    2240 non-null   int64  
 19  NumWebVisitsMonth    2240 non-null   int64  
 20  AcceptedCmp3         2240 non-null   int64  
 21  AcceptedCmp4         2240 non-null   int64  
 22  AcceptedCmp5         2240 non-null   int64  
 23  AcceptedCmp1         2240 non-null   int64  
 24  AcceptedCmp2         2240 non-null   int64  
 25  Complain             2240 non-null   int64  
 26  Z_CostContact        2240 non-null   int64  
 27  Z_Revenue            2240 non-null   int64  
 28  Response             2240 non-null   int64  
dtypes: float64(1), int64(25), object(3)
memory usage: 507.6+ KB

数据预处理

#删除缺少收入值的行
data = data.dropna()

#使用 **"Dt_Customer"** 创建一个特征，该特征指示客户在公司数据库中注册的天数
data["Dt_Customer"] = pd.to_datetime(data["Dt_Customer"])
dates = []
for i in data["Dt_Customer"]:
    i = i.date()
    dates.append(i)  
days = []
d1 = max(dates) #taking it to be the newest customer
for i in dates:
    delta = d1 - i
    days.append(delta)
data["Customer_For"] = days
data["Customer_For"] = pd.to_numeric(data["Customer_For"], errors="coerce")

变量优化方便我们进行后续的处理

#对其余变量进行优化
data["Age"] = 2021-data["Year_Birth"]

data["Spent"]

最低0.47元/天解锁文章

程序员加菲

关注

4
点赞
踩
51

收藏

觉得还不错? 一键收藏
4
评论
客户个性分析聚类大数据

客户个性分析聚类大数据作者：桂Sir 联系方式：1052656099@qq.com不同的消费者，由于受年龄、性别、群体、职业、民族等自身类型的不同，以及生活习惯，兴趣、爱好、和个人性格因素的影响，在对同一产品的选购过程中往往会表现出不同的心理差异。所以客户个性分析能够帮助企业更好地了解其客户，并使他们更容易根据不同类型客户的特定需求、行为和关注点来修改产品。例如，公司无需花钱向公司数据库中的每个客户推销新产品，而是可以分析哪个客户群最有可能购买该产品，然后仅在该特定市场上销售该产品。我们的数据集
复制链接

扫一扫