python数据分析与挖掘——航空数据模型

最新推荐文章于 2023-05-13 20:30:00 发布

爱遛弯的布谷

最新推荐文章于 2023-05-13 20:30:00 发布

阅读量1k

点赞数

分类专栏： python 数据分析与挖掘文章标签： python 数据分析大数据聚类

本文链接：https://blog.csdn.net/weixin_45609831/article/details/111384168

版权

python 数据分析与挖掘专栏收录该内容

9 篇文章 0 订阅

订阅专栏

航空数据模型

数据探索

# 导入需要的库
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
plt.rcParams['font.sans-serif']=['SimHei'] 
plt.rcParams['axes.unicode_minus']=False 

plane=pd.read_excel('./航空数据.xls') #读取数据
plane.head()

在这里插入图片描述

plane.info()
des=plane.describe()
des

在这里插入图片描述

#空值
len(plane)-des.loc['count'] 
plane[plane.isnull().values==True]
plane.columns=plane.columns.map(lambda x:x.upper())  # 全部变成大写字母

数据预处理

数据清洗

# 去除票价为空的记录
airline_notnull = plane.loc[plane['SUM_YR_1'].notnull() & plane['SUM_YR_2'].notnull(),:]
print('删除缺失记录后数据的形状为：',airline_notnull.shape)  #结果：删除缺失记录后数据的形状为： (62299, 44)

# 只保留票价非零的，或者平均折扣率不为0且总飞行公里数大于0的记录。
# 总里程数和折扣率同时为0 的数据，是新客户
index1 = airline_notnull['SUM_YR_1'] != 0
index2 = airline_notnull['SUM_YR_2'] != 0
index3 = (airline_notnull['SEG_KM_SUM']> 0) & (airline_notnull["AVG_DISCOUNT"] != 0)
index4 = airline_notnull['AGE'] > 100  # 去除年龄大于100的记录
plane = airline_notnull[(index1 | index2) & index3 & ~index4]
print('数据清洗后数据的形状为：',plane.shape) #结果：数据清洗后数据的形状为： (62043, 44)

数据规约

plane_1=plane.loc[:,["FFP_DATE", "LOAD_TIME", "LAST_TO_END","FLIGHT_COUNT", "SEG_KM_SUM","AVG_DISCOUNT"]]
plane_1.head()

plane_1.describe()

构造属性

res=plane_1['LOAD_TIME']-plane_1['FFP_DATE']
plane_1['L'] = res.map(lambda x: x / np.timedelta64(30 * 24 * 60, 'm'))
plane_1.head()

plane_1['R']=plane['LAST_TO_END']
plane_1['F']=plane['FLIGHT_COUNT']
plane_1['M']=plane['SEG_KM_SUM']
plane_1['C']=plane["AVG_DISCOUNT"]
plane_2= plane_1[['L', 'R', 'F', 'M', 'C']]
plane_2.head()   # 合并属性

在这里插入图片描述

数据标准化

from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans  # 导入kmeans算法

plane=StandardScaler().fit_transform(plane_2)
plane[:5,:]  #标准化后LRFMC五个属性

在这里插入图片描述

建模

k=5  # 确定聚类中心数

kmeans_model=KMeans(n_clusters=k,n_jobs=4)  # 构建模型
fit_kmeans=kmeans_model.fit(plane)  #模拟训练

cen=kmeans_model.cluster_centers_   # 聚类中心
cen

在这里插入图片描述

y_pre=kmeans_model.labels_  # 样本的类别标签
y_pre  # array([3, 3, 3, ..., 1, 2, 2])

r1=pd.Series(kmeans_model.labels_).value_counts()  # 统计不同类别样本的数目
r1  #最终每个类别的数目

在这里插入图片描述

输出聚类分群的结果

cluster_center=pd.DataFrame(kmeans_model.cluster_centers_ ,columns=['ZL','ZR','ZF','ZM','ZC'])  # 将聚类中心放在数据框中
cluster_center.index=pd.DataFrame(kmeans_model.labels_).drop_duplicates().iloc[:,0]  # 将样本类别作为数据框索引
cluster_center

在这里插入图片描述

客户价值分析

labels = ['ZL','ZR','ZF','ZM','ZC']
legen = ['客户群' + str(i + 1) for i in cluster_center.index]  # 客户群命名，作为雷达图的图例
lstype = ['-','--',(0, (3, 5, 1, 5, 1, 5)),':','-.']
kinds = list(cluster_center.iloc[:, 0])

# 由于雷达图要保证数据闭合，因此再添加L列，并转换为 np.ndarray
cluster_center=pd.concat([cluster_center,cluster_center[['ZL']]],axis=1)
centers=np.array(cluster_center.iloc[:,0:])

# 分割圆周长，并让其闭合
n = len(labels)
angles = np.linspace(0,2*np.pi,6,endpoint =True)
angles  #array([0.        , 1.25663706, 2.51327412, 3.76991118, 5.02654825,6.28318531])

# 绘图
ax = plt.subplot(111,polar = True)  # 以极坐标的形式绘制图形
style=['ro--','bo--','yo--','go--','ko--']
c = ['class1','class2','class3','class4','class5']
# 画线
for i in range(5):  #for i in range(len(kinds)):
    ax.plot(angles, centers[i],style[i],label = c[i])
# 添加属性标签
ax.set_thetagrids(angles * 180/np.pi, ['ZL','ZR','ZF','ZM','ZC'])
plt.legend(loc='lower right', bbox_to_anchor=(1.5, 0.0))

在这里插入图片描述

爱遛弯的布谷

关注

0
点赞
踩
8

收藏

觉得还不错? 一键收藏
1
评论
python数据分析与挖掘——航空数据模型

航空数据模型数据探索# 导入需要的库import numpy as npimport pandas as pdimport matplotlib.pylab as pltplt.rcParams['font.sans-serif']=['SimHei'] plt.rcParams['axes.unicode_minus']=False plane=pd.read_excel('./航空数据.xls') #读取数据plane.head()plane.info()des=plane.
复制链接

扫一扫