数据分析处理(三）

最新推荐文章于 2023-06-08 20:43:53 发布

Leon_Kbl

最新推荐文章于 2023-06-08 20:43:53 发布

阅读量343

点赞数

分类专栏：数据挖掘机器学习

本文链接：https://blog.csdn.net/Leon_Kbl/article/details/97764657

版权

机器学习同时被 2 个专栏收录

19 篇文章 4 订阅

订阅专栏

数据挖掘

17 篇文章 1 订阅

订阅专栏

前十个花钱最多的用户ID

jupyter导包

import pandas as pd  #导入pandas用于表格操作
import xlrd  #导入xlrd用于获取一个表格里多个sheet
from matplotlib import pyplot as plt#导入pyplot用于绘图

解决中文乱码问题

#sans-serif就是无衬线字体，是一种通用字体族。
#常见的无衬线字体有 Trebuchet MS, Tahoma, Verdana, Arial, Helvetica, 中文的幼圆、隶书等等。
import matplotlib as mpl
mpl.rcParams['font.sans-serif']=['SimHei'] #指定默认字体 SimHei为黑体
mpl.rcParams['axes.unicode_minus']=False #用来正常显示负号

读取表格合并sheet（当知道sheet名时）

table1=pd.read_excel('meal_order_detail.xlsx',sheet_name='meal_order_detail1')
table2=pd.read_excel('meal_order_detail.xlsx',sheet_name='meal_order_detail2')
table3=pd.read_excel('meal_order_detail.xlsx',sheet_name='meal_order_detail3')
foodData=pd.concat([table1,table2,table3],axis=0,sort=False)
foodData

去重ID

foodData['total_price']=foodData['counts']*foodData['amounts']
foodData2=foodData.drop_duplicates(['order_id'])
print(foodData2)

取出去重后的ID换成一个列表

order_id1=foodData2['order_id'].values  #它是一个array类型
print(order_id1)
order_id_list=order_id1.tolist() #转成List

1.拆：得到每一个id对应价钱 2.组：价钱组成新的列表

price_list=[]
for i in order_id_list:  #遍历id，
    moneys=foodData[foodData['order_id']==i]['total_price'] #得到每一个id即i对应的数据中再去取价钱
    print(moneys)

    user_sum_price=np.sum(moneys.values) #取出它们的值为array类型并用numpy中sum方法求和。
    price_list.append(user_sum_price) #然后把这些求和的价钱再添加到新的列表里

3.组：将价格列表和去重后的id列表组成一个DataFrame类型进行排序

pd_user_price=pd.DataFrame(price_list,index=order_id_list)  
pd_user_price.sort_values(0,ascending=False,inplace=True) #降序
pd_user_price

取出0字段的内容

hand_price=pd_user_price[0] 
print(hand_price)

结果：
1166 1314
743 1214
1317 1210
576 1162
408 1148
…
1135 79
1320 78
1256 77
874 70
1064 48
Name: 0, Length: 942, dtype: int64
绘图

X2=hand_price.index[0:10].tolist()  #取X
Y2=hand_price.values[0:10].tolist() #取Y
plt.title('花钱最多的前十用户')
plt.xlabel('用户ID') #定义X坐标轴
plt.ylabel('消费') #定义Y坐标轴
plt.xticks(np.arange(10),X2,rotation=60) #定义xticks
plt.bar(['a','b','c','d','e','f','g','h','i','j'],Y2)  #先自定义一个X坐标轴，直接给X2不能正常显示，然后通过xticks对X轴的设置进行X轴的赋值
plt.show()

在这里插入图片描述

Leon_Kbl

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
数据分析处理(三）

前十个花钱最多的用户IDjupyter导包import pandas as pd #导入pandas用于表格操作import xlrd #导入xlrd用于获取一个表格里多个sheetfrom matplotlib import pyplot as plt#导入pyplot用于绘图解决中文乱码问题#sans-serif就是无衬线字体，是一种通用字体族。#常见的无衬线字体有 Tre...
复制链接

扫一扫

专栏目录