Python-玩转数据-Pandas练习

最新推荐文章于 2024-07-28 15:01:02 发布

人猿宇宙

最新推荐文章于 2024-07-28 15:01:02 发布

阅读量2.3k

点赞数 4

分类专栏： python-玩转数据-python基础文章标签： python 数据挖掘开发语言

本文链接：https://blog.csdn.net/s_unbo/article/details/123179491

版权

1、了解你的数据

模拟测试数据
在这里插入图片描述导入数据处理库pandas

import pandas as pd

利用pandas库读取csv文件赋值给容器chipo

chipo = pd.read_csv("工作簿1.csv",encoding='gbk')

读取文件前3行数据,并打印出数据

print(chipo.head(3))

查看数据集中有多少个列并打印

print(chipo.shape[1])

查看数据集中有多少行并打印

print(chipo.shape[0])

x = np.array([[1,2,5],[2,3,5],[3,4,5],[2,3,6]])
#输出数组的行和列数
print x.shape  #结果： (4, 3)
#只输出行数
print x.shape[0] #结果： 4
#只输出列数
print x.shape[1] #结果： 3

print(chipo.columns)

查看数据集的索引是怎样的

print(chipo.index)

以门店分组汇总，汇总订单，并按订单进行降序排列,

c = chipo[['门店','订单']].groupby(['门店'],as_index=False).agg({
   '订单':sum})
c.sort_values(['订单'],ascending=False,inplace=True)
print(c.head())

as_index = True 是默认，as_index=False 是sql的输出风格，前面带有序号
ascending=False 表示排序为降序，默认为True 为排序的升序

2、数据过滤与排序

只选取订单号这一列并打印

print(chipo.订单号)

查看共有多少笔订单

print(chipo.shape[0])

该数据集中一共有多少列(columns)

print(chipo.info())

将数据集中的列单独存为discipline

discipline = chipo[['门店','订单','单价','订单号','地区']]
print(discipline)

对discipline框内数据按单价排序

print(discipline.sort_values(['单价'],ascending=False))

计算门店单价订单平均值

print(discipline['单价'].mean())

#找到单价大于3的订单

print(discipline[discipline.单价>3])

取以4开头的订单

print(discipline[discipline.地区.str.startswith('成')])

提取前三列

print(discipline.iloc[:,0:3])

除了后三列全部显示

print(discipline.iloc[:,0:-3])

筛选地址只有成都的记录

print(discipline.loc[discipline.地区.isin(['成都'])])

3、数据分组

打印按照门店分组的平均价

print(chipo.groupby('门店').单价.mean())

describe分析

print(chipo.groupby('门店').订单.describe())

对于一维数组，describe()返回值的解释如下：
count： 返回数组的个数
mean： 返回数组的平均值
std： 返回数组的标准差；
min： 返回数组的最小值；
25%，50%，75%： 返回数组的三个不同百分位置的数值，也就是统计学中的四分位数，其中50%对应的是中位数。
max： 返回列表的最大值。

打印出每个门店订单的中位数，相当与上面四分位中的50%

print(chipo.groupby('门店').订单.median())

打印门店订单的平均值，最小值，最大值

print(chipo.groupby('门店').订单.agg(['mean','min','max']))

4、数据合并

数据准备

raw_data_1 = {
   
        'subject_id': ['1', '2', '3', '4', '5'],
        'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
        'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}

raw_data_2 = {
   
        'subject_id': ['4', '5', '6', '7', '8'],
        'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
        'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}

raw_data_3 = {
   
        'subject_id': [</