python pandas基本操作练习50题

最新推荐文章于 2024-04-26 19:28:22 发布

远胥

最新推荐文章于 2024-04-26 19:28:22 发布

阅读量4k

点赞数 2

文章标签： python pandas matplotlib numpy

本文链接：https://blog.csdn.net/weixin_44559710/article/details/114792594

版权

数据来源：https://www.kesci.com/mw/project/604db88774dfc60016e29d56/dataset

一、例题1：探索快餐数据

1. 数据预览

变量	含义
order_id	订单id
quantity	数量
item_name	商品名称
choice_description	详情
item_price	商品单价

2. 代码练习

#导入pandas
import pandas as pd
#读入数据
df = pd.read_csv('/Users/harper/Desktop/python/exercise_data/chipotle.tsv',sep='\t')
#查看前10行内容
df.head(10)
#数据集维度、行数、列数
print(df.shape)#维度：行*列
df.shape[0]#行数
df.shape[1]#行数

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
(4622, 5)
4622
5
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

#输出列名称，可以比较下面两种形式
print(df.columns)
print(df.columns.to_list())#以列表形式储存

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Index(['order_id', 'quantity', 'item_name', 'choice_description',
       'item_price'],
      dtype='object')

['order_id', 'quantity', 'item_name', 'choice_description', 'item_price']
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

#被下单数最多商品(item_name)是什么
#思路：先按item_name分组，再根据quantity求和，再降序排列
#reset_index是可以输出成dataframe
df.groupby('item_name').quantity.sum().reset_index().sort_values(by = 'quantity',ascending = False)

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
                                item_name  quantity
17                           Chicken Bowl       761
18                        Chicken Burrito       591
25                    Chips and Guacamole       506
39                          Steak Burrito       386
10                      Canned Soft Drink       351
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

#求item_name的类别数
df['item_name'].nunique()

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
50
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

#在choice_description中下单次数最多的商品是什么
#注意到choice_description 中有一些空值，所以我们先挑出非空的行再分组-求和-排序
df[df['choice_description'].notnull()].groupby('choice_description').quantity.sum().reset_index().sort_values(by = 'quantity',ascending = False)

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
                                     choice_description  quantity
15                                          [Diet Coke]       159
14                                               [Coke]       143
583                                            [Sprite]        89
256   [Fresh Tomato Salsa, [Rice, Black Beans, Chees...        49
257   [Fresh Tomato Salsa, [Rice, Black Beans, Chees...        42
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

#一共有多少商品被下单，就是计算订单总数
print(df['quantity'].sum())

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
4972
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

#将item_price转换为浮点数
#注意到item_price前面有$符号
#方法1，个人感觉容易出错，str不知道什么时候需要加
df['item_price'] = df['item_price'].str.replace('$','')
#方法2：apply是针对每个元素，就不需要str了
def func(df):
    df['item_price'] = df['item_price'].replace('$','')
    return df
df = df.apply(func, axis = 1)
df['item_price']  = df['item_price'].astype('float')
print(df['item_price'])

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
0        2.39
1        3.39
2        3.39
3        2.39
4       16.98
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

#每一单order对应的平均总价是多少，就是计算总收入/总订单数
round((df['quantity']*df['item_price']).sum()/df['order_id'].nunique(),2)

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
21.39
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

二、例题2：探索欧洲杯数据

1. 数据预览

2. 代码练习

#导入pandas
import pandas as pd
#读入数据
df = pd.read_csv('/Users/harper/Desktop/python/exercise_data/Euro2012_stats.csv')
#计算有多少球队参与了2012欧洲杯，就是计算Team的类别数
df['Team'].nunique()

<<<<<<<<<<<<<<<<<<<<<<
16
<<<<<<<<<<<<<<<<<<<<<<

#该数据集中一共有多少列
print(df.shape[1])

<<<<<<<<<<<<<<<<<<<<<<
35
<<<<<<<<<<<<<<<<<<<<<<

#将数据集中的Team,Yellow Cards和Red Cards单独存为一个叫discipline的数据框,对数据框按照先Red Cards再Yellow Cards进行排序
discipline = df[['Team','Yellow Cards','Red Cards']]
discipline.sort_values(['Red Cards','Yellow Cards'],ascending = False)

<<<<<<<<<<<<<<<<<<<<<<
                   Team  Yellow Cards  Red Cards
6                Greece             9          1
9                Poland             7          1
11  Republic of Ireland             6          1
7                 Italy            16          0
10             Portugal            12          0
13                Sp

最低0.47元/天解锁文章

远胥

关注

2
点赞
踩
38

收藏

觉得还不错? 一键收藏
打赏
2
评论
python pandas基本操作练习50题

数据来源：https://www.kesci.com/mw/project/604db88774dfc60016e29d56/dataset一、例题1：探索快餐数据1. 数据预览变量含义 order_id 订单id quantity 数量 item_name 商品名称 choice_description 详情 item_price 商品单价 2. 代码练习#导入pandasimport pandas as pd..
复制链接

扫一扫