Pandas组队学习(上)——综合练习题答案

最新推荐文章于 2024-07-05 15:33:10 发布

KF_Guan

最新推荐文章于 2024-07-05 15:33:10 发布

阅读量1.1k

点赞数

分类专栏： Pandas学习笔记文章标签：数据挖掘

本文链接：https://blog.csdn.net/KF_Guan/article/details/105879341

版权

本文是Datawhale的Pandas组队学习的上部分总结，涵盖了2002年至2018年上海机动车拍卖、2007年至2019年俄罗斯机场货运航班运载量以及新冠肺炎在美国传播的数据分析，包括数据处理、统计和可视化等操作，旨在提升Pandas使用技能。

摘要由CSDN通过智能技术生成

第一阶段学习总结：Datawhale组织的Pandas组队学习暂告一段落。快乐pandas以自主学习、助教答疑、定期打卡等形式，课程资料编写比较详细，配合思考题和练习巩固，无奈于本人较菜，对所学的知识掌握还不够透彻，完成这次综合练习断断续续花了两天时间，后面一定要抽时间复盘之前所学的内容。
全面学习基础内容有助于建立体系，了解各大功能模块，在使用时才能有所了解，找准方向。因此后面会继续参加pandas(下)的学习，争取学完这期教程。

数据集下载：pandas教程（上）综合题数据集

往期内容：

Pandas学习总结——1. 基础操作(文件读写)、数据结构(Series、DataFrame)、常用基本函数、数据排序

Pandas学习总结——2. 一文详述 Pandas索引

Pandas学习总结——3. Pandas分组都有哪些操作？

Pandas学习总结——4. Pandas变形之透视表 & 变形方法

Pandas学习总结——5. 合并 (append与assign、combine与update、concat、merge与join)

import numpy as np
import pandas as pd
import re

一、2002 年-2018 年上海机动车拍照拍卖

vehicle = pd.read_csv('数据集/2002年-2018年上海机动车拍照拍卖.csv')
vehicle.head()

	Date	Total number of license issued	lowest price	avg price	Total number of applicants
0	2-Jan	1400	13600	14735	3718
1	2-Feb	1800	13100	14057	4590
2	2-Mar	2000	14300	14662	5190
3	2-Apr	2300	16000	16334	4806
4	2-May	2350	17800	18357	4665

（1）

问：哪一次拍卖的中标率首次小于 5%？

vehicle_1 = vehicle.assign(probability=vehicle['Total number of license issued'] / vehicle['Total number of applicants'])
vehicle_1[vehicle_1['probability']<0.05].head(1)

	Date	Total number of license issued	lowest price	avg price	Total number of applicants	probability
159	15-May	7482	79000	79099	156007	0.047959

（2）

问：按年统计拍卖最低价的下列统计量：最大值、均值、0.75 分位数，要求显示在同一张表上。

vehicle_2 = vehicle.assign(year=vehicle['Date'].apply(lambda x: x.split('-')[0]))
vehicle_2.head()
groupby_ = vehicle_2.groupby('year')

new_dataframe = pd.DataFrame()
new_dataframe['max'] = groupby_['lowest price'].max()
new_dataframe['mean'] = groupby_['lowest price'].mean()
new_dataframe['quantile'] = groupby_['lowest price'].quantile(0.75)
new_dataframe

	max	mean	quantile
year
10	44900	38008.333333	41825.0
11	53800	47958.333333	51000.0
12	68900	61108.333333	65325.0
13	90800	79125.000000	82550.0
14	74600	73816.666667	74000.0
15	85300	80575.000000	83450.0
16	88600	85733.333333	87475.0
17	93500	90616.666667	92350.0
18	89000	87825.000000	88150.0
2	30800	20316.666667	24300.0
3	38500	31983.333333	36300.0
4	44200	29408.333333	38400.0
5	37900	31908.333333	35600.0
6	39900	37058.333333	39525.0
7	53800	45691.666667	48950.0
8	37300	29945.454545	34150.0
9	36900	31333.333333	34150.0

（3）

问：将第一列时间列拆分成两个列，一列为年份（格式为 20××），另一列为月份（英语缩写），添加到列表作为第一第二列，并将原表第一列删除，其他列依次向后顺延。

Year = vehicle['Date'].apply(lambda x: str(20) + ("%02d" % int(x.split('-')[0])))
Month = vehicle['Date'].apply(lambda x: x.split('-')[1])

vehicle_3 = vehicle.drop(columns='Date')
vehicle_3.insert(0, 'Year', Year)
vehicle_3.insert(1, 'Month', Month)
vehicle_3.head()

	Year	Month	Total number of license issued	lowest price	avg price	Total number of applicants
0	2002	Jan	1400	13600	14735	3718
1	2002	Feb	1800	13100	14057	4590
2	2002	Mar	2000	14300	14662	5190
3	2002	Apr	2300	16000	16334	4806
4	2002	May	2350	17800	18357	4665

（4）

问：现在将表格行索引设为多级索引，外层为年份，内层为原表格第二至第五列的变量名，列索引为月份。

vehicle_4 = vehicle_3.set_index(['Year', 'Total number of license issued', 'lowest price', 'avg price', 'Total number of applicants'])
vehicle_4.head()

					Month
Year	Total number of license issued	lowest price	avg price	Total number of applicants
2002	1400	13600	14735	3718	Jan
	1800	13100	14057	4590	Feb
	2000	14300	14662	5190	Mar
	2300	16000	16334	4806	Apr
	2350	17800	18357	4665	May

（5）

问：一般而言某个月最低价与上月最低价的差额，会与该月均值与上月均值的差额具有相同的正负号，哪些拍卖时间不具有这个特点？

vehicle_5 = vehicle[['Date','lowest price','avg price']]

for index in range(1, len(vehicle_5)):
    if ((vehicle_5.iloc[index]['lowest price'] - vehicle_5.iloc[index-1]['lowest price'])\
    * (vehicle_5.iloc[index]['avg price'] - vehicle_5.iloc[index-1]['avg price'])) < 0:
        print(vehicle_5.iloc[index]['Date'])

3-Oct
3-Nov
4-Jun
5-Jan
5-Feb
5-Sep
6-May
6-Sep
7-Jan
7-Feb
7-Dec
12-Oct

（6）

问：将某一个月牌照发行量与其前两个月发行量均值的差额定义为发行增益，最初的两个月用 0 填充，求发行增益极值出现的时间。

vehicle_6 = vehicle[['Date','Total number of license issued']]
gain = [0,0]
string = 'Total number of license issued'
for index in

最低0.47元/天解锁文章

KF_Guan

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

Pandas组队学习(上)——综合练习题答案

题目快速跳转：

一、2002 年-2018 年上海机动车拍照拍卖

（1）

（2）

（3）

（4）

（5）

（6）