问题
一、2002 年-2018 年上海机动车拍照拍卖
import numpy as np
import pandas as pd
import re
vehicle = pd.read_csv('数据集/2002年-2018年上海机动车拍照拍卖.csv')
vehicle.head()
(1) 哪一次拍卖的中标率首次小于5%?
vehicle_1 = vehicle.assign(bidding_rate=vehicle['Total number of license issued'] / vehicle['Total number of applicants'])
vehicle_1[vehicle_1['bidding_rate']<0.05].head(1)
(2) 按年统计拍卖最低价的下列统计量:最大值、均值、0.75 分位数,要求显示在同一张表上。
vehicle_2 = vehicle.assign(year=vehicle['Date'].apply(lambda x: x.split('-')[0]))
vehicle_2.head()
groupby_ = vehicle_2.groupby('year')
new_dataframe = pd.DataFrame()
new_dataframe['max'] = groupby_['lowest price '].max()
new_dataframe['mean'] = groupby_['lowest price '].mean()
new_dataframe['quantile'] = groupby_['lowest price '].quantile(0.75)
new_dataframe
(3) 将第一列时间列拆分成两个列,一列为年份(格式为20××),另一列为月份(英语缩写),添加到列表作为第一第二列,并将原表第一列删除,其他列依次向后顺延。
Year = vehicle['Date'].apply(lambda x: str(20) + ("%02d" % int(x.split('-')[0])))
Month = vehicle['Date'].apply(lambda x: x.split('-')[1])
vehicle_3 = vehicle.drop(columns='Date')
vehicle_3.insert(0, 'Year', Year)
vehicle_3.insert(1, 'Month', Month)
vehicle_3.head()
(4) 现在将表格行索引设为多级索引,外层为年份,内层为原表格第二至第五列的变量名,列索引为月份。
vehicle_4 = vehicle_3.set_index(['Year', 'Total number of license issued', 'lowest price ', 'avg price', 'Total number of applicants'])
vehicle_4.head()
(5) 一般而言某个月最低价与上月最低价的差额,会与该月均值与上月均值的差额具有相同的正负号,哪些拍卖时间不具有这个特点?
vehicle_5 = vehicle[['Date','lowest price ','avg price']]
for index in range(1, len(vehicle_5)):
if ((vehicle_5.iloc[index]['lowest price '] - vehicle_5.iloc[index-1]['lowest price '])\
* (vehicle_5.iloc[index]['avg price'] - vehicle_5.iloc[index-1]['avg price'])) < 0:
print(vehicle_5.iloc[index]['Date'])
(6) 将某一个月牌照发行量与其前两个月发行量均值的差额定义为发行增益,最初的两个月用0 填充,求发行增益极值出现的时间。
vehicle_6 = vehicle[['Date','Total number of license issued']]
gain = [0,0]
string = 'Total number of license issued'
for index in range(2, len(vehicle_6)):
gain.append(vehicle.iloc[index][string] - np.mean([vehicle.iloc[index-1][string], vehicle.iloc[index-2][string]]))
vehicle_6['gain'] = gain
print('增益极大值出现时间:', vehicle_6[vehicle_6['gain'] == vehicle_6['gain'].max()]['Date'].values)
print('增益极小值出现时间:', vehicle_6[vehicle_6['gain'] == vehicle_6['gain'].min()]['Date'].values)
本文主要参考链接: https://blog.csdn.net/KF_Guan/article/details/105879341.