首先导入库及数据
import pandas as pd
import numpy as np
df = pd.read_csv('data/2002年-2018年上海机动车拍照拍卖.csv')
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 203 entries, 0 to 202
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 203 non-null object
1 Total number of license issued 203 non-null int64
2 lowest price 203 non-null int64
3 avg price 203 non-null int64
4 Total number of applicants 203 non-null int64
dtypes: int64(4), object(1)
memory usage: 8.1+ KB
df.head()
Date | Total number of license issued | lowest price | avg price | Total number of applicants | |
---|---|---|---|---|---|
0 | 2-Jan | 1400 | 13600 | 14735 | 3718 |
1 | 2-Feb | 1800 | 13100 | 14057 | 4590 |
2 | 2-Mar | 2000 | 14300 | 14662 | 5190 |
3 | 2-Apr | 2300 | 16000 | 16334 | 4806 |
4 | 2-May | 2350 | 17800 | 18357 | 4665 |
(1) 哪一次拍卖的中标率首次小于 5%?
df['Total number of license issued']/df['Total number of applicants']
0 0.376547
1 0.392157
2 0.385356
3 0.478568
4 0.503751
...
198 0.053965
199 0.067209
200 0.058990
201 0.066342
202 0.077671
Length: 203, dtype: float64
df[df['Total number of license issued']/df['Total number of applicants'] < 0.05].head(1)
Date | Total number of license issued | lowest price | avg price | Total number of applicants | |
---|---|---|---|---|---|
159 | 15-May | 7482 | 79000 | 79099 | 156007 |
(2) 将第一列时间列拆分成两个列,一列为年份(格式为 20××),另一列为 月份(英语缩写)
years = []
months =[]
for i in list(df['Date'].values):
years.append(i.split('-')[0])
months.append(i.split('-')[1])
print(years)
['2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '3', '3', '3', '3', '3', '3', '3', '3', '3', '3', '3', '3', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '5', '5', '5', '5', '5', '5', '5', '5', '5', '5', '5', '5', '6', '6', '6', '6', '6', '6', '6', '6', '6', '6', '6', '6', '7', '7', '7', '7', '7', '7', '7', '7', '7', '7', '7', '7', '8', '8', '8', '8', '8', '8', '8', '8', '8', '8', '8', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '10', '10', '10', '10', '10', '10', '10', '10', '10', '10', '10', '10', '11', '11', '11', '11', '11', '11', '11', '11', '11', '11', '11', '11', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '14', '14', '14', '14', '14', '14', '14', '14', '14', '14', '14', '14', '15', '15', '15', '15', '15', '15', '15', '15', '15', '15', '15', '15', '16', '16', '16', '16', '16', '16', '16', '16', '16', '16', '16', '16', '17', '17', '17', '17', '17', '17', '17', '17', '17', '17', '17', '17', '18', '18', '18', '18', '18', '18', '18', '18', '18', '18', '18', '18']
df['Years'] = years
df['months'] = months
df['Years']= df['Years'].apply(lambda x: '200' + str(x) if len(str(x)) == 1 else '20'+str(x) )
df
Date | Total number of license issued | lowest price | avg price | Total number of applicants | Years | months | |
---|---|---|---|---|---|---|---|
0 | 2-Jan | 1400 | 13600 | 14735 | 3718 | 2002 | Jan |
1 | 2-Feb | 1800 | 13100 | 14057 | 4590 | 2002 | Feb |
2 | 2-Mar | 2000 | 14300 | 14662 | 5190 | 2002 | Mar |
3 | 2-Apr | 2300 | 16000 | 16334 | 4806 | 2002 | Apr |
4 | 2-May | 2350 | 17800 | 18357 | 4665 | 2002 | May |
... | ... | ... | ... | ... | ... | ... | ... |
198 | 18-Aug | 10402 | 88300 | 88365 | 192755 | 2018 | Aug |
199 | 18-Sep | 12712 | 87300 | 87410 | 189142 | 2018 | Sep |
200 | 18-Oct | 10728 | 88000 | 88070 | 181861 | 2018 | Oct |
201 | 18-Nov | 11766 | 87300 | 87374 | 177355 | 2018 | Nov |
202 | 18-Dec | 12850 | 87400 | 87508 | 165442 | 2018 | Dec |
203 rows × 7 columns
(3) 按年统计拍卖最低价的下列统计量:最大值、均值、0.75 分位数
df.groupby('Years').describe(percentiles=[.75])
Total number of license issued | lowest price | ... | avg price | Total number of applicants | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | mean | std | min | 50% | 75% | max | count | mean | std | ... | 50% | 75% | max | count | mean | std | min | 50% | 75% | max | |
Years | |||||||||||||||||||||
2002 | 12.0 | 2654.166667 | 674.017242 | 1400.0 | 2900.0 | 3200.00 | 3600.0 | 12.0 | 20316.666667 | 5825.466557 | ... | 20541.0 | 24790.00 | 31721.0 | 12.0 | 4373.750000 | 503.571991 | 3525.0 | 4546.0 | 4662.00 | 5190.0 |
2003 | 12.0 | 4422.333333 | 1237.668434 | 3000.0 | 4500.0 | 5156.50 | 6650.0 | 12.0 | 31983.333333 | 5926.033985 | ... | 35874.0 | 38107.75 | 39369.0 | 12.0 | 10927.083333 | 2253.274888 | 8532.0 | 10170.0 | 11954.25 | 15507.0 |
2004 | 12.0 | 5966.666667 | 787.878894 | 4800.0 | 6380.0 | 6600.00 | 6800.0 | 12.0 | 29408.333333 | 10235.007513 | ... | 30157.5 | 39650.25 | 45492.0 | 12.0 | 11048.500000 | 3482.109875 | 8114.0 | 9734.5 | 11591.50 | 19233.0 |
2005 | 12.0 | 5589.833333 | 937.048833 | 3800.0 | 5700.0 | 6081.50 | 6829.0 | 12.0 | 31908.333333 | 4766.828245 | ... | 35172.5 | 36900.50 | 38378.0 | 12.0 | 9240.750000 | 1940.377148 | 6208.0 | 8863.0 | 9997.75 | 13633.0 |
2006 | 12.0 | 5375.000000 | 950.717432 | 3800.0 | 5250.0 | 6275.00 | 6500.0 | 12.0 | 37058.333333 | 3607.935278 | ... | 38696.0 | 40089.25 | 41601.0 | 12.0 | 17334.416667 | 29305.134479 | 5907.0 | 8935.0 | 9917.00 | 110234.0 |
2007 | 12.0 | 6291.666667 | 1558.821427 | 3500.0 | 6000.0 | 7500.00 | 8500.0 | 12.0 | 45691.666667 | 4694.766590 | ... | 46739.0 | 49973.25 | 56042.0 | 12.0 | 9965.250000 | 2104.412167 | 5056.0 | 10439.5 | 10625.75 | 12943.0 |
2008 | 11.0 | 7681.818182 | 3176.418795 | 4500.0 | 6800.0 | 8600.00 | 16000.0 | 11.0 | 29945.454545 | 8329.869583 | ... | 33224.0 | 35497.00 | 37659.0 | 11.0 | 22616.636364 | 15685.256391 | 10170.0 | 16801.0 | 23774.50 | 63534.0 |
2009 | 12.0 | 7216.666667 | 1185.390870 | 5200.0 | 8000.0 | 8000.00 | 8500.0 | 12.0 | 31333.333333 | 3585.408138 | ... | 31442.5 | 34630.75 | 37593.0 | 12.0 | 18073.833333 | 2107.165258 | 14906.0 | 17543.5 | 18620.25 | 22006.0 |
2010 | 12.0 | 8600.000000 | 532.575219 | 7500.0 | 8750.0 | 9000.00 | 9200.0 | 12.0 | 38008.333333 | 8947.062325 | ... | 40274.5 | 42200.50 | 45291.0 | 12.0 | 15867.833333 | 2336.789205 | 11224.0 | 16288.0 | 17410.75 | 18975.0 |
2011 | 12.0 | 8625.000000 | 607.715544 | 7500.0 | 9000.0 | 9000.00 | 9500.0 | 12.0 | 47958.333333 | 4245.095209 | ... | 48277.5 | 51634.75 | 54008.0 | 12.0 | 23580.083333 | 3153.803518 | 19415.0 | 22400.0 | 25255.00 | 30675.0 |
2012 | 12.0 | 9025.000000 | 679.739255 | 8000.0 | 9500.0 | 9500.00 | 9500.0 | 12.0 | 61108.333333 | 5091.786315 | ... | 62092.5 | 66495.75 | 69346.0 | 12.0 | 22391.833333 | 2747.913120 | 18244.0 | 23048.5 | 24459.00 | 26526.0 |
2013 | 12.0 | 9166.666667 | 685.344417 | 8500.0 | 9000.0 | 9000.00 | 11000.0 | 12.0 | 79125.000000 | 5149.426270 | ... | 77144.0 | 83609.00 | 91898.0 | 12.0 | 27110.333333 | 6808.147503 | 20857.0 | 24120.0 | 30453.75 | 39625.0 |
2014 | 12.0 | 7603.916667 | 362.218532 | 7400.0 | 7400.0 | 7610.25 | 8300.0 | 12.0 | 73816.666667 | 392.737093 | ... | 73873.5 | 74084.50 | 74680.0 | 12.0 | 97630.166667 | 32370.029233 | 41946.0 | 101252.0 | 121717.25 | 136098.0 |
2015 | 12.0 | 7745.583333 | 403.454956 | 7406.0 | 7592.0 | 7819.75 | 8727.0 | 12.0 | 80575.000000 | 3873.951471 | ... | 81465.5 | 83521.25 | 85424.0 | 12.0 | 152743.333333 | 27089.892403 | 98203.0 | 166033.5 | 169618.00 | 179133.0 |
2016 | 12.0 | 11033.250000 | 1490.068767 | 8310.0 | 11549.0 | 11673.00 | 12889.0 | 12.0 | 85733.333333 | 2216.604009 | ... | 85825.0 | 87516.00 | 88665.0 | 12.0 | 232111.333333 | 28913.748423 | 187533.0 | 225326.5 | 252615.25 | 277889.0 |
2017 | 12.0 | 11115.416667 | 900.862663 | 10157.0 | 10780.0 | 12159.25 | 12413.0 | 12.0 | 90616.666667 | 2099.278375 | ... | 90812.0 | 92399.50 | 93540.0 | 12.0 | 249034.333333 | 14597.264995 | 226911.0 | 251141.5 | 257564.75 | 270197.0 |
2018 | 12.0 | 11241.333333 | 1011.900402 | 9855.0 | 10936.5 | 11982.75 | 12850.0 | 12.0 | 87825.000000 | 570.685393 | ... | 87918.0 | 88223.25 | 89018.0 | 12.0 | 198864.500000 | 18424.422554 | 165442.0 | 200482.0 | 211518.00 | 226316.0 |
17 rows × 28 columns
(4)现在将表格行索引设为多级索引,外层为年份,内层为原表格第二至第 五列的变量名,列索引为月份。
L1 = ['Years']
L2 = ['Total number of license issued', 'lowest price ', 'avg price', 'Total number of applicants']
mul_index = pd.MultiIndex.from_product([L1,L2],names=('Upper', 'Lower'))
pd.pivot_table(df,index=['Years'],
columns=['months'])
Total number of applicants | ... | lowest price | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
months | Apr | Aug | Dec | Feb | Jan | Jul | Jun | Mar | May | Nov | ... | Dec | Feb | Jan | Jul | Jun | Mar | May | Nov | Oct | Sep |
Years | |||||||||||||||||||||
2002 | 4806.0 | 4640.0 | 3525.0 | 4590.0 | 3718.0 | 3774.0 | 4502.0 | 5190.0 | 4665.0 | 4021.0 | ... | 27800.0 | 13100.0 | 13600.0 | 19800.0 | 19600.0 | 14300.0 | 17800.0 | 30800.0 | 26400.0 | 23600.0 |
2003 | 8794.0 | 9315.0 | 10491.0 | 12030.0 | 9442.0 | 11929.0 | 15507.0 | 11219.0 | 14634.0 | 9849.0 | ... | 37100.0 | 23800.0 | 18800.0 | 36900.0 | 36100.0 | 28800.0 | 35000.0 | 33100.0 | 32800.0 | 28800.0 |
2004 | 8150.0 | 15506.0 | 9005.0 | 10156.0 | 8663.0 | 14464.0 | 19233.0 | 9950.0 | 8114.0 | 9188.0 | ... | 29300.0 | 39600.0 | 38000.0 | 21800.0 | 17800.0 | 43000.0 | 10800.0 | 26000.0 | 28000.0 | 29300.0 |
2005 | 8113.0 | 7520.0 | 8351.0 | 8949.0 | 6208.0 | 8777.0 | 8409.0 | 9117.0 | 9673.0 | 13633.0 | ... | 35200.0 | 31700.0 | 28500.0 | 37900.0 | 37000.0 | 34300.0 | 35000.0 | 29800.0 | 25200.0 | 26500.0 |
2006 | 7888.0 | 9190.0 | 9477.0 | 12367.0 | 5907.0 | 8966.0 | 8478.0 | 8904.0 | 8301.0 | 110234.0 | ... | 39800.0 | 34200.0 | 26900.0 | 39600.0 | 39500.0 | 38500.0 | 37700.0 | 37800.0 | 36300.0 | 37000.0 |
2007 | 10523.0 | 12943.0 | 10356.0 | 5056.0 | 6587.0 | 10327.0 | 11478.0 | 10168.0 | 10273.0 | 10596.0 | ... | 50000.0 | 39100.0 | 38500.0 | 45200.0 | 47200.0 | 41100.0 | 44500.0 | 53800.0 | 50500.0 | 48600.0 |
2008 | 37072.0 | 13451.0 | 16801.0 | NaN | 20539.0 | 16783.0 | 21208.0 | 63534.0 | 26341.0 | 10170.0 | ... | 31000.0 | NaN | 8100.0 | 33800.0 | 33900.0 | 31300.0 | 34400.0 | 21800.0 | 32600.0 | 29300.0 |
2009 | 17654.0 | 18750.0 | 18577.0 | 16848.0 | 16544.0 | 17220.0 | 17433.0 | 18575.0 | 16471.0 | 21902.0 | ... | 36900.0 | 33000.0 | 28600.0 | 32100.0 | 30000.0 | 26600.0 | 28500.0 | 34900.0 | 33900.0 | 27200.0 |
2010 | 17313.0 | 16855.0 | 11224.0 | 18810.0 | 18975.0 | 13389.0 | 16252.0 | 17704.0 | 16324.0 | 13429.0 | ... | 10400.0 | 38300.0 | 37800.0 | 38400.0 | 39200.0 | 39600.0 | 41900.0 | 44900.0 | 43000.0 | 41800.0 |
2011 | 22326.0 | 21544.0 | 26531.0 | 25104.0 | 30675.0 | 21852.0 | 22474.0 | 25014.0 | 25708.0 | 20050.0 | ... | 51000.0 | 44200.0 | 38300.0 | 50900.0 | 48500.0 | 46200.0 | 47400.0 | 45700.0 | 53800.0 | 52200.0 |
2012 | 22706.0 | 21425.0 | 18244.0 | 23391.0 | 24354.0 | 26526.0 | 24774.0 | 24897.0 | 24230.0 | 19120.0 | ... | 68900.0 | 55400.0 | 52800.0 | 57700.0 | 55800.0 | 58300.0 | 64000.0 | 66400.0 | 65200.0 | 65700.0 |
2013 | 26174.0 | 22650.0 | 39625.0 | 24651.0 | 20857.0 | 21811.0 | 21482.0 | 23589.0 | 22224.0 | 38220.0 | ... | 76000.0 | 83300.0 | 75000.0 | 76300.0 | 77600.0 | 90800.0 | 80700.0 | 75500.0 | 82300.0 | 73400.0 |
2014 | 94241.0 | 121550.0 | 96972.0 | 45758.0 | 41946.0 | 136098.0 | 135677.0 | 61853.0 | 114121.0 | 95595.0 | ... | 73600.0 | 73200.0 | 73500.0 | 74600.0 | 73800.0 | 73800.0 | 74400.0 | 73500.0 | 74000.0 | 73800.0 |
2015 | 152298.0 | 166939.0 | 179133.0 | 103224.0 | 98203.0 | 166302.0 | 172205.0 | 132690.0 | 156007.0 | 169159.0 | ... | 84500.0 | 76500.0 | 74000.0 | 83100.0 | 80000.0 | 74600.0 | 79000.0 | 84600.0 | 85300.0 | 82100.0 |
2016 | 256897.0 | 251188.0 | 219882.0 | 196470.0 | 187533.0 | 240750.0 | 275438.0 | 221109.0 | 277889.0 | 215424.0 | ... | 88300.0 | 83200.0 | 82200.0 | 87200.0 | 84400.0 | 83100.0 | 85000.0 | 88600.0 | 88300.0 | 86500.0 |
2017 | 252273.0 | 256083.0 | 228148.0 | 251717.0 | 232101.0 | 269189.0 | 244349.0 | 262010.0 | 270197.0 | 226911.0 | ... | 92800.0 | 88200.0 | 87600.0 | 92200.0 | 89400.0 | 87800.0 | 90100.0 | 93100.0 | 93500.0 | 91300.0 |
2018 | 204980.0 | 192755.0 | 165442.0 | 220831.0 | 226316.0 | 202337.0 | 209672.0 | 217056.0 | 198627.0 | 177355.0 | ... | 87400.0 | 87600.0 | 87900.0 | 88300.0 | 87800.0 | 88100.0 | 89000.0 | 87300.0 | 88000.0 | 87300.0 |
17 rows × 48 columns