pandas的三道综合小练习,需要的数据集在这
一、2002 年-2018 年上海机动车拍照拍卖
问题
(1) 哪一次拍卖的中标率首次小于 5%?
ex1 = pd.read_csv('work/task6/2002年-2018年上海机动车拍照拍卖.csv')
ex1['%'] = ex1['Total number of license issued']/ex1['Total number of applicants']
print(ex1[ex1['%']<0.05].iloc[0])
Date 15-May
Total number of license issued 7482
lowest price 79000
avg price 79099
Total number of applicants 156007
% 0.0479594
Name: 159, dtype: object
(2) 按年统计拍卖最低价的下列统计量:最大值、均值、0.75 分位数,要求
显示在同一张表上。
ex1 = pd.read_csv('work/task6/2002年-2018年上海机动车拍照拍卖.csv')
ex1['year']=ex1['Date'].apply(lambda x:'200'+x[:1] if len(x[:2].replace('-',''))==1 else '20'+x[:2])
ex1['mouth']=ex1['Date'].apply(lambda x:x[-3:])
del ex1['Date']
ls=ex1.columns.tolist()
ls.remove('mouth')
ls.remove('year')
ex1 = ex1[['year','mouth']+ls]
ex1_groupby = ex1.groupby('year')
ls=list(set(ex1['year']))
ls.sort()
df2=pd.DataFrame()
for i in ls:
df=ex1_groupby.get_group(i)['lowest price '].agg(['sum','mean'])
df['0.75']=ex1_groupby.get_group(i)['lowest price '].quantile(0.75)
df=df.to_frame()
df.rename(columns={'lowest price ':i}, inplace = True)
df2=pd.concat([df2,df.T])
print(df2)
把每个年份都遍历了一遍,这个方法稍微麻烦了点。
sum mean 0.75
2002 243800.0 20316.666667 24300.0
2003 383800.0 31983.333333 36300.0
2004 352900.0 29408.333333 38400.0
2005 382900.0 31908.333333 35600.0
2006 444700.0 37058.333333 39525.0
2007 548300.0 45691.666667 48950.0
2008 329400.0 29945.454545 34150.0
2009 376000.0 31333.333333 34150.0
2010 456100.0 38008.333333 41825.0
2011 575500.0 47958.333333 51000.0
2012 733300.0 61108.333333 65325.0
2013 949500.0 79125.000000 82550.0
2014 885800.0 73816.666667 74000.0
2015 966900.0 80575.000000 83450.0
2016 1028800.0 85733.333333 87475.0
2017 1087400.0 90616.666667 92350.0
2018 1053900.0 87825.000000 88150.0
(3) 将第一列时间列拆分成两个列,一列为年份(格式为 20××),另一列为
月份(英语缩写),添加到列表作为第一第二列,并将原表第一列删除,
其他列依次向后顺延。
ex1 = pd.read_csv('work/task6/2002年-2018年上海机动车拍照拍卖.csv')
ex1['year']=ex1['Date'].apply(lambda x:'200'+x[:1] if len(x[:2].replace('-',''))==1 else '20'+x[:2])
ex1['mouth']=ex1['Date'].apply(lambda x:x[-3:])
del ex1['Date']
ls=ex1.columns.tolist()
ls.remove('mouth')
ls.remove('year')
ex1 = ex1[['year','mouth']+ls]
print(ex1)
其实,在第二题中就用了年月的分类。
year mouth Total number of license issued lowest price avg price \
0 2002 Jan 1400 13600 14735
1 2002 Feb 1800 13100 14057
2 2002 Mar 2000 14300 14662
3 2002 Apr 2300 16000 16334
4 2002 May 2350 17800 18357
.. ... ... ... ... ...
198 2018 Aug 10402 88300 88365
199 2018 Sep 12712 87300 87410
200 2018 Oct 10728 88000 88070
201 2018 Nov 11766 87300 87374
202 2018 Dec 12850 87400 87508
Total number of applicants
0 3718
1 4590
2 5190
3 4806
4 4665
.. ...
198 192755
199 189142
200 181861
201 177355
202 165442
[203 rows x 6 columns]
(4) 现在将表格行索引设为多级索引,外层为年份,内层为原表格第二至第
五列的变量名,列索引为月份。
ls=ex1.columns.tolist()
ls.remove('mouth')
ls.remove('year')
df=ex1.set_index(['year']+ls)
#print(df)
(5) 一般而言某个月最低价与上月最低价的差额,会与该月均值与上月均值
的差额具有相同的正负号,哪些拍卖时间不具有这个特点?
ex1 = pd.read_csv('work/task6/2002年-2018年上海机动车拍照拍卖.csv')
ex1['product']=ex1['lowest price '].diff()* ex1['avg price'].diff()
print(ex1[ex1['product']<=0])
正负号不相同则乘积为负数,这里的筛选结果还把0放进去了。
Date Total number of license issued lowest price avg price \
21 3-Oct 4500 32800 34842
22 3-Nov 5042 33100 34284
29 4-Jun 6233 17800 21001
36 5-Jan 5500 28500 32520
37 5-Feb 3800 31700 32425
44 5-Sep 6700 26500 28927
52 6-May 4500 37700 38139
56 6-Sep 6500 37000 41601
60 7-Jan 6000 38500 40974
61 7-Feb 3500 39100 40473
71 7-Dec 7500 50000 56042
128 12-Oct 9500 65200 66708
198 18-Aug 10402 88300 88365
Total number of applicants product
21 9383 -15544000.0
22 9849 -167400.0
29 19233 -92575000.0
36 6208 -1790400.0
37 8949 -304000.0
44 10972 -10467000.0
52 8301 -37400.0
56 7064 -3311800.0
60 6587 -592800.0
61 5056 -300600.0
71 10356 -6555000.0
128 19921 -141500.0
198 192755 -0.0
(6) 将某一个月牌照发行量与其前两个月发行量均值的差额定义为发行增
益,最初的两个月用 0 填充,求发行增益极值出现的时间。
ex1 = pd.read_csv('work/task6/2002年-2018年上海机动车拍照拍卖.csv')
ex1['gain']=(ex1['Total number of license issued'].diff()/2+ex1['Total number of license issued'].diff(periods=2)/2).fillna(0)
print(ex1.iloc[ex1['gain'].idxmax()])
[‘gain’]为发行增益,选择最大的idxmax()
。最小的就是ex1.iloc[ex1['gain'].idxmin()]
。
Date 8-Jan
Total number of license issued 16000
lowest price 8100
avg price 23370
Total number of applicants 20539
gain 8500
Name: 72, dtype: object
二、2007 年-2019 年俄罗斯机场货运航班运载量
问题
(1) 求每年货运航班总运量。
ex2 = pd.read_csv('work/task6/2007年-2019年俄罗斯货运航班运载量.csv')
print(ex2.groupby('Year')['Whole year'].sum())
Year
2007 659438.23
2008 664682.46
2009 560809.77
2010 693033.98
2011 818691.71
2012 846388.03
2013 792337.08
2014 729457.12
2015 630208.97
2016 679370.15
2017 773662.28
2018 767095.28
2019 764606.27
Name: Whole year, dtype: float64
(2) 每年记录的机场都是相同的吗?
print(ex2.groupby('Year')['Airport name'].count())
数量都不同,自然也不相同。
Year
2007 292
2008 292
2009 292
2010 292
2011 292
2012 292
2013 292
2014 292
2015 292
2016 292
2017 292
2018 248
2019 251
Name: Airport name, dtype: int64
(3) 按年计算 2010 年-2015 年全年货运量记录为 0 的机场航班比例。
for i in range(2010,2016):
print(i,"年")
print(ex2[ex2['Year']==i][ex2['Whole year']==0]['Airport name'].count()/292)
从上一题可见2010 年-2015 年都是292做分母。
2010 年
0.7671232876712328
2011 年
0.7705479452054794
2012 年
0.7705479452054794
2013 年
0.7705479452054794
2014 年
0.7705479452054794
2015 年
0.7705479452054794
(4) 若某机场至少存在 5 年或以上满足所有月运量记录都为 0,则将其所有
年份的记录信息从表中删除,并返回处理后的表格
ex2 = pd.read_csv('work/task6/2007年-2019年俄罗斯货运航班运载量.csv')
ex2_4=ex2.groupby('Airport name')['Whole year'].agg(['count','sum'])
ls=ex2_4[ex2_4['count']>4][ex2_4['sum']==0].index.tolist()
ls=list(set(i for i in ex2['Airport name'].values if i not in ls))
ex2_44=pd.DataFrame()
for j in ls:
ex2_44=pd.concat([ex2_44,ex2[ex2['Airport name']==j]])
print(ex2_44.sort_index())
满足所有月运量记录都为 0其实就是总数为0。
Airport name Year January February March April \
0 Abakan 2019 44.70 66.21 72.70 75.82
4 Anadyr (Carbon) 2019 81.63 143.01 260.90 304.36
5 Anapa (Vitjazevo) 2019 45.92 53.15 54.00 54.72
8 Arkhangelsk (Talagy) 2019 85.61 118.70 131.39 144.82
9 Astrakhan (Narimanovo) 2019 51.75 61.08 65.60 71.84
... ... ... ... ... ... ...
3701 Ekimçan 2007 0.00 0.00 0.00 0.00
3705 Yuzhno-(Khomutovo) 2007 710.80 970.00 1330.30 1352.30
3706 Yakutsk 2007 583.70 707.80 851.80 1018.00
3708 Yamburg 2007 3.55 0.16 3.37 5.32
3709 Yaroslavl (Tunoshna) 2007 847.00 1482.90 1325.40 1235.97
May June July August September October November \
0 100.34 78.38 63.88 73.06 66.74 75.44 110.50
4 122.00 106.87 84.99 130.00 102.00 118.00 94.00
5 52.00 67.45 172.31 72.57 70.00 63.00 69.00
8 137.95 140.18 128.56 135.68 124.75 139.60 210.27
9 71.38 63.95 164.86 79.46 85.21 87.23 79.06
... ... ... ... ... ... ... ...
3701 0.00 0.00 0.00 0.00 0.00 0.00 0.00
3705 1324.40 1613.00 1450.70 1815.60 1902.30 1903.20 1666.10
3706 950.80 900.00 1154.90 1137.84 1485.50 1382.50 1488.00
3708 4.31 6.30 6.88 3.60 4.13 4.93 4.17
3709 629.00 838.00 1211.30 915.00 1249.60 1650.50 1822.60
December Whole year Airport coordinates
0 89.80 917.57 (Decimal('91.399735'), Decimal('53.751351'))
4 199.00 1746.76 (Decimal('177.738273'), Decimal('64.713433'))
5 82.10 856.22 (Decimal('37.341511'), Decimal('45.003748'))
8 307.10 1804.61 (Decimal('40.714892'), Decimal('64.596138'))
9 99.16 980.58 (Decimal('47.999896'), Decimal('46.287344'))
... ... ... ...
3701 0.00 0.00 (Decimal('132.950411'), Decimal('53.074423'))
3705 1632.10 17670.80 (Decimal('142.723677'), Decimal('46.886967'))
3706 1916.60 13577.44 (Decimal('129.750225'), Decimal('62.086594'))
3708 8.87 55.59 (Decimal('75.097783'), Decimal('67.980026'))
3709 2055.60 15262.87 (Decimal('40.170054'), Decimal('57.56231'))
[1354 rows x 16 columns]
(5) 采用一种合理的方式将所有机场划分为东南西北四个分区,并给出 2017
年-2019 年货运总量最大的区域。
ex2 = pd.read_csv('work/task6/2007年-2019年俄罗斯货运航班运载量.csv')
ex2_5=ex2[ex2['Year']>2016].copy()
#ex2_5[ex2_5['Airport coordinates']=='Not found']
ex2_5=ex2_5.drop(ex2_5[ex2_5['Airport coordinates']=='Not found'].index)
ex2_5['lon']=ex2_5['Airport coordinates'].apply(lambda x:x.split(', ')[0].replace("(Decimal('",'').replace("')",''))
ex2_5['lat']=ex2_5['Airport coordinates'].apply(lambda x:x.split(', ')[-1].replace("Decimal('",'').replace("'))",''))
lon=np.mean(ex2_5['lon'].apply(lambda x:float(x)))
lat=np.mean(ex2_5['lat'].apply(lambda x:float(x)))
ex2_5['area_lon']=ex2_5['lon'].apply(lambda x:'right' if float(x)>=lon else 'left')
ex2_5['area_lat']=ex2_5['lat'].apply(lambda x:'up' if float(x)>=lat else 'down')
ex2_5['area']=ex2_5['area_lon']+ex2_5['area_lat']
print(ex2_5[['Year','Airport name','area']])
print('2017年-2019 年货运总量最大的区域')
print(ex2_5.groupby('area')['Whole year'].sum().idxmax())
刚好有个坐标,就按照坐标中位数来划分(上下左右分四块)。对于没有坐标的就让他们睡觉吧(不管)。
Year Airport name area
0 2019 Abakan rightdown
1 2019 Aikhal rightup
2 2019 Loss rightdown
3 2019 Amderma leftup
4 2019 Anadyr (Carbon) rightup
.. ... ... ...
786 2017 Yakutsk rightup
787 2017 Yakutsk (Magan) leftdown
788 2017 Yamburg leftup
789 2017 Yaroslavl (Tunoshna) leftdown
790 2017 Yartsevo rightup
[763 rows x 3 columns]
2017年-2019 年货运总量最大的区域
leftdown
(6) 在统计学中常常用秩代表排名,现在规定某个机场某年某个月的秩为该
机场该月在当年所有月份中货运量的排名(例如 *** 机场 19 年 1 月运
量在整个 19 年 12 个月中排名第一,则秩为 1),那么判断某月运量情
况的相对大小的秩方法为将所有机场在该月的秩排名相加,并将这个量
定义为每一个月的秩综合指数,请根据上述定义计算 2016 年 12 个月
的秩综合指数。
ex2 = pd.read_csv('work/task6/2007年-2019年俄罗斯货运航班运载量.csv',index_col='Airport name')
ex2_6=ex2[ex2['Year']==2016].copy()
del ex2_6['Year'],ex2_6['Airport coordinates'],ex2_6['Whole year']
ex2_6=ex2_6.T
'''
dict1={}
for key in ex2_6.columns:
dict1[key]=dict1.get(key,0)+1
print(ex2_6['Vorkuta'])
'''
ex2_66=pd.DataFrame()
for i in ex2_6.columns:
if i in ex2_66.columns:
continue
else:
ex2_66=pd.concat([ex2_66,ex2_6[i].rank(ascending=False,method='min')],axis=1)
print(ex2_66.apply(lambda x:x.sum(),axis =1))
先把2016 年 12 个月筛出来再操作,根据第三小题2016年有292个机场。事实上,这个数据集有一些问题,有些年份实际上对应的机场数是重复的,这是很恶心人的一件事情。╮(╯▽╰)╭比如这个2016年里实际是288个机场。
rank(ascending=False,method='min')
这里使用的是降序、不平均排名,对数字相同的使用最小排名进行统一排名。
January 931.0
February 825.0
March 704.0
April 630.0
May 700.0
June 695.0
July 722.0
August 623.0
September 590.0
October 556.0
November 502.0
December 422.0
dtype: float64
三、
问题
(1)
ex3 = pd.read_csv('work/task6/死亡数.csv')
#ex3.columns[:20]
#ex3.columns[-10:]
print(ex3['Population'].corr(ex3['2020/4/26']))
这个表格列有点多,一堆时间,可以先把ex3.columns
打出来看看。
0.40384419734806903
(2)
ex3 = pd.read_csv('work/task6/确证数.csv')
print(ex3[ex3['2020/4/1']==0].groupby('Province_State')['Admin2'].count()/ex3.groupby('Province_State')['Admin2'].count())
这里的NaN
其实是没 零—感~染*县,也就是0。ex3[ex3['Province_State']=='Arizona'][ex3['2020/4/1']==0]
可以看到是空的。
Province_State
Alabama 0.119403
Alaska 0.793103
Arizona NaN
Arkansas 0.293333
California 0.137931
Colorado 0.218750
Connecticut NaN
Delaware NaN
District of Columbia NaN
Florida 0.164179
Georgia 0.125786
Hawaii 0.200000
Idaho 0.386364
Illinois 0.480392
Indiana 0.108696
Iowa 0.404040
Kansas 0.609524
Kentucky 0.441667
Louisiana 0.062500
Maine 0.250000
Maryland 0.041667
Massachusetts 0.142857
Michigan 0.192771
Minnesota 0.367816
Mississippi 0.060976
Missouri 0.391304
Montana 0.625000
Nebraska 0.752688
Nevada 0.470588
New Hampshire 0.100000
New Jersey NaN
New Mexico 0.424242
New York 0.080645
North Carolina 0.180000
North Dakota 0.547170
Ohio 0.181818
Oklahoma 0.376623
Oregon 0.277778
Pennsylvania 0.104478
Rhode Island NaN
South Carolina 0.065217
South Dakota 0.560606
Tennessee 0.115789
Texas 0.452756
Utah 0.482759
Vermont 0.142857
Virginia 0.270677
Washington 0.128205
West Virginia 0.472727
Wisconsin 0.319444
Wyoming 0.347826
Name: Admin2, dtype: float64
(3)
print(ex3.set_index(['Province_State','Admin2'])[ex3.columns[-96:].tolist()].sort_values(by=ex3.columns[-96:].tolist(),ascending=False))
其实不用全打出来,看前六行就可以了。
2020/1/22 2020/1/23 2020/1/24 2020/1/25 \
Province_State Admin2
Washington King 1 1 1 1
Illinois Cook 0 0 1 1
California Los Angeles 0 0 0 0
Orange 0 0 0 0
Arizona Maricopa 0 0 0 0
... ... ... ... ...
Wisconsin Forest 0 0 0 0
Langlade 0 0 0 0
Lincoln 0 0 0 0
Taylor 0 0 0 0
Wyoming Weston 0 0 0 0
2020/1/26 2020/1/27 2020/1/28 2020/1/29 \
Province_State Admin2
Washington King 1 1 1 1
Illinois Cook 1 1 1 1
California Los Angeles 1 1 1 1
Orange 1 1 1 1
Arizona Maricopa 1 1 1 1
... ... ... ... ...
Wisconsin Forest 0 0 0 0
Langlade 0 0 0 0
Lincoln 0 0 0 0
Taylor 0 0 0 0
Wyoming Weston 0 0 0 0
2020/1/30 2020/1/31 ... 2020/4/17 2020/4/18 \
Province_State Admin2 ...
Washington King 1 1 ... 4902 4902
Illinois Cook 1 2 ... 19391 20395
California Los Angeles 1 1 ... 11400 12021
Orange 1 1 ... 1501 1556
Arizona Maricopa 1 1 ... 2404 2491
... ... ... ... ... ...
Wisconsin Forest 0 0 ... 0 0
Langlade 0 0 ... 0 0
Lincoln 0 0 ... 0 0
Taylor 0 0 ... 0 0
Wyoming Weston 0 0 ... 0 0
2020/4/19 2020/4/20 2020/4/21 2020/4/22 \
Province_State Admin2
Washington King 5174 5174 5293 5379
Illinois Cook 21272 22101 23181 24546
California Los Angeles 12341 13823 15153 16447
Orange 1636 1676 1691 1753
Arizona Maricopa 2589 2636 2738 2846
... ... ... ... ...
Wisconsin Forest 0 0 0 0
Langlade 0 0 0 0
Lincoln 0 0 0 0
Taylor 0 0 0 0
Wyoming Weston 0 0 0 0
2020/4/23 2020/4/24 2020/4/25 2020/4/26
Province_State Admin2
Washington King 5532 5637 5739 5863
Illinois Cook 25811 27616 29058 30574
California Los Angeles 17537 18545 19133 19567
Orange 1827 1845 1969 2074
Arizona Maricopa 2970 3116 3234 3359
... ... ... ... ...
Wisconsin Forest 0 0 0 0
Langlade 0 0 0 0
Lincoln 0 0 0 0
Taylor 0 0 0 0
Wyoming Weston 0 0 0 0
[3142 rows x 96 columns]
(4)
ex3 = pd.read_csv('work/task6/死亡数.csv')
print(ex3.set_index(['Province_State']).groupby('Province_State')[ex3.columns[-96:].tolist()].sum().diff(axis=1))
date=ex3.set_index(['Province_State']).groupby('Province_State')[ex3.columns[-96:].tolist()].sum().diff(axis=1).max().idxmax()
print(date,ex3.iloc[ex3['{}'.format(date)].idxmax()]['Province_State'])
这里指的是在所有州和所有天两个指标一起算,不是分别算。
2020/1/22 2020/1/23 2020/1/24 2020/1/25 2020/1/26 \
Province_State
Alabama NaN 0.0 0.0 0.0 0.0
Alaska NaN 0.0 0.0 0.0 0.0
Arizona NaN 0.0 0.0 0.0 0.0
Arkansas NaN 0.0 0.0 0.0 0.0
California NaN 0.0 0.0 0.0 0.0
Colorado NaN 0.0 0.0 0.0 0.0
Connecticut NaN 0.0 0.0 0.0 0.0
Delaware NaN 0.0 0.0 0.0 0.0
District of Columbia NaN 0.0 0.0 0.0 0.0
Florida NaN 0.0 0.0 0.0 0.0
Georgia NaN 0.0 0.0 0.0 0.0
Hawaii NaN 0.0 0.0 0.0 0.0
Idaho NaN 0.0 0.0 0.0 0.0
Illinois NaN 0.0 0.0 0.0 0.0
Indiana NaN 0.0 0.0 0.0 0.0
Iowa NaN 0.0 0.0 0.0 0.0
Kansas NaN 0.0 0.0 0.0 0.0
Kentucky NaN 0.0 0.0 0.0 0.0
Louisiana NaN 0.0 0.0 0.0 0.0
Maine NaN 0.0 0.0 0.0 0.0
Maryland NaN 0.0 0.0 0.0 0.0
Massachusetts NaN 0.0 0.0 0.0 0.0
Michigan NaN 0.0 0.0 0.0 0.0
Minnesota NaN 0.0 0.0 0.0 0.0
Mississippi NaN 0.0 0.0 0.0 0.0
Missouri NaN 0.0 0.0 0.0 0.0
Montana NaN 0.0 0.0 0.0 0.0
Nebraska NaN 0.0 0.0 0.0 0.0
Nevada NaN 0.0 0.0 0.0 0.0
New Hampshire NaN 0.0 0.0 0.0 0.0
New Jersey NaN 0.0 0.0 0.0 0.0
New Mexico NaN 0.0 0.0 0.0 0.0
New York NaN 0.0 0.0 0.0 0.0
North Carolina NaN 0.0 0.0 0.0 0.0
North Dakota NaN 0.0 0.0 0.0 0.0
Ohio NaN 0.0 0.0 0.0 0.0
Oklahoma NaN 0.0 0.0 0.0 0.0
Oregon NaN 0.0 0.0 0.0 0.0
Pennsylvania NaN 0.0 0.0 0.0 0.0
Rhode Island NaN 0.0 0.0 0.0 0.0
South Carolina NaN 0.0 0.0 0.0 0.0
South Dakota NaN 0.0 0.0 0.0 0.0
Tennessee NaN 0.0 0.0 0.0 0.0
Texas NaN 0.0 0.0 0.0 0.0
Utah NaN 0.0 0.0 0.0 0.0
Vermont NaN 0.0 0.0 0.0 0.0
Virginia NaN 0.0 0.0 0.0 0.0
Washington NaN 0.0 0.0 0.0 0.0
West Virginia NaN 0.0 0.0 0.0 0.0
Wisconsin NaN 0.0 0.0 0.0 0.0
Wyoming NaN 0.0 0.0 0.0 0.0
2020/1/27 2020/1/28 2020/1/29 2020/1/30 2020/1/31 \
Province_State
Alabama 0.0 0.0 0.0 0.0 0.0
Alaska 0.0 0.0 0.0 0.0 0.0
Arizona 0.0 0.0 0.0 0.0 0.0
Arkansas 0.0 0.0 0.0 0.0 0.0
California 0.0 0.0 0.0 0.0 0.0
Colorado 0.0 0.0 0.0 0.0 0.0
Connecticut 0.0 0.0 0.0 0.0 0.0
Delaware 0.0 0.0 0.0 0.0 0.0
District of Columbia 0.0 0.0 0.0 0.0 0.0
Florida 0.0 0.0 0.0 0.0 0.0
Georgia 0.0 0.0 0.0 0.0 0.0
Hawaii 0.0 0.0 0.0 0.0 0.0
Idaho 0.0 0.0 0.0 0.0 0.0
Illinois 0.0 0.0 0.0 0.0 0.0
Indiana 0.0 0.0 0.0 0.0 0.0
Iowa 0.0 0.0 0.0 0.0 0.0
Kansas 0.0 0.0 0.0 0.0 0.0
Kentucky 0.0 0.0 0.0 0.0 0.0
Louisiana 0.0 0.0 0.0 0.0 0.0
Maine 0.0 0.0 0.0 0.0 0.0
Maryland 0.0 0.0 0.0 0.0 0.0
Massachusetts 0.0 0.0 0.0 0.0 0.0
Michigan 0.0 0.0 0.0 0.0 0.0
Minnesota 0.0 0.0 0.0 0.0 0.0
Mississippi 0.0 0.0 0.0 0.0 0.0
Missouri 0.0 0.0 0.0 0.0 0.0
Montana 0.0 0.0 0.0 0.0 0.0
Nebraska 0.0 0.0 0.0 0.0 0.0
Nevada 0.0 0.0 0.0 0.0 0.0
New Hampshire 0.0 0.0 0.0 0.0 0.0
New Jersey 0.0 0.0 0.0 0.0 0.0
New Mexico 0.0 0.0 0.0 0.0 0.0
New York 0.0 0.0 0.0 0.0 0.0
North Carolina 0.0 0.0 0.0 0.0 0.0
North Dakota 0.0 0.0 0.0 0.0 0.0
Ohio 0.0 0.0 0.0 0.0 0.0
Oklahoma 0.0 0.0 0.0 0.0 0.0
Oregon 0.0 0.0 0.0 0.0 0.0
Pennsylvania 0.0 0.0 0.0 0.0 0.0
Rhode Island 0.0 0.0 0.0 0.0 0.0
South Carolina 0.0 0.0 0.0 0.0 0.0
South Dakota 0.0 0.0 0.0 0.0 0.0
Tennessee 0.0 0.0 0.0 0.0 0.0
Texas 0.0 0.0 0.0 0.0 0.0
Utah 0.0 0.0 0.0 0.0 0.0
Vermont 0.0 0.0 0.0 0.0 0.0
Virginia 0.0 0.0 0.0 0.0 0.0
Washington 0.0 0.0 0.0 0.0 0.0
West Virginia 0.0 0.0 0.0 0.0 0.0
Wisconsin 0.0 0.0 0.0 0.0 0.0
Wyoming 0.0 0.0 0.0 0.0 0.0
... 2020/4/17 2020/4/18 2020/4/19 2020/4/20 \
Province_State ...
Alabama ... 15.0 5.0 4.0 6.0
Alaska ... 0.0 0.0 0.0 0.0
Arizona ... 5.0 39.0 3.0 7.0
Arkansas ... 0.0 1.0 1.0 2.0
California ... 81.0 103.0 35.0 48.0
Colorado ... 17.0 17.0 31.0 0.0
Connecticut ... 65.0 49.0 40.0 204.0
Delaware ... 9.0 6.0 0.0 5.0
District of Columbia ... 5.0 5.0 5.0 9.0
Florida ... 57.0 23.0 26.0 48.0
Georgia ... 56.0 23.0 14.0 86.0
Hawaii ... 0.0 0.0 1.0 0.0
Idaho ... 0.0 2.0 1.0 1.0
Illinois ... 60.0 125.0 31.0 59.0
Indiana ... 46.0 23.0 17.0 15.0
Iowa ... 4.0 10.0 1.0 4.0
Kansas ... 2.0 3.0 8.0 9.0
Kentucky ... 7.0 2.0 2.0 14.0
Louisiana ... 57.0 54.0 29.0 32.0
Maine ... 2.0 3.0 2.0 1.0
Maryland ... 15.0 87.0 40.0 29.0
Massachusetts ... 135.0 156.0 301.0 0.0
Michigan ... 229.0 80.0 82.0 75.0
Minnesota ... 24.0 10.0 13.0 9.0
Mississippi ... 11.0 12.0 7.0 10.0
Missouri ... 14.0 13.0 2.0 1.0
Montana ... 1.0 2.0 0.0 0.0
Nebraska ... 0.0 0.0 3.0 0.0
Nevada ... 10.0 10.0 7.0 1.0
New Hampshire ... 0.0 0.0 0.0 0.0
New Jersey ... 322.0 231.0 294.0 134.0
New Mexico ... 8.0 9.0 2.0 3.0
New York ... 452.0 520.0 480.0 947.0
North Carolina ... 27.0 10.0 13.0 18.0
North Dakota ... 0.0 0.0 0.0 0.0
Ohio ... 27.0 33.0 20.0 38.0
Oklahoma ... 0.0 0.0 9.0 3.0
Oregon ... 6.0 2.0 2.0 1.0
Pennsylvania ... 80.0 121.0 234.0 72.0
Rhode Island ... 0.0 0.0 0.0 0.0
South Carolina ... 5.0 3.0 1.0 3.0
South Dakota ... 0.0 0.0 0.0 0.0
Tennessee ... 5.0 1.0 6.0 4.0
Texas ... 37.0 25.0 14.0 17.0
Utah ... 10.0 2.0 0.0 0.0
Vermont ... 2.0 2.0 0.0 1.0
Virginia ... 33.0 0.0 0.0 20.0
Washington ... 24.0 10.0 23.0 7.0
West Virginia ... 1.0 0.0 0.0 0.0
Wisconsin ... 9.0 6.0 8.0 10.0
Wyoming ... 0.0 0.0 0.0 0.0
2020/4/21 2020/4/22 2020/4/23 2020/4/24 2020/4/25 \
Province_State
Alabama 20.0 13.0 6.0 7.0 0.0
Alaska 0.0 0.0 0.0 0.0 0.0
Arizona 18.0 23.0 18.0 17.0 7.0
Arkansas 1.0 0.0 3.0 2.0 1.0
California 57.0 139.0 112.0 88.0 68.0
Colorado 63.0 23.0 44.0 121.0 10.0
Connecticut 92.0 121.0 93.0 128.0 98.0
Delaware 10.0 7.0 3.0 8.0 12.0
District of Columbia 7.0 15.0 12.0 14.0 12.0
Florida 45.0 26.0 94.0 59.0 9.0
Georgia 25.0 48.0 32.0 18.0 7.0
Hawaii 0.0 2.0 0.0 1.0 0.0
Idaho 3.0 3.0 3.0 0.0 2.0
Illinois 120.0 98.0 124.0 106.0 80.0
Indiana 58.0 31.0 40.0 35.0 45.0
Iowa 4.0 7.0 6.0 11.0 5.0
Kansas 7.0 3.0 1.0 5.0 2.0
Kentucky 17.0 14.0 6.0 9.0 5.0
Louisiana 76.0 69.0 68.0 61.0 43.0
Maine 1.0 3.0 5.0 3.0 0.0
Maryland 42.0 58.0 52.0 45.0 64.0
Massachusetts 267.0 0.0 0.0 0.0 764.0
Michigan 106.0 238.0 162.0 106.0 67.0
Minnesota 17.0 19.0 21.0 21.0 23.0
Mississippi 14.0 10.0 8.0 8.0 12.0
Missouri 24.0 14.0 13.0 19.0 9.0
Montana 2.0 2.0 0.0 0.0 0.0
Nebraska 0.0 0.0 20.0 0.0 0.0
Nevada 4.0 9.0 18.0 5.0 10.0
New Hampshire 0.0 0.0 0.0 0.0 0.0
New Jersey 24.0 630.0 276.0 257.0 231.0
New Mexico 0.0 13.0 7.0 6.0 9.0
New York 615.0 460.0 540.0 212.0 1054.0
North Carolina 27.0 20.0 16.0 13.0 11.0
North Dakota 4.0 1.0 1.0 0.0 1.0
Ohio 48.0 53.0 46.0 34.0 21.0
Oklahoma 21.0 6.0 9.0 9.0 6.0
Oregon 3.0 0.0 5.0 3.0 1.0
Pennsylvania 266.0 59.0 51.0 23.0 46.0
Rhode Island 0.0 0.0 -3.0 0.0 0.0
South Carolina 1.0 16.0 10.0 7.0 9.0
South Dakota 1.0 1.0 0.0 1.0 0.0
Tennessee 5.0 0.0 14.0 -2.0 8.0
Texas 26.0 23.0 48.0 20.0 14.0
Utah 3.0 0.0 3.0 4.0 1.0
Vermont 2.0 0.0 3.0 1.0 2.0
Virginia 0.0 165.0 24.0 38.0 26.0
Washington 21.0 22.0 25.0 11.0 15.0
West Virginia 0.0 0.0 24.0 1.0 0.0
Wisconsin 14.0 4.0 10.0 5.0 3.0
Wyoming 0.0 0.0 0.0 0.0 0.0
2020/4/26
Province_State
Alabama 10.0
Alaska 0.0
Arizona 2.0
Arkansas 3.0
California 36.0
Colorado 4.0
Connecticut 65.0
Delaware 8.0
District of Columbia 13.0
Florida 20.0
Georgia 11.0
Hawaii 1.0
Idaho 0.0
Illinois 59.0
Indiana 27.0
Iowa 6.0
Kansas 1.0
Kentucky 3.0
Louisiana 26.0
Maine 3.0
Maryland 14.0
Massachusetts 41.0
Michigan 158.0
Minnesota 28.0
Mississippi 6.0
Missouri 2.0
Montana 0.0
Nebraska 0.0
Nevada 0.0
New Hampshire 0.0
New Jersey 24.0
New Mexico 6.0
New York 260.0
North Carolina 19.0
North Dakota 1.0
Ohio 17.0
Oklahoma 1.0
Oregon 5.0
Pennsylvania 23.0
Rhode Island 0.0
South Carolina 8.0
South Dakota 1.0
Tennessee 3.0
Texas 13.0
Utah 0.0
Vermont 0.0
Virginia 12.0
Washington 12.0
West Virginia 2.0
Wisconsin 6.0
Wyoming 0.0
[51 rows x 96 columns]
按州统计增加数最大的日期和州
2020/4/7 New York
(5)
ex3_ = pd.read_csv('work/task6/死亡数.csv')
ex3_confirm = pd.read_csv('work/task6/确证数.csv')
ex3_death=ex3_.groupby('Province_State').sum().iloc[:,6:].T
ex3_confirm=ex3_confirm.groupby('Province_State').sum().iloc[:,5:].T
for i in ex3_confirm.columns:
df=pd.DataFrame()
df['confirm']=ex3_confirm[i]
df['death']=ex3_death[i]
df[df['death']!=0].to_csv('data/{}.csv'.format(i))
#print(df[df['death']!=0])
(6)
ex3_ = pd.read_csv('work/task6/死亡数.csv')
ex3_confirm = pd.read_csv('work/task6/确证数.csv')
ex3_death=ex3_.groupby('Province_State').sum().iloc[:,6:].T.diff()
ex3_confirm=ex3_confirm.groupby('Province_State').sum().iloc[:,5:].T.diff()
death=ex3_death.iloc[-26:-16,:].T
confirm=ex3_confirm.iloc[-26:-16,:].T
for i in confirm.columns:
df=pd.DataFrame()
df['add_confirm']=confirm[i]
df['add_death']=death[i]
df.to_csv('data/{}.csv'.format(i.replace('/','-')))
/
不能作为名字,就用-
来代替了。
劳动节,快乐学习~~
图片来源网络,如有侵权,请联系删除,谢谢!