Pandas(上)综合练习

第一阶段学习总结:Datawhale组织的Pandas组队学习暂告一段落。快乐pandas以自主学习助教答疑定期打卡等形式,课程资料编写比较详细,配合思考题和练习巩固,无奈于本人较菜,对所学的知识掌握还不够透彻,完成这次综合练习断断续续花了两天时间,后面一定要抽时间复盘之前所学的内容。
全面学习基础内容有助于建立体系,了解各大功能模块,在使用时才能有所了解,找准方向。因此后面会继续参加pandas(下)的学习,争取学完这期教程。

import numpy as np
import pandas as pd
import re

   
   
  • 1
  • 2
  • 3

一、2002 年-2018 年上海机动车拍照拍卖

vehicle = pd.read_csv('数据集/2002年-2018年上海机动车拍照拍卖.csv')
vehicle.head()

   
   
  • 1
  • 2
DateTotal number of license issuedlowest priceavg priceTotal number of applicants
02-Jan140013600147353718
12-Feb180013100140574590
22-Mar200014300146625190
32-Apr230016000163344806
42-May235017800183574665
(1)

问:哪一次拍卖的中标率首次小于 5%?

vehicle_1 = vehicle.assign(probability=vehicle['Total number of license issued'] / vehicle['Total number of applicants'])
vehicle_1[vehicle_1['probability']<0.05].head(1)

 
 
  • 1
  • 2
DateTotal number of license issuedlowest priceavg priceTotal number of applicantsprobability
15915-May748279000790991560070.047959
(2)

问:按年统计拍卖最低价的下列统计量:最大值、均值、0.75 分位数,要求显示在同一张表上。

vehicle_2 = vehicle.assign(year=vehicle['Date'].apply(lambda x: x.split('-')[0]))
vehicle_2.head()
groupby_ = vehicle_2.groupby('year')

new_dataframe = pd.DataFrame()
new_dataframe[‘max’] = groupby_[‘lowest price’].max()
new_dataframe[‘mean’] = groupby_[‘lowest price’].mean()
new_dataframe[‘quantile’] = groupby_[‘lowest price’].quantile(0.75)
new_dataframe

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
maxmeanquantile
year
104490038008.33333341825.0
115380047958.33333351000.0
126890061108.33333365325.0
139080079125.00000082550.0
147460073816.66666774000.0
158530080575.00000083450.0
168860085733.33333387475.0
179350090616.66666792350.0
188900087825.00000088150.0
23080020316.66666724300.0
33850031983.33333336300.0
44420029408.33333338400.0
53790031908.33333335600.0
63990037058.33333339525.0
75380045691.66666748950.0
83730029945.45454534150.0
93690031333.33333334150.0
(3)

问:将第一列时间列拆分成两个列,一列为年份(格式为 20××) ,另一列为月份(英语缩写) ,添加到列表作为第一第二列,并将原表第一列删除,其他列依次向后顺延。

Year = vehicle['Date'].apply(lambda x: str(20) + ("%02d" % int(x.split('-')[0])))
Month = vehicle['Date'].apply(lambda x: x.split('-')[1])

vehicle_3 = vehicle.drop(columns=‘Date’)
vehicle_3.insert(0, ‘Year’, Year)
vehicle_3.insert(1, ‘Month’, Month)
vehicle_3.head()

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
YearMonthTotal number of license issuedlowest priceavg priceTotal number of applicants
02002Jan140013600147353718
12002Feb180013100140574590
22002Mar200014300146625190
32002Apr230016000163344806
42002May235017800183574665
(4)

问:现在将表格行索引设为多级索引,外层为年份,内层为原表格第二至第五列的变量名,列索引为月份。

vehicle_4 = vehicle_3.set_index(['Year', 'Total number of license issued', 'lowest price', 'avg price', 'Total number of applicants'])
vehicle_4.head()

 
 
  • 1
  • 2
Month
YearTotal number of license issuedlowest priceavg priceTotal number of applicants
2002140013600147353718Jan
180013100140574590Feb
200014300146625190Mar
230016000163344806Apr
235017800183574665May
(5)

问:一般而言某个月最低价与上月最低价的差额,会与该月均值与上月均值的差额具有相同的正负号,哪些拍卖时间不具有这个特点?

vehicle_5 = vehicle[['Date','lowest price','avg price']]

for index in range(1, len(vehicle_5)):
if ((vehicle_5.iloc[index][‘lowest price’] - vehicle_5.iloc[index-1][‘lowest price’])
* (vehicle_5.iloc[index][‘avg price’] - vehicle_5.iloc[index-1][‘avg price’])) < 0:
print(vehicle_5.iloc[index][‘Date’])

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
3-Oct
3-Nov
4-Jun
5-Jan
5-Feb
5-Sep
6-May
6-Sep
7-Jan
7-Feb
7-Dec
12-Oct

 
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
(6)

问:将某一个月牌照发行量与其前两个月发行量均值的差额定义为发行增益,最初的两个月用 0 填充,求发行增益极值出现的时间。

vehicle_6 = vehicle[['Date','Total number of license issued']]
gain = [0,0]
string = 'Total number of license issued'
for index in range(2, len(vehicle_6)):
    gain.append(vehicle.iloc[index][string] - np.mean([vehicle.iloc[index-1][string], vehicle.iloc[index-2][string]]))
vehicle_6['gain'] = gain

print(‘增益极大值出现时间:’, vehicle_6[vehicle_6[‘gain’] vehicle_6[‘gain’].max()][‘Date’].values)
print(‘增益极小值出现时间:’, vehicle_6[vehicle_6[‘gain’] vehicle_6[‘gain’].min()][‘Date’].values)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
增益极大值出现时间: ['8-Jan']
增益极小值出现时间: ['8-Apr']

 
 
  • 1
  • 2



二、2007 年-2019 年俄罗斯机场货运航班运载量

Q2 = pd.read_csv('数据集/2007年-2019年俄罗斯货运航班运载量.csv')
Q2.head()

 
 
  • 1
  • 2
Airport nameYearJanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecemberWhole yearAirport coordinates
0Abakan201944.7066.2172.775.82100.3478.3863.8873.0666.7475.44110.589.8917.57(Decimal('91.399735'), Decimal('53.751351'))
1Aikhal20190.000.000.00.000.000.000.000.000.000.000.00.00.00(Decimal('111.543324'), Decimal('65.957161'))
2Loss20190.000.000.00.000.000.000.000.000.000.000.00.00.00(Decimal('125.398355'), Decimal('58.602489'))
3Amderma20190.000.000.00.000.000.000.000.000.000.000.00.00.00(Decimal('61.577429'), Decimal('69.759076'))
4Anadyr (Carbon)201981.63143.01260.9304.36122.00106.8784.99130.00102.00118.0094.0199.01746.76(Decimal('177.738273'), Decimal('64.713433'))
(1)

问:求每年货运航班总运量

groupby2 = Q2.groupby('Year')
print('各年货运总量航班:')
groupby2['Whole year'].sum()

 
 
  • 1
  • 2
  • 3
各年货运总量航班:

Year
2007 659438.23
2008 664682.46
2009 560809.77
2010 693033.98
2011 818691.71
2012 846388.03
2013 792337.08
2014 729457.12
2015 630208.97
2016 679370.15
2017 773662.28
2018 767095.28
2019 764606.27
Name: Whole year, dtype: float64

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
(2)

问:每年记录的机场都是相同的吗?

print('各年统计的机场数量:')
display(groupby2['Airport name'].count())

Q2_2 = Q2.groupby(‘Airport name’)[‘Year’]
print(‘各机场被统计的总数:’)
display(Q2_2.count())
print( ‘有 %d个机场并不是每年都被统计。’ % len(Q2_2.count()[(Q2_2.count())<(2019-2007+1)]))

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
各年统计的机场数量:

Year
2007 292
2008 292
2009 292
2010 292
2011 292
2012 292
2013 292
2014 292
2015 292
2016 292
2017 292
2018 248
2019 251
Name: Airport name, dtype: int64

各机场被统计的总数:

Airport name
Abakan 13
Achinsk 11
Aikhal 13
Amderma 13
Anadyr 12

Лешуконское 13
Мотыгино 13
Нюрба 13
Среднеколымск 13
Таксимо 13
Name: Year, Length: 297, dtype: int64

有 63个机场并不是每年都被统计。

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
(3)

问:按年计算 2010 年-2015 年全年货运量记录为 0 的机场航班比例。

Q2_3 = Q2.set_index(['Year'])
groupby_3 = Q2_3.sort_index().loc[2010:2015].groupby('Year')
display(groupby_3['Whole year'].agg( lambda x: print( '年份:', x.index[0], '\t 比例:%.2f' % (len(x[x==0]) / len(x) * 100), '%' ) ))

 
 
  • 1
  • 2
  • 3
年份: 2010 	 比例:76.71 %
年份: 2011 	 比例:77.05 %
年份: 2012 	 比例:77.05 %
年份: 2013 	 比例:77.05 %
年份: 2014 	 比例:77.05 %
年份: 2015 	 比例:77.05 %

Year
2010 None
2011 None
2012 None
2013 None
2014 None
2015 None
Name: Whole year, dtype: object

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
(4)

问:若某机场至少存在 5 年或以上满足所有月运量记录都为 0,则将其所有年份的记录信息从表中删除,并返回处理后的表格

Q2_4 = Q2.set_index('Airport name')
groupby_4 = Q2_4.groupby('Airport name')
zero_airport = pd.DataFrame(groupby_4['Whole year'].apply( lambda x: len(x[x==0])>5 ))
Q2_4.drop(zero_airport[zero_airport['Whole year']==True].index).head()

 
 
  • 1
  • 2
  • 3
  • 4
YearJanuaryFebruaryMarchAprilMayJuneJulyAugustSeptemberOctoberNovemberDecemberWhole yearAirport coordinates
Airport name
Abakan201944.7066.2172.7075.82100.3478.3863.8873.0666.7475.44110.5089.80917.57(Decimal('91.399735'), Decimal('53.751351'))
Anadyr (Carbon)201981.63143.01260.90304.36122.00106.8784.99130.00102.00118.0094.00199.001746.76(Decimal('177.738273'), Decimal('64.713433'))
Anapa (Vitjazevo)201945.9253.1554.0054.7252.0067.45172.3172.5770.0063.0069.0082.10856.22(Decimal('37.341511'), Decimal('45.003748'))
Arkhangelsk (Talagy)201985.61118.70131.39144.82137.95140.18128.56135.68124.75139.60210.27307.101804.61(Decimal('40.714892'), Decimal('64.596138'))
Astrakhan (Narimanovo)201951.7561.0865.6071.8471.3863.95164.8679.4685.2187.2379.0699.16980.58(Decimal('47.999896'), Decimal('46.287344'))
(5)

问:采用一种合理的方式将所有机场划分为东南西北四个分区, 并给出 2017 年 - 2019 年货运总量最大的区域。

Q2 = pd.read_csv('数据集/2007年-2019年俄罗斯货运航班运载量.csv')
Q2_5 = Q2.set_index(['Year']).sort_index().loc[2017:2019].loc[:, ['Whole year','Airport coordinates']]

# 坐标 提取
longitude = []
latitude = []
for i in range(0, len(Q2_5[‘Airport coordinates’])):
string = re.findall(r"\d+.?\d*", str(Q2_5[‘Airport coordinates’].iloc[i]))

<span class="token keyword">if</span> string<span class="token operator">==</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">:</span>
    longitude<span class="token punctuation">.</span>append<span class="token punctuation">(</span>np<span class="token punctuation">.</span>nan<span class="token punctuation">)</span>
    latitude<span class="token punctuation">.</span>append<span class="token punctuation">(</span>np<span class="token punctuation">.</span>nan<span class="token punctuation">)</span>
<span class="token keyword">else</span><span class="token punctuation">:</span>
    longitude<span class="token punctuation">.</span>append<span class="token punctuation">(</span><span class="token builtin">float</span><span class="token punctuation">(</span>string<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
    latitude<span class="token punctuation">.</span>append<span class="token punctuation">(</span> <span class="token builtin">float</span><span class="token punctuation">(</span>string<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span>

Q2_5[‘Longitude’] = longitude
Q2_5[‘Latitude’] = latitude
Q2_5 = Q2_5.dropna() # 删掉 Not found 的机场

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
cuts_long = pd.cut(Q2_5['Longitude'],bins=[0, Q2_5['Longitude'].median(), 180])
Q2_5['cuts_long'] = cuts_long
cuts_la = pd.cut(Q2_5['Latitude'],bins=[0, Q2_5['Latitude'].median(), 90])
Q2_5['cuts_la'] = cuts_la
display(Q2_5.groupby(['cuts_long', 'cuts_la']).count())
groupby_5 = Q2_5.groupby(['cuts_long', 'cuts_la'])
print( '2017 年 - 2019 年货运总量最大区域的 经度范围:%s,维度范围:%s' \
      % (groupby_5['Whole year'].sum().idxmax()[0], groupby_5['Whole year'].sum().idxmax()[1]) )

 
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
Whole yearAirport coordinatesLongitudeLatitude
cuts_longcuts_la
(0.0, 82.049](0.0, 58.498]187187187187
(58.498, 90.0]196196196196
(82.049, 180.0](0.0, 58.498]195195195195
(58.498, 90.0]185185185185
2017 年 - 2019 年货运总量最大区域的 经度范围:(0.0, 82.049],维度范围:(0.0, 58.498]

 
 
  • 1
(6)

问:在统计学中常常用秩代表排名,现在规定某个机场某年某个月的秩为该机场该月在当年所有月份中货运量的排名(例如:xx 机场 19 年 1 月运量在整个 19 年 12 个月中排名第一,则秩为 1) ,那么判断某月运量情况的相对大小的秩方法为将所有机场在该月的秩排名相加,并将这个量定义为每一个月的秩综合指数,请根据上述定义计算 2016 年 12 个月的秩综合指数。

Q2_6 = Q2[Q2['Year']==2016]

month = Q2_6.columns[2:14]
Rank = pd.DataFrame(index=Q2_6.index, columns=month)
for index in Q2_6.index:
aa = Q2_6.loc[index,month].sort_values(ascending=False)
rank = 1
for mon in aa.index:
Rank.loc[index, mon] = rank
rank = rank + 1

print(Rank.apply(lambda x: x.sum()))

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
January      3406
February     3076
March        2730
April        2432
May          2276
June         2047
July         1854
August       1527
September    1269
October      1009
November      728
December      422
dtype: int64

 
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13



三、新冠肺炎在美国的传播

Q3_confirmed = pd.read_csv('数据集/美国确证数.csv')
Q3_death = pd.read_csv('数据集/美国死亡数.csv')
display(Q3_confirmed.head())
display(Q3_death.head())

 
 
  • 1
  • 2
  • 3
  • 4
UIDiso2iso3code3FIPSAdmin2Province_StateCountry_RegionLatLong_...2020/4/172020/4/182020/4/192020/4/202020/4/212020/4/222020/4/232020/4/242020/4/252020/4/26
084001001USUSA8401001AutaugaAlabamaUS32.539527-86.644082...26252628303233363637
184001003USUSA8401003BaldwinAlabamaUS30.727750-87.722071...103109112117123132143147147161
284001005USUSA8401005BarbourAlabamaUS31.868263-85.387129...15182022282930323233
384001007USUSA8401007BibbAlabamaUS32.996421-87.125115...24262832323433343438
484001009USUSA8401009BlountAlabamaUS33.982109-86.567906...20202122262931313134

5 rows × 107 columns

UIDiso2iso3code3FIPSAdmin2Province_StateCountry_RegionLatLong_...2020/4/172020/4/182020/4/192020/4/202020/4/212020/4/222020/4/232020/4/242020/4/252020/4/26
084001001USUSA8401001AutaugaAlabamaUS32.539527-86.644082...2221122222
184001003USUSA8401003BaldwinAlabamaUS30.727750-87.722071...2223333333
284001005USUSA8401005BarbourAlabamaUS31.868263-85.387129...0000000000
384001007USUSA8401007BibbAlabamaUS32.996421-87.125115...0000000000
484001009USUSA8401009BlountAlabamaUS33.982109-86.567906...0000000000

5 rows × 108 columns

(1)

问:用 corr() 函数计算县(每行都是一个县)人口与表中最后一天记录日期死亡数的相关系数。

Q3_death[['Population', '2020/4/26']].corr()

 
 
  • 1
Population2020/4/26
Population1.0000000.403844
2020/4/260.4038441.000000
(2)

问:截止到 4 月 1 日,统计每个州零感染县的比例。

# 时序数据,是累计的,因此只取2020/4/1日
Q3_2 = Q3_confirmed[['Admin2', 'Province_State', '2020/4/1']]
groupby_Q3_2 = Q3_2.groupby('Province_State')
# display(groupby_Q3_2.first())

groupby_Q3_2[‘2020/4/1’].apply(lambda x: len(x[x==0]) / len(x) )

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
Province_State
Alabama                 0.119403
Alaska                  0.793103
Arizona                 0.000000
Arkansas                0.293333
California              0.137931
Colorado                0.218750
Connecticut             0.000000
Delaware                0.000000
District of Columbia    0.000000
Florida                 0.164179
Georgia                 0.125786
Hawaii                  0.200000
Idaho                   0.386364
Illinois                0.480392
Indiana                 0.108696
Iowa                    0.404040
Kansas                  0.609524
Kentucky                0.441667
Louisiana               0.062500
Maine                   0.250000
Maryland                0.041667
Massachusetts           0.142857
Michigan                0.192771
Minnesota               0.367816
Mississippi             0.060976
Missouri                0.391304
Montana                 0.625000
Nebraska                0.752688
Nevada                  0.470588
New Hampshire           0.100000
New Jersey              0.000000
New Mexico              0.424242
New York                0.080645
North Carolina          0.180000
North Dakota            0.547170
Ohio                    0.181818
Oklahoma                0.376623
Oregon                  0.277778
Pennsylvania            0.104478
Rhode Island            0.000000
South Carolina          0.065217
South Dakota            0.560606
Tennessee               0.115789
Texas                   0.452756
Utah                    0.482759
Vermont                 0.142857
Virginia                0.270677
Washington              0.128205
West Virginia           0.472727
Wisconsin               0.319444
Wyoming                 0.347826
Name: 2020/4/1, dtype: float64

 
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
(3)

问:请找出最早出确证病例的三个县。

Q3_3 = Q3_confirmed.set_index('Admin2')
# Q3_3.head()
Date = Q3_3.columns[10:]

top3_adamin = {}
for date in Date:
aa = Q3_3.loc[:, date]
first_index = aa[aa>0].index

<span class="token keyword">for</span> first <span class="token keyword">in</span> first_index<span class="token punctuation">:</span>
    <span class="token keyword">if</span> first <span class="token operator">not</span> <span class="token keyword">in</span> top3_adamin<span class="token punctuation">:</span>
        top3_adamin<span class="token punctuation">[</span>first<span class="token punctuation">]</span> <span class="token operator">=</span> date
    <span class="token keyword">else</span><span class="token punctuation">:</span>
        <span class="token keyword">continue</span>

<span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>top3_adamin<span class="token punctuation">)</span><span class="token operator">&gt;</span><span class="token number">3</span><span class="token punctuation">:</span>
    <span class="token keyword">break</span>

for key, value in top3_adamin.items():
print(’{value} : {key} 开始出现病例’.format(value = value,key = key))
# pd.DataFrame([top3_adamin])

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
2020/1/22 : King 开始出现病例
2020/1/24 : Cook 开始出现病例
2020/1/26 : Maricopa 开始出现病例
2020/1/26 : Los Angeles 开始出现病例
2020/1/26 : Orange 开始出现病例

 
 
  • 1
  • 2
  • 3
  • 4
  • 5
(4)

问:按州统计单日死亡增加数,并给出哪个州在哪一天确诊数增加最大(这里指的是在所有州和所有天两个指标一起算,不是分别算)。

Q3_4_confirmed = Q3_confirmed.set_index('Province_State').iloc[:, 10:].groupby('Province_State').sum()
Q3_4_death = Q3_death.set_index('Province_State').iloc[:, 11:].groupby('Province_State').sum()

Date = Q3_4_death.columns[0:]
death_gain = []
confirmed_gain = []
for date in range(len(Date)-1):
death_gain.append(Q3_4_death.iloc[:, date+1] - Q3_4_death.iloc[:, date])
confirmed_gain.append(Q3_4_confirmed.iloc[:, date+1] - Q3_4_confirmed.iloc[:, date])
death_gain = pd.DataFrame(np.array(death_gain).T, index=Q3_4_death.index, columns=Q3_4_death.columns[1:])
confirmed_gain = pd.DataFrame(np.array(confirmed_gain).T, index=Q3_4_confirmed.index, columns=Q3_4_death.columns[1:])

print(‘死亡增加数统计结果:’)
display(death_gain)

print(‘各州确诊数增加最大情况:’)
confirmed_gain.idxmax(axis=1)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
死亡增加数统计结果:

 
 
  • 1
2020/1/232020/1/242020/1/252020/1/262020/1/272020/1/282020/1/292020/1/302020/1/312020/2/1...2020/4/172020/4/182020/4/192020/4/202020/4/212020/4/222020/4/232020/4/242020/4/252020/4/26
Province_State
Alabama0000000000...15546201367010
Alaska0000000000...0000000000
Arizona0000000000...539371823181772
Arkansas0000000000...0112103213
California0000000000...81103354857139112886836
Colorado0000000000...1717310632344121104
Connecticut0000000000...65494020492121931289865
Delaware0000000000...960510738128
District of Columbia0000000000...555971512141213
Florida0000000000...5723264845269459920
Georgia0000000000...5623148625483218711
Hawaii0000000000...0010020101
Idaho0000000000...0211333020
Illinois0000000000...601253159120981241068059
Indiana0000000000...46231715583140354527
Iowa0000000000...410144761156
Kansas0000000000...2389731521
Kentucky0000000000...7221417146953
Louisiana0000000000...57542932766968614326
Maine0000000000...2321135303
Maryland0000000000...15874029425852456414
Massachusetts0000000000...135156301026700076441
Michigan0000000000...22980827510623816210667158
Minnesota0000000000...2410139171921212328
Mississippi0000000000...1112710141088126
Missouri0000000000...1413212414131992
Montana0000000000...1200220000
Nebraska0000000000...00300020000
Nevada0000000000...10107149185100
New Hampshire0000000000...0000000000
New Jersey0000000000...3222312941342463027625723124
New Mexico0000000000...89230137696
New York0000000000...4525204809476154605402121054260
North Carolina0000000000...27101318272016131119
North Dakota0000000000...0000411011
Ohio0000000000...27332038485346342117
Oklahoma0000000000...00932169961
Oregon0000000000...6221305315
Pennsylvania0000000000...80121234722665951234623
Rhode Island0000000000...000000-3000
South Carolina0000000000...531311610798
South Dakota0000000000...0000110101
Tennessee0000000000...51645014-283
Texas0000000000...37251417262348201413
Utah0000000000...10200303410
Vermont0000000000...2201203120
Virginia0000000000...330020016524382612
Washington0000000000...2410237212225111512
West Virginia0000000000...10000024102
Wisconsin0000000000...9681014410536
Wyoming0000000000...0000000000

51 rows × 95 columns

各州确诊数增加最大情况:

Province_State
Alabama                 2020/4/26
Alaska                  2020/3/28
Arizona                 2020/4/23
Arkansas                2020/4/23
California              2020/4/20
Colorado                2020/4/24
Connecticut             2020/4/22
Delaware                2020/4/26
District of Columbia     2020/4/8
Florida                  2020/4/2
Georgia                 2020/4/14
Hawaii                   2020/4/3
Idaho                    2020/4/2
Illinois                2020/4/24
Indiana                 2020/4/25
Iowa                    2020/4/25
Kansas                  2020/4/23
Kentucky                2020/4/12
Louisiana                2020/4/2
Maine                    2020/4/2
Maryland                 2020/4/8
Massachusetts           2020/4/24
Michigan                 2020/4/3
Minnesota               2020/4/17
Mississippi             2020/4/19
Missouri                2020/4/10
Montana                  2020/4/2
Nebraska                2020/4/23
Nevada                  2020/3/30
New Hampshire           2020/4/15
New Jersey              2020/4/22
New Mexico              2020/4/22
New York                2020/4/15
North Carolina          2020/4/25
North Dakota            2020/4/18
Ohio                    2020/4/19
Oklahoma                2020/4/19
Oregon                   2020/4/5
Pennsylvania            2020/4/23
Rhode Island            2020/4/15
South Carolina           2020/4/9
South Dakota            2020/4/15
Tennessee               2020/4/23
Texas                    2020/4/9
Utah                    2020/4/10
Vermont                 2020/4/11
Virginia                2020/4/25
Washington               2020/4/2
West Virginia           2020/4/19
Wisconsin               2020/4/25
Wyoming                 2020/4/21
dtype: object

 
 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
(5)

问:**现需对每个州编制确证与死亡表,第一列为时间,并且起始时间为该州开始出现死亡比例的那一天,第二列和第三列分别为确证数和死亡数,每个州需要保存为一个单独的 csv 文件,文件名为“州名.csv” **

Q3_5_confirmed = Q3_confirmed.set_index('Province_State').iloc[:, 10:].groupby('Province_State').sum().T
Q3_5_death = Q3_death.set_index('Province_State').iloc[:, 11:].groupby('Province_State').sum().T

zhou = Q3_5_confirmed.columns[:]
for name in list(zhou):
confirmed = Q3_5_confirmed.loc[:, name]
death = Q3_5_death.loc[:, name]
for date in Q3_5_confirmed.index:
if death.loc[date] > 0:
start_day = date
break
save_data = pd.DataFrame( {‘confirmed num’:confirmed.loc[start_day:], ‘death num’:death.loc[start_day:]} )
save_data.to_csv(name + ‘.csv’)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
(6)

问:**现需对 4 月 1 日至 4 月 10 日编制新增确证数与新增死亡数表,第一列为州名,第二列和第三列分别为新增确证数和新增死亡数,分别保存为十个单独的 csv 文件,文件名为“日期.csv” **

Q3_6_death_gain = death_gain.loc[:, '2020/4/1':'2020/4/10']
Q3_6_confirm_gain = confirmed_gain.loc[:, '2020/4/1':'2020/4/10']
# display(Q3_6_death_gain)
# display(Q3_6_confirm_gain)
Date = Q3_6_death_gain.columns[0:]

# pd.DataFrame({‘add_confirm_num’:Q3_6_confirm_gain.loc[:, date], ‘add_death_num’:Q3_6_death_gain.loc[:, date]})
for date in list(Date):
save_data = pd.DataFrame({‘add_confirm_num’:Q3_6_confirm_gain.loc[:, date], ‘add_death_num’:Q3_6_death_gain.loc[:, date]})
save_data.to_csv(date.replace(’/’, ‘-’) + ‘.csv’)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
                                </div><div data-report-view="{&quot;mod&quot;:&quot;1585297308_001&quot;,&quot;dest&quot;:&quot;https://blog.csdn.net/KF_Guan/article/details/105879341&quot;,&quot;extend1&quot;:&quot;pc&quot;,&quot;ab&quot;:&quot;new&quot;}"><div></div></div>
            <link href="https://csdnimg.cn/release/phoenix/mdeditor/markdown_views-60ecaf1f42.css" rel="stylesheet">
                                <div data-report-view="{&quot;mod&quot;:&quot;popu_387&quot;,&quot;dest&quot;:&quot;https://blog.csdn.net/KF_Guan/article/details/105879341&quot;,&quot;extend1&quot;:&quot;pc&quot;,&quot;ab&quot;:&quot;new&quot;}"></div>
                    
        <div class="person-messagebox">
            <div class="left-message"><a href="https://blog.csdn.net/KF_Guan">
                <img src="https://profile.csdnimg.cn/A/6/C/3_kf_guan" class="avatar_pic" username="KF_Guan">
            </a></div>
            <div class="middle-message">
                                    <div class="title"><span class="tit "><a href="https://blog.csdn.net/KF_Guan" data-report-click="{&quot;mod&quot;:&quot;popu_379&quot;,&quot;ab&quot;:&quot;new&quot;}" target="_blank">KF_Guan</a></span>
                    <!-- 等级,level -->
                                            <img class="identity-icon" src="https://csdnimg.cn/identity/blog1.png">                                            </div>
                <div class="text"><span>原创文章 11</span><span>获赞 0</span><span>访问量 273</span></div>
            </div>
                            <div class="right-message">
                                        <a class="btn btn-sm  bt-button personal-watch attented" data-report-click="{&quot;mod&quot;:&quot;popu_379&quot;,&quot;ab&quot;:&quot;new&quot;}">已关注</a>
                                                            <a href="https://im.csdn.net/im/main.html?userName=KF_Guan" target="_blank" class="btn btn-sm bt-button personal-letter">私信
                    </a>
                                </div>
                        </div>
                    
    </div>
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值