【问题1】分组聚合-----非时间类型
import pandas as pd
from matplotlib import pyplot as plt
df = pd.read_csv('./books-Copy1.csv')
df1 = df[ pd.notnull(df['original_publication_year']) ]
'''
注意:下面这3种方式是一样的。推荐第二种
(1)应该先groupby完后,再选“rating”列,最后求均值mean()。这个顺序更好
'''
data1 = df1.groupby( by=df['original_publication_year'] )['average_rating'].mean()
data1 = df1.groupby( by=df['original_publication_year'] ).mean()['average_rating']
data1 = df1['average_rating'].groupby( by=df['original_publication_year'] ).mean()
print(data1)
original_publication_year
-1750.0 3.630000
-762.0 4.030000
-750.0 4.005000
-720.0 3.730000
-560.0 4.050000
...
2013.0 4.012297
2014.0 3.985378
2015.0 3.954641
2016.0 4.027576
2017.0 4.100909
Name: average_rating, Length: 293, dtype: float64
【问题2】 分组聚合------时间类型
步骤:
(1)先将时间字符串转化为时间类型 df['timeStamp'] = pd.to_datetime(df['timeStamp'])
(2)再将该列设置为索引 df.set_index('timeStamp', inplace=True)
注意:只有设置为索引后,才能对时间类型进行分组聚合
(3)对时间序列进行分组聚合
例如:按照“月”计数。取“title”列 count_by_month = df.resample('M').count()['title']
例如:按照“cate”列计数。取“title”列 grouped = df.groupby(by='cate').count()['title']
import pandas as pd
from matplotlib import pyplot as plt
df = pd.read_csv('./911-Copy1.csv')
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
df.set_index('timeStamp', inplace=True)
count_by_month = df.resample('M').count()['title']
print(df.head())
print(count_by_month)
_x = count_by_month.index
_y = count_by_month.values
_x = [i.strftime('%Y%m%d') for i in _x]
plt.figure(figsize=(20,8), dpi=80)
plt.plot(range(len(_x)), _y, label='title')
plt.xticks(range(len(_x)), _x, rotation=45)
plt.legend(loc='best')
plt.show()
lat lng \
timeStamp
2015-12-10 17:10:52 40.297876 -75.581294
2015-12-10 17:29:21 40.258061 -75.264680
2015-12-10 14:39:21 40.121182 -75.351975
2015-12-10 16:47:36 40.116153 -75.343513
2015-12-10 16:56:52 40.251492 -75.603350
desc \
timeStamp
2015-12-10 17:10:52 REINDEER CT & DEAD END; NEW HANOVER; Station ...
2015-12-10 17:29:21 BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...
2015-12-10 14:39:21 HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...
2015-12-10 16:47:36 AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...
2015-12-10 16:56:52 CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...
zip title twp \
timeStamp
2015-12-10 17:10:52 19525.0 EMS: BACK PAINS/INJURY NEW HANOVER
2015-12-10 17:29:21 19446.0 EMS: DIABETIC EMERGENCY HATFIELD TOWNSHIP
2015-12-10 14:39:21 19401.0 Fire: GAS-ODOR/LEAK NORRISTOWN
2015-12-10 16:47:36 19401.0 EMS: CARDIAC EMERGENCY NORRISTOWN
2015-12-10 16:56:52 NaN EMS: DIZZINESS LOWER POTTSGROVE
addr e
timeStamp
2015-12-10 17:10:52 REINDEER CT & DEAD END 1
2015-12-10 17:29:21 BRIAR PATH & WHITEMARSH LN 1
2015-12-10 14:39:21 HAWS AVE 1
2015-12-10 16:47:36 AIRY ST & SWEDE ST 1
2015-12-10 16:56:52 CHERRYWOOD CT & DEAD END 1
timeStamp
2015-12-31 7916
2016-01-31 13096
2016-02-29 11396
2016-03-31 11059
2016-04-30 11287
2016-05-31 11374
2016-06-30 11732
2016-07-31 12088
2016-08-31 11904
2016-09-30 11669
2016-10-31 12502
2016-11-30 12091
2016-12-31 12162
2017-01-31 11605
2017-02-28 10267
2017-03-31 11684
2017-04-30 11056
2017-05-31 11719
2017-06-30 12333
2017-07-31 11768
2017-08-31 11753
2017-09-30 11332
2017-10-31 12337
2017-11-30 11548
2017-12-31 12941
2018-01-31 13123
2018-02-28 11165
2018-03-31 14923
2018-04-30 11240
2018-05-31 12551
2018-06-30 12106
2018-07-31 12549
2018-08-31 12315
2018-09-30 12338
2018-10-31 12976
2018-11-30 14097
2018-12-31 12144
2019-01-31 12304
2019-02-28 11556
2019-03-31 12441
2019-04-30 11845
2019-05-31 12823
2019-06-30 12322
2019-07-31 13166
2019-08-31 12387
2019-09-30 11874
2019-10-31 13425
2019-11-30 12446
2019-12-31 12529
2020-01-31 12208
2020-02-29 11043
2020-03-31 9920
2020-04-30 8243
2020-05-31 7220
Freq: M, Name: title, dtype: int64
![在这里插入图片描述](https://img-blog.csdnimg.cn/20210511193852224.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ0NjQ3NTU5,size_16,color_FFFFFF,t_70#pic_center)
【问题3】关于索引
若“one”为列 :则a['one']
若“one”为索引:则a.loc['one']