pandas分组及时间序列案例

最新推荐文章于 2021-09-01 20:29:40 发布

johnnyhan321

最新推荐文章于 2021-09-01 20:29:40 发布

阅读量434

点赞数 1

本文链接：https://blog.csdn.net/weixin_52143641/article/details/112116246

版权

分组案例一：

import numpy as np
import pandas as pd
import time
from datetime import datetime
df2=pd.read_csv(r"E:\拜师\14100_HM数据科学库课件\14100_HM数据科学库课件\day05\code\starbucks_store_worldwide.csv")
df2.head() # 星巴克训练数据
在这里插入图片描述
df2.columns

要求统计中国各个城市的店铺数量

df2.loc[df2[‘Country’]==‘CN’,‘City’].value_counts() # .loc[]的布尔索引，再分组统计
在这里插入图片描述

案例二：字符串提取及时间序列分组的两种方法

df=pd.read_csv(r"E:\拜师\14100_HM数据科学库课件\14100_HM数据科学库课件\datasourse\911\911.csv")
df.head()
# 简要信息： 911报警的数据集，要求统计不同时间的不同报警类型，报警类型信息在title中，以冒号分割，需要提取冒号前内容，且time为字符串格式，需要转化。
在这里插入图片描述
df.title

- 首先提取冒号前的报警类型，可以用正则表达式，这里用的更通用的方法，用index方法。
df[‘title_new’]=df.title.apply(lambda x : x[ : x.index(’:’)])
df

- 方法一：新建列，将时间转化为时间格式，提取年月。
df[‘time_month’]=df.timeStamp.apply(lambda x : datetime.strptime(x,’%Y-%m-%d %H:%M:%S’).month)
df[‘time_year’]=df.timeStamp.apply(lambda x : datetime.strptime(x,’%Y-%m-%d %H:%M:%S’).year)
df[[‘time_year’,‘time_month’]]
在这里插入图片描述
- 用time_year分组对新建的title_new统计。
df2=df.groupby(‘time_year’)[‘title_new’].value_counts()
df2

- 方法二：用to_datetime转化格式，再用pd.resample方法
#pd.to_datetime,转化为时间格式
df[‘time’]=pd.to_datetime(df.timeStamp)
#把时间设置为index，然后用pd.resample，可以进行快速分类
df.set_index(‘time’,inplace=True)
df.resample(‘Y’).title_new.value_counts()
在这里插入图片描述

johnnyhan321

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
pandas分组及时间序列案例

分组案例一：import numpy as npimport pandas as pdimport timefrom datetime import datetimedf2=pd.read_csv(r"E:\拜师\14100_HM数据科学库课件\14100_HM数据科学库课件\day05\code\starbucks_store_worldwide.csv")df2.head() # 星巴克训练数据df2.columns要求统计中国各个城市的店铺数量df2.loc[df2[‘Count
复制链接

扫一扫