python按一小时进行分组,Python-如何将具有相同时间戳但不同日期的值按小时分组...

最新推荐文章于 2023-08-29 16:24:19 发布

唐臭臭

最新推荐文章于 2023-08-29 16:24:19 发布

阅读量385

点赞数

文章标签： python按一小时进行分组

I need to show the amount of calls that are made within every hour in an entire month. So far I could resample the CSV in the following way:

Amount

Date

2017-03-01 00:00:00 5

2017-03-01 01:00:00 1

2017-03-31 22:00:00 7

2017-03-31 23:00:00 2

The date is a datetimeIndex y resampled all values in intervals of one hour.

What I need is to be able to group by hour all rows, what i mean is to group all the calls that are made in every day of the month at 21:00,for example, sum the amount and show it in a single row.

For example:

Amount

Date

2017-03 00:00:00 600

2017-03 01:00:00 200

2017-03 02:00:00 30

2017-03 22:00:00 500

2017-03 23:00:00 150

解决方案

You can use DatetimeIndex.strftime with groupby and aggregating sum:

df1 = df.groupby(df.index.strftime('%Y-%m %H:%M:%S'))[['Amount']].sum()

#with borrowing sample from Allen

print (df1)

Amount

2017-03 00:00:00 1528

2017-03 01:00:00 1505

2017-03 02:00:00 1606

2017-03 03:00:00 1750

2017-03 04:00:00 1493

2017-03 05:00:00 1649

2017-03 06:00:00 1390

2017-03 07:00:00 1147

2017-03 08:00:00 1687

2017-03 09:00:00 1602

2017-03 10:00:00 1755

2017-03 11:00:00 1381

2017-03 12:00:00 1390

2017-03 13:00:00 1565

2017-03 14:00:00 1805

2017-03 15:00:00 1674

2017-03 16:00:00 1375

2017-03 17:00:00 1601

2017-03 18:00:00 1493

2017-03 19:00:00 1422

2017-03 20:00:00 1781

2017-03 21:00:00 1709

2017-03 22:00:00 1578

2017-03 23:00:00 1583

Another solution with DatetimeIndex.to_period and DatetimeIndex.hour. Then use groupby and sum and last create Index from MultiIndex with map:

a = df.index.to_period('M')

b = df.index.hour

df1 = df.groupby([a,b])[['Amount']].sum()

#http://stackoverflow.com/questions/17118071/python-add-leading-zeroes-using-str-format

df1.index = df1.index.map(lambda x: '{0[0]} {0[1]:0>2}:00:00'.format(x))

print (df1)

Amount

2017-03 00:00:00 1739

2017-03 01:00:00 1502

2017-03 02:00:00 1585

2017-03 03:00:00 1710

2017-03 04:00:00 1679

2017-03 05:00:00 1371

2017-03 06:00:00 1489

2017-03 07:00:00 1252

2017-03 08:00:00 1540

2017-03 09:00:00 1443

2017-03 10:00:00 1589

2017-03 11:00:00 1499

2017-03 12:00:00 1837

2017-03 13:00:00 1834

2017-03 14:00:00 1695

2017-03 15:00:00 1616

2017-03 16:00:00 1499

2017-03 17:00:00 1329

2017-03 18:00:00 1727

2017-03 19:00:00 1764

2017-03 20:00:00 1754

2017-03 21:00:00 1621

2017-03 22:00:00 1486

2017-03 23:00:00 1672

Timings:

In [394]: %timeit (jez(df))

1 loop, best of 3: 630 ms per loop

In [395]: %timeit (df.groupby(df.index.strftime('%Y-%m %H:%M:%S'))[['Amount']].sum())

1 loop, best of 3: 792 ms per loop

#Allen's solution

In [396]: %timeit (df.groupby([dt.datetime.strftime(e, '%Y-%m %H:%M:%S') for e in df.index])['Amount'].sum().to_frame())

1 loop, best of 3: 663 ms per loop

Code for timings:

np.random.seed(100)

#[68712 rows x 1 columns]

idx = pd.date_range('2010-03-01 00:00:00', '2017-12-31 23:00:00', freq='H')

df = pd.DataFrame(index=idx, columns=['Amount'], data=np.random.randint(1,100,len(idx)))

print (df.head())

def jez(df):

a = df.index.to_period('M')

b = df.index.hour

df1 = df.groupby([a,b])[['Amount']].sum()

df1.index = df1.index.map(lambda x: '{0[0]} {0[1]:0>2}:00:00'.format(x))

return (df1)

唐臭臭

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫