python无效数据怎么办,如何在Python中按天汇总时间序列数据? resample.sum()无效...

I am new to Python. How do I sum data based on date and plot the result?

I have a Series object with data like:

2017-11-03 07:30:00 NaN

2017-11-03 09:18:00 NaN

2017-11-03 10:00:00 NaN

2017-11-03 11:08:00 NaN

2017-11-03 14:39:00 NaN

2017-11-03 14:53:00 NaN

2017-11-03 15:00:00 NaN

2017-11-03 16:00:00 NaN

2017-11-03 17:03:00 NaN

2017-11-03 17:42:00 800.0

2017-11-04 07:27:00 600.0

2017-11-04 10:10:00 NaN

2017-11-04 11:48:00 NaN

2017-11-04 12:58:00 500.0

2017-11-04 13:40:00 NaN

2017-11-04 15:15:00 NaN

2017-11-04 16:21:00 NaN

2017-11-04 17:37:00 500.0

2017-11-04 21:37:00 NaN

2017-11-05 03:00:00 NaN

2017-11-05 06:30:00 NaN

2017-11-05 07:19:00 NaN

2017-11-05 08:31:00 200.0

2017-11-05 09:31:00 500.0

2017-11-05 12:03:00 NaN

2017-11-05 12:25:00 200.0

2017-11-05 13:11:00 500.0

2017-11-05 16:31:00 NaN

2017-11-05 19:00:00 500.0

2017-11-06 08:08:00 NaN

I have the following code:

# load packages

import pandas as pd

import matplotlib.pyplot as plt

# import painkiller data

df = pd.read_csv('/Users/user/Documents/health/PainOverTime.csv',delimiter=',')

# plot bar graph of date and painkiller amount

times = pd.to_datetime(df.loc[:,'Time'])

ts = pd.Series(df.loc[:,'acetaminophen'].values, index = times,

name = 'Painkiller over Time')

ts.plot()

This gives me the following line(?) graph:

35fe1a6a1729db0df1c48793c2e5757d.png

It's a start; now I want to sum the doses by date. However, this code fails to effect any change: The resulting plot is the same. What is wrong?

ts.resample('D',closed='left', label='right').sum()

ts.plot()

I have also tried ts.resample('D').sum(), ts.resample('1d').sum(), ts.resample('1D').sum(), but there is no change in the plot.

Is .resample even the correct function? I understand resampling to be sampling from the data, e.g. randomly taking one point per day, whereas I want to sum each day's values.

Namely, I'm hoping for some result (based on the above data) like:

2017-11-03 800

2017-11-04 1600

2017-11-05 1900

2017-11-06 NaN

解决方案

This answer helped me see that I needed to assign it to a new object (if that's the right terminology):

import pandas as pd

import matplotlib.pyplot as plt

df = pd.read_csv('/Users/user/Documents/health/PainOverTime.csv',delimiter=',')

# plot bar graph of date and painkiller amount

times = pd.to_datetime(df.loc[:,'Time'])

# raw plot of data

ts = pd.Series(df.loc[:,'acetaminophen'].values, index = times,

name = 'Painkiller over Time')

fig1 = ts.plot()

# combine data by day

test2 = ts.resample('D').sum()

fig2 = test2.plot()

That produces the following plots:

befb084a88ee2fb0bab84f53d32da7d2.png

1e70061a9265edc3d3a09438b5acf07d.png

Is this method not better than the 'groupby' function?

Now how do I make a scatter or bar plot instead of this line plot...?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值