python时间序列进行线性插值,Python使用线性插值对不规则时间序列进行正则化

I have a time series in pandas that looks like this:

Values

1992-08-27 07:46:48 28.0

1992-08-27 08:00:48 28.2

1992-08-27 08:33:48 28.4

1992-08-27 08:43:48 28.8

1992-08-27 08:48:48 29.0

1992-08-27 08:51:48 29.2

1992-08-27 08:53:48 29.6

1992-08-27 08:56:48 29.8

1992-08-27 09:03:48 30.0

I would like to resample it to a regular time series with 15 min times steps where the values are linearly interpolated. Basically I would like to get:

Values

1992-08-27 08:00:00 28.2

1992-08-27 08:15:00 28.3

1992-08-27 08:30:00 28.4

1992-08-27 08:45:00 28.8

1992-08-27 09:00:00 29.9

However using the resample method (df.resample('15Min')) from Pandas I get:

Values

1992-08-27 08:00:00 28.20

1992-08-27 08:15:00 NaN

1992-08-27 08:30:00 28.60

1992-08-27 08:45:00 29.40

1992-08-27 09:00:00 30.00

I have tried the resample method with different 'how' and 'fill_method' parameters but never got exactly the results I wanted. Am I using the wrong method?

I figure this is a fairly simple query, but I have searched the web for a while and couldn't find an answer.

Thanks in advance for any help I can get.

解决方案

It takes a bit of work, but try this out. Basic idea is find the closest two timestamps to each resample point and interpolate. np.searchsorted is used to find dates closest to the resample point.

# empty frame with desired index

rs = pd.DataFrame(index=df.resample('15min').iloc[1:].index)

# array of indexes corresponding with closest timestamp after resample

idx_after = np.searchsorted(df.index.values, rs.index.values)

# values and timestamp before/after resample

rs['after'] = df.loc[df.index[idx_after], 'Values'].values

rs['before'] = df.loc[df.index[idx_after - 1], 'Values'].values

rs['after_time'] = df.index[idx_after]

rs['before_time'] = df.index[idx_after - 1]

#calculate new weighted value

rs['span'] = (rs['after_time'] - rs['before_time'])

rs['after_weight'] = (rs['after_time'] - rs.index) / rs['span']

# I got errors here unless I turn the index to a series

rs['before_weight'] = (pd.Series(data=rs.index, index=rs.index) - rs['before_time']) / rs['span']

rs['Values'] = rs.eval('before * before_weight + after * after_weight')

After all that, hopefully the right answer:

In [161]: rs['Values']

Out[161]:

1992-08-27 08:00:00 28.011429

1992-08-27 08:15:00 28.313939

1992-08-27 08:30:00 28.223030

1992-08-27 08:45:00 28.952000

1992-08-27 09:00:00 29.908571

Freq: 15T, Name: Values, dtype: float64

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值