不规则间隔时间序列转规则时间序列

pandas对时间序列中缺失值进行线性插值_Johngo学长 (johngo689.com)

一、问题背景

以二氧化碳浓度数据集为例,该时间序列在某些时间点上会有缺失值。

# 读取数据集
df = pd.read_csv('二氧化碳浓度.csv')
# 将字段Datetime数据类型转换为日期类型
df['datetime'] = pd.to_datetime(df['datetime'], format="%Y-%m-%d %H:%M:%S")
df

value datetime 0 370 2022-04-13 00:11:00 1 375 2022-04-13 00:42:00 2 383 2022-04-13 01:11:00 3 391 2022-04-13 01:41:00 4 397 2022-04-13 02:11:00 … … … 583 369 2022-04-23 22:12:00 584 376 2022-04-23 22:41:00 585 379 2022-04-23 23:11:00 586 379 2022-04-23 23:12:00 587 378 2022-04-23 23:41:00

588 rows × 2 columns

该数据集是没个半小时一条数据,若想要填充成10分钟一条数据,就需要进行线性插值。

二、pandas解决方案

下面我们需要生成一个辅助dataframe,命名为helper, 包含所有日期。

# pd.date_range时间序列
# 间隔十分钟
helper = pd.DataFrame({'datetime': pd.date_range(start=df['datetime'].min(), end=df['datetime'].max(),freq='600s')})
# helper = pd.DataFrame(helper).set_index('datetime')#将时间列变为索引

helper:

datetime 0 2022-04-13 00:11:00 1 2022-04-13 00:21:00 2 2022-04-13 00:31:00 3 2022-04-13 00:41:00 4 2022-04-13 00:51:00 … … 1577 2022-04-23 23:01:00 1578 2022-04-23 23:11:00 1579 2022-04-23 23:21:00 1580 2022-04-23 23:31:00 1581 2022-04-23 23:41:00

1582 rows × 1 columns

然后再将helper和d join起来

d = pd.merge(df, helper, on='datetime', how='outer').sort_values('datetime')

最后一步就是插值了,直接用interpolate方法,method选择linear,线性插值。

d['value'] = d['value'].interpolate(method='linear')

value datetime 0 370.0 2022-04-13 00:11:00 588 NaN 2022-04-13 00:21:00 589 NaN 2022-04-13 00:31:00 590 NaN 2022-04-13 00:41:00 1 375.0 2022-04-13 00:42:00 … … … 585 379.0 2022-04-23 23:11:00 586 379.0 2022-04-23 23:12:00 1658 NaN 2022-04-23 23:21:00 1659 NaN 2022-04-23 23:31:00 587 378.0 2022-04-23 23:41:00

1660 rows × 2 columns

插值选择方法不止有线性,还可以是

Original: pandas对时间序列中缺失值进行线性插值 - 王陸 - 博客园
Author: 王陸
Title: pandas对时间序列中缺失值进行线性插值

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/8417/

转载文章受原作者版权保护。转载请注明原作者出处!

======================================================================

pandas插值--非规则时间序列 

pandas插值--非规则时间序列 - Heywhale.com

问题描述:现有一观测资料,时间间隔非均匀,如何将其插值到均匀的时间序列?

这个问题可以参考:工作台 - Heywhale.com ,具体步骤如下

  • 生成目标时间的list,和当前index取交集
  • 将交集reindex为新的索引,填充缺测,调用插值函数
  • 再次reindex到目标时间list

当然,这里也可以考虑利用xarray的插值函数

原始数据

In [1]:

import pandas as pd 
import xarray as xr 
import numpy as np
Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.14.1

In [2]:

df = pd.read_csv("/home/mw/input/pcp6426/interp2.csv", parse_dates=[0], index_col=0)
df 

Out[2]:

data
time
2019-12-10 17:17:37.1406248960.255970
2019-12-10 17:17:37.3398443520.364656
2019-12-10 17:17:37.5390620160.206504
2019-12-10 17:17:37.7382807040.491149
2019-12-10 17:17:37.9374999040.037144
......
2019-12-10 17:17:56.1406248960.400654
2019-12-10 17:17:56.3398443520.382707
2019-12-10 17:17:56.5390620160.094809
2019-12-10 17:17:56.7382807040.049449
2019-12-10 17:17:56.9374999040.793193

100 rows × 1 columns

转到xarray

In [3]:

ds = df.to_xarray() 
ds

Out[3]:

xarray.Dataset

  • Dimensions:
    • time: 100
  • Coordinates:
    • time

      (time)

      datetime64[ns]

      2019-12-10T17:17:37.140624896 .....

  • Data variables:
    • data

      (time)

      float64

      0.256 0.3647 ... 0.04945 0.7932

      array([0.25596959, 0.36465587, 0.20650436, 0.49114901, 0.03714372,
             0.28040043, 0.70309464, 0.68119741, 0.05901183, 0.67645174,
             0.28849891, 0.50132343, 0.62840763, 0.95099851, 0.02626051,
             0.33692432, 0.46553236, 0.50861759, 0.74550601, 0.55250929,
             0.52566986, 0.52261307, 0.37321184, 0.79995134, 0.23094678,
             0.87915976, 0.70379002, 0.1751345 , 0.08580627, 0.48962096,
             0.85569395, 0.20317885, 0.80113267, 0.64351059, 0.06970162,
             0.3934217 , 0.44517141, 0.89346621, 0.55945748, 0.72797598,
             0.02390159, 0.20475986, 0.51505429, 0.82730583, 0.77451266,
             0.65776446, 0.6155744 , 0.71989889, 0.01017897, 0.38116152,
             0.9602366 , 0.86313496, 0.76415434, 0.85792074, 0.84286672,
             0.56732516, 0.58180219, 0.23357688, 0.77524271, 0.56301411,
             0.48070764, 0.19323841, 0.37067283, 0.14380672, 0.8256961 ,
             0.90791718, 0.31419928, 0.42323499, 0.58264084, 0.73124201,
             0.80804131, 0.47969966, 0.2311884 , 0.45588249, 0.72280821,
             0.64822314, 0.84239576, 0.14129363, 0.59488152, 0.62924007,
             0.93597915, 0.96099593, 0.63139743, 0.86937852, 0.59717561,
             0.06693927, 0.93089973, 0.45931833, 0.51508115, 0.40343549,
             0.22152076, 0.84937486, 0.30642194, 0.8782651 , 0.09930092,
             0.40065394, 0.38270678, 0.09480903, 0.04944922, 0.79319296])
  • Attributes: (0)

插值

In [4]:

idx = pd.date_range('2019-12-10 17:17:37','2019-12-10 17:17:57',freq='0.2S')
ds = ds.interp(time=idx)

In [5]:

ds

Out[5]:

xarray.Dataset

  • Dimensions:
    • time: 101
  • Coordinates:
    • time

      (time)

      datetime64[ns]

      2019-12-10T17:17:37 ... 2019-12-...

  • Data variables:
    • data

      (time)

      float64

      nan 0.2884 0.3169 ... 0.2799 nan

      array([       nan, 0.2883623 , 0.31690053, 0.29357285, 0.35049546,
             0.11199205, 0.40889926, 0.69618861, 0.49088025, 0.25029839,
             0.55474113, 0.35610155, 0.54140369, 0.7270833 , 0.66450917,
             0.12372359, 0.37699039, 0.47854235, 0.58107818, 0.68571448,
             0.54408908, 0.52471756, 0.47749996, 0.503745  , 0.62367034,
             0.43430757, 0.82452572, 0.54415751, 0.14781035, 0.21091047,
             0.60446731, 0.65241178, 0.38373663, 0.75291846, 0.46574115,
             0.17126078, 0.40954361, 0.58053824, 0.79129799, 0.61166552,
             0.50709006, 0.08024552, 0.29845618, 0.61056731, 0.81095019,
             0.7378858 , 0.64462072, 0.64707616, 0.5028063 , 0.12511157,
             0.562832  , 0.92998591, 0.83324683, 0.79283606, 0.85325691,
             0.75642237, 0.57183528, 0.47665228, 0.39926427, 0.70949302,
             0.53719249, 0.39115053, 0.2468164 , 0.30127792, 0.35506012,
             0.85149093, 0.72295248, 0.34712363, 0.47199483, 0.62867837,
             0.75533589, 0.70575096, 0.40465934, 0.29991892, 0.53857767,
             0.69940899, 0.70871497, 0.63069137, 0.28003931, 0.60552601,
             0.72547187, 0.94377278, 0.86147056, 0.70419224, 0.78504843,
             0.43402571, 0.3244329 , 0.78850129, 0.47637533, 0.48049266,
             0.34746164, 0.40864556, 0.68542511, 0.48134011, 0.63693735,
             0.19202507, 0.39530499, 0.29577334, 0.08093415, 0.27986546,
                    nan])
  • Attributes: (0)

参考:Interpolating data

转回pandas

In [6]:

df = ds.to_dataframe()
df

Out[6]:

data
time
2019-12-10 17:17:37.000NaN
2019-12-10 17:17:37.2000.288362
2019-12-10 17:17:37.4000.316901
2019-12-10 17:17:37.6000.293573
2019-12-10 17:17:37.8000.350495
......
2019-12-10 17:17:56.2000.395305
2019-12-10 17:17:56.4000.295773
2019-12-10 17:17:56.6000.080934
2019-12-10 17:17:56.8000.279865
2019-12-10 17:17:57.000NaN

101 rows × 1 columns

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值