Python数据分析与机器学习44-Python生成时间序列

一. Python 生成时间序列

时间序列

  • 时间戳(timestamp)
  • 固定周期(period)
  • 时间间隔(interval)

date_range

  • 可以指定开始时间与周期
  • H:小时
  • D:天
  • M:月

二.生成不同间隔的时间序列

代码:

import pandas as pd
import numpy as np
import datetime as dt

# 从2022-07-01开始,间隔3天,生成10条 时间数据
rng = pd.date_range('2022-07-01', periods = 10, freq = '3D')
print(rng)
print("#####################")

# 指定开始时间,结束时间  以及频率
data=pd.date_range('2022-01-01','2023-01-01',freq='M')
print(data)
print("#####################")

# 从2022-01-01开始,间隔1天,生成20条 时间数据
time=pd.Series(np.random.randn(20),
           index=pd.date_range(dt.datetime(2022,1,1),periods=20))
print(time)
print("#####################")

# 不规则的时间间隔
p1 = pd.period_range('2022-01-01 10:10', freq = '25H', periods = 10)
print(p1)
print("######################################")

# 指定索引
rng = pd.date_range('2022 Jul 1', periods = 10, freq = 'D')
print(pd.Series(range(len(rng)), index = rng))
print("######################################")

测试记录:

DatetimeIndex(['2022-07-01', '2022-07-04', '2022-07-07', '2022-07-10',
               '2022-07-13', '2022-07-16', '2022-07-19', '2022-07-22',
               '2022-07-25', '2022-07-28'],
              dtype='datetime64[ns]', freq='3D')
#####################
DatetimeIndex(['2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30',
               '2022-05-31', '2022-06-30', '2022-07-31', '2022-08-31',
               '2022-09-30', '2022-10-31', '2022-11-30', '2022-12-31'],
              dtype='datetime64[ns]', freq='M')
#####################
2022-01-01   -0.957412
2022-01-02   -0.333720
2022-01-03    1.079960
2022-01-04    0.050675
2022-01-05    0.270313
2022-01-06   -0.222715
2022-01-07   -0.560258
2022-01-08    1.009430
2022-01-09   -0.678157
2022-01-10    0.213557
2022-01-11   -0.720791
2022-01-12    0.332096
2022-01-13   -0.986449
2022-01-14   -0.357303
2022-01-15   -0.559618
2022-01-16    0.480281
2022-01-17   -0.443998
2022-01-18    1.541631
2022-01-19   -0.094559
2022-01-20    1.875012
Freq: D, dtype: float64
#####################
PeriodIndex(['2022-01-01 10:00', '2022-01-02 11:00', '2022-01-03 12:00',
             '2022-01-04 13:00', '2022-01-05 14:00', '2022-01-06 15:00',
             '2022-01-07 16:00', '2022-01-08 17:00', '2022-01-09 18:00',
             '2022-01-10 19:00'],
            dtype='period[25H]', freq='25H')
######################################
2022-07-01    0
2022-07-02    1
2022-07-03    2
2022-07-04    3
2022-07-05    4
2022-07-06    5
2022-07-07    6
2022-07-08    7
2022-07-09    8
2022-07-10    9
Freq: D, dtype: int64
######################################

三. 截断时间段

代码:

import pandas as pd
import numpy as np
import datetime as dt

# 从2022-01-01开始,间隔1天,生成20条 时间数据
time=pd.Series(np.random.randn(20),
           index=pd.date_range(dt.datetime(2022,1,1),periods=20))
print(time)
print("#####################")

# 只输出2022-01-10 之后的数据
print(time.truncate(before='2022-1-10'))
print("#####################")

# 只输出2022-01-10 之后的数据
print(time.truncate(after='2022-1-10'))
print("#####################")

# 输出区间段
print(time['2022-01-15':'2022-01-20'])
print("#####################")

测试记录:

2022-01-01   -0.203552
2022-01-02   -1.035483
2022-01-03    0.252587
2022-01-04   -1.046993
2022-01-05    0.152435
2022-01-06   -0.534518
2022-01-07    0.770170
2022-01-08   -0.038129
2022-01-09    0.531485
2022-01-10    0.499937
2022-01-11    0.815295
2022-01-12    2.315740
2022-01-13   -0.443379
2022-01-14   -0.689247
2022-01-15    0.667250
2022-01-16   -2.067246
2022-01-17   -0.105151
2022-01-18   -0.420562
2022-01-19    1.012943
2022-01-20    0.509710
Freq: D, dtype: float64
#####################
2022-01-10    0.499937
2022-01-11    0.815295
2022-01-12    2.315740
2022-01-13   -0.443379
2022-01-14   -0.689247
2022-01-15    0.667250
2022-01-16   -2.067246
2022-01-17   -0.105151
2022-01-18   -0.420562
2022-01-19    1.012943
2022-01-20    0.509710
Freq: D, dtype: float64
#####################
2022-01-01   -0.203552
2022-01-02   -1.035483
2022-01-03    0.252587
2022-01-04   -1.046993
2022-01-05    0.152435
2022-01-06   -0.534518
2022-01-07    0.770170
2022-01-08   -0.038129
2022-01-09    0.531485
2022-01-10    0.499937
Freq: D, dtype: float64
#####################
2022-01-15    0.667250
2022-01-16   -2.067246
2022-01-17   -0.105151
2022-01-18   -0.420562
2022-01-19    1.012943
2022-01-20    0.509710
Freq: D, dtype: float64
#####################

四. 时间戳及时间计算

代码:

import pandas as pd
import numpy as np
import datetime as dt

#时间戳
print(pd.Timestamp('2022-07-25'))
print(pd.Timestamp('2022-07-25 10'))
print(pd.Timestamp('2022-07-25 10:15'))
print("######################################")

#时间区间
print(pd.Period('2022-01'))
print(pd.Period('2022-01-01'))
print("######################################")

#时间计算
#help(pd.Timedelta)
print(pd.Period('2022-01-01 10:10') + pd.Timedelta('1 day'))
print(pd.Period('2022-01-01 10:10:10') + pd.Timedelta('1 s'))
print("######################################")

测试记录:

2022-07-25 00:00:00
2022-07-25 10:00:00
2022-07-25 10:15:00
######################################
2022-01
2022-01-01
######################################
2022-01-02 10:10
2022-01-01 10:10:11
######################################

五. 数据重采样

数据重采样

  • 时间数据由一个频率转换到另一个频率
  • 降采样
  • 升采样

代码:

import pandas as pd
import numpy as np
import datetime as dt

# 生成时间序列
rng = pd.date_range('1/1/2022', periods=90, freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
#print(ts.head())

# 按月进行汇总
print(ts.resample('M').sum())
print("######################################")
# 按3天进行汇总
print(ts.resample('3D').sum())
print("######################################")
#  求3天的平均值
day3Ts = ts.resample('3D').mean()
print(day3Ts)
print("######################################")
# 将3天的时间序列转为1天的,结果发现很多空值
# 插值方法:
# 1. ffill 空值取前面的值
# 2. bfill 空值取后面的值
# 3. interpolate 线性取值
print(day3Ts.resample('D').asfreq())
print("######################################")
print(day3Ts.resample('D').ffill(1))
print("######################################")
print(day3Ts.resample('D').bfill(1))
print("######################################")
print(day3Ts.resample('D').interpolate('linear'))
print("######################################")

测试记录:

2022-01-31    0.904974
2022-02-28   -1.930083
2022-03-31    7.617911
Freq: M, dtype: float64
######################################
2022-01-01    0.104413
2022-01-04    2.255400
2022-01-07   -0.993552
2022-01-10    1.234344
2022-01-13   -0.621381
2022-01-16   -0.072830
2022-01-19   -0.215890
2022-01-22    0.050444
2022-01-25   -1.794619
2022-01-28    0.030952
2022-01-31   -1.022843
2022-02-03   -1.035522
2022-02-06   -1.124857
2022-02-09    1.915781
2022-02-12    0.263875
2022-02-15    0.927552
2022-02-18    0.760483
2022-02-21   -2.771669
2022-02-24    2.157336
2022-02-27    0.107964
2022-03-02   -0.852413
2022-03-05    1.252628
2022-03-08   -0.529793
2022-03-11    2.110139
2022-03-14    1.624062
2022-03-17   -0.241604
2022-03-20   -2.165326
2022-03-23    2.975993
2022-03-26    1.389412
2022-03-29    0.874324
dtype: float64
######################################
2022-01-01    0.034804
2022-01-04    0.751800
2022-01-07   -0.331184
2022-01-10    0.411448
2022-01-13   -0.207127
2022-01-16   -0.024277
2022-01-19   -0.071963
2022-01-22    0.016815
2022-01-25   -0.598206
2022-01-28    0.010317
2022-01-31   -0.340948
2022-02-03   -0.345174
2022-02-06   -0.374952
2022-02-09    0.638594
2022-02-12    0.087958
2022-02-15    0.309184
2022-02-18    0.253494
2022-02-21   -0.923890
2022-02-24    0.719112
2022-02-27    0.035988
2022-03-02   -0.284138
2022-03-05    0.417543
2022-03-08   -0.176598
2022-03-11    0.703380
2022-03-14    0.541354
2022-03-17   -0.080535
2022-03-20   -0.721775
2022-03-23    0.991998
2022-03-26    0.463137
2022-03-29    0.291441
dtype: float64
######################################
2022-01-01    0.034804
2022-01-02         NaN
2022-01-03         NaN
2022-01-04    0.751800
2022-01-05         NaN
2022-01-06         NaN
2022-01-07   -0.331184
2022-01-08         NaN
2022-01-09         NaN
2022-01-10    0.411448
2022-01-11         NaN
2022-01-12         NaN
2022-01-13   -0.207127
2022-01-14         NaN
2022-01-15         NaN
2022-01-16   -0.024277
2022-01-17         NaN
2022-01-18         NaN
2022-01-19   -0.071963
2022-01-20         NaN
2022-01-21         NaN
2022-01-22    0.016815
2022-01-23         NaN
2022-01-24         NaN
2022-01-25   -0.598206
2022-01-26         NaN
2022-01-27         NaN
2022-01-28    0.010317
2022-01-29         NaN
2022-01-30         NaN
                ...   
2022-02-28         NaN
2022-03-01         NaN
2022-03-02   -0.284138
2022-03-03         NaN
2022-03-04         NaN
2022-03-05    0.417543
2022-03-06         NaN
2022-03-07         NaN
2022-03-08   -0.176598
2022-03-09         NaN
2022-03-10         NaN
2022-03-11    0.703380
2022-03-12         NaN
2022-03-13         NaN
2022-03-14    0.541354
2022-03-15         NaN
2022-03-16         NaN
2022-03-17   -0.080535
2022-03-18         NaN
2022-03-19         NaN
2022-03-20   -0.721775
2022-03-21         NaN
2022-03-22         NaN
2022-03-23    0.991998
2022-03-24         NaN
2022-03-25         NaN
2022-03-26    0.463137
2022-03-27         NaN
2022-03-28         NaN
2022-03-29    0.291441
Freq: D, Length: 88, dtype: float64
######################################
2022-01-01    0.034804
2022-01-02    0.034804
2022-01-03         NaN
2022-01-04    0.751800
2022-01-05    0.751800
2022-01-06         NaN
2022-01-07   -0.331184
2022-01-08   -0.331184
2022-01-09         NaN
2022-01-10    0.411448
2022-01-11    0.411448
2022-01-12         NaN
2022-01-13   -0.207127
2022-01-14   -0.207127
2022-01-15         NaN
2022-01-16   -0.024277
2022-01-17   -0.024277
2022-01-18         NaN
2022-01-19   -0.071963
2022-01-20   -0.071963
2022-01-21         NaN
2022-01-22    0.016815
2022-01-23    0.016815
2022-01-24         NaN
2022-01-25   -0.598206
2022-01-26   -0.598206
2022-01-27         NaN
2022-01-28    0.010317
2022-01-29    0.010317
2022-01-30         NaN
                ...   
2022-02-28    0.035988
2022-03-01         NaN
2022-03-02   -0.284138
2022-03-03   -0.284138
2022-03-04         NaN
2022-03-05    0.417543
2022-03-06    0.417543
2022-03-07         NaN
2022-03-08   -0.176598
2022-03-09   -0.176598
2022-03-10         NaN
2022-03-11    0.703380
2022-03-12    0.703380
2022-03-13         NaN
2022-03-14    0.541354
2022-03-15    0.541354
2022-03-16         NaN
2022-03-17   -0.080535
2022-03-18   -0.080535
2022-03-19         NaN
2022-03-20   -0.721775
2022-03-21   -0.721775
2022-03-22         NaN
2022-03-23    0.991998
2022-03-24    0.991998
2022-03-25         NaN
2022-03-26    0.463137
2022-03-27    0.463137
2022-03-28         NaN
2022-03-29    0.291441
Freq: D, Length: 88, dtype: float64
######################################
2022-01-01    0.034804
2022-01-02         NaN
2022-01-03    0.751800
2022-01-04    0.751800
2022-01-05         NaN
2022-01-06   -0.331184
2022-01-07   -0.331184
2022-01-08         NaN
2022-01-09    0.411448
2022-01-10    0.411448
2022-01-11         NaN
2022-01-12   -0.207127
2022-01-13   -0.207127
2022-01-14         NaN
2022-01-15   -0.024277
2022-01-16   -0.024277
2022-01-17         NaN
2022-01-18   -0.071963
2022-01-19   -0.071963
2022-01-20         NaN
2022-01-21    0.016815
2022-01-22    0.016815
2022-01-23         NaN
2022-01-24   -0.598206
2022-01-25   -0.598206
2022-01-26         NaN
2022-01-27    0.010317
2022-01-28    0.010317
2022-01-29         NaN
2022-01-30   -0.340948
                ...   
2022-02-28         NaN
2022-03-01   -0.284138
2022-03-02   -0.284138
2022-03-03         NaN
2022-03-04    0.417543
2022-03-05    0.417543
2022-03-06         NaN
2022-03-07   -0.176598
2022-03-08   -0.176598
2022-03-09         NaN
2022-03-10    0.703380
2022-03-11    0.703380
2022-03-12         NaN
2022-03-13    0.541354
2022-03-14    0.541354
2022-03-15         NaN
2022-03-16   -0.080535
2022-03-17   -0.080535
2022-03-18         NaN
2022-03-19   -0.721775
2022-03-20   -0.721775
2022-03-21         NaN
2022-03-22    0.991998
2022-03-23    0.991998
2022-03-24         NaN
2022-03-25    0.463137
2022-03-26    0.463137
2022-03-27         NaN
2022-03-28    0.291441
2022-03-29    0.291441
Freq: D, Length: 88, dtype: float64
######################################
2022-01-01    0.034804
2022-01-02    0.273803
2022-01-03    0.512801
2022-01-04    0.751800
2022-01-05    0.390805
2022-01-06    0.029811
2022-01-07   -0.331184
2022-01-08   -0.083640
2022-01-09    0.163904
2022-01-10    0.411448
2022-01-11    0.205256
2022-01-12   -0.000935
2022-01-13   -0.207127
2022-01-14   -0.146177
2022-01-15   -0.085227
2022-01-16   -0.024277
2022-01-17   -0.040172
2022-01-18   -0.056068
2022-01-19   -0.071963
2022-01-20   -0.042371
2022-01-21   -0.012778
2022-01-22    0.016815
2022-01-23   -0.188192
2022-01-24   -0.393199
2022-01-25   -0.598206
2022-01-26   -0.395365
2022-01-27   -0.192524
2022-01-28    0.010317
2022-01-29   -0.106771
2022-01-30   -0.223859
                ...   
2022-02-28   -0.070721
2022-03-01   -0.177429
2022-03-02   -0.284138
2022-03-03   -0.050244
2022-03-04    0.183649
2022-03-05    0.417543
2022-03-06    0.219496
2022-03-07    0.021449
2022-03-08   -0.176598
2022-03-09    0.116728
2022-03-10    0.410054
2022-03-11    0.703380
2022-03-12    0.649371
2022-03-13    0.595363
2022-03-14    0.541354
2022-03-15    0.334058
2022-03-16    0.126762
2022-03-17   -0.080535
2022-03-18   -0.294281
2022-03-19   -0.508028
2022-03-20   -0.721775
2022-03-21   -0.150518
2022-03-22    0.420740
2022-03-23    0.991998
2022-03-24    0.815711
2022-03-25    0.639424
2022-03-26    0.463137
2022-03-27    0.405905
2022-03-28    0.348673
2022-03-29    0.291441
Freq: D, Length: 88, dtype: float64
######################################

六. 移动窗口函数

代码:

import matplotlib.pylab as plt
import numpy as np
import pandas as pd

# 生成时间序列
df = pd.Series(np.random.randn(600), index = pd.date_range('7/1/2022', freq = 'D', periods = 600))

# 使用window函数
r = df.rolling(window = 10)
# 输出最近10个值的平均值
print(print(r.mean()))


# 画图
plt.figure(figsize=(15, 5))

df.plot(style='r')
df.rolling(window=10).mean().plot(style='b')

plt.show()

测试记录:
image.png

参考:

  1. https://study.163.com/course/introduction.htm?courseId=1003590004#/courseDetail?tab=1
  • 2
    点赞
  • 43
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
除了机器学习Python数据分析项目还可以使用以下技术和工具: 1. 数据清洗和预处理:在数据分析项目中,数据往往需要进行清洗和预处理,以去除噪声、处理缺失值、处理异常值等。Python提供了多种库和工具,如Pandas、NumPy和Scikit-learn,可以方便地进行数据清洗和预处理。 2. 可视化:可视化是数据分析中非常重要的一环。Python中的Matplotlib、Seaborn和Plotly等库提供了丰富的绘图功能,可以帮助用户将数据以直观的方式展示出来,从而更好地理解数据和发现模式。 3. 统计分析:Python中的SciPy和Statsmodels库提供了丰富的统计分析工具,如假设检验、方差分析、回归分析等。这些工具可以帮助用户进行数据的统计描述和推断分析。 4. 数据库操作:对于大规模数据的分析项目,可能需要将数据存储在数据库中,并进行查询和操作。Python中的SQLAlchemy和pymysql等库可以方便地与各种关系型数据库进行交互。 5. 文本挖掘和自然语言处理:对于包含文本数据的项目,可以使用Python中的NLTK和SpaCy等库进行文本挖掘和自然语言处理,如文本分类、情感分析、实体识别等。 6. 时间序列分析:对于时间序列数据的分析,Python中的Pandas和Statsmodels库提供了丰富的时间序列分析功能,如时间序列预测、季节性分析等。 7. Web开发和部署:如果需要将数据分析项目部署为Web应用,可以使用Python中的Django、Flask和FastAPI等框架进行Web开发,将数据可视化和分析结果展示在网页上。 这些只是数据分析项目中常用的一些技术和工具,根据具体的需求和场景,还可以使用其他的库和工具来完成数据分析任务。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值