python时间序列分类_6、python时间序列数据分析

python 中的时间

datetime, time 及calendar模块

datetime,以毫秒形式存储日期和时间

datime.timedelta, 表示两个datetime对象的时间差

datetime模块中包含的数据类型

总结datetime中包括,date:年月日 time:时分秒 datetime:年月日时分秒 timedelta:两个datetime之间的差

from datetime import datetime

if __name__ == '__main__':

now = datetime.now()

print now

print type(now)

print now.year, now.month, now.date(),now.day,now.hour,now.second

print "---------------------------------------------------------"

datetime_ = datetime(2017, 3, 4, 17) - datetime(2017, 2, 18, 15)

print datetime_

print type(datetime_)

print datetime_.days, datetime_.seconds

运行结果:

C:\Anaconda2\python.exe F:/python01/lect006/datetime_001.py

2018-01-06 00:15:35.027000

2018 1 2018-01-06 6 0 35

---------------------------------------------------------

14 days, 2:00:00

14 7200

Process finished with exit code 0

字符串和datetime的转换

# -*- coding: utf-8 -*-

from datetime import datetime

if __name__ == '__main__':

print "------------------时间转换为字符串----------------------------------"

d = datetime(2018, 1, 5)

print d

print type(d)

str_d = str(d)

print str_d

print type(str_d)

strftime = d.strftime("%d-%m-%Y")

print strftime

print type(strftime)

print "---------------字符串转换为时间----------------------------------"

str = "2018-01-05"

strptime = datetime.strptime(str,"%Y-%m-%d")

print strptime

print type(strptime)

print "------from dateutil.parser import parse 日期格式要求不严格------------"

from dateutil.parser import parse

time_parse = parse(str)

print time_parse

print type(time_parse)

print '------------------------------------------------------------------------'

import pandas as pd

series = pd.Series(['2018/1/1', '2018/1/2', '2018/1/3', '2018/1/4', '2018/1/5'], name="course_time")

print series

to_datetime = pd.to_datetime(series)

print to_datetime

print "------------NAT not a time----------------------"

series2 = pd.Series(['2018/1/1', '2018/1/2', '2018/1/3', '2018/1/4', '2018/1/5']+[None], name="course_time")

print series2

pd_to_datetime = pd.to_datetime(series2)

print pd_to_datetime

运行结果:

C:\Anaconda2\python.exe F:/python01/lect006/datetime_to_str.py

------------------时间转换为字符串----------------------------------

2018-01-05 00:00:00

2018-01-05 00:00:00

05-01-2018

---------------字符串转换为时间----------------------------------

2018-01-05 00:00:00

------from dateutil.parser import parse 日期格式要求不严格------------

2018-01-05 00:00:00

------------------------------------------------------------------------

0 2018/1/1

1 2018/1/2

2 2018/1/3

3 2018/1/4

4 2018/1/5

Name: course_time, dtype: object

0 2018-01-01

1 2018-01-02

2 2018-01-03

3 2018-01-04

4 2018-01-05

Name: course_time, dtype: datetime64[ns]

0 2018/1/1

1 2018/1/2

2 2018/1/3

3 2018/1/4

4 2018/1/5

5 None

Name: course_time, dtype: object

0 2018-01-01

1 2018-01-02

2 2018-01-03

3 2018-01-04

4 2018-01-05

5 NaT

Name: course_time, dtype: datetime64[ns]

Process finished with exit code 0

pandas的时间序列处理

# -*- coding: utf-8 -*-

from datetime import datetime

import pandas as pd

import numpy as np

if __name__ == '__main__':

date_list = [datetime(2018,1,1), datetime(2018,1,2), datetime(2018,1,3),

datetime(2018,1,4), datetime(2018,1,5), datetime(2018,1,6)]

# date_list = [3, 4,5,

# 6,7,8]

# print date_list

series = pd.Series(np.random.randn(6),date_list)

print series

print type(series.index)

print "--------获取数据,通过索引位置------------------------"

print series[0]

print "--------获取数据,索引值------------------------"

print series[datetime(2018, 1, 1)]

print "--------获取数据,可以被解析的日期字符串------------------------"

print series["2018-01-01"]

print "--------获取数据,按照年份,月份索引 --------------------------------------------------------"

print series["2018-01"]

print "--------获取数据,切片操作 --------------------------------------------------------"

print series["2018-01-01":"2018-01-03"]

print

print "----------------------------------"

dates = pd.date_range("2018-01-01",

periods=5,

freq='W-SAT')

dates2 = pd.date_range("2018-01-01",#起始日期

periods=5,#周期

freq='W-MON') # 频率,W按照周生成,MON周一 SAT 周六

print dates

print dates2

print type(dates)

print pd.Series(np.random.randn(5), index=dates)

运行结果:

C:\Anaconda2\python.exe F:/python01/lect006/pandas_datetimeindex.py

2018-01-01 1.470973

2018-01-02 1.182988

2018-01-03 0.110619

2018-01-04 -0.485221

2018-01-05 0.240115

2018-01-06 -0.779746

dtype: float64

--------获取数据,通过索引位置------------------------

1.4709733498

--------获取数据,索引值------------------------

1.4709733498

--------获取数据,可以被解析的日期字符串------------------------

1.4709733498

--------获取数据,按照年份,月份索引 --------------------------------------------------------

2018-01-01 1.470973

2018-01-02 1.182988

2018-01-03 0.110619

2018-01-04 -0.485221

2018-01-05 0.240115

2018-01-06 -0.779746

dtype: float64

--------获取数据,切片操作 --------------------------------------------------------

2018-01-01 1.470973

2018-01-02 1.182988

2018-01-03 0.110619

dtype: float64

----------------------------------

DatetimeIndex(['2018-01-06', '2018-01-13', '2018-01-20', '2018-01-27',

'2018-02-03'],

dtype='datetime64[ns]', freq='W-SAT')

DatetimeIndex(['2018-01-01', '2018-01-08', '2018-01-15', '2018-01-22',

'2018-01-29'],

dtype='datetime64[ns]', freq='W-MON')

2018-01-06 0.981539

2018-01-13 -0.573164

2018-01-20 0.095981

2018-01-27 0.955236

2018-02-03 0.714157

Freq: W-SAT, dtype: float64

Process finished with exit code 0

时间序列的过滤:

# -*- coding: utf-8 -*-

from datetime import datetime

import pandas as pd

import numpy as np

if __name__ == '__main__':

"""

过滤 扔掉符合条件的数据

"""

date_list = [datetime(2018,1,1), datetime(2018,1,2), datetime(2018,1,3),

datetime(2018,1,4), datetime(2018,1,5), datetime(2018,1,6)]

# date_list = [3, 4,5,

# 6,7,8]

# print date_list

series = pd.Series(np.random.randn(6),date_list)

truncate = series.truncate(before="2018-1-3")

print truncate

series_truncate = series.truncate(after="2018-01-02")

print series_truncate

print "---------传入开始,结束日期,默认生成的该时间段的时间是按天计算的---------------------------------"

date_range = pd.date_range("2018-01-01", "2018-01-20")

print date_range

print "---------只传入开始或结束日期,还需要传入时间段-----------------------------------------------"

print pd.date_range("2018-1-1",periods=10)

print "只指定结束日期"

print pd.date_range(end="2018-1-1",periods=10)

print "规范化时间戳"

print pd.date_range(start='2018/01/01 12:13:14', periods=10)

print pd.date_range(start='2018/01/01 12:13:14', periods=10, normalize=True)

运行结果

C:\Anaconda2\python.exe F:/python01/lect006/pandas_truncate.py

2018-01-03 -2.455503

2018-01-04 0.157108

2018-01-05 -1.542617

2018-01-06 0.648572

dtype: float64

2018-01-01 0.235691

2018-01-02 1.572366

dtype: float64

---------传入开始,结束日期,默认生成的该时间段的时间是按天计算的---------------------------------

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',

'2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',

'2018-01-09', '2018-01-10', '2018-01-11', '2018-01-12',

'2018-01-13', '2018-01-14', '2018-01-15', '2018-01-16',

'2018-01-17', '2018-01-18', '2018-01-19', '2018-01-20'],

dtype='datetime64[ns]', freq='D')

---------只传入开始或结束日期,还需要传入时间段-----------------------------------------------

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',

'2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',

'2018-01-09', '2018-01-10'],

dtype='datetime64[ns]', freq='D')

只指定结束日期

DatetimeIndex(['2017-12-23', '2017-12-24', '2017-12-25', '2017-12-26',

'2017-12-27', '2017-12-28', '2017-12-29', '2017-12-30',

'2017-12-31', '2018-01-01'],

dtype='datetime64[ns]', freq='D')

规范化时间戳

DatetimeIndex(['2018-01-01 12:13:14', '2018-01-02 12:13:14',

'2018-01-03 12:13:14', '2018-01-04 12:13:14',

'2018-01-05 12:13:14', '2018-01-06 12:13:14',

'2018-01-07 12:13:14', '2018-01-08 12:13:14',

'2018-01-09 12:13:14', '2018-01-10 12:13:14'],

dtype='datetime64[ns]', freq='D')

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',

'2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',

'2018-01-09', '2018-01-10'],

dtype='datetime64[ns]', freq='D')

Process finished with exit code 0

移动数据

# -*- coding: utf-8 -*-

from datetime import datetime

import pandas as pd

import numpy as np

if __name__ == '__main__':

print pd.date_range("2018-01-01", "2018-01-10", freq="D")

print pd.date_range("2018-01-01", "2018-01-10", freq="2D")

print "---------偏移量通过加法连接---------------------------"

offset_ = pd.tseries.offsets.Week(2) + pd.tseries.offsets.Hour(12)

print offset_

print pd.date_range("2018-01-01", "2018-03-01", freq=offset_)

print "---------移动数据--------------------------"

ts = pd.Series(np.random.randn(5), index=pd.date_range('20180101', periods=5, freq="W-SAT"))

print ts

print ts.shift(1)

print ts.shift(-1)

C:\Anaconda2\python.exe F:/python01/lect006/pandas_freq.py

DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',

'2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',

'2018-01-09', '2018-01-10'],

dtype='datetime64[ns]', freq='D')

DatetimeIndex(['2018-01-01', '2018-01-03', '2018-01-05', '2018-01-07',

'2018-01-09'],

dtype='datetime64[ns]', freq='2D')

---------偏移量通过加法连接---------------------------

14 days 12:00:00

DatetimeIndex(['2018-01-01 00:00:00', '2018-01-15 12:00:00',

'2018-01-30 00:00:00', '2018-02-13 12:00:00',

'2018-02-28 00:00:00'],

dtype='datetime64[ns]', freq='348H')

---------移动数据--------------------------

2018-01-06 0.612710

2018-01-13 0.897702

2018-01-20 1.262353

2018-01-27 -2.317208

2018-02-03 -1.161990

Freq: W-SAT, dtype: float64

2018-01-06 NaN

2018-01-13 0.612710

2018-01-20 0.897702

2018-01-27 1.262353

2018-02-03 -2.317208

Freq: W-SAT, dtype: float64

2018-01-06 0.897702

2018-01-13 1.262353

2018-01-20 -2.317208

2018-01-27 -1.161990

2018-02-03 NaN

Freq: W-SAT, dtype: float64

Process finished with exit code 0

采样:

# -*- coding: utf-8 -*-

from datetime import datetime

import pandas as pd

import numpy as np

if __name__ == '__main__':

"""

重采样

"""

date_range = pd.date_range("2018-01-01", periods=100, freq="D")

s_obj = pd.Series(np.random.randint(1,10,100), date_range)

# print s_obj

resample_obj = s_obj.resample("M").sum()

print resample_obj

print s_obj.resample("M").mean()

print "--------ohlc 开始值,最高值,最低值,结束值-----------------"

print s_obj.resample("M").ohlc()

print '--------升采样--------------------------'

frame = pd.DataFrame(np.random.randn(5, 3), index=pd.date_range('2018-1-1', periods=5, freq="W-MON"),

columns=['a', 'b', 'c'])

print frame

print "升采样----按天"

print frame.resample("D").asfreq()

print "-----补数据-----拿前面的数据补后面的数据----"

print frame.resample("D").ffill(2)

print "-------------拿后面的数据补充前面的数据-----------------------------"

print frame.resample("D").bfill(2)

print "------------------------------------------------------"

print frame.resample("D").fillna("ffill")

print "---------------按照线性补全数据-------------------------------"

print frame.resample("D").interpolate("linear")

运行结果:

C:\Anaconda2\python.exe F:/python01/lect006/pandas_chongcaiyang.py

2018-01-31 160

2018-02-28 143

2018-03-31 148

2018-04-30 47

Freq: M, dtype: int32

2018-01-31 5.161290

2018-02-28 5.107143

2018-03-31 4.774194

2018-04-30 4.700000

Freq: M, dtype: float64

--------ohlc 开始值,最高值,最低值,结束值-----------------

open high low close

2018-01-31 1 9 1 7

2018-02-28 2 9 1 8

2018-03-31 4 9 1 4

2018-04-30 3 9 1 1

--------升采样--------------------------

a b c

2018-01-01 -1.787660 -1.081827 0.264846

2018-01-08 -1.053651 0.910496 -0.950530

2018-01-15 -0.081673 -0.424570 0.281261

2018-01-22 -0.385825 -0.364235 1.497702

2018-01-29 0.188421 0.770612 0.029760

升采样----按天

a b c

2018-01-01 -1.787660 -1.081827 0.264846

2018-01-02 NaN NaN NaN

2018-01-03 NaN NaN NaN

2018-01-04 NaN NaN NaN

2018-01-05 NaN NaN NaN

2018-01-06 NaN NaN NaN

2018-01-07 NaN NaN NaN

2018-01-08 -1.053651 0.910496 -0.950530

2018-01-09 NaN NaN NaN

2018-01-10 NaN NaN NaN

2018-01-11 NaN NaN NaN

2018-01-12 NaN NaN NaN

2018-01-13 NaN NaN NaN

2018-01-14 NaN NaN NaN

2018-01-15 -0.081673 -0.424570 0.281261

2018-01-16 NaN NaN NaN

2018-01-17 NaN NaN NaN

2018-01-18 NaN NaN NaN

2018-01-19 NaN NaN NaN

2018-01-20 NaN NaN NaN

2018-01-21 NaN NaN NaN

2018-01-22 -0.385825 -0.364235 1.497702

2018-01-23 NaN NaN NaN

2018-01-24 NaN NaN NaN

2018-01-25 NaN NaN NaN

2018-01-26 NaN NaN NaN

2018-01-27 NaN NaN NaN

2018-01-28 NaN NaN NaN

2018-01-29 0.188421 0.770612 0.029760

-----补数据-----拿前面的数据补后面的数据----

a b c

2018-01-01 -1.787660 -1.081827 0.264846

2018-01-02 -1.787660 -1.081827 0.264846

2018-01-03 -1.787660 -1.081827 0.264846

2018-01-04 NaN NaN NaN

2018-01-05 NaN NaN NaN

2018-01-06 NaN NaN NaN

2018-01-07 NaN NaN NaN

2018-01-08 -1.053651 0.910496 -0.950530

2018-01-09 -1.053651 0.910496 -0.950530

2018-01-10 -1.053651 0.910496 -0.950530

2018-01-11 NaN NaN NaN

2018-01-12 NaN NaN NaN

2018-01-13 NaN NaN NaN

2018-01-14 NaN NaN NaN

2018-01-15 -0.081673 -0.424570 0.281261

2018-01-16 -0.081673 -0.424570 0.281261

2018-01-17 -0.081673 -0.424570 0.281261

2018-01-18 NaN NaN NaN

2018-01-19 NaN NaN NaN

2018-01-20 NaN NaN NaN

2018-01-21 NaN NaN NaN

2018-01-22 -0.385825 -0.364235 1.497702

2018-01-23 -0.385825 -0.364235 1.497702

2018-01-24 -0.385825 -0.364235 1.497702

2018-01-25 NaN NaN NaN

2018-01-26 NaN NaN NaN

2018-01-27 NaN NaN NaN

2018-01-28 NaN NaN NaN

2018-01-29 0.188421 0.770612 0.029760

-------------拿后面的数据补充前面的数据-----------------------------

a b c

2018-01-01 -1.787660 -1.081827 0.264846

2018-01-02 NaN NaN NaN

2018-01-03 NaN NaN NaN

2018-01-04 NaN NaN NaN

2018-01-05 NaN NaN NaN

2018-01-06 -1.053651 0.910496 -0.950530

2018-01-07 -1.053651 0.910496 -0.950530

2018-01-08 -1.053651 0.910496 -0.950530

2018-01-09 NaN NaN NaN

2018-01-10 NaN NaN NaN

2018-01-11 NaN NaN NaN

2018-01-12 NaN NaN NaN

2018-01-13 -0.081673 -0.424570 0.281261

2018-01-14 -0.081673 -0.424570 0.281261

2018-01-15 -0.081673 -0.424570 0.281261

2018-01-16 NaN NaN NaN

2018-01-17 NaN NaN NaN

2018-01-18 NaN NaN NaN

2018-01-19 NaN NaN NaN

2018-01-20 -0.385825 -0.364235 1.497702

2018-01-21 -0.385825 -0.364235 1.497702

2018-01-22 -0.385825 -0.364235 1.497702

2018-01-23 NaN NaN NaN

2018-01-24 NaN NaN NaN

2018-01-25 NaN NaN NaN

2018-01-26 NaN NaN NaN

2018-01-27 0.188421 0.770612 0.029760

2018-01-28 0.188421 0.770612 0.029760

2018-01-29 0.188421 0.770612 0.029760

------------------------------------------------------

a b c

2018-01-01 -1.787660 -1.081827 0.264846

2018-01-02 -1.787660 -1.081827 0.264846

2018-01-03 -1.787660 -1.081827 0.264846

2018-01-04 -1.787660 -1.081827 0.264846

2018-01-05 -1.787660 -1.081827 0.264846

2018-01-06 -1.787660 -1.081827 0.264846

2018-01-07 -1.787660 -1.081827 0.264846

2018-01-08 -1.053651 0.910496 -0.950530

2018-01-09 -1.053651 0.910496 -0.950530

2018-01-10 -1.053651 0.910496 -0.950530

2018-01-11 -1.053651 0.910496 -0.950530

2018-01-12 -1.053651 0.910496 -0.950530

2018-01-13 -1.053651 0.910496 -0.950530

2018-01-14 -1.053651 0.910496 -0.950530

2018-01-15 -0.081673 -0.424570 0.281261

2018-01-16 -0.081673 -0.424570 0.281261

2018-01-17 -0.081673 -0.424570 0.281261

2018-01-18 -0.081673 -0.424570 0.281261

2018-01-19 -0.081673 -0.424570 0.281261

2018-01-20 -0.081673 -0.424570 0.281261

2018-01-21 -0.081673 -0.424570 0.281261

2018-01-22 -0.385825 -0.364235 1.497702

2018-01-23 -0.385825 -0.364235 1.497702

2018-01-24 -0.385825 -0.364235 1.497702

2018-01-25 -0.385825 -0.364235 1.497702

2018-01-26 -0.385825 -0.364235 1.497702

2018-01-27 -0.385825 -0.364235 1.497702

2018-01-28 -0.385825 -0.364235 1.497702

2018-01-29 0.188421 0.770612 0.029760

a b c

2018-01-01 -1.787660 -1.081827 0.264846

2018-01-02 -1.682801 -0.797210 0.091221

2018-01-03 -1.577943 -0.512592 -0.082404

2018-01-04 -1.473085 -0.227975 -0.256029

2018-01-05 -1.368226 0.056643 -0.429654

2018-01-06 -1.263368 0.341261 -0.603280

2018-01-07 -1.158509 0.625878 -0.776905

2018-01-08 -1.053651 0.910496 -0.950530

2018-01-09 -0.914797 0.719772 -0.774560

2018-01-10 -0.775943 0.529049 -0.598590

2018-01-11 -0.637089 0.338325 -0.422620

2018-01-12 -0.498235 0.147601 -0.246650

2018-01-13 -0.359381 -0.043123 -0.070680

2018-01-14 -0.220527 -0.233846 0.105291

2018-01-15 -0.081673 -0.424570 0.281261

2018-01-16 -0.125123 -0.415951 0.455038

2018-01-17 -0.168573 -0.407332 0.628815

2018-01-18 -0.212024 -0.398712 0.802593

2018-01-19 -0.255474 -0.390093 0.976370

2018-01-20 -0.298924 -0.381474 1.150147

2018-01-21 -0.342375 -0.372855 1.323925

2018-01-22 -0.385825 -0.364235 1.497702

2018-01-23 -0.303790 -0.202114 1.287996

2018-01-24 -0.221755 -0.039993 1.078290

2018-01-25 -0.139719 0.122128 0.868584

2018-01-26 -0.057684 0.284249 0.658878

2018-01-27 0.024351 0.446370 0.449172

2018-01-28 0.106386 0.608491 0.239466

2018-01-29 0.188421 0.770612 0.029760

Process finished with exit code 0

窗口函数:

# -*- coding: utf-8 -*-

from datetime import datetime

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

if __name__ == '__main__':

series_obj = pd.Series(np.random.randn(1000), index=pd.date_range("20170101", periods=1000))

print series_obj.head()

rolling = series_obj.rolling(window=5)

print rolling

print "-------第五天的数据是前五天数据的均值----------------------"

print rolling.mean()

print '--------------------------------------'

plt.figure(figsize=(15, 5))

series_obj.plot(style="r--")

# series_obj.rolling(window=10).mean().plot(style="b")

# series_obj.rolling(window=10, center=True).mean().plot(style="b")

series_obj.rolling(window=10, center=True).sum().plot(style="b")

plt.show()

运行结果:

C:\Anaconda2\python.exe F:/python01/lect006/pandas_windows.py

2017-01-01 1.155897

2017-01-02 -0.061706

2017-01-03 -1.171228

2017-01-04 0.248460

2017-01-05 0.786955

Freq: D, dtype: float64

Rolling [window=5,center=False,axis=0]

-------第五天的数据是前五天数据的均值----------------------

2017-01-01 NaN

2017-01-02 NaN

2017-01-03 NaN

2017-01-04 NaN

2017-01-05 0.191675

2017-01-06 0.153913

2017-01-07 0.169883

2017-01-08 0.685552

2017-01-09 0.470192

2017-01-10 0.502044

2017-01-11 0.283706

2017-01-12 0.092507

2017-01-13 -0.155475

2017-01-14 -0.149128

2017-01-15 -0.488179

2017-01-16 -0.363505

2017-01-17 -0.091595

2017-01-18 -0.112211

2017-01-19 0.316891

2017-01-20 0.348507

2017-01-21 0.425806

2017-01-22 0.193486

2017-01-23 -0.282033

2017-01-24 -0.337791

2017-01-25 -0.136945

2017-01-26 -0.257740

2017-01-27 -0.219738

2017-01-28 0.069635

2017-01-29 0.032645

2017-01-30 0.188748

...

2019-08-29 -0.320065

2019-08-30 -0.365553

2019-08-31 -0.586240

2019-09-01 -0.789963

2019-09-02 -0.479596

2019-09-03 -0.249201

2019-09-04 -0.333432

2019-09-05 0.171652

2019-09-06 0.221092

2019-09-07 0.030030

2019-09-08 0.000564

2019-09-09 0.389039

2019-09-10 0.336969

2019-09-11 0.815062

2019-09-12 1.185745

2019-09-13 1.132346

2019-09-14 0.677973

2019-09-15 0.557701

2019-09-16 0.059425

2019-09-17 -0.020097

2019-09-18 0.354538

2019-09-19 0.595465

2019-09-20 0.488031

2019-09-21 0.309707

2019-09-22 0.124832

2019-09-23 -0.104373

2019-09-24 -0.338483

2019-09-25 -0.310262

2019-09-26 0.079869

2019-09-27 0.034300

Freq: D, Length: 1000, dtype: float64

--------------------------------------

Process finished with exit code 0

时序模型:ARIMA

ARIMA是基于统计的 ,HMM(隐马可夫模型):机器学习里面的一个算法,因为语音识别而兴起

AR(Autoregressive)模型

* 自回归模型描述的是当前值与历史值之间的关系

arima拿到的数据要求是平稳的,如果是不平稳的需要转化为平稳的

转化为平稳的数据-------》差分

104m

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Statsmodels是Python中用于统计建模和计量经济学的库,它提供了各种统计模型,包括线性回归、时间序列分析等。在时间序列分析中,ARIMA模型是一种常用的模型。 ARIMA模型是自回归移动平均模型的缩写,它是一种广义的线性模型,常用于描述时间序列数据的自相关结构和随机性。ARIMA模型可以分为AR(自回归)、MA(移动平均)和差分(I)三部分,其中AR是指用当前值的前几个值来预测当前值,MA是指用当前误差的前几个值来预测当前误差,差分是指对时间序列进行差分处理,使其变得平稳。 在Python中,使用Statsmodels中的ARIMA模型进行时间序列分析可以分为以下几个步骤: 1. 导入相关库 ```python import pandas as pd import numpy as np import statsmodels.api as sm import matplotlib.pyplot as plt ``` 2. 读取数据 ```python data = pd.read_csv("data.csv", index_col=0, parse_dates=True) ``` 3. 绘制时间序列图 ```python plt.plot(data) plt.show() ``` 4. 确定模型阶数 可以使用ACF和PACF图来确定ARIMA模型的阶数。ACF图展示了时间序列与其滞后版本之间的自相关性,PACF图展示了当前时间序列与其滞后版本之间的部分自相关性。根据ACF和PACF图的信息,可以确定ARIMA模型的p、d和q参数。 ```python fig, ax = plt.subplots(2,1) sm.graphics.tsa.plot_acf(data, lags=30, ax=ax[0]) sm.graphics.tsa.plot_pacf(data, lags=30, ax=ax[1]) plt.show() ``` 5. 拟合模型 根据确定的ARIMA模型阶数,使用ARIMA()函数拟合时间序列数据。 ```python model = sm.tsa.ARIMA(data, order=(p,d,q)) results = model.fit() ``` 6. 模型诊断 使用plot_diagnostics()函数进行模型诊断,检查残差是否符合白噪声假设。 ```python results.plot_diagnostics(figsize=(15, 12)) plt.show() ``` 7. 预测 使用forecast()函数进行预测。 ```python forecast = results.forecast(steps=10) ``` 以上就是使用Python中Statsmodels包进行时间序列分析ARIMA模型的步骤。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值