时序数列提取特征

miaoshan000

已于 2022-03-18 18:23:26 修改

阅读量2.4k

点赞数 2

文章标签： python 数据分析

于 2022-03-08 20:20:21 首次发布

本文链接：https://blog.csdn.net/miaoshan000/article/details/123362134

版权

本文介绍了一种方法用于从时序数列中提取关键拐点，包括高点和低点。通过定义高点和低点集合，筛选出高顶点和低顶点，并合并删除重复项。提供的Python代码示例展示了如何实现这一过程，并封装成可复用的函数。该方法适用于金融市场数据或其他需要识别趋势变化的时序数据分析。

摘要由CSDN通过智能技术生成

时序特征提取

时序数列，如何提取关键拐点呢？也许你正寻找。这里给你提供一个思路。
高点集合，点大于邻左右两边点的集合
低点集合，点小于邻左右两边点的集合
对高点集合筛选提取高顶点，按点大于邻左右点提取
对低点集合筛选提取低顶点，按点小于邻左右点提取
高顶点与低顶点集合合并，再删除日期重复项

  数据样例
    ------
    trade_date	close	high	low
    2021-08-11	39.03	40.12	38.44
    2021-08-12	40.29	40.73	38.35
    2021-08-13	40.07	40.99	39.50
    2021-08-16	39.24	40.46	38.56
    2021-08-17	37.59	39.58	37.30
    2021-08-18	36.55	38.20	36.20
    2021-08-19	37.97	38.36	36.56
    2021-08-20	38.21	38.88	37.37
    2021-08-23	37.96	39.37	37.19
    2021-08-24	37.57	38.30	37.50
    2021-08-25	37.10	37.88	36.66
    2021-08-26	35.10	37.27	34.79
    2021-08-27	35.15	35.60	34.50
    2021-08-30	34.87	35.69	34.20
    2021-08-31	35.85	36.10	34.35
    2021-09-01	34.40	36.00	33.84
    2021-09-02	35.01	35.45	33.90

使用数据注意，上面数据不全，时长较短，只是个样子。将日期列转为索引使用。

import pandas as pd

df_high = df.high.loc[(df.high > df.high.shift(-1)) & (df.high >df.high.shift(1))]   #提高高点

df_low = df.low.loc[(df.low < df.low.shift(-1)) & (df.low < df.low.shift(1))]    #提取低点

df_h = df_high[(df_high > df_high.shift(-1)) & (df_high > df_high.shift(1))]     #提取高顶点

df_l = df_low[(df_low < df_low.shift(-1)) & (df_low < df_low.shift(1))]      #提取低端点

terminal = pd.concat([df_h,df_l])  #将高顶点，低端点合并

terminal[df.index.min()] = df.loc[df.index.min()].min()   #设置起点

terminal[df.index.max()] = df.loc[df.index.max()].min()   #设置终点

terminal = terminal.to_frame('endpoint')       #转为DataFrame

terminal.plot()

进一步封装成函数，保存复用

import pandas as pd 


def endpoint(df:pd.DataFrame):
    """
    高点集合，点大于邻左右两边点的集合
    低点集合，点小于邻左右两边点的集合
    对高顶点集合筛选提取高顶点，按点大于邻左右点提取
    对低顶点集合筛选提取低顶点，按点小于邻左右点提取

    数据样例
    ------
    trade_date	close	high	low
    2021-08-11	39.03	40.12	38.44
    2021-08-12	40.29	40.73	38.35
    2021-08-13	40.07	40.99	39.50
    2021-08-16	39.24	40.46	38.56
    2021-08-17	37.59	39.58	37.30
    2021-08-18	36.55	38.20	36.20
    2021-08-19	37.97	38.36	36.56
    2021-08-20	38.21	38.88	37.37
    2021-08-23	37.96	39.37	37.19
    2021-08-24	37.57	38.30	37.50
    2021-08-25	37.10	37.88	36.66
    2021-08-26	35.10	37.27	34.79
    2021-08-27	35.15	35.60	34.50
    2021-08-30	34.87	35.69	34.20
    2021-08-31	35.85	36.10	34.35
    2021-09-01	34.40	36.00	33.84
    2021-09-02	35.01	35.45	33.90
    2021-09-03	35.72	36.41	34.05
    2021-09-06	36.50	37.36	35.72
    2021-09-07	38.90	39.35	36.12
    2021-09-08	37.18	39.28	37.01
    2021-09-09	38.42	38.74	37.21
    2021-09-10	37.70	38.80	37.20
    2021-09-13	36.97	37.87	36.34
    2021-11-29	66.58	67.01	61.21
    2021-11-30	67.05	68.06	65.67
    2021-12-01	66.25	67.13	64.50
  
    """


    #提高高点
    df_high = df.high.loc[(df.high > df.high.shift(-1)) & (df.high >df.high.shift(1))]
    #提取低点
    df_low = df.low.loc[(df.low < df.low.shift(-1)) & (df.low < df.low.shift(1))]
    #提取高顶点
    df_h = df_high[(df_high > df_high.shift(-1)) & (df_high > df_high.shift(1))]
    #提取低顶点
    df_l = df_low[(df_low < df_low.shift(-1)) & (df_low < df_low.shift(1))]
    #将高顶点，低顶点合并
    terminal  = pd.concat([df_h,df_l])
    #添加起点，选首行最小值
    terminal[df.index.min()] = df.loc[df.index.min()].min()
    #添加终点，选末行最小值
    terminal[df.index.max()] = df.loc[df.index.max()].min()

    return terminal.to_frame('endpoint').sort_index()

欢迎沟通交流指导，收藏留用。记着点赞哦。