时序特征提取
时序数列,如何提取关键拐点呢? 也许你正寻找。这里给你提供一个思路。
高点集合,点大于邻左右两边点的集合
低点集合,点小于邻左右两边点的集合
对高点集合筛选提取高顶点,按点大于邻左右点提取
对低点集合筛选提取低顶点,按点小于邻左右点提取
高顶点与低顶点集合合并,再删除日期重复项
数据样例
------
trade_date close high low
2021-08-11 39.03 40.12 38.44
2021-08-12 40.29 40.73 38.35
2021-08-13 40.07 40.99 39.50
2021-08-16 39.24 40.46 38.56
2021-08-17 37.59 39.58 37.30
2021-08-18 36.55 38.20 36.20
2021-08-19 37.97 38.36 36.56
2021-08-20 38.21 38.88 37.37
2021-08-23 37.96 39.37 37.19
2021-08-24 37.57 38.30 37.50
2021-08-25 37.10 37.88 36.66
2021-08-26 35.10 37.27 34.79
2021-08-27 35.15 35.60 34.50
2021-08-30 34.87 35.69 34.20
2021-08-31 35.85 36.10 34.35
2021-09-01 34.40 36.00 33.84
2021-09-02 35.01 35.45 33.90
使用数据注意,上面数据不全,时长较短,只是个样子。将日期列转为索引使用。
import pandas as pd
df_high = df.high.loc[(df.high > df.high.shift(-1)) & (df.high >df.high.shift(1))] #提高高点
df_low = df.low.loc[(df.low < df.low.shift(-1)) & (df.low < df.low.shift(1))] #提取低点
df_h = df_high[(df_high > df_high.shift(-1)) & (df_high > df_high.shift(1))] #提取高顶点
df_l = df_low[(df_low < df_low.shift(-1)) & (df_low < df_low.shift(1))] #提取低端点
terminal = pd.concat([df_h,df_l]) #将高顶点,低端点合并
terminal[df.index.min()] = df.loc[df.index.min()].min() #设置起点
terminal[df.index.max()] = df.loc[df.index.max()].min() #设置终点
terminal = terminal.to_frame('endpoint') #转为DataFrame
terminal.plot()
进一步封装成函数,保存复用
import pandas as pd
def endpoint(df:pd.DataFrame):
"""
高点集合,点大于邻左右两边点的集合
低点集合,点小于邻左右两边点的集合
对高顶点集合筛选提取高顶点,按点大于邻左右点提取
对低顶点集合筛选提取低顶点,按点小于邻左右点提取
数据样例
------
trade_date close high low
2021-08-11 39.03 40.12 38.44
2021-08-12 40.29 40.73 38.35
2021-08-13 40.07 40.99 39.50
2021-08-16 39.24 40.46 38.56
2021-08-17 37.59 39.58 37.30
2021-08-18 36.55 38.20 36.20
2021-08-19 37.97 38.36 36.56
2021-08-20 38.21 38.88 37.37
2021-08-23 37.96 39.37 37.19
2021-08-24 37.57 38.30 37.50
2021-08-25 37.10 37.88 36.66
2021-08-26 35.10 37.27 34.79
2021-08-27 35.15 35.60 34.50
2021-08-30 34.87 35.69 34.20
2021-08-31 35.85 36.10 34.35
2021-09-01 34.40 36.00 33.84
2021-09-02 35.01 35.45 33.90
2021-09-03 35.72 36.41 34.05
2021-09-06 36.50 37.36 35.72
2021-09-07 38.90 39.35 36.12
2021-09-08 37.18 39.28 37.01
2021-09-09 38.42 38.74 37.21
2021-09-10 37.70 38.80 37.20
2021-09-13 36.97 37.87 36.34
2021-11-29 66.58 67.01 61.21
2021-11-30 67.05 68.06 65.67
2021-12-01 66.25 67.13 64.50
"""
#提高高点
df_high = df.high.loc[(df.high > df.high.shift(-1)) & (df.high >df.high.shift(1))]
#提取低点
df_low = df.low.loc[(df.low < df.low.shift(-1)) & (df.low < df.low.shift(1))]
#提取高顶点
df_h = df_high[(df_high > df_high.shift(-1)) & (df_high > df_high.shift(1))]
#提取低顶点
df_l = df_low[(df_low < df_low.shift(-1)) & (df_low < df_low.shift(1))]
#将高顶点,低顶点合并
terminal = pd.concat([df_h,df_l])
#添加起点,选首行最小值
terminal[df.index.min()] = df.loc[df.index.min()].min()
#添加终点,选末行最小值
terminal[df.index.max()] = df.loc[df.index.max()].min()
return terminal.to_frame('endpoint').sort_index()
欢迎沟通交流指导,收藏留用。记着点赞哦。