day05-裁剪-压缩-累乘-协方差-相关矩阵-符号数组-矢量化-数据平滑
2、概念二:
通过上述线性方程Y = kX + b,有:
kX1 + b = Y1'-Y1
kX2 + b = Y2'-Y2
...
kXn + b = Yn'-Yn
求得离差:
△1 = Y1' = (Y1-(kX1+b))
△2 = Y2' = (Y2-(kX2+b))
... ...
△n = Yn' = (Yn-(kXn+b))
得离差方之和,该和越小越好,设为loss:
loss = △1^2+△2^2+...+△n^2
由此推导出:
loss = f(k,b)
仍然适用:
np.linalg.lstsq(a,b)[0]
示例:
import datetime as dt
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md
def dmy2ymd(dmy):
dmy = str(dmy,encoding='utf-8')
date = dt.datetime.strptime(dmy, '%d-%m-%Y').date()
ymd = date.strftime('%Y-%m-%d')
return ymd
dates,opening_prices,hightest_prices,lowest_prices,\
closing_prices = np.loadtxt(
'../day04/aapl.csv',delimiter=',',
usecols=(1,3,4,5,6),unpack=True,
dtype='M8[D],f8,f8,f8,f8',
converters={1:dmy2ymd})
#用每天的最低价,最高价,收盘价获取每天的平均价
trend_points = (hightest_prices + lowest_prices + closing_prices) / 3
spreads = hightest_prices - lowest_prices #计算每天的波动幅度
resistance_points = trend_points + spreads #获得上方压力封顶线
support_points = trend_points - spreads #获得下方压力支撑线
days = dates.astype(int) #将日期转换为天数
#np.ones_like(days) #表示创建全是1 的数组,数量同days一样多的
a = np.column_stack((days,np.ones_like(days))) #column_stack一维数组做列组合
b = trend_points
x1 = np.linalg.lstsq(a,b)[0]
trend_line = days * x1[0] + x1[1] #x[0]是斜率 x[1]是截距 trend_line就是△
x2 = np.linalg.lstsq(a,resistance_points)[0]
resistance_line = days * x2[0] +x2[1]
x3 = np.linalg.lstsq(a,support_points)[0]
support_line = days * x3[0] +x3[1]
mp.figure('Trend',facecolor='lightgray')
mp.title('Trend',fontsize=20)
mp.xlabel('Date',fontsize=14)
mp.ylabel('Price',fontsize=14)
ax = mp.gca()
ax.xaxis.set_major_locator(md.WeekdayLocator(
byweekday=md.MO))
ax.xaxis.set_minor_locator(md.DayLocator())
ax.xaxis.set_major_formatter(md.DateFormatter(
'%d %b %Y'))
mp.tick_params(labelsize=10)
mp.grid(linestyle=':')
dates = dates.astype(md.datetime.datetime)
rise = closing_prices - opening_prices >=0.01
fall = opening_prices - closing_prices >=0.01
fc = np.zeros(dates.size,dtype='3f4')
ec = np.zeros(dates.size,dtype='3f4')
fc[rise],fc[fall] = (0.85,0,0),(0,0.85,0)
ec[rise],ec[fall] = (0.85,0,0),(0,0.85,0)
mp.bar(dates,hightest_prices - lowest_prices,0,
lowest_prices,color=fc,edgecolor = ec)
mp.bar(dates,closing_prices-opening_prices,0.8,
opening_prices,color=fc,edgecolor=ec)
mp.scatter(dates,trend_points,c='dodgerblue',
alpha=0.9,s=20,zorder=2)
mp.plot(dates,trend_line,c='dodgerblue',
linewidth=3,label='Trend')
mp.scatter(dates,resistance_points,c='orangered',
alpha=0.9,s=20,zorder=2)
mp.plot(dates,resistance_line,c='orangered',
linewidth=3,label='Resistance')
mp.scatter(dates,support_points,c='limegreen',
alpha=0.9,s=20,zorder=2)
mp.plot(dates,support_line,c='limegreen',
linewidth=3,label='Support')
mp.legend()
mp.gcf().autofmt_xdate()
mp.show()
13、裁剪、压缩和累乘
1、ndarray.clip(min=下限,max=上限):返回一个被修剪过的数组,
原数组中所有比max大的元素都被设定为max,
所有比min小的元素都被设定为min。
2、ndarray.compress(条件):
压缩,返回一个根据给定条件筛选后的数组。
3、ndarray.prod():累乘,所有元素的乘积
4、ndarray.cumprod():每次累乘的结果放入一个数组
示例:
import numpy as np
a = np.array([10,20,5,46,30,43])
b = a.clip(min = 15,max=45)
print(a , b) #输出:[10 20 5 46 30 43] [15 20 15 45 30 43]
c = a.compress((15 <= a) & (a <=45))
print(c) #输出:[20 30 43]
d = a.prod()
print(d) #输出:59340000
e = a.cumprod()
print(e)#输出:[ 10 200 1000 46000 1380000 59340000]
阶乘示例:
print(np.arange(1,11).prod()) #1至10阶乘,3628800
14、协方差、相关矩阵和相关系数
1.概念
两组来自不同数据源随机样本:
a: [a1 a2 ... an]
b: [b1 b2 ... bn]
分别计算平均值:
ave(a) = (a1+a2+...+an)/n
ave(b) = (b1+b2+...+bn)/n
分别计算离差:
dev(a) = [a1 a2 ... an]-ave(a)
dev(b) = [b1 b2 ... bn]-ave(b)
分别计算方差:
var(a) = ave(dev(a) * dev(a))=cov(a,a)
var(b) = ave(dev(b) * dev(b))=cov(b,b)
分别计算标准差:
std(a) = sqrt(var(a))
std(b) = sqrt(var(b))
2、计算协方差:
cov(a,b) = ave(dev(a) * dev(b))
cov(b,a) = ave(dev(b) * dev(a))
3、协方差(相关)矩阵:
cov(a,a) cov(a,b)
-------------- --------------
std(a)std(a) std(a)std(b)
cov(b,a) cov(b,b)
-------------- ---------------
std(b)std(a) std(b)std(b)
主对角线为1,辅对角线为相关系数。
4、相关系数:
coco(a,b) = cov(a,b)/(std(a)std(b)), [-1, 1]
coco(b,a) = cov(b,a)/(std(b)std(a)), [-1, 1]
相关系数的正负号表示同向相关还是反向相关,
其绝对值的大小反映相关程度的强弱。
numpy.cov (a, b) -> 协方差矩阵的分子部分
cov(a,a) cov(a,b)
cov(b,a) cov(b,b)
numpy.corrcoef(a,b)->协方差矩阵,
从其辅对角线上获得相关系数
代码:corr.py
示例: 示例:
import datetime as dt
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md
def dmy2ymd(dmy):
dmy = str(dmy,encoding='utf-8')
date = dt.datetime.strptime(dmy, '%d-%m-%Y').date()
ymd = date.strftime('%Y-%m-%d')
return ymd
dates,bhp_closing_prices = np.loadtxt(
'./BHP.csv',delimiter=',',usecols=(1,6),unpack=True,
dtype='M8[D],f8',converters={1:dmy2ymd})
vale_closing_prices = np.loadtxt(
'./VALE.csv',delimiter=',',usecols=(6),unpack=True)
#计算收益率:(今天收盘价 - 昨天收盘价)/今天收盘价
bhp_returns = np.diff(bhp_closing_prices)/bhp_closing_prices[:-1]
vale_returns = np.diff(vale_closing_prices)/vale_closing_prices[:-1]
#covs = np.cov(bhp_returns,vale_returns)
corr = np.corrcoef(bhp_returns,vale_returns) #协方差矩阵
print(corr)
mp.figure('Corre of Returns',facecolor='lightgray')
mp.title('Corre of Returns',fontsize=20)
mp.xlabel('Date',fontsize=14)
mp.ylabel('Returns',fontsize=14)
ax = mp.gca()
ax.xaxis.set_major_locator(md.WeekdayLocator(
byweekday=md.MO))
ax.xaxis.set_minor_locator(md.DayLocator())
ax.xaxis.set_major_formatter(md.DateFormatter('%d %b %Y'))
mp.tick_params(labelsize=10)
mp.grid(linestyle=':')
dates = dates.astype(md.datetime.datetime)
mp.plot(dates[:-1],bhp_returns,c='orangered',label='BHP')
mp.plot(dates[:-1],vale_returns,c='dodgerblue',label='VALE')
mp.legend()
mp.gcf().autofmt_xdate()
mp.show()
15.多项式拟合
1、概念
任何可微的函数都可以用一个N次多项式来拟合,
而比N次幂更高阶的部分作为无穷小量而被忽略不计。
F(x)=P0x^n + P1x^(n-1) + P2x^(n-2) +... + Pn = Y
注:Pnx^(n-n) = Pnx^0 = Pn
Y1' = F(x1) -->Y1
Y2' = F(x2) -->Y2
...
Yn' = F(xn) -->Yn
(Y1-Y1')^2 + (Y2-Y2')^2 + ... + (Yn-Yn')^2 = loss(P0...Pn)
抛出问题:
P0...Pn =? 时,loss为最小 ,即为多元函数,寻找最小值
2、多项式训练(拟合):
P = numpy.ployfit(X样本,Y样本,最高次幂n)
即:X样本 = [x1,x2,...,xn]
Y样本 = [Y1,Y2,...,Yn]
P = [P0,P1,...,Pn]
3、拟合(预测)函数曲线:
Y' = numpy.polyval(P,X)
Y' = [Y1',Y2',...,Yn']
4、导函数的系数:
Q = np.polyder(P)
例:
Y = 4x^3 + 3x^2 + 2x + 1
则:P = [4,3,2,1]
dy/dx = 12x^2 + 6x +2
所以:Q = [12,6,2] 即为Y的导函数的系数
v
poly_y
5、求解方程根
Y = 0 时,方程在x轴上的交点
即:P0x^n + P1x^(n-1) + P2x^(n-2) +... + Pn = 0 时,求解x
如:4x^3 + 3x^2 + 2x + 1 = 0 ,求x
np.roots(P),可以有虚根
6、两个多项式函数的差函数的系数
如:Y1 = 4x^3 + 3x^2 + 2x + 1 , P1 = [4,3,2,1]
Y2 = 5x^4 + x , P2 = [5,0,0,1,0]
Y1,Y2的差函数为:
Y1-Y2 = -5x^4+4x^3+3x^2+x+1
则差函数的系数为:[-5,4,3,1,1]
即:np.polysub(P1,P2) = [-5,4,3,1,1]
代码:poly.py
示例:
import datetime as dt
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md
def dmy2ymd(dmy):
dmy = str(dmy,encoding='utf-8')
date = dt.datetime.strptime(dmy, '%d-%m-%Y').date()
ymd = date.strftime('%Y-%m-%d')
return ymd
dates,bhp_closing_prices = np.loadtxt(
'./BHP.csv',delimiter=',',usecols=(1,6),unpack=True,
dtype='M8[D],f8',converters={1:dmy2ymd})
vale_closing_prices = np.loadtxt(
'./VALE.csv',delimiter=',',usecols=(6),unpack=True)
diff_closing_prices = bhp_closing_prices - vale_closing_prices #两支股票收盘价差价
days = dates.astype(int)
P = np.polyfit(days,diff_closing_prices,4) #多项式训练(拟合)
dY = np.polyval(P,days) #拟合(预测)函数值 应该同 diff_closing_prices接近
Q = np.polyder(P) #求导函数的系数
roots_x = np.roots(Q) #求解方程根 得天数x坐标值
roots_y = np.polyval(P,roots_x) #求得y坐标值
#print(roots)
mp.figure('多项式拟合',facecolor='lightgray')
mp.title('Polynomial Fitting',fontsize=20)
mp.xlabel('Date',fontsize=14)
mp.ylabel('Differencs Price',fontsize=14)
ax = mp.gca()
ax.xaxis.set_major_locator(md.WeekdayLocator(
byweekday=md.MO))
ax.xaxis.set_minor_locator(md.DayLocator())
ax.xaxis.set_major_formatter(md.DateFormatter('%d %b %Y'))
mp.tick_params(labelsize=10)
mp.grid(linestyle=':')
dates = dates.astype(md.datetime.datetime)
mp.plot(dates,dY,c='dodgerblue',label='Polynomial Fitting')
mp.scatter(dates,diff_closing_prices,c='orangered',label='Differencs Price',
alpha=0.5,s=60)
#将天数转为日期
roots_x = roots_x.astype(int).astype('M8[D]').\
astype(md.datetime.datetime)
#maker,点形状 ,^上三角形,v下三角形,<左三角形,>三角形
mp.scatter(roots_x,roots_y,marker='^',s = 80 ,
c = 'orangered',label='Peek',zorder=4 )
mp.legend()
mp.gcf().autofmt_xdate()
mp.show()
16.符号数组
1、numpy.sign()
函数返回参数数组中每个元素的符号,
分别用+1/-1/0表示。
a: [12 -21 18 0 32 -27 -66]
b = numpy.sign(a)
b: [1 -1 1 0 1 -1 -1]
代码:obv.py
示例:
import datetime as dt
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md
def dmy2ymd(dmy):
dmy = str(dmy,encoding='utf-8')
date = dt.datetime.strptime(dmy, '%d-%m-%Y').date()
ymd = date.strftime('%Y-%m-%d')
return ymd
dates,closing_prices,volumes = np.loadtxt(
'./BHP.csv',delimiter=',',usecols=(1,6,7),unpack=True,
dtype='M8[D],f8,f8',converters={1:dmy2ymd})
diff_closing_prices = np.diff(closing_prices) #收益有正负
sign_closing_prices = np.sign(diff_closing_prices) #取出收益正负的符号
obvs = volumes[1:] * sign_closing_prices #通过正负符号判断当天成绩量收益
mp.figure('金额成交量',facecolor='lightgray')
mp.title('On_Balance Volume',fontsize=20)
mp.xlabel('Date',fontsize=14)
mp.ylabel('OBV',fontsize=14)
ax = mp.gca()
ax.xaxis.set_major_locator(md.WeekdayLocator(
byweekday=md.MO))
ax.xaxis.set_minor_locator(md.DayLocator())
ax.xaxis.set_major_formatter(md.DateFormatter('%d %b %Y'))
mp.tick_params(labelsize=10)
mp.grid(axis= 'y' ,linestyle=':')
dates = dates.astype(md.datetime.datetime)
mp.bar(dates[1:],obvs,1.0,color='dodgerblue',
edgecolor='white',label='OBV')
mp.legend()
mp.gcf().autofmt_xdate()
mp.show()
2、np.piecewise(源数组,条件序列,取值序列)
针对源数组中的每个元素,检测其是否符合
条件序列中的每个条件,符合,就在取值序列取对应值,
放在目标数组中返回
a: [70,80,60,59,55,62,45]
条件序列b:[a<60,a==60,a>60]
取值序列s:[-1,0,1]
d = np.piecewise(a,b,s)
d = [ 1 1 0 -1 -1 1 -1]
17.矢量化
将一个针对单个数值的处理函数变成针对数组的处理函数
矢量函数 = numpy.vectorize(标量函数)(矢量)
def fun (a, b, c):
return a + b + c
A = np.arange(1, 101)
B = np.arange(101,201)
C = np.arange(201,301)
D = np.zeros(A.size)
for i, (a, b, c) in enumerate (zip (A, B, C)):
D[i] = fun (a, b, c)
D = numpy.vectorize(fun)(A, B, C)
标量函数变成矢量函数。
代码:vec.py
示例:
import numpy as np
def foo(x,y):
return x+y , x-y,x*y
x , y = 1,4
print(foo(x,y))
#如果x,y是数组呢
X,Y = np.array([1,2,3]),np.array([4,5,6])
bar = np.vectorize(foo)
print(bar(X,Y)) # 等价于:np.vectorize(foo)(X,Y)
代码:sim.py
示例:
import datetime as dt
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md
def dmy2ymd(dmy):
dmy = str(dmy,encoding='utf-8')
date = dt.datetime.strptime(dmy, '%d-%m-%Y').date()
ymd = date.strftime('%Y-%m-%d')
return ymd
dates,opening_price,highest_prices,\
lowest_prices,closing_prices = np.loadtxt(
'./BHP.csv',delimiter=',',usecols=(1,3,4,5,6),unpack=True,
dtype='M8[D],f8,f8,f8,f8',converters={1:dmy2ymd})
#计算一天的收益
def profit(opening_price,highest_prices,\
lowest_prices,closing_prices):
buying_price = opening_price * 0.99
if lowest_prices <= buying_price <= highest_prices:
return (closing_prices - buying_price) * 100 / buying_price
return np.nan #无效值
profits = np.vectorize(profit)(opening_price,highest_prices,\
lowest_prices,closing_prices)
nan = np.isnan(profits)
dates,profits = dates[~nan],profits[~nan] #~ np中取反
#盈利的交易日
gain_dates,gain_profits = dates[profits > 0],profits[profits>0]
#亏损的交易日
loss_dates,loss_profits = dates[profits < 0],profits[profits<0]
mp.figure('交易模拟',facecolor='lightgray')
mp.title('Trading',fontsize=20)
mp.xlabel('Date',fontsize=14)
mp.ylabel('Profit',fontsize=14)
ax = mp.gca()
ax.xaxis.set_major_locator(md.WeekdayLocator(
byweekday=md.MO))
ax.xaxis.set_minor_locator(md.DayLocator())
ax.xaxis.set_major_formatter(md.DateFormatter('%d %b %Y'))
mp.tick_params(labelsize=10)
mp.grid(linestyle=':')
if dates.size > 0:
dates = dates.astype(md.datetime.datetime)
mp.plot(dates,profits,c='gray',label='Profit')
mp.axhline(y=profits.mean(),linestyle='--',color='gray')
if gain_dates.size > 0:
gain_dates = gain_dates.astype(md.datetime.datetime)
mp.plot(gain_dates,gain_profits,'o', #'o'表示只要点,不连续
c='orangered',label='Gain Profit')
mp.axhline(y=gain_profits.mean(),linestyle='--',color='orangered')
if loss_dates.size > 0:
loss_dates = loss_dates.astype(md.datetime.datetime)
mp.plot(loss_dates,loss_profits,'o', #'o'表示只要点,不连续
c='dodgerblue',label='Loss Profit')
mp.axhline(y=loss_profits.mean(),linestyle='--',color='dodgerblue')
mp.legend()
mp.gcf().autofmt_xdate()
mp.show()
18.数据平滑
降噪 + 拟合 + 特征识别
降噪的主要方式:卷积 np.convolve(序列,卷积核,卷积类型)
拟合的主要方式:多项式np.polyval(P,X)
特征识别的主要方式:数学方法
例:求两个函数F(x) G(x)的交汇点
因为两个函数在交汇点的坐标相同,即有:
F(x1) = G(x1) ==> F(x1) - G(x1) = 0
即是解方程:F(x) - G(x) = 0
即:获得差函数:P = np.polysub(F(x),G(x))
求解方程根即可np.roots(P)
numpy.hanning()返回一个余弦函数序列,
可被作为卷积核数组,提高移动平均线的质量。
代码:smr.py
示例:
import datetime as dt
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md
def dmy2ymd(dmy):
dmy = str(dmy,encoding='utf-8')
date = dt.datetime.strptime(dmy, '%d-%m-%Y').date()
ymd = date.strftime('%Y-%m-%d')
return ymd
dates,bhp_closing_prices = np.loadtxt(
'./BHP.csv',delimiter=',',usecols=(1,6),unpack=True,
dtype='M8[D],f8',converters={1:dmy2ymd})
vale_closing_prices = np.loadtxt(
'./VALE.csv',delimiter=',',usecols=(6),unpack=True)
#计算收益率:(今天收盘价 - 昨天收盘价)/今天收盘价
bhp_returns = np.diff(bhp_closing_prices)/bhp_closing_prices[:-1]
vale_returns = np.diff(vale_closing_prices)/vale_closing_prices[:-1]
#去除噪声:卷积,合适的时间宽度,去噪效果就比较好
N = 8 #设置时间宽度为8天
weights = np.hanning(N) #卷积核权重, 这是其中一个窗函数
#print(weights) #打印数据类似余弦曲线
weights /= weights.sum()
bhp_smooth_returns = np.convolve(bhp_returns,weights,'valid')
vale_smooth_returns = np.convolve(vale_returns,weights,'valid')
#数据拟合:寻找平滑曲线的方程式
days = dates[N-1:-1].astype(int) #将日期转换为天数
degree = 3 #设定为3次方多项式
bhp_p = np.polyfit(days,bhp_smooth_returns,degree) #拟合
bhp_fitted_returns = np.polyval(bhp_p,days) #获得拟合函数曲线
vale_p = np.polyfit(days,vale_smooth_returns,degree) #拟合
vale_fitted_returns = np.polyval(vale_p,days) #获得拟合函数曲线
#特征识别:取得曲线的交汇点
sub_p = np.polysub(bhp_p,vale_p)#获得差函数
roots = np.roots(sub_p) #解差函数根
roots_x =roots.compress(
(days[0]<=roots)& (roots<=days[-1])) #过滤不在时间范围的根
roots_y = np.polyval(bhp_p,roots_x)
mp.figure('平滑收益率',facecolor='lightgray')
mp.title('smr',fontsize=20)
mp.xlabel('Date',fontsize=14)
mp.ylabel('Returns',fontsize=14)
ax = mp.gca()
ax.xaxis.set_major_locator(md.WeekdayLocator(
byweekday=md.MO))
ax.xaxis.set_minor_locator(md.DayLocator())
ax.xaxis.set_major_formatter(md.DateFormatter('%d %b %Y'))
mp.tick_params(labelsize=10)
mp.grid(linestyle=':')
dates = dates.astype(md.datetime.datetime)
#收益率曲线
mp.plot(dates[:-1],bhp_returns,c='orangered',
label='BHP',alpha = 0.2 )
mp.plot(dates[:-1],vale_returns,c='dodgerblue',
label='VALE',alpha = 0.2)
#平滑曲线
mp.plot(dates[N-1:-1],bhp_smooth_returns,c='orangered',
label='smooth_BHP',alpha = 0.8 )
mp.plot(dates[N-1:-1],vale_smooth_returns,c='dodgerblue',
label='smooth_VALE',alpha = 0.8)
#拟合函数曲线
mp.plot(dates[N-1:-1],bhp_fitted_returns,c='orangered',
label='fitted_BHP',alpha = 0.8 ,linewidth=3)
mp.plot(dates[N-1:-1],vale_fitted_returns,c='dodgerblue',
label='fitted_VALE',alpha = 0.8,linewidth=3)
#绘制交汇点
roots_x = roots_x.astype(int).astype('M8[D]').astype(
md.datetime.datetime)
mp.scatter(roots_x,roots_y,s=50,marker='X',
c='firebrick',lw=2,zorder=3)
mp.legend()
mp.gcf().autofmt_xdate()
mp.show()