之前审核没过,前几天跑了一边发现wy财经已经把下载的链接关了(雾),还是把它重新发一遍凑个数好了
最初的思路:利用urlread和regexp爬取历史交易数据,再对数据整合写成xls。直到我发现了wy财经提供了下载历史数据(.csv格式)的链接,wy果然是有态度的网站,于是尝试使用websave逐一保存,可以减少不少工作量。
复制下载链接:
http://quotes.money.163.com/service/chddata.html?code=0600300&start=20000630&end=20221101&fields=TCLOSE;HIGH;LOW;TOPEN;LCLOSE;CHG;PCHG;TURNOVER;VOTURNOVER;VATURNOVER;TCAP;MCAP
发现其中第一段060300是股票代码,0代表上海,1代表深圳,start=开始日期YYMMDD,end=结束日期YYMMDD,fields后是数据类型。
很容易写出下载个股的语句
如果要实现对大a所有股票历史数据的下载,首先想到用枚举法,将上证分为601xxx,602xxx,603xxx...逐个下载,但是这会将不存在的和已退市的个股统统下载下来,所以想到建立一个a股代码池。
找到一个能查询所有a股代码的网站,用urlread读取,正则表达式查找六个一组的数字,将最后一个不是股票的结果删去,注意到每个股票代码都出现了两次,所以需要将查找结果整理一下。
获取股票代码,可以将获取的data保存到一个.mat文件中,使用时直接调用
%update stock data
disp('downloading stock data')
urlstock=['http://www.cgedt.com/stockcode/yilanbiao.asp'];
[sp,sta]=urlread(urlstock);
if sta==0
disp('can not connect to http://www.cgedt.com/stockcode/yilanbiao.asp')
end
stocka=regexp(sp,'\d{6}','match');
stocka{end}=[];
stocka=stocka';
sizes=size(stocka);
sizes=sizes(1);
sizes=sizes-1;
for i=1:(sizes)/2
stocka{i}=stocka{2*i-1};
end
stocka(cellfun(@isempty,stocka))=[];
allstock=stocka;
disp('download completed')
end
这样url中的股票代码就可以直接用代码池中的了,在读取时须num2str。
开始运行时的设置
disp('settings')
stock=input(['input a stock, (empty for all,) stock=']);
if isempty(stock)==0
mkt=input(['input a market number, (SH=0,SZ=1) mkt=']);
end
startd=input(['input a start date, (YYMMDD) startd=']);
endd=input(['input an end date, (YYMMDD) endd=']);
在下载时出现了问题,长时间下载会出现卡顿和无法连接服务器,网站返回500和504。
在不使用ip代理的情况下,首先想到加入一个用户代理池,每下载一定数量后自动更换用户代理,将websave的timeout设置为用户可修改,同时加入停顿,出现卡顿后增加停顿时间。对于500和504,则使用try-catch,多试几次。
%define fake UA,伪用户代码池
UA0=['MATLAB 9.9.0.1467703 (R2020b)'];
UA1=['Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763'];
UA2=['Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'];
UA3=['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36'];
UA4=['Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:65.0) Gecko/20100101 Firefox/65.0'];
UA5=['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.3 Safari/605.1.15'];
UA={UA0;UA1;UA2;UA3;UA4;UA5};
r=2;%不想用matlab默认的UA
n=1;
.........
%auto switch UA 每下载10个改变用户代理
n=n+1;
if n==10
r=r+1;
if r==7
r=2;
end
n=0;
opt=weboptions('timeout',timeoutt,'UserAgent',UA{r});
end
%将下载的时长赋值给t,t大于预设值则改变用户代理,小于预设值调整等待时间
if t>=det
tpause=5;
r=r+1;
if r==7
r=2;
end
opt=weboptions('timeout',timeoutt,'UserAgent',UA{r});
disp('download paused, auto switching Useragent...')
pause(tpause);
disp('resume download')
tpause=.1;
end
if t<det
tpause=.1;
end
再加入一个进度条
f = waitbar(0,'1','Name','Downloading');
........
waitbar(num/2910,f,sprintf(['downloading ',file]))
问题解决啦
再增加一个max-min-normalize的模块,可以将csv文件中每一列单独取出进行均一化
%max-min-normalize
if jdmmn==1
name1=stock0;
datastock=mats;
sizestock=size(datastock);
sizestock1=sizestock(1);
for rr=1:12
try
zerostock=zeros(sizestock1,1);
datastock1=mats(:,rr);
for j=1:sizestock1
if datastock1(j)==0
datastock1(j)=datastock1(j-1);
end
end
maxstock=max(datastock1);
minstock=min(datastock1);
mmn0=zerostock;
for i=1:sizestock
mmn1(i)=(datastock1(i)-minstock)./(maxstock-minstock);
end
mmn1=mmn1';
switch rr
case 1
str=['sp'];
case 2
str=['max'];
case 3
str=['min'];
case 4
str=['kp'];
case 5
str=['qsp'];
case 6
str=['zd'];
case 7
str=['zdf'];
case 8
str=['hs'];
case 9
str=['vol'];
case 10
str=['cj'];
case 11
str=['sz'];
case 12
str=['ltsz'];
end
jdtrans=size(mmn1);
if jdtrans(1)==1
mmn1=mmn1';
end
eval(['mmn',str,name1,'_',num2str(startd),'_',num2str(endd),'=','mmn1',';'])
catch
disp('data error')
end
end
最后用matlab自带的candle画烛图
%candle
if cq==1
name1=['m',stock];
color='y';
namestock=eval(name1);
if isempty(namestock)==0
figure('color','k')
whitebg('black')
kpstock=namestock(:,4);
open=kpstock;
highstock=namestock(:,2);
high=highstock;
lowstock=namestock(:,3);
low=lowstock;
spstock=namestock(:,1);
close=spstock;
dd=size(namestock);
dd=dd(1);
time=days(1:dd);
time=time';
tt=timetable(time,open,high,low,close);
candle(tt,color)
str=name1;
title([str])
else
disp('stock not exist')
end
end
全部代码:
%StockSpiderV3
%This script is used to download stock data from 163 finance
%clear;
clc;
r=2;n=0;det=2.5;num=1;
disp('choose a data source ')
chos=input(['1=download data from the internet,empty=use local data(allstock.mat),input:']);
if isempty(chos)==1
disp('initializing...')
allstock=load('allstock.mat');
allstock=allstock.allstock;
end
if chos==1
%download stock data
disp('downloading stock data')
urlstock=['http://www.cgedt.com/stockcode/yilanbiao.asp'];
[sp,sta]=urlread(urlstock);
if sta==0
disp('can not connect to http://www.cgedt.com/stockcode/yilanbiao.asp')
end
stocka=regexp(sp,'\d{6}','match');
stocka{end}=[];
stocka=stocka';
sizes=size(stocka);
sizes=sizes(1);
sizes=sizes-1;
for i=1:(sizes)/2
stocka{i}=stocka{2*i-1};
end
stocka(cellfun(@isempty,stocka))=[];
allstock=stocka;
disp('download completed')
end
%define fake UA pool
UA0=['MATLAB 9.9.0.1467703 (R2020b)'];
UA1=['Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763'];
UA2=['Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'];
UA3=['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36'];
UA4=['Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:65.0) Gecko/20100101 Firefox/65.0'];
UA5=['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.3 Safari/605.1.15'];
UA={UA0;UA1;UA2;UA3;UA4;UA5};
%download settings
disp('settings')
stock=input(['input a stock, (empty for all,) (SH:000001,SZ=399001) stock=',],'s');
if isempty(stock)==0
size1=input(['input number of stocks input:']);
end
startd=input(['input a start date, (YYMMDD) startd=']);
endd=input(['input an end date, (YYMMDD) endd=']);
timeoutt=input(['empty=inf,timeout=']);
if isempty(timeoutt)==1
timeoutt=Inf;
end
tpause=.1;
opt=weboptions('timeout',timeoutt,'UserAgent',UA{r});
det=2;
wr=input(['to let matlab read csv files, input 1, input:']);
if wr==1
cq=input(['to let matlab plot candlestick chart, input 1, input:']);
jdmmn=input(['to let matlab do Max-Min-Normalize for all data, input 1, input:']);
end
disp('>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>')
disp('download settings')
if isempty(stock)==1
disp(['download all'])
end
disp(['from',num2str(startd)])
disp(['to',num2str(endd)])
if isempty(timeoutt)==0
disp(['downloading timeout is ',num2str(timeoutt)])
end
disp(['pause for ',num2str(tpause)])
if wr==1
disp('inport to matlab')
end
if cq==1
disp('draw candlestick chart')
end
disp('>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>')
%download
disp('start downloading')
%download the stock you typed in
if isempty(stock)==0
stock0=stock;
stockmode1=(regexp(stock0,'\d{6}','match'));
for j=1:size1
tic;
stock0=stockmode1{j};
inum=stock0;
inum=inum(1);
jdmkt=str2double(inum);
switch jdmkt
case 6
mkt=0;
case 0
mkt=1;
case 3
mkt=1;
otherwise
disp('stock not found ')
end
if stockmode1{j}==['000001']
mkt=0;
end
file=[stock0,'_',num2str(startd),'_',num2str(endd),'.csv'];
url=['http://quotes.money.163.com/service/chddata.html?code=',num2str(mkt),stock0,'&start=',num2str(startd),'&end=',num2str(endd),...
'&fields=TCLOSE;HIGH;LOW;TOPEN;LCLOSE;CHG;PCHG;TURNOVER;VOTURNOVER;VATURNOVER;TCAP;MCAP' ];
websave(file,url,opt);
disp(['downloading ',file])
%open file in matlab as a matrix
if wr==1
disp(['matlab is opening ',file] )
mats=xlsread(file);
mats=mats(end:-1:1,:);
for i=1:1
eval(['m',num2str(stock0),'=','mats',';']);
end
end
%max-min-normalize
if jdmmn==1
name1=stock0;
datastock=mats;
sizestock=size(datastock);
sizestock1=sizestock(1);
for rr=1:12
try
zerostock=zeros(sizestock1,1);
datastock1=mats(:,rr);
for j=1:sizestock1
if datastock1(j)==0
datastock1(j)=datastock1(j-1);
end
end
maxstock=max(datastock1);
minstock=min(datastock1);
mmn0=zerostock;
for i=1:sizestock
mmn1(i)=(datastock1(i)-minstock)./(maxstock-minstock);
end
mmn1=mmn1';
switch rr
case 1
str=['sp'];
case 2
str=['max'];
case 3
str=['min'];
case 4
str=['kp'];
case 5
str=['qsp'];
case 6
str=['zd'];
case 7
str=['zdf'];
case 8
str=['hs'];
case 9
str=['vol'];
case 10
str=['cj'];
case 11
str=['sz'];
case 12
str=['ltsz'];
end
jdtrans=size(mmn1);
if jdtrans(1)==1
mmn1=mmn1';
end
eval(['mmn',str,name1,'_',num2str(startd),'_',num2str(endd),'=','mmn1',';'])
catch
disp('data error')
end
end
%candle
if cq==1
name1=['m',stock0];
color='y';
namestock=eval(name1);
if isempty(namestock)==0
figure('color','k')
whitebg('black')
kpstock=namestock(:,4);
open=kpstock;
highstock=namestock(:,2);
high=highstock;
lowstock=namestock(:,3);
low=lowstock;
spstock=namestock(:,1);
close=spstock;
dd=size(namestock);
dd=dd(1);
time=days(1:dd);
time=time';
tt=timetable(time,open,high,low,close);
candle(tt,color)
str=name1;
title([str])
else
disp('stock not exist')
end
end
toc;
end
end
end
%download all stock from http://www.cgedt.com/stockcode/yilanbiao.asp
if isempty(stock)==1
f = waitbar(0,'1','Name','Downloading');
%i want to add a waitbar so for-end is better
for num=1:sizes
tic;
stock=num2str(allstock{num});
%auto choose market
inum=(regexp(stock,'\d{1}','match'));
%inum=inum{j};
inum=inum{1};
jdmkt=str2double(inum);
switch jdmkt
case 6
mkt=0;
case 0
mkt=1;
case 3
mkt=1;
otherwise
mkt=[];
disp('stock not found ')
end
if stock==['000001']
mkt=0;
end
file=[num2str(stock),'_',num2str(startd),'_',num2str(endd),'.csv'];
waitbar(num/sizes,f,sprintf(['downloading ',file]))
url=['http://quotes.money.163.com/service/chddata.html?code=',num2str(mkt),stock,'&start=',num2str(startd),'&end=',num2str(endd),...
'&fields=TCLOSE;HIGH;LOW;TOPEN;LCLOSE;CHG;PCHG;TURNOVER;VOTURNOVER;VATURNOVER;TCAP;MCAP' ];
%in case of 500 or 504, try-catch keeps the script running
try
websave(file,url,opt);
disp(['downloading ',file])
catch
disp('download failed, reconnecting')
pause(5);
r=r+1;
if r==7
r=2;
end
websave(file,url,opt);
end
%auto switch UA
n=n+1;
if n==10
r=r+1;
if r==7
r=2;
end
n=0;
opt=weboptions('timeout',timeoutt,'UserAgent',UA{r});
end
%auto pause and switch UA
t=toc;
%open file in matlab as a matrix
if wr==1
disp(['matlab is opening ',file] )
mats=xlsread(file);
mats=mats(end:-1:1,:);
% for i=1:sizes
eval(['m',stock,'=','mats',';']);
%end
end
%max-min-normalize
if jdmmn==1
name1=stock;
datastock=mats;
sizestock=size(datastock);
sizestock1=sizestock(1);
for rr=1:12
try
zerostock=zeros(sizestock1,1);
datastock1=datastock(:,rr);
for j=1:sizestock1
if datastock1(j)==0
datastock1(j)=datastock1(j-1);
end
end
maxstock=max(datastock1);
minstock=min(datastock1);
mmn0=zerostock;
for i=1:sizestock
mmn1(i)=(datastock1(i)-minstock)./(maxstock-minstock);
end
mmn1=mmn1';
switch rr
case 1
str=['sp'];
case 2
str=['max'];
case 3
str=['min'];
case 4
str=['kp'];
case 5
str=['qsp'];
case 6
str=['zd'];
case 7
str=['zdf'];
case 8
str=['hs'];
case 9
str=['vol'];
case 10
str=['cj'];
case 11
str=['sz'];
case 12
str=['ltsz'];
end
jdtrans=size(mmn1);
if jdtrans(1)==1
mmn1=mmn1';
end
eval(['mmn',str,name1,'_',num2str(startd),'_',num2str(endd),'=','mmn1',';'])
catch
disp('data error')
end
end
%candle
if cq==1
name1=['m',stock];
color='y';
namestock=eval(name1);
if isempty(namestock)==0
figure('color','k')
whitebg('black')
kpstock=namestock(:,4);
open=kpstock;
highstock=namestock(:,2);
high=highstock;
lowstock=namestock(:,3);
low=lowstock;
spstock=namestock(:,1);
close=spstock;
dd=size(namestock);
dd=dd(1);
time=days(1:dd);
time=time';
tt=timetable(time,open,high,low,close);
candle(tt,color)
str=name1;
title([str])
else
disp('stock not exist')
end
end
%auto switch UA
if t>=det
tpause=5;
r=r+1;
if r==7
r=2;
end
%auto pause and switch UA
opt=weboptions('timeout',timeoutt,'UserAgent',UA{r});
disp('download paused, auto switching Useragent...')
pause(tpause);
disp('resume download')
tpause=.1;
end
% auto adjusted pause time
if t<det
tpause=.1;
end
pause(tpause);
end
disp('download complete')
end
end
figure
whitebg('white')
set(gcf,'color','w')
clear allstock chos det file i inum j jdmkt mats mkt n num opt r size1 ...
sizes sp sta stock stock0 stocka stockmode1 timeoutt tpause UA ...
UA0 UA1 UA2 UA3 UA4 UA5 url urlstock wr tt time ...
name1 namestock open spstock str t color cq dd close ans high ...
highstock low lowstock kpstock f spstock datastock datastock1...
jdmmn jdtrans mmn0 mmn1 maxstock minstock rr sizestock sizestock1...
zerostock