Matlab抓取网页数据

本文示例借助正则表达式regexp进行语法识别,抓取网页数据:

代码:

url='http://quote.eastmoney.com/stock_list.html';
[str status]=urlread(url,'Charset','GBK');

%上海股票
suf='ss';
scmp='<li><a target="_blank" href="http://quote.eastmoney.com/sh\d+\.html">(.{1,10})\((\d+)\)</a></li>';
%深圳股票
%suf='sz';
%scmp='<li><a target="_blank" href="http://quote.eastmoney.com/sz\d+\.html">(.{1,10})\((\d+)\)</a></li>';

if status
    sdata=regexp(str,scmp,'tokens');
else
    error('download error');
end

ls=length(sdata);
s=cell(ls,2);
for i=1:ls
    s{i,1}=[sdata{i}{2},'.',suf];
    s{i,2}=sdata{i}{1};
end

root=[pwd, '\'];
filename=[root,'stocklist_',suf,'.mat'];
save(filename, 's');


输出:

'166105.ss' '信达增利'
'201000.ss' 'R003'
'201001.ss' 'R007'
'201002.ss' 'R014'
'201003.ss' 'R028'
'201004.ss' 'R091'
'201005.ss' 'R182'
'201008.ss' 'R001'
'201009.ss' 'R002'
'201010.ss' 'R004'
'202001.ss' 'RC001'
'202003.ss' 'RC003'
'202007.ss' 'RC007'
'203007.ss' '0501R007'
'203008.ss' '0501R028'
'203009.ss' '0501R091'
'203016.ss' '0504R007'
'203017.ss' '0504R028'
'203018.ss' '0504R091'
'203019.ss' '0505R007'
'203020.ss' '0505R028'
'203021.ss' '0505R091'
'203031.ss' '0509R007'
'203032.ss' '0509R028'
'203033.ss' '0509R091'
'203040.ss' '0512R007'
'203041.ss' '0512R028'
'203042.ss' '0512R091'
'203043.ss' '0513R007'
'203044.ss' '0513R028'
'203045.ss' '0513R091'
'203049.ss' '0601R007'
'203050.ss' '0601R028'
'203051.ss' '0601R091'
'203052.ss' '0603R007'
'203053.ss' '0603R028'
'203054.ss' '0603R091'
'204001.ss' 'GC001'
'204002.ss' 'GC002'
'204003.ss' 'GC003'
'204004.ss' 'GC004'
'204007.ss' 'GC007'
'204014.ss' 'GC014'
'204028.ss' 'GC028'
'204091.ss' 'GC091'
'204182.ss' 'GC182'
'500001.ss' '基金金泰'
'500002.ss' '基金泰和'
'500003.ss' '基金安信'
'500005.ss' '基金汉盛'
'500006.ss' '基金裕阳'
'500007.ss' '基金景阳'
'500008.ss' '基金兴华'
'500009.ss' '基金安顺'
'500011.ss' '基金金鑫'
'500015.ss' '基金汉兴'
'500018.ss' '基金兴和'
'500029.ss' '基金科讯'
'500038.ss' '基金通乾'
'500056.ss' '基金科瑞'
'500058.ss' '基金银丰'
'502000.ss' '500等权'
'502001.ss' '500等权A'
'502002.ss' '500等权B'
'502006.ss' '国企改革'
'502007.ss' '国企改A'
'502008.ss' '国企改B'
'502013.ss' '一带一路'
'502014.ss' '一带一A'
'502015.ss' '一带一B'
'502020.ss' '国金50'
'502021.ss' '国金50A'
'502022.ss' '国金50B'
'502036.ss' '互联金融'
'502037.ss' '网金A'
'502038.ss' '网金B'
'502048.ss' '50分级'
'502049.ss' '上证50A'
'502050.ss' '上证50B'
'505888.ss' '嘉实元和'
'510010.ss' '治理ETF'
'510020.ss' '超大ETF'
'510030.ss' '价值ETF'
'510050.ss' '50ETF'
'510060.ss' '央企ETF'
'510061.ss' '央企申赎'
'510070.ss' '民企ETF'
'510090.ss' '责任ETF'
'510110.ss' '周期ETF'
'510120.ss' '非周ETF'
'510130.ss' '中盘ETF'
'510150.ss' '招商上证消费80'
'510160.ss' '小康ETF'
'510170.ss' '商品ETF'
'510180.ss' '180ETF'
'510190.ss' '龙头ETF'
'510210.ss' '综指ETF'
'510220.ss' '中小ETF'
'510230.ss' '金融ETF'
'510260.ss' '新兴ETF'
'510270.ss' '国企ETF'
'510280.ss' '成长ETF'
'510290.ss' '380ETF'
'510300.ss' '300ETF'
'510310.ss' 'HS300ETF'
'510330.ss' '华夏300'
'510410.ss' '资源ETF'
'510420.ss' '180EWETF'
'510430.ss' '50等权'
'510440.ss' '500沪市ETF'
'510450.ss' '180高ETF'
'510500.ss' '500ETF'
'510510.ss' '广发500'
'510520.ss' '诺安500'
'510560.ss' '国寿500'
'510610.ss' '能源行业'
'510620.ss' '材料行业'
'510630.ss' '消费行业'
'510650.ss' '金融行业'
'510660.ss' '医药行业'
'510680.ss' '万家380'
'510700.ss' '百强ETF'
'510710.ss' '上50ETF'
'510880.ss' '红利ETF'
'510900.ss' 'H股ETF'
'511010.ss' '国债ETF'
'511210.ss' '企债ETF'
'511220.ss' '城投ETF'
'511800.ss' '易货币'
'511860.ss' '博时货币'
'511880.ss' '银华日利'
'511990.ss' '华宝添益'
'512010.ss' '医药ETF'
'512070.ss' '非银ETF'
'512210.ss' '景顺食品'
'512220.ss' '景顺TMT'
'512230.ss' '景顺医药'
'512300.ss' '500医药'
'512310.ss' '500工业'
'512340.ss' '500原料'
'512500.ss' '中证500'
'512510.ss' 'ETF500'
'512600.ss' '主要消费'
'512610.ss' '医药卫生'
'512640.ss' '金融地产'
'512990.ss' 'MSCIA股'
'513030.ss' '德国30'
'513100.ss' '纳指ETF'
'513500.ss' '标普500'
'513600.ss' '恒指ETF'
'513660.ss' '恒生通'
'518800.ss' '国泰黄金'
'518880.ss' '黄金ETF'
'580012.ss' '云化CWB1'
'580013.ss' '武钢CWB1'
'580014.ss' '深高CWB1'
'580016.ss' '上汽CWB1'
'580017.ss' '赣粤CWB1'
'580019.ss' '石化CWB1'
'580020.ss' '上港CWB1'
'580021.ss' '青啤CWB1'
'580022.ss' '国电CWB1'
'580023.ss' '康美CWB1'
'580024.ss' '宝钢CWB1'
'580025.ss' '葛洲CWB1'
'580026.ss' '江铜CWB1'
'580027.ss' '长虹CWB1'
'600000.ss' '浦发银行'
'600001.ss' '邯郸钢铁'
'600002.ss' '齐鲁石化'
'600003.ss' 'ST东北高'
'600004.ss' '白云机场'
'600005.ss' '武钢股份'
'600006.ss' '东风汽车'
'600007.ss' '中国国贸'
'600008.ss' '首创股份'
'600009.ss' '上海机场'
'600010.ss' '包钢股份'
'600011.ss' '华能国际'
'600012.ss' '皖通高速'
'600015.ss' '华夏银行'
'600016.ss' '民生银行'
'600017.ss' '日照港'
'600018.ss' '上港集团'
'600019.ss' '宝钢股份'
'600020.ss' '中原高速'
'600021.ss' '上海电力'
'600022.ss' '山东钢铁'
'600023.ss' '浙能电力'
'600026.ss' '中海发展'
'600027.ss' '华电国际'
'600028.ss' '中国石化'
'600029.ss' '南方航空'
'600030.ss' '中信证券'
'600031.ss' '三一重工'
'600033.ss' '福建高速'
'600035.ss' '楚天高速'
'600036.ss' '招商银行'
'600037.ss' '歌华有线'
'600038.ss' '中直股份'
'600039.ss' '四川路桥'
'600048.ss' '保利地产'
'600050.ss' '中国联通'
'600051.ss' '宁波联合'
'600052.ss' '浙江广厦'
'600053.ss' '中江地产'
'600054.ss' '黄山旅游'
'600055.ss' 'XD华润万'
'600056.ss' '中国医药'
'600057.ss' '象屿股份'
'600058.ss' '五矿发展'
'600059.ss' '古越龙山'
'600060.ss' '海信电器'
'600061.ss' '国投安信'
'600062.ss' '华润双鹤'
'600063.ss' '皖维高新'
'600064.ss' '南京高科'
'600065.ss' '*ST联谊'
'600066.ss' '宇通客车'
'600067.ss' '冠城大通'
'600068.ss' '葛洲坝'
'600069.ss' '*ST银鸽'
'600070.ss' '浙江富润'
'600071.ss' '*ST光学'
'600072.ss' '钢构工程'
'600073.ss' '上海梅林'
'600074.ss' '保千里'
'600075.ss' '新疆天业'
'600076.ss' '青鸟华光'
'600077.ss' '宋都股份'
'600078.ss' '澄星股份'
'600079.ss' '人福医药'
'600080.ss' '金花股份'
'600081.ss' 'XD东风科'
'600082.ss' '海泰发展'
'600083.ss' '博信股份'
'600084.ss' '中葡股份'
'600085.ss' '同仁堂'
'600086.ss' '东方金钰'
'600087.ss' '退市长油'
'600088.ss' '中视传媒'
'600089.ss' 'XD特变电'
'600090.ss' '啤酒花'
'600091.ss' '*ST明科'
'600092.ss' 'S*ST精密'
'600093.ss' '禾嘉股份'
'600094.ss' '大名城'
'600095.ss' '哈高科'
'600096.ss' '云天化'
'600097.ss' '开创国际'
'600098.ss' '广州发展'
'600099.ss' '林海股份'
'600100.ss' '同方股份'
'600101.ss' '明星电力'
'600102.ss' '莱钢股份'
'600103.

  • 2
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值