matlab获取href,用 MATLAB 抓取网页数据小程序

最新推荐文章于 2024-05-13 08:38:41 发布

weixin_39953356

最新推荐文章于 2024-05-13 08:38:41 发布

阅读量435

点赞数

文章标签： matlab获取href

function main

keyword = '方程';

url0 = 'https://www.ilovematlab.cn/forum-6-1.html'; % MATLAB 基础板块网址

% ----------- 窗口建立 -------------

hfig = figure;

haxes = axes('unit', 'pixel', 'visible', 'off');

pos = get(haxes, 'pos');

hcon = uicontrol('Style','edit', 'pos', pos+[50, 0, 0, 0], 'max', inf);

hcon_text = uicontrol('pos', [pos(1)-50, pos(4)+30, 50, 20], 'style', 'text', 'string', 'keyword:');

hcon_edit_keyword = uicontrol('pos', [pos(1)-60, pos(4)+10, 90, 20], 'style', 'edit', 'string', keyword);

hcon_start = uicontrol('pos', [pos(1)-50, pos(4)-70, 70, 20], 'string', 'start', 'callback', @start_fcn);

hcon_stop = uicontrol('pos', [pos(1)-50, pos(4)-120, 70, 20], 'string', 'stop', 'callback', @stop_fcn);

flag = [];

% GUI_contains 的作用是获取当前 url 中含有关键字的帖子名和对应 url

function [data_url, data_name] = GUI_contains(datain, keyword)

exp = sprintf('.{1,100}%s.{1,100}', keyword);

dataout = regexp(datain, exp, 'match'); % 利用正则表达式匹配关键字对应的 url

count.data_url = 1;

for ii = 1:numel(dataout)

flag_url = regexp(dataout{ii}, 'a href=".*?"', 'match');

if ~isempty(flag_url)

temp1 = flag_url{1}(9:end-1);

data_url{count.data_url} = ['https://www.ilovematlab.com/' temp1];

temp2 = regexp(dataout{ii}, 'a href=".*?".*?>(.*?)', 'tokens'); % 利用正则表达式匹配关键字对应的帖子名

data_name{count.data_url} = temp2{1}{:};

count.data_url = count.data_url+1;

end

end

if isempty(dataout)

data_url = [];

data_name = [];

end

end

% next_page_url 的作用是获取当前 url 下一页的 url

function data_out = next_page_url(datain)

temp = regexp(datain, '.{1,100}下一页', 'match');

url = regexp(temp{1}, 'a href="(.*?)"', 'tokens'); % 利用正则表达式匹配下一页的 url

data_out = ['https://www.ilovematlab.com/' url{:}{:}];

end

function start_fcn(obj, event, handles)

set(hcon, 'string', []);

keyword = get(hcon_edit_keyword, 'string');

if isempty(keyword)

msgbox('please enter the keyword...')

end

flag = true;

while flag

data = webread(url0); % 读取当前 url 的数据

[data_url, data_name] = GUI_contains(data, keyword);

string = get(hcon, 'string');

for ii = 1:numel(data_url)

string = [string; data_name{ii}; data_url{ii}; {''}];

set(hcon, 'string', string); % 在窗口打印含关键字的帖子名和对应 url

drawnow

end

url0 = next_page_url(data); % 获取下一页的 url

end

end

function stop_fcn(obj, event, handles)

flag = false;

end

end

weixin_39953356

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
matlab获取href,用 MATLAB 抓取网页数据小程序

function mainkeyword = '方程';url0 = 'https://www.ilovematlab.cn/forum-6-1.html'; % MATLAB 基础板块网址% ----------- 窗口建立 -------------hfig = figure;haxes = axes('unit', 'pixel', 'visible', 'off');pos = get(h...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。