Matlab 爬虫 Web Scraping with Matlab 02--爬取酷狗TOP500的数据

weixin_30549657

于 2018-05-17 16:53:00 发布

阅读量182

点赞数

文章标签： matlab 爬虫

原文链接：http://www.cnblogs.com/mathpro/p/9051962.html

版权

一、利用的函数

主要利用matlab中的webread 和regexp两个函数

二、爬虫思路

获取源码发现，歌手和歌曲都存在这个标签里

所以使用regexp正则匹配就好了

三、实现

clc;close all;clear all

top = cell(501,2);%
k=1;
top{1,1}='名次';
top{1,2}='歌手-歌名';

for i=1:23
    url = strcat('http://www.kugou.com/yy/rank/home/',num2str(i),'-8888.html?from=rank');
    webdate = webread(url);
    tpn = 'li class=" " title=(.*?)data-index';
    [sdate,~,~] = regexp(webdate,tpn,'tokens');
    sn = length(sdate);
    for j=1:sn

        top{k+1,1}=strcat('第',num2str(k),'名');
        top(k+1,2)=sdate{j};
        k=k+1;
    end
end

　　实现的结果