PHP网络爬虫，实现数据挖掘功能

最新推荐文章于 2024-06-25 14:23:25 发布

weixin_30455023

最新推荐文章于 2024-06-25 14:23:25 发布

阅读量209

点赞数

文章标签： php 爬虫

原文链接：http://www.cnblogs.com/wrpuser/p/8425243.html

版权

php实现实时获取当天天气小工具

//获取天气预报网站的网页内容
$html = file_get_contents("http://www.weather.com.cn/weather1d/101210101.shtml");
//正则表达式
$reg = '#hour3data.+?\[".+?,.+?,(?&lt;tianqi&gt;.+?),(?&lt;wendu&gt;.+?),#';
//如果匹配成功，就输出温度相关的信息
if(preg_match($reg, $html, $mat)){
    echo "今天".$mat['tianqi'].",温度".$mat['wendu'];
}

根据qq号获取昵称和头像

$url = "http://r.pengyou.com/fcg-bin/cgi_get_portrait.fcg?uins=1579715173";
$html = file_get_contents($url);
$reg = '#.+?\["(.+?)",.+?,.+?,.+?,.+?,.+?,"(.+?)"#';
if(preg_match($reg, $html, $mat)){
    //由于防盗链，无法直接使用腾讯的头像链接，所以要先下载到本地
    file_put_contents("1.jpg",file_get_contents($mat[1]));
    echo "&lt;img src='./1.jpg' /&gt;".$mat[2];
}

根据ip获取地址信息

$ip = "14.215.177.38";
$html = file_get_contents("http://ip.chinaz.com/".$ip);
$regex = '#&lt;p class="WhwtdWrap bor-b1s col-gray03"&gt;[\s\S]+?&lt;span class="Whwtdhalf w50-0"&gt;(.+?)&lt;/span&gt;[\s\S]+?&lt;/p&gt;#';
if(preg_match($regex, $html, $mat)){
    echo $mat[1];
}

从起点采集一本指定的小说所有的章节内容，合并到一个txt文件

$html = file_get_contents("http://book.qidian.com/info/1004608738");
$regex = '#&lt;li data-rid="\d+?"&gt;&lt;a href="(.+?)"[\s\S]+?&gt;(.+?)&lt;/a&gt;[\s\S]+?&lt;/li&gt;#';
if(preg_match_all($regex, $html, $mats)){
    foreach($mats[1] as $k =&gt; $v){
        $html1 = file_get_contents("http:".$v);
        $regex1 = '#&lt;div class="read-content j_readContent"&gt;([\s\S]+?)&lt;/div&gt;#';
        //匹配内容
        if(preg_match($regex1, $html1, $mat)){
            $mat[1] = preg_replace('#&lt;.+?&gt;|\s+?#', "",$mat[1]);
            $content = "\r\n".$mats[2][$k]."\r\n".$mat[1];
            file_put_contents("1.txt", $content, FILE_APPEND);
        } else {
            echo "内容没有匹配成功";
        }
        echo $mats[2][$k]."\n";
    }
}

转载于:https://www.cnblogs.com/wrpuser/p/8425243.html

weixin_30455023

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
PHP网络爬虫，实现数据挖掘功能

php实现实时获取当天天气小工具//获取天气预报网站的网页内容$html = file_get_contents("http://www.weather.com.cn/weather1d/101210101.shtml");//正则表达式$reg = '#hour3data.+?\[".+?,.+?,(?&lt;tianqi&gt;.+?),(?&lt;...
复制链接

扫一扫