php爬虫框架使用案例QueryList，将数据爬到mysql数据库

最新推荐文章于 2024-06-28 16:14:27 发布

小张帅三代

最新推荐文章于 2024-06-28 16:14:27 发布

阅读量1.3k

点赞数

分类专栏： php 文章标签： php 张小三爬虫 QueryList 原生

未经允许，不可转载 by张小三

本文链接：https://blog.csdn.net/qq_38313548/article/details/87871413

版权

php 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

准备工具：

QueryList.php （我用的是版本3）下载地址：https://github.com/jae-jae/QueryList

phpQuery.php 下载地址：https://doc.querylist.cc/

phpStudy：其中包含mysql数据库

注意：PHP版本最好在5.4.45的。在phpStudy中修改。

创建连接数据库的php文件，文件名为sql.php：

<?php
/**
 * @Author: Marte
 * @Date:   2019-02-22 02:19:55
 * @Last Modified by:   Marte
 * @Last Modified time: 2019-02-22 02:25:53
 */
$dblocation="localhost:3306";
$user="root";
$password="root";

$con = mysql_connect($dblocation,$user,$password);
if (!$con)
{
    die('Could not connect: ' . mysql_error());

}
mysql_select_db("spider", $con);

在phpstudy的网站目录phpStudy\PHPTutorial\WWW\中创建项目文件夹名为QueryList。

将下载的QueryList.php和phpQuery.php解压后，从解压的文件夹中复制这两个文件夹到项目文件夹QueryList中。

此时，项目的文件夹中的文件如下（demo.php为下一步创建的文件）：

创建demo.php，将自己写的爬虫代码放在这里。

其中代码为：

<?php
//引入两个文件
require 'phpQuery.php';
require 'QueryList.php';
//使用命名空间
use QL\QueryList;

//引入sql文件
include 'sql.php';
try {
    header('Content-type:text/html;charset=utf-8');
   
    $run_times=0;
    while (true) {//由于实验的网址中存在分页，所以用死循环循环页码index_*.html
        //此处为分页的第一页连接
        $hj = QueryList::Query('http://www.bankofchina.com/sourcedb/ffx/index.html',
                array(
                    "url"=>array('tr td','text')/*通过标签的方式选择html中需要爬的数据地址，text表示爬去标签之间的内容*/
                )
        );
        if($run_times!=0){
            $run_times++;
            //此处为爬取的分页数据的分页连接
            $hj = QueryList::Query('http://www.bankofchina.com/sourcedb/ffx/index_'.$run_times.'.html',
                array(
                    "url"=>array('tr td','text')
                )
            );
        }

        //开始通过超链接获取到制定的网页源代码
        $data = $hj->getData(function($x){
            return $x['url'];
        });

        $sige=7;//由于返回的数据是一个数组对象，且根据实际情况分为7个数组为一组，所以此变量的作用是识别每7次就做为一个数据记录，即存入数据库的一条记录
        $model = array();//声明一个空数组，存放循环7次后组成的新数组，此数组的元素作为一条数据的数据字段，用于存放到数据库
        for ($i=0; $i<= count($data); $i++) {//count($data)统计爬到的数据数组的元素个数
            if($sige>0){
                //echo $data[$i];
                array_push($model,$data[$i]);//将爬到的数据数组循环，并将元素作为新的数组元素放在新的数组中
                //echo  "  ";
                $sige--;
            }
            if($sige==0){
                //echo  "<br>";
                //print_r($model);

                //echo  "<br>";

                //sql语句，需要存入的值是新数组的每个元素
                $sql="insert into bank (name,num,timess,enterprice,outprice,moneyss,timesss) VALUES ('".$model[0]."','".$model[1]."','".$model[2]."','".$model[3]."','".$model[4]."','".$model[5]."','".$model[6]."')";
                mysql_query($sql,$con);//开始执行插入语句
                $model = array();//将存入的数据元素从定以的新的数组中删除，将此数组清空
                //echo $sql;
                $sige=7;//清空数组后，将标记循环次数回复到初始状态
            }
        }
    }


    // print_r($hj);
    //print_r($data);
   /* echo '<br><hr><br>';

    //echo "<br><br>";*/
    //echo count($data);
    mysql_close($con);//当while循环结束时，说明所有数据已经爬取完毕，关闭数据库的连接
} catch (Exception $e) {
    mysql_close($con);//以上爬取数据到关闭数据库的代码若有某一行报错，则立即执行到此处，抛出异常，并关闭数据库连接
    echo $e;
}

在浏览器中输入本站点，并访问到demo.php文件。访问成功后数据将存放在数据库中。若想看到结果，请将上诉代码中的输出、打印语句解除注释，访问的时候页面上会显示结果。

小张帅三代

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
打赏
0
评论
php爬虫框架使用案例QueryList，将数据爬到mysql数据库

准备工具：QueryList.php （我用的是版本3）下载地址：https://github.com/jae-jae/QueryListphpQuery.php 下载地址：https://doc.querylist.cc/phpStudy：其中包含mysql数据库注意：PHP版本最好在5.4.45的。在phpStudy中修改。创建连接数据库的php文件，文件名为sql....
复制链接

扫一扫