基于ThinkPHP5 使用QueryList爬取 并存入mysql数据库

QueryList4教程 地址:

https://doc.querylist.cc/site/index/doc/45

在ThinkPHP5代码根目录执行composer命令安装QueryList:

composer require jaeger/querylist

如果出现 以下错误

Loading composer repositories with package information
Updating dependencies (including require-dev)

    Authentication required (packagist.phpcomposer.com):
      Username:

出现这样的 情况

使用

composer config -g repo.packagist composer https://packagist.laravel-china.org

1-下面演示在Index控制器中使用QueryList:


use QL\QueryList;

public function qulist(){
    $data = QueryList::get('http://maoyan.com/board/4')
    // 设置采集规则
    ->rules([
        // 爬取图片地址
        "src"=>array(".board-wrapper dd img.board-img","data-src"),
        // 爬取电影名
        "name"=>array(".board-wrapper dd .movie-item-info .name","html"),
        // 爬取电影主演信息
        "star"=>array(".board-wrapper dd .movie-item-info .star","html"),
        // 爬取上映时间
        "releasetime"=>array(".board-wrapper dd .movie-item-info .releasetime","html"),
    ])
    ->query()->getData();
    $excel_array=$data->all();
        $city = [];
        foreach($excel_array as $k=>$v) {
            $city[$k]['src'] = $v['src'];
            $city[$k]['name'] = $v['name'];
            $city[$k]['star'] = $v['star'];
            $city[$k]['releasetime'] = $v['releasetime'];
        }
        Db::name("article")->insertAll($city);
    }

如果没有错的 则插入到数据库的

截图

2-如果想继续抓取下一页 数据 则要根据规律来取数据

public function qulist(){
        for($i=0;$i<20;$i++){
            $page=$i*10;
            $data = QueryList::get('http://maoyan.com/board/4?offset='.$page)
                // 设置采集规则
                ->rules([
                    // 爬取图片地址
                    "src"=>array(".board-wrapper dd img.board-img","data-src"),
                    // 爬取电影名
                    "name"=>array(".board-wrapper dd .movie-item-info .name","html"),
                    // 爬取电影主演信息
                    "star"=>array(".board-wrapper dd .movie-item-info .star","html"),
                    // 爬取上映时间
                    "releasetime"=>array(".board-wrapper dd .movie-item-info .releasetime","html"),
                ])
                ->query()->getData();
            $excel_array=$data->all();
            $city = [];
            foreach($excel_array as $k=>$v) {
                $city[$k]['src'] = $v['src'];
                $city[$k]['name'] = $v['name'];
                $city[$k]['star'] = $v['star'];
                $city[$k]['releasetime'] = $v['releasetime'];
            }
            Db::name("article")->insertAll($city);

        }

    }

3- 继续抓取下一页 数据 并判断数据库是否存在 存在不爬虫 不存在继续填满 ,并且休眠几秒再爬取

public function qulist(){
        set_time_limit(0);  //防止程序响应30秒后  报错
        for($i=0;$i<20;$i++){
            $page=$i*10;
            $data = QueryList::get('http://maoyan.com/board/4?offset='.$page)
                // 设置采集规则
                ->rules([
                    // 爬取图片地址
                    "src"=>array(".board-wrapper dd img.board-img","data-src"),
                    // 爬取电影名
                    "name"=>array(".board-wrapper dd .movie-item-info .name","html"),
                    // 爬取电影主演信息
                    "star"=>array(".board-wrapper dd .movie-item-info .star","html"),
                    // 爬取上映时间
                    "releasetime"=>array(".board-wrapper dd .movie-item-info .releasetime","html"),
                ])
                ->query()->getData();
            $excel_array=$data->all();
            $city = [];
            foreach($excel_array as $k=>$v) {
                $find = Db::name("article")->where("src", "=",$excel_array[$k]["src"])->find();
                if (empty($find)) {
                    $city[$k]['src'] = $v['src'];
                    $city[$k]['name'] = $v['name'];
                    $city[$k]['star'] = $v['star'];
                    $city[$k]['releasetime'] = $v['releasetime'];
                }
            }
            if (!empty($city)){
                Db::name("article")->insertAll($city);
            }
            /*
             * 暂停时间 为2秒执行
             */
            sleep(2);
        }
    }
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值