QueryList4教程 地址:
https://doc.querylist.cc/site/index/doc/45
在ThinkPHP5代码根目录执行composer命令安装QueryList:
composer require jaeger/querylist
如果出现 以下错误
Loading composer repositories with package information
Updating dependencies (including require-dev)
Authentication required (packagist.phpcomposer.com):
Username:
出现这样的 情况
使用
composer config -g repo.packagist composer https://packagist.laravel-china.org
1-下面演示在Index控制器中使用QueryList:
use QL\QueryList;
public function qulist(){
$data = QueryList::get('http://maoyan.com/board/4')
// 设置采集规则
->rules([
// 爬取图片地址
"src"=>array(".board-wrapper dd img.board-img","data-src"),
// 爬取电影名
"name"=>array(".board-wrapper dd .movie-item-info .name","html"),
// 爬取电影主演信息
"star"=>array(".board-wrapper dd .movie-item-info .star","html"),
// 爬取上映时间
"releasetime"=>array(".board-wrapper dd .movie-item-info .releasetime","html"),
])
->query()->getData();
$excel_array=$data->all();
$city = [];
foreach($excel_array as $k=>$v) {
$city[$k]['src'] = $v['src'];
$city[$k]['name'] = $v['name'];
$city[$k]['star'] = $v['star'];
$city[$k]['releasetime'] = $v['releasetime'];
}
Db::name("article")->insertAll($city);
}
如果没有错的 则插入到数据库的
截图
2-如果想继续抓取下一页 数据 则要根据规律来取数据
public function qulist(){
for($i=0;$i<20;$i++){
$page=$i*10;
$data = QueryList::get('http://maoyan.com/board/4?offset='.$page)
// 设置采集规则
->rules([
// 爬取图片地址
"src"=>array(".board-wrapper dd img.board-img","data-src"),
// 爬取电影名
"name"=>array(".board-wrapper dd .movie-item-info .name","html"),
// 爬取电影主演信息
"star"=>array(".board-wrapper dd .movie-item-info .star","html"),
// 爬取上映时间
"releasetime"=>array(".board-wrapper dd .movie-item-info .releasetime","html"),
])
->query()->getData();
$excel_array=$data->all();
$city = [];
foreach($excel_array as $k=>$v) {
$city[$k]['src'] = $v['src'];
$city[$k]['name'] = $v['name'];
$city[$k]['star'] = $v['star'];
$city[$k]['releasetime'] = $v['releasetime'];
}
Db::name("article")->insertAll($city);
}
}
3- 继续抓取下一页 数据 并判断数据库是否存在 存在不爬虫 不存在继续填满 ,并且休眠几秒再爬取
public function qulist(){
set_time_limit(0); //防止程序响应30秒后 报错
for($i=0;$i<20;$i++){
$page=$i*10;
$data = QueryList::get('http://maoyan.com/board/4?offset='.$page)
// 设置采集规则
->rules([
// 爬取图片地址
"src"=>array(".board-wrapper dd img.board-img","data-src"),
// 爬取电影名
"name"=>array(".board-wrapper dd .movie-item-info .name","html"),
// 爬取电影主演信息
"star"=>array(".board-wrapper dd .movie-item-info .star","html"),
// 爬取上映时间
"releasetime"=>array(".board-wrapper dd .movie-item-info .releasetime","html"),
])
->query()->getData();
$excel_array=$data->all();
$city = [];
foreach($excel_array as $k=>$v) {
$find = Db::name("article")->where("src", "=",$excel_array[$k]["src"])->find();
if (empty($find)) {
$city[$k]['src'] = $v['src'];
$city[$k]['name'] = $v['name'];
$city[$k]['star'] = $v['star'];
$city[$k]['releasetime'] = $v['releasetime'];
}
}
if (!empty($city)){
Db::name("article")->insertAll($city);
}
/*
* 暂停时间 为2秒执行
*/
sleep(2);
}
}