《PHP用CURL多线程并发采集》要点:
本文介绍了PHP用CURL多线程并发采集,希望对您有用。如果有疑问,可以联系我们。
相关主题:PHP开发
CURL多线程并发采集一个案例(本站注:其实不是真正的多线程):
$urls = [
'http://m.doudou360.com/bus/i/live.ashx?lid=1132',
'http://m.doudou360.com/bus/i/live.ashx?lid=5491',
'http://m.doudou360.com/bus/i/live.ashx?lid=4501',
'http://m.doudou360.com/bus/i/live.ashx?lid=4531',
'http://m.doudou360.com/bus/i/live.ashx?lid=1131',
'http://m.doudou360.com/bus/i/live.ashx?lid=5492',
'http://m.doudou360.com/bus/i/live.ashx?lid=4502',
'http://m.doudou360.com/bus/i/live.ashx?lid=4532',
];
$mh = curl_multi_init(); //1.初始化
$ch = [];
foreach ( $urls as $i => $url ) {
$ch[$i] = curl_init($url);
curl_setopt($ch[$i], CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch[$i], CURLOPT_TIMEOUT, 5);
curl_setopt($ch[$i], CURLOPT_HTTPHEADER, ['Cookie:CurAreaCode=xiamen']);
curl_multi_add_handle($mh, $ch[$i]); //2.循环增加ch句柄到批处理会话mh
}
/*
下面注释代码是网上相关文章较多使用的一种curl_multi_exec执行方式:
do {
curl_multi_exec($mh, $active);
} while ( $active );
DO-WHILE在整个URL请求期内是死循环,当采集较多URL时容易导致CPU占用100%,不推荐使用!
*/
//推荐写法
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active); //3.运行当前cURL句柄的子连接
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
foreach ( $urls as $i => $url ) {
$data[] = curl_multi_getcontent($ch[$i]); //4.采集数据
curl_multi_remove_handle($mh, $ch[$i]); //5.移除句柄资源
curl_close($ch[$i]); //6.关闭cURL会话
}
curl_multi_close($mh); //7.关闭一组cURL句柄
print_r($data);
API返回的是json数据,数据格式:Array
(
[success] => 1
[message] =>
[result] => Array
(
[name] => 1路
[cur] => Array
(
[lid] => 1182 //线路ID
[sid] => 1155
[dire] => 下行
[time] => Array
(
[0] => 05:50 //首班时间
[1] => 01:00 //末班时间
)
)
[oppo] => Array
(
[lid] => 1181 //反向线路ID
[sid] => 1154
)
[stas] => Array
(
[0] => 火车站小广场
[1] => 梧村车站
[2] => 金榜公园
[3] => 文灶
[4] => 后江埭
[5] => 二市
[6] => 斗西路口
[7] => 眼科医院
[8] => 中华城
[9] => 镇海路
[10] => 大生里
[11] => 博物馆
[12] => 理工学院(思明)
[13] => 厦大
)
[live] => Array //实时公交信息
(
[0] => Array
(
[0] => 6 //公交位置
[1] => 0 //公交状态 0:到达 1:开往
[2] => 2 //公交数量
)
[1] => Array
(
[0] => 13
[1] => 1
[2] => 1
)
)
[next] => 22:30 //下一班公交发车时间
)
)