之前写了一篇监测地址跟踪的文章,【基于guzzlehttp库实现广告监测地址检查】,由于不断的演进,目前期望结果是跟踪全部的uri
和code
,所以打算写篇文章说说这个问题。
一、目的:跟踪重定向的路径和状态信息
如果只跟踪重定向信息,使用allow_redirects
的track_redirects=true
即可实现,on_redirect
回调记录相关信息即可。
$onRedirect = function(
RequestInterface $request,
ResponseInterface $response,
UriInterface $uri
) {
echo 'Redirecting! ' . $request->getUri() . ' to ' . $uri . "\n";
};
$conf = [
'allow_redirects' => [
'max' => 10, // allow at most 10 redirects.
'strict' => true, // use "strict" RFC compliant redirects.
'referer' => true, // add a Referer header
'on_redirect' => $onRedirect,
'track_redirects' => true
]
];
//效果
Redirecting! http://sohu.com to https://www.sohu.com/
二、目的:跟踪全部请求路径和状态信息
这样的配置,最后能够拿到全部中间重定向的地址,但是最终地址,无法获取到,本文的目的就是实现[[url, code], [url, code]...]
的跟踪结果,其中url
包括最开始请求的url和最终到达的url
,在request请求配置中有个on_stats
配置,基于该配置实现如上要求。
$client->request('GET', 'http://httpbin.org/stream/1024', [
'on_stats' => function (TransferStats $stats) {
echo $stats->getEffectiveUri() . "\n";
echo $stats->getTransferTime() . "\n";
var_dump($stats->getHandlerStats());
}
]);
部分实现代码如下:
$conf = [
'allow_redirects' => [
'max' => 5, // allow at most 5 redirects.
'strict' => true, // use "strict" RFC compliant redirects.
'referer' => true, // add a Referer header
'track_redirects' => true
],
'timeout' => 10,
'headers' => $this->headers,
'config' => [
'curl' => [
CURLOPT_SSL_VERIFYHOST => false,
CURLOPT_MAXFILESIZE => 1024 * 1024
]
],
'on_stats' => function (TransferStats $stats) use (&$url, &$trace) {
$uri = (string)$stats->getEffectiveUri();
$code = $stats->getResponse()->getStatusCode();
$location = $stats->getResponse()->getHeader('Location');
array_push($trace, ['code' => $code, 'url' => $uri]);
}
];
// 结果
[
{"code":307,"url":"http:\/\/sohu.com"},
{"code":200,"url":"https:\/\/www.sohu.com\/"}
]
这样就方便监测的展示和理解。
参考文档
1 https://guzzle-cn.readthedocs.io/zh_CN/latest/overview.html
2 https://guzzle-cn.readthedocs.io/zh_CN/latest/request-options.html#on-stats
3 https://blog.csdn.net/cjqh_hao/article/details/107880981