DEDE采集文章重复标题URL内容重复
文件路径:include/dedehttpdown.class.php
采集网址类似:
http://www.a.com/b.php?url=a/resources/43/356.html
DEDE源码实际采集的URL会变成下面这样,并不会带上后面的query参数:
http://www.a.com/b.php
function PrivateStartSession($requestType="GET")
{
if ($this->m_scheme == "https") {
$this->m_port = "443";
}
//这里是原来的拼接方式【去除了query】所以url不完整,也就是说带参数的详情页无法采集,所以修改下,例Array ( [scheme] => http [host] => www.xx.com [path] => /b.php [query] => url=a%2Fresources%2F41%2F59.html )
$url = $this->m_scheme.'://'.$this->m_host.':'.$this->m_port.$this->m_path;
$url = $this->m_url;//加上这行。直接使用原始URL采集。
if (function_exists('curl_init') && function_exists('curl_exec')) {
$this->m_ch = curl_init();
curl_setopt($this->m_ch, CURLOPT_URL, $url);//
所以做下修改,如上。