phpcms V9支持采集https网页的修改方法

最新推荐文章于 2023-12-17 12:09:21 发布

qq_37308775

最新推荐文章于 2023-12-17 12:09:21 发布

阅读量372

点赞数

原文链接：https://www.42rw.com/jiaocheng/202202/79.html

版权

关键词由CSDN通过智能技术生成

不知道有没有小伙伴在使用PHPCMS V9系统的时候，在采集网页内容时，有些网页可以采集，有些网页不能采集，经过分析，小编得出结论是，http前缀的网页是可以采集的，但是通过SSL加密的HTTPS网页就无法采集，怎么办呢？只要善于思考，办法总比困难多，小编就把修改方法记录下来，顺便给大家做个参考。

经过分析，PHPCMS无法采集https网页内容的主要原因是，https不支持file_get_contents获取内容，所以可以考虑采用curl的方式获取。（需要开启curl，可以在pathinfo里边查看）

1.宝塔用户可以在SSH终端输入以下命令安装curl。

yum -y install curl 就行了

2.打开phpcms\modules\collection\classes\collection.class.php，在里面添加下列函数：

protected static function curl_request($url){
        if (!function_exists('curl_init')) {
            throw new Exception('server not install curl');
        }
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL,$url);
        curl_setopt($ch, CURLOPT_HEADER,0);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);//禁止调用时就输出获取到的数据
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,false);
        curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,false);
        $result = curl_exec($ch);
        curl_close($ch);
        return $result;
    }

3.在同一个文件中找到function get_htm函数

将：

protected static function get_html($url, &$config) {
        if (!empty($url) && $html = @file_get_contents($url)) {
            if ($syscharset != $config['sourcecharset'] && $config['sourcetype'] != 4) {
                $html = iconv($config['sourcecharset'], CHARSET.'//TRANSLIT//IGNORE', $html);
            }
            return $html;
        } else {
            return false;
        }
    }

修改为：

protected static function get_html($url, &$config) {
        if(substr(trim($url),0, 5) == "https"){
             $html = @self::curl_request($url);
        }else{
             $html = @file_get_contents($url);
        }
        if (!empty($url) && $html) {
            if ($syscharset != $config['sourcecharset'] && $config['sourcetype'] != 4) {
                $html = iconv($config['sourcecharset'], CHARSET.'//TRANSLIT//IGNORE', $html);
            }
            return $html;
        } else {
            return false;
        }
    }

大功告成，看采集效果：