php请求远程url内容方法

最新推荐文章于 2022-11-30 13:48:08 发布

github_zwl

最新推荐文章于 2022-11-30 13:48:08 发布

阅读量928

点赞数 1

php请求远程url内容方法

php请求远程url内容有两个方法fopen/file_get_contents和curl。

1，fopen/file_get_contents与curl的差异

（1）fopen /file_get_contents 每次请求都会重新做DNS查询，并不对DNS信息进行缓存。但是CURL会自动对DNS信息进行缓存。对同一域名下的网页或者图片的请求只需要一次DNS查询。这大大减少了DNS查询的次数。所以CURL的性能比fopen /file_get_contents 好很多。
（2）fopen /file_get_contents在请求HTTP时，使用的是http_fopen_wrapper，不会keeplive。而curl却可以。这样在多次请求多个链接时，curl效率会好一些。
（3）curl可以模拟多种请求，例如：POST数据，表单提交等，用户可以按照自己的需求来定制请求。而fopen / file_get_contents只能使用get方式获取数据。

2，如果远程服务器关闭，file_get_contents处理方法，可以参考这篇文章，http://www.cnblogs.com/scofi/articles/3607529.html

公司里有经常有这样的业务，需要调用第三方公司提供的HTTP接口，在把接口提供的信息显示到网页上，代码是这样写的: file_get_contents("http://example.com/") 。

有一天突然接到运维同事的报告，说是服务器挂了，查出原因说是因为file_get_contents函数造成的，那么为什么一个函数会把服务器给搞挂掉呢？

经过详细的查询发现第三方公司提供接口已经坏掉了，就是因为接口坏掉了，才导致服务器挂掉。

问题分析如下：

我们代码是“file_get_contents("http://example.com/") “获取一个 URL 的返回内容，如果第三方公司提供的URL响应速度慢，或者出现问题，我们服务器的PHP程序将会一直执行去获得这个URL，我们知道，在 php.ini 中，有一个参数 max_execution_time 可以设置 PHP 脚本的最大执行时间，但是，在 php-cgi(php-fpm) 中，该参数不会起效。真正能够控制 PHP 脚本最大执行时间的是 php-fpm.conf 配置文件中的以下参数： <value name="request_terminate_timeout">0s</value> 　默认值为 0 秒，也就是说，PHP 脚本会一直执行下去，当请求越来越多的情况下会导致php-cgi 进程都卡在 file_get_contents() 函数时，这台 Nginx+PHP 的 WebServer 已经无法再处理新的 PHP 请求了，Nginx 将给用户返回“502 Bad Gateway”。CPU的利用率达到100% ，时间一长服务器就会挂掉。

问题的解决：

已经找到问题，那么我们该怎么解决呢？

当时想到的解决问题的办法就是设置PHP的超时时间，用set_time_limit; 设置超时时间，这样就不会卡住了。代码上线后发现服务器还是会挂掉，好像根本不起作用。后来查了资料才知道，set_time_limit设置的是PHP程序的超时时间，而不是file_get_contents函数读取URL的超时时间。set_time_limit和修改php.ini文件里max_execution_time 效果是一样的。

要设置file_get_contents函数的超时时间，可以用resource $context的timeout参数，代码如下：

 
         $opts  
         =  
         array 
         ( 
        
         'http' 
         => 
         array 
         ( 
        
         'method' 
         => 
         "GET" 
         , 
        
         'timeout' 
         =>10, 
        
         　 ) 
        
         ); 
        
         $context  
         = stream_context_create( 
         $opts 
         ); 
        
         $html  
         = 
         file_get_contents 
         ( 
         'http://www.example.com' 
         , false,  
         $context 
         ); 
        
         echo  
         $html 
         ;

代码中的timeout就是file_get_contents读取url的超时时间。

另外还有一个说法也可以改变读取url的超时时间，就是修改php.ini中的 default_socket_timeout的值，或者 ini_set('default_socket_timeout', 10); 但是我没有测试过不知道行不行。

有了解决方法之后，服务器就不会挂掉了。

http://www.cnblogs.com/scofi/articles/3607533.html

上篇说到我们说到设置file_get_contents超时时间用到了 stream_context_create方法，那么这个方法到底是什么呢？

查了下资料， stream_context_create创建并返回一个文本数据流并应用各种选项，可用于fopen(),file_get_contents()等过程的超时设置、代理服务器、请求方式、头信息设置的特殊过程。这样看起来功能就强大了，不仅仅可以设置超时时间，还可以设置代理服务器，请求方式和头信息，下面我们就测试下吧：

request.php请求页面负责发起请求：

 
           <?php 
          
           $data  
           =  
           array 
           ( 
           "name"  
           =>  
           'test_name' 
           , 
           "content"  
           =>  
           'test_con' 
           ); 
          
           $data  
           = http_build_query( 
           $data 
           ); 
          
           $opts  
           =  
           array 
           ( 
          
           'http' 
           => 
           array 
           ( 
          
           'method' 
           => 
           "POST" 
           , 
          
           'header' 
           => 
           "Content-type: application/x-www-form-urlencoded\r\n" 
           . 
          
           "Content-length:" 
           . 
           strlen 
           ( 
           $data 
           ). 
           "\r\n"  
           . 
          
           "Cookie: foo=bar\r\n"  
           . 
          
           "\r\n" 
           , 
          
           'content'  
           =>  
           $data 
           , 
          
           ) 
          
           ); 
          
           $cxContext  
           = stream_context_create( 
           $opts 
           ); 
          
           $sFile  
           =  
           file_get_contents 
           ( 
           "http://127.0.0.1/reponse.php" 
           , false,  
           $cxContext 
           );   
          
           echo  
           $sFile 
           ; 
          
           ?>

reponse.php被请求的页面：

 
           <?php 
          
           var_dump( 
           $_POST 
           ); 
          
           var_dump( 
           $_COOKIE 
           ); 
          
           ?>

　　运行之后的结果为：

string(132) "array(2) { ["content"]=> string(8) "test_con" ["name"]=> string(9) "test_name" } array(1) { ["foo"]=> string(3) "bar" } "

说明file_get_contents可以post数据和cookie数据到目标url，并获得内容。

3，curl的用法总结， http://www.cnblogs.com/scofi/articles/3607538.html

（1）使用curl，get获取数据

 
            <?php 
           
            $url  
            =  
            'http://www.example.com' 
            ; 
           
            //初始化一个 cURL 对象 
           
            $ch   
            = curl_init(); 
           
            //设置你需要抓取的URL 
           
            curl_setopt( 
            $ch 
            , CURLOPT_URL,  
            $url 
            ); 
           
            // 设置cURL 参数，要求结果保存到字符串中还是输出到屏幕上。 
           
            curl_setopt( 
            $ch 
            , CURLOPT_RETURNTRANSFER, 1); 
           
            //是否获得跳转后的页面 
           
            curl_setopt( 
            $ch 
            , CURLOPT_FOLLOWLOCATION, 1); 
           
            $data  
            = curl_exec( 
            $ch 
            ); 
           
            curl_close( 
            $ch 
            ); 
           
            echo  
            $data 
            ; 
           
            ?>

（2）使用curl。post获取数据

 
            <?php 
           
            function  
            curl_post( 
            $url 
            ,  
            $arr_data 
            ){ 
           
            $post_data  
            = http_build_query( 
            $url_data 
            ); 
           
            $ch  
            = curl_init(); 
           
            curl_setopt( 
            $ch 
            , CURLOPT_URL,  
            $url 
            ); 
           
            curl_setopt( 
            $ch 
            , CURLOPT_RETURNTRANSFER, 1); 
           
            curl_setopt( 
            $ch 
            , CURLOPT_POST, 1); 
           
            curl_setopt( 
            $ch 
            ,  CURLOPT_POSTFLELDS,  
            $post_data 
            ); 
           
            $data  
            = curl_exec( 
            $ch 
            ); 
           
            curl_close( 
            $ch 
            ); 
           
            echo  
            $data 
            ; 
           
            } 
           
            $arr_post  
            =  
            array 
            ( 
           
            'name' 
            => 
            'test_name' 
            , 
           
            'age'    
            => 1 
           
            ); 
           
            curl_post( 
            "http://www.explame.com/" 
            ,  
            $arr_post 
            ); 
           
            ?>

（3）使用代理抓取页面，什么要使用代理进行抓取呢？以google为例吧，如果去抓google的数据，短时间内抓的很频繁的话，你就抓取不到了。google对你的ip地址做限制这个时候，你可以换代理重新抓。

 
            <?php 
           
            $ch  
            = curl_init(); 
           
            curl_setopt( 
            $ch 
            , CURLOPT_URL,  
            "http://google.com" 
            ); 
           
            curl_setopt( 
            $ch 
            , CURLOPT_HEADER, false);  
           
            curl_setopt( 
            $ch 
            , CURLOPT_RETURNTRANSFER, 1); 
           
            //是否通过http代理来传输 
           
            curl_setopt( 
            $ch 
            , CURLOPT_HTTPPROXYTUNNEL, TRUE); 
           
            curl_setopt( 
            $ch 
            , CURLOPT_PROXY, 125.21.23.6:8080);  
           
            //url_setopt($ch, CURLOPT_PROXYUSERPWD, 'user:password');如果要密码的话，加上这个  
           
            $result 
            =curl_exec( 
            $ch 
            ); 
           
            curl_close( 
            $ch 
            ); 
           
            ?>

（4）继续保持本站session调用，在实现用户同步登录的情况下需要共享session,如果要继续保持本站的session,那么要把sessionid放到http请求中。

 
            <?php 
           
            $session_str  
            = session_name(). 
            '=' 
            .session_id(). 
            '; path=/; domain=.explame.com' 
            ; 
           
            session_write_close();  
            //将数据写入文件并且结束session 
           
            $ch  
            = curl_init(); 
           
            curl_setopt( 
            $ch 
            , CURLOPT_URL,  
            $url 
            ); 
           
            curl_setopt( 
            $ch 
            , CURLOPT_HEADER, false); 
           
            curl_setopt( 
            $ch 
            , CURLOPT_RETURNTRANSFER, 1); 
           
            curl_setopt( 
            $ch 
            , CURLOPT_COOKIE,  
            $session_str 
            ); 
           
            $ret  
            = curl_exec( 
            $ch 
            ); 
           
            curl_close( 
            $ch 
            ); 
           
            ?>

github_zwl

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
php请求远程url内容方法

php请求远程url内容方法php请求远程url内容有两个方法fopen/file_get_contents和curl。1，fopen/file_get_contents与curl的差异（1）fopen /file_get_contents 每次请求都会重新做DNS查询，并不对DNS信息进行缓存。但是CURL会自动对DNS信息进行缓存。对同一域名下的网页或者图片的请
复制链接

扫一扫