乱码有几种可能如下:
1.页面能获取过来,只是内容乱码而已:可以采用iconv()和mb_convert_encoding()函数进行转码即可
如:
header('content-type:text/html;charset=utf-8;');
$url = 'http://weibo.com/5589516034/profile?topnav=1&wvr=6&is_all=1';
$file = file_get_contents($url);
//打印获得的数据
$output = iconv('gb2312', 'utf-8', $file);
var_dump($output);
2.当内容都无法正常获取的时候,分两种情况第一种是防盗链,第二种是页面需要解压
2.1防盗链情况下解决办法比较简单,模拟浏览器就可以了
2.2需要解压时,我们可以用以下方式
- <?php
- header("content-type:text/html;charset=utf-8");
- $url="http://wthrcdn.etouch.cn/WeatherApi?city=%E5%8C%97%E4%BA%AC";
- $xml = simplexml_load_file("compress.zlib://".$url);
- $json=json_encode($xml);
- $arr=json_decode($json,true);
- print_r($arr);
当cookie认证登陆的时候
如果上述还是无法解决,那么采用以下方案
上面的$header中的数组中cookie的内容是现在浏览器登录,然后打开控制台,把cookie的信息粘贴进来
cookie存储时间有限制。
下面介绍个模拟登录可以长期存取的cookie
<?php header('Content-type:text/html;charset=utf-8'); $loginUrl = 'https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.15)&_=1403138799543'; $cookie_file1 = dirname(__FILE__).'/cookie/cookie1.txt'; $cookie_file2 = dirname(__FILE__).'/cookie/cookie2.txt'; $cookie_file3 = dirname(__FILE__).'/cookie/cookie3.txt'; (在同级目录下建立文件夹cookie,在此目录下分别创建文件cookie1.txt,cookie2.txt,cookie3.txt)。
$u = '此处通行证帐号';//无中文 $p = '此处通行证密码'; $username = base64_encode($u); $password = $p; //登录新浪通行证 $loginData['entry'] = 'sso'; $loginData['gateway'] = '1'; $loginData['from'] = 'null'; $loginData['savestate'] = '30'; $loginData['useticket'] = '0'; $loginData['pagerefer'] = ''; $loginData['vsnf'] = '1'; $loginData['su'] = base64_encode($u); $loginData['service'] = 'sso'; $loginData['sp'] = $password; $loginData['sr'] = '1920*1080'; $loginData['encoding'] = 'UTF-8'; $loginData['cdult'] = '3'; $loginData['domain'] = 'sina.com.cn'; $loginData['prelt'] = '0'; $loginData['returntype'] = 'TEXT'; $login = json_decode(loginPost($loginUrl,$loginData),true); //获取微博cookie $ch = curl_init(); curl_setopt($ch,CURLOPT_URL,$login['crossDomainUrlList'][0]); curl_setopt ( $ch, CURLOPT_SSL_VERIFYPEER, 0 ); curl_setopt ( $ch, CURLOPT_SSL_VERIFYHOST, 2 ); curl_setopt($ch,CURLOPT_HEADER,0); curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); curl_setopt($ch,CURLOPT_COOKIEFILE, $cookie_file1); curl_setopt($ch,CURLOPT_COOKIEJAR,$cookie_file2); $return = curl_exec($ch); curl_close($ch); //通过获取的cookie 登录微博, 自动跳转 $ch = curl_init(); curl_setopt($ch,CURLOPT_URL,"http://weibo.com/at/weibo?topnav=1&wvr=5"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch,CURLOPT_COOKIEFILE, $cookie_file2); curl_setopt($ch,CURLOPT_COOKIEJAR,$cookie_file3); curl_setopt ( $ch, CURLOPT_FOLLOWLOCATION, 1 ); $return = curl_exec($ch); //$info = curl_getinfo($ch); curl_close($ch); echo $return;
//写正则获取所需内容
$preg = '#<title>(.*)</title>#iUs'; preg_match_all($preg,$return,$arr); var_dump($arr); function loginPost($url,$data){ global $cookie_file1 ; //echo $cookie_file ;exit; $ch = curl_init(); curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false); curl_setopt($ch,CURLOPT_POST,1); curl_setopt($ch,CURLOPT_POSTFIELDS, http_build_query($data)); curl_setopt($ch,CURLOPT_COOKIEJAR,$cookie_file1); $return = curl_exec($ch); $info = curl_getinfo($ch); curl_close($ch); return $return; }