PHP 支持libcurl(允许你用不同的协议连接和沟通不同的服务器)。, libcurl当前支持http, https, ftp, gopher, telnet, dict, file, 和ldap 协议。libcurl同样支持HTTPS证书授权,HTTP POST, HTTP PUT, FTP 上传(当然你也可以使用PHP的ftp扩展), HTTP基本表单上传,代理,cookies,和用户认证。
按我的理解 curl可以简单的看成4个步骤
- 1. $ch = curl.init();
- 2. curl_setopt($ch,CURLOPT_URL,$url);
- 3. curl_exec($ch);
- 4. curl_close($ch);
curl_init
curl_init()函数将初始化一个新的会话,返回一个CURL句柄供curl_setopt(), curl_exec(),和 curl_close() 函数使用。如果可选参数被提供,那么CURLOPT_URL选项将被设置成这个参数的值。你可以使用curl_setopt()函数人工设置。
curl_setopt
bool curl_setopt ( resource $ch , int $option , mixed $value ) 这个函数的第2个参数特别特别多,而且难记...... 只好用一些例子来方便记忆了
要请求的地址
值为0,和非0的值.0不输出响应头,非0相反(默认是不输出响应头的)
值为0,和非0的值.0表示输出响应的内容(默认是输出的)
值为0,和非0的值,如果为非0,响应的内容不会直接输出,而是保存在curl_exec的返回值里面.(默认0)
上面的4个参数,CURLOPT_NOBODY有些无用,其他的都很重要了,CURLOPT_URL就不说了,CURLOPT_HTTPHEADER可以查看返回的信息,看返回结果是什么,CURLOPT_RETURNTRANSFER,让返回的结果,保存在变量里面,可以根据自己的需要提取自己需要的信息.
请求一个静态页
$url = "http://www.baidu.com";
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_HEADER,1);//输出响应头
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);//将返回的内容用curl_exec返回
$c = curl_exec($ch);
curl_close($ch);
echo $c;
做一个http的post请求,请求头的Content-Type默认是application/x-www-form-urlencoded,就像html的表单提交一样(看的手册自己翻译的,哈哈,怕错了,还是放出英文原文)
TRUE to do a regular HTTP POST. This POST is the normal application/x-www-form-urlencoded kind, most commonly used by HTML forms.
要post的数据,值可以使数组,就跟html post一样了 在请求头里面的格式,也可以是一个key 其他的数据拼成一个字符串
The full data to post in a HTTP "POST" operation. To post a file, prepend a filename with @ and use the full path. This can either be passed as a urlencoded string like 'para1=val1¶2=val2&...' or as an array with the field name as key and field data as value.
连接结束后保存cookie信息的文件。 它的值为文件名
包含cookie数据的文件名,cookie文件的格式可以是Netscape格式,或者只是纯HTTP头部信息存入文件。
post模拟登陆
模拟登陆cnblogs算是最简单的了,不需要验证码,畏怯好像对请求头,referer都没有限制<?php
define('SCRIPT_ROOT',dirname(__FILE__).'/');
$url = "http://www.cnblogs.com/wtcsy/admin/EditPosts.aspx";
$name = "wtcsy";
$password = "XXXXXXX";
$post = "__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=%2FwEPDwULLTE1MzYzODg2NzZkGAEFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYBBQtjaGtSZW1lbWJlcm1QYDyKKI9af4b67Mzq2xFaL9Bt&__EVENTVALIDATION=%2FwEWBQLWwpqPDQLyj%2FOQAgK3jsrkBALR55GJDgKC3IeGDE1m7t2mGlasoP1Hd9hLaFoI2G05&tbUserName=$name&tbPassword=$password&btnLogin=%E7%99%BB++%E5%BD%95&txtReturnUrl=http%3A%2F%2Fhome.cnblogs.com%2F";
$cookie = SCRIPT_ROOT."cookie.tmp";
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST, 1);
curl_setopt($ch,CURLOPT_COOKIEJAR, $cookie); //保存cookie
curl_setopt($ch,CURLOPT_POSTFIELDS,$post);
curl_exec($ch);
curl_close($ch);
//用保存的cookie去请求其它的地址........
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,"http://www.cnblogs.com/wtcsy/admin/EditPosts.aspx");
curl_setopt($ch,CURLOPT_COOKIEFILE,$cookie);
curl_exec($ch);
curl_close($ch);
?>
post提交数据
这带验证码的,先请求那个验证码,得取到cookie,在请求提交的数据,所以是做了2次请求的
<?php
define('SCRIPT_ROOT',dirname(__FILE__).'/');
$img_url = "http://www.web-youhua.com/include/getcode.asp";
$post_url = "http://www.web-youhua.com/include/comment_save.asp?comment_add=add";
$post = array(
"id"=>"3159",
"submit"=>"XXXX",
);
$header = array(
"Content-Type: application/x-www-form-urlencoded"
);
if(isset($_POST["imgcode"])){
$post["num"] = $_POST["imgcode"];
$post["user"] = $_POST["user"];
$post["content"] = $_POST["content"];
//$post = http_build_query($post);
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$post_url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch,CURLOPT_COOKIEFILE,SCRIPT_ROOT."cookie.tmp");
curl_setopt($ch,CURLOPT_POST, 1);
curl_setopt($ch,CURLOPT_POSTFIELDS,$post);
curl_exec($ch);
curl_close($ch);
exit();
}
if(isset($_REQUEST["action"])){
$action = $_REQUEST["action"];
if($action == "img"){
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$img_url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_COOKIEJAR, SCRIPT_ROOT."cookie.tmp");
$content = curl_exec($ch);
curl_close($ch);
header('Content-Type: text/png');
echo $content;
}
}
?>
<img src="?action=img"/>
<form method="post">
<div>
验证码<input name="imgcode" type="text">
</div>
<div>
姓名<input type="text" maxlength="40" size="25" name="user">
</div>
<div>
内容<textarea rows="6" cols="75" name="content"></textarea>
</div>
<div>
<input type="submit" name="submit" value="tijiao">
</div>
</form>
在请求别人的资源,对于别人来说会增加服务器的负担,所以别人还是会做一些防范措施的,比如看你的referr,看你的客户端的信息啊,就是查看请求头里面的信息,如果没有referer,没有传来客户端浏览器的信息,对方的服务器可能就认为这是一次非正常的浏览器请求,就不会返回正确的数据.
面对这样的情况,我们可以伪造来路,就是伪装成是从别人地方跳过来的,设置请求头的referer,手工配置,请求头的User-Agent,Accept,Accept-Language:信息
这里还有一个问题,如何在curl地址的请求头,fiddler, httpwatch,都是看不到的,他只能抓到响应头,我用的是wireshark来抓包的,他可以抓取所有经过网卡传输的信息
试试请求下www.163.com
<?php
// get 静态页
$header = array(
"Cookie: _ntes_nnid=ebf4fa5f49213b4b4f97ab4c817cdd5b,1359287760647; _ntes_nuid=ebf4fa5f49213b4b4f97ab4c817cdd5b; vjuids=-6869a0ddd.13a4d936ce5.0.b1a806897d8fd; vjlast=1349921238.1359287761.12; USERTRACK=116.204.127.149.1357977352300774; Province=020; City=0755; locOfCh=shenzhen; __ntes__test__cookies=1359287768445; locOfCity=sz; news_cate=default_news; adRandomCookie=3",
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.163.com');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
//curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11');
//注释掉是跳到3g首页的 加上去这个请求头会到电脑版的首页
$htmls = curl_exec($ch);
curl_close($ch);
echo $htmls.'<hr />';
?>
试试请求下mail.10086.cn
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://mail.10086.cn');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_HEADER,0);
$htmls = curl_exec($ch);
curl_close($ch);
echo $htmls.'<hr />';
如果加上一些便要的信息 User-Agent,Accept,Accept-Language, cookie 就可以返回信息了 如下
<?php
// get 静态页
$header = array(
"GET http://mail.10086.cn/default.html HTTP/1.1",
"Host: mail.10086.cn",
"User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20100101 Firefox/17.0",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language: zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3",
//"Accept-Encoding: gzip, deflate",
"Connection: keep-alive",
"Cookie: WT_FPC=id=24e37bdbbf503d770581347846806030:lv=1359105398139:ss=1359105398139; isSave=t; key=uid%3dpL7s%2fq7Mca84um3ur%2bAefzT2WL6xf3lwgOXdy1Sb56g%3d%26vid%3d1B1B2051A120A1D1FE8D5DA747FED0CC; Key=uid%3DMTU4OTk4NzY4NDF2M1drNHxFOTJzJm9Z%26vid%3Ddfde76d9e25578db1d84f54b1c223ade",
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://mail.10086.cn');
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_HEADER,0);
$htmls = curl_exec($ch);
curl_close($ch);
echo $htmls.'<hr />';
?>
在工作中,我要做多国翻译,以前用fiddler挂载baidu的翻译页来批量翻译的(但是还是要收到操作几次),自从学了点curl后,一次提取,直接翻译,减少了至少4次的手工操作..
一个请求baidu翻译接口的例子
<?php
$header = array (
"POST /transcncontent HTTP/1.1",
"Host: fanyi.baidu.com",
"User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20100101 Firefox/17.0",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language: zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3",
//"Accept-Encoding: gzip, deflate",
"Connection: keep-alive",
"Content-Type: application/x-www-form-urlencoded; charset=UTF-8",
"X-Requested-With: XMLHttpRequest",
"Referer: http://fanyi.baidu.com/translate.html",
//"Content-Length: 113",
"Cookie: BAIDUID=3AA047015250B59286C7A793D349A6CE:FG=1; BDUT=7oio3AA047015250B59286C7A793D349A6CE13bd02e37843; BDREFER=%7Burl%3A%22http%3A//news.baidu.com/%3F%22%2Cword%3A%22%22%7D; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1358599413,1359101453; H_PS_PSSID=1446_1884_1788; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1359101459",
"Pragma: no-cache",
"Cache-Control: no-cache",
);
function baiduTranslate($ch="",$val="",$header="",$cn=false){
if($ch == "" || $val == "" || $header == ""){
return false;
}
$word = urlencode($val); //要保证文件的编码是utf-8的 如果不是utf-8先装成utf-8 在用urlencode;
$data = "ie=utf-8&source=txt&query=$word&t=1359102851602&token=3aae3cb6dfb405eb71cfdf6c4dea87cd&from=zh&to=en";
$url = "http://fanyi.baidu.com/transcontent";
//百度翻译有时候第一次返回不出来 会请求另外一个接口,所以要把请求头和post的参数修改一下.........
/*if($cn){
echo "1111111";
$header[0] = "POST /transcontent HTTP/1.1";
$data = "ie=utf-8&source=txt&query=$word&t=1359441358504&token=3aae3cb6dfb405eb71cfdf6c4dea87cd&from=zh&to=en&has=0&type=1";
}*/
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS,$data);
$content = curl_exec($ch);
$is = collect($content,$cn);
return $is ? $is : $val;
}
function collect($str,$cn=false){
if($str==""){
return false;
}
$patten = "/\"dst\":\"([^\"]+)\"/";
preg_match($patten,$str,$m);
if(isset($m[1])){
return $m[1];
}else{
return false;
}
}
$ch = curl_init();
$translate_arr = array();
//$is = baiduTranslate($ch,"就是这样的",$header);
$words = array("你好","什么情况","你个猪","二逼青年欢乐多","就是这样的","你妹的","天王盖地虎","宝塔镇妖河","地振高冈,一派西山千古秀","门朝大海,三河合水万年流");
foreach($words as $i=>$val){
$translate_arr[$val] = baiduTranslate($ch,$val,$header);
}
print_r($translate_arr);
?>
还未学过https ssl, 希望学会了 能写https 模拟登陆的东西。。。。 哦也.........