php curl curloptcookiefile,除了CURLOPT_COOKIEFILE,如何使用PHP curl发送cookie?

提交表单后,我正在从网站上抓取一些内容。问题是该脚本有时会失败,例如,每5次中有2次脚本失败。我正在使用php

curl,COOKIEFILE和COOKIEJAR处理cookie。但是,当我观察到浏览器的已发送标头(从浏览器访问目标网站并使用实时http标头)和php发送的标头时,发现有很多区别。

我的浏览器发送的cookie变量比php curl多得多。我认为这种差异可能是因为javascript可用于设置大多数cookie,但是我对此不确定。

我正在使用以下代码进行抓取,并显示了浏览器和php curl的已发送标头:

$ckfile = tempnam ("/tmp", 'cookiename');

$url = 'https://www.domain.com/firststep';

$poststring = 'variable1=4&variable2=5';

$ch = curl_init ($url);

curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile);

curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile);

curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

curl_setopt ($ch, CURLOPT_POST, 1);

curl_setopt ($ch, CURLOPT_POSTFIELDS, $poststring);

$output = curl_exec ($ch);

curl_close($ch);

$url = 'https://www.domain.com/nextstep';

$poststring = 'variableB1=4&variableB2=5';

$ch = curl_init ($url);

curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile);

curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile);

curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

curl_setopt ($ch, CURLOPT_POST, 1);

curl_setopt ($ch, CURLOPT_POSTFIELDS, $poststring);

curl_setopt($ch, CURLINFO_HEADER_OUT, true);

$output = curl_exec ($ch);

$headers = curl_getinfo($ch, CURLINFO_HEADER_OUT);

curl_close($ch);

print_r($headers);

// Gives:

POST /d-cobs-web/doffers.html;jsessionid=7BC2A5277A4EB07D9A7237A707BE1366 HTTP/1.1

User-Agent: Mozilla

Host: domain.subdomain.nl

Accept: */*

Cookie: JSESSIONID=7BC2A5277A4EB07D9A7237A707BE1366; www-20480=MIFBNLFDFAAA

Content-Length: 187

Content-Type: application/x-www-form-urlencoded

// Where live http headers gives:

POST /d-cobs-web/doffers.html;jsessionid=7BC2A5277A4EB07D9A7237A707BE1366 HTTP/1.1

Host: domain.subdomain.nl

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Accept-Language: nl,en-us;q=0.7,en;q=0.3

Accept-Encoding: gzip, deflate

Content-Type: application/x-www-form-urlencoded; charset=UTF-8

Referer: https://domain.subdomain.nl/dd/doffers.html?returnUrl=https%3A%2F%2Fttcc.subdomain.nl%2Fdd%2Fpreferences.html%3FValueChanged%3Dfalse&BEGBA=&departureDate=13-06-2013&extChangeTime=&pax2=0&bp=&pax1=1&pax4=0&bk=&pax3=0&shopId=&xtpage=&partner=NSINT&bc=&xt_pc=&ov=&departureTime=&comfortClass=2&destination=DEBHF&thalysTicketless=&beneUser=&debugDOffer=&logonId=&valueChanged=&iDomesticOrigin=&rp=&returnTime=&locale=nl_NL&vu=&thePassWeekend=false&returnDate=&xtsite=&pax=A&lc2=&lc1=&lc4=&lc3=&lc6=&lc5=&BECRA=&passType2=&custId=&lc9=&iDomesticDestination=&passType1=A&lc7=&lc8=&origin=NLASC&toporef=&pid=&passType4=&returnTimeType=1&passType3=&departureTimeType=1&socusId=&idr3=&xtn2=&loyaltyCard=&idr2=&idr1=&thePassBusiness=false&cid=14812

Content-Length: 219

Cookie: subdomainPARTNER=NSINT; JSESSIONID=CB3FEB3AC72AD61A80BFED91D3FD96CA; www-20480=MHFBNLFDFAAA; campaignPos=5; www-47873=MGFBNLFDFAAA; __utma=1.993399624.1370027094.1370040145.1370082133.5; __utmc=1; __utmz=1.1370027094.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); BCSessionID=5dc05787-c2c8-43e1-9abe-93989970b087; BCPermissionLevel=PERSONAL; __utmb=1.1.10.1370082133

Connection: keep-alive

Pragma: no-cache

Cache-Control: no-cache

AJAXREQUEST=_viewRoot&doffersForm=doffersForm&doffersForm%3AvalueChanged=&doffersForm%3ArequestValid=true&javax.faces.ViewState=j_id3&doffersForm%3Aj_id937=doffersForm%3Aj_id937&valueChanged=false&AJAX%3AEVENTS_COUNT=1&

我想使用:

$headers = array();

$headers[] = 'Cookie: ' . $cookie;

和:

curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

哪里:

$cookie = 'subdomainPARTNER=NSINT; JSESSIONID=CB3FEB3AC72AD61A80BFED91D3FD96CA; www-20480=MHFBNLFDFAAA; campaignPos=5; www-47873=MGFBNLFDFAAA; __utma=1.993399624.1370027094.1370040145.1370082133.5; __utmc=1; __utmz=1.1370027094.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); BCSessionID=5dc05787-c2c8-43e1-9abe-93989970b087; BCPermissionLevel=PERSONAL; __utmb=1.1.10.1370082133';

我上面的cookie中的某些参数也许可以从网站的内容中获取,但并非全部。其中一些我也许可以从$ ckfile中读取,但是我不知道该怎么做。特别是utma

utmc,utmz,utmcsr,utmccn,utmcmd我无法从任何地方获取,我认为这些是由javascript生成的。

问题1: 由于php

curl发送的cookie变量很少,浏览器发送的cookie变量更多,因此当前代码中的cookie处理是否出错?进一步:浏览器和php

curl发送的头文件之间的其他区别是否会成为返回正确内容的问题?

问题2: 丢失的cookie变量是否是由于javascript设置了那些cookie?

问题3: 处理cookie以确保将所有必需的cookie发送到远程服务器的最佳方法是什么?

非常欢迎您的帮助!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值