urllib2.open(req).read() 报403的错误：怎么办？

最新推荐文章于 2022-11-09 08:44:42 发布

weixin_30585437

最新推荐文章于 2022-11-09 08:44:42 发布

阅读量372

点赞数

文章标签： python

原文链接：http://www.cnblogs.com/shanguanghui/p/3662647.html

版权

http://www.douban.com/group/topic/18095751/

heads = {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset':'GB2312,utf-8;q=0.7,*;q=0.7',
'Accept-Language':'zh-cn,zh;q=0.5',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Host':HOST,
'Keep-Alive':'115',
'Referer':url,
'User-Agent':'Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.14) Gecko/20110221 Ubuntu/10.10 (maverick) Firefox/3.6.14'}

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)
req = urllib2.Request(url)
opener.addheaders = heads.items()
page = opener.open(req).read()

有些网站可能配置了“防爬”的东东，当用urllib2去get数据的时候return：403，浏览器却是正常的，可以在urllib2的请求中：
1，添加cookies
2，添加http headers

head中不要包含（Accept-Encoding，If-Modified-Since）这两个东东：
'Accept-Encoding':'gzip,deflate',(返回的是压缩后的zip包)
'If-Modified-Since':'Fri, 04 Mar 2011 06:35:06 GMT',（返回Error 304 Not Modified）

转载于:https://www.cnblogs.com/shanguanghui/p/3662647.html

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_30585437

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
urllib2.open(req).read() 报403的错误：怎么办？

http://www.douban.com/group/topic/18095751/heads = {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Charset':'GB2312,utf-8;q=0.7,*;q=0.7','Accept-Language':'z...
复制链接

扫一扫