python requests 重定向_Python的要求:requests.exceptions.TooManyRedirects:突破30重定向

I was trying to crawl this page using python-requests library

import requests

from lxml import etree,html

url = 'http://www.amazon.in/b/ref=sa_menu_mobile_elec_all?ie=UTF8&node=976419031'

r = requests.get(url)

tree = etree.HTML(r.text)

print tree

but I got above error. (TooManyRedirects)

I tried to use allow_redirects parameter but same error

r = requests.get(url, allow_redirects=True)

I even tried to send headers and data alongwith url but I'm not sure if this is correct way to do it.

headers = {'content-type': 'text/html'}

payload = {'ie':'UTF8','node':'976419031'}

r = requests.post(url,data=payload,headers=headers,allow_redirects=True)

how to resolve this error. I've even tried beautiful-soup4 out of curiosity and I got different but same kind of error

page = BeautifulSoup(urllib2.urlopen(url))

urllib2.HTTPError: HTTP Error 301: The HTTP server returned a redirect error that would lead to an infinite loop.

The last 30x error message was:

Moved Permanently

解决方案

Amazon is redirecting your request to http://www.amazon.in/b?ie=UTF8&node=976419031, which in turn redirects to http://www.amazon.in/electronics/b?ie=UTF8&node=976419031, after which you have entered a loop:

>>> loc = url

>>> seen = set()

>>> while True:

... r = requests.get(loc, allow_redirects=False)

... loc = r.headers['location']

... if loc in seen: break

... seen.add(loc)

... print loc

...

http://www.amazon.in/b?ie=UTF8&node=976419031

http://www.amazon.in/electronics/b?ie=UTF8&node=976419031

>>> loc

http://www.amazon.in/b?ie=UTF8&node=976419031

So your original URL A redirects no a new URL B, which redirects to C, which redirects to B, etc.

Apparently Amazon does this based on the User-Agent header. The following works:

>>> s = requests.Session()

>>> s.headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36'

>>> r = s.get(url)

>>> r

This created a session (for ease of re-use and for cookie persistence), and a copy of the Chrome user agent string. The request succeeds (returns a 200 response).

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值