python3 urllib.request程序崩溃,如何绕过HTTP错误403:使用Python 3禁止urllib.request

Hi not every time but sometimes when trying to gain access to the LSE code I am thrown the every annoying HTTP Error 403: Forbidden message.

Anyone know how I can overcome this issue only using standard python modules (so sadly no beautiful soup).

import urllib.request

url = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"

infile = urllib.request.urlopen(url) # Open the URL

data = infile.read().decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1

print(data) # Print the data to the screen

However every now and then this is the error I am shown:

Traceback (most recent call last):

File "/home/ubuntu/workspace/programming_practice/Assessment/Summative/removingThe403Error.py", line 5, in

webpage = urlopen(req).read().decode('ISO-8859-1')

File "/usr/lib/python3.4/urllib/request.py", line 161, in urlopen

return opener.open(url, data, timeout)

File "/usr/lib/python3.4/urllib/request.py", line 469, in open

response = meth(req, response)

File "/usr/lib/python3.4/urllib/request.py", line 579, in http_response

'http', request, response, code, msg, hdrs)

File "/usr/lib/python3.4/urllib/request.py", line 507, in error

return self._call_chain(*args)

File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chain

result = func(*args)

File "/usr/lib/python3.4/urllib/request.py", line 587, in http_error_default

raise HTTPError(req.full_url, code, msg, hdrs, fp)

urllib.error.HTTPError: HTTP Error 403: Forbidden

Process exited with code: 1

Link to a list of all the modules that are okay: https://docs.python.org/3.4/py-modindex.html

Many thanks in advance.

解决方案

This is probably due to mod_security. You need to spoof by opening the URL as a browser, not as python urllib.

Here, I corrected your code:

import urllib.request

url = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"

# Open the URL as Browser, not as python urllib

page=urllib.request.Request(url,headers={'User-Agent': 'Mozilla/5.0'})

infile=urllib.request.urlopen(page).read()

data = infile.decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1

print(data) # Print the data to the screen

Next, you can use BeautifulSoup to scrape the HTML.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值