python3 urllib.request程序崩溃,如何绕过HTTP错误403：使用Python 3禁止urllib.request

最新推荐文章于 2024-06-25 08:34:47 发布

星球研究所

最新推荐文章于 2024-06-25 08:34:47 发布

阅读量422

点赞数

文章标签： python3 urllib.request程序崩溃

Hi not every time but sometimes when trying to gain access to the LSE code I am thrown the every annoying HTTP Error 403: Forbidden message.

Anyone know how I can overcome this issue only using standard python modules (so sadly no beautiful soup).

import urllib.request

url = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"

infile = urllib.request.urlopen(url) # Open the URL

data = infile.read().decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1

print(data) # Print the data to the screen

However every now and then this is the error I am shown:

Traceback (most recent call last):

File "/home/ubuntu/workspace/programming_practice/Assessment/Summative/removingThe403Error.py", line 5, in

webpage = urlopen(req).read().decode('ISO-8859-1')

File "/usr/lib/python3.4/urllib/request.py", line 161, in urlopen

return opener.open(url, data, timeout)

File "/usr/lib/python3.4/urllib/request.py", line 469, in open

response = meth(req, response)

File "/usr/lib/python3.4/urllib/request.py", line 579, in http_response

'http', request, response, code, msg, hdrs)

File "/usr/lib/python3.4/urllib/request.py", line 507, in error

return self._call_chain(*args)

File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chain

result = func(*args)

File "/usr/lib/python3.4/urllib/request.py", line 587, in http_error_default

raise HTTPError(req.full_url, code, msg, hdrs, fp)

urllib.error.HTTPError: HTTP Error 403: Forbidden

Process exited with code: 1

Link to a list of all the modules that are okay: https://docs.python.org/3.4/py-modindex.html

Many thanks in advance.

解决方案

This is probably due to mod_security. You need to spoof by opening the URL as a browser, not as python urllib.

Here, I corrected your code:

import urllib.request

url = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"

# Open the URL as Browser, not as python urllib

page=urllib.request.Request(url,headers={'User-Agent': 'Mozilla/5.0'})

infile=urllib.request.urlopen(page).read()

data = infile.decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1

print(data) # Print the data to the screen

Next, you can use BeautifulSoup to scrape the HTML.

关注