Python urllib – Python 3 urllib

Python urllib module allows us to access URL data programmatically.

Python urllib模块允许我们以编程方式访问URL数据。

Python URLlib (Python urllib)

  • We can use Python urllib to get website content in python program.

    我们可以使用Python urllib在python程序中获取网站内容。
  • We can also use it to call REST web services.

    我们还可以使用它来调用REST Web服务。
  • We can make GET and POST http requests.

    我们可以发出GET和POST http请求。
  • This module allows us to make HTTP as well as HTTPS requests.

    这个模块允许我们发出HTTP以及HTTPS请求。
  • We can send request headers and also get information about response headers.

    我们可以发送请求标头,还可以获取有关响应标头的信息。

Python urllib GET示例 (Python urllib GET example)

Let’s start with a simple example where we will read the content of Wikipedia home page.

让我们从一个简单的示例开始,我们将阅读Wikipedia主页的内容。

import urllib.request

response = urllib.request.urlopen('https://www.wikipedia.org')

print(response.read())

Response read() method returns the byte array. Above code will print the HTML data returned by the Wikipedia home page. It will not be in human readable format, but we can use some HTML parser to extract useful information from it.

响应read()方法返回字节数组。 上面的代码将打印Wikipedia主页返回HTML数据。 它不是人类可读的格式,但是我们可以使用一些HTML解析器从中提取有用的信息。

带有标头的Python urllib请求 (Python urllib request with header)

Let’s see what happens when we try to run the above program for JournalDev.

让我们看看尝试为JournalDev运行以上程序时会发生什么。

import urllib.request

response = urllib.request.urlopen('https://www.journaldev.com')

print(response.read())

We will get below error message.

我们将收到以下错误消息。

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/pankaj/Documents/PycharmProjects/BasicPython/urllib/urllib_example.py
Traceback (most recent call last):
  File "/Users/pankaj/Documents/PycharmProjects/BasicPython/urllib/urllib_example.py", line 3, in <module>
    response = urllib.request.urlopen('https://www.journaldev.com')
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

It’s because my server doesn’t allow programmatic access to the website data because it’s meant for browsers that can parse HTML data. Usually we can overcome this error by sending User-Agent header in request. Let’s look at the modified program for this.

这是因为我的服务器不允许以编程方式访问网站数据,因为它是供可以解析HTML数据的浏览器使用的。 通常,我们可以通过在请求中发送User-Agent标头来克服此错误。 让我们看一下修改后的程序。

import urllib.request

# Request with Header Data to send User-Agent header
url = 'https://www.journaldev.com'

headers = {}
headers['User-Agent'] = 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17'

request = urllib.request.Request(url, headers=headers)
resp = urllib.request.urlopen(request)

print(resp.read())

We are creating request headers using dictionary and then sending it in the request. Above program will print HTML data received from JournalDev home page.

我们正在使用字典创建请求标头,然后在请求中发送它。 上面的程序将打印从JournalDev主页接收HTML数据。

Python urllib REST示例 (Python urllib REST Example)

REST web services are accessed over HTTP protocols, so we can easily access them using urllib module. I have a simple JSON based demo rest web service running on my local machine created using JSON Server. It’s a great Node module to run dummy JSON REST web services for testing purposes.

REST Web服务通过HTTP协议访问,因此我们可以使用urllib模块轻松访问它们。 我在使用JSON Server创建的本地计算机上运行了一个基于JSON的简单演示剩余Web服务。 这是一个很棒的Node模块,可以运行虚拟JSON REST Web服务以进行测试。

import urllib.request

response = urllib.request.urlopen('https://localhost:3000/employees')

print(response.read())

Notice the console output is printing JSON data.

请注意,控制台输出正在打印JSON数据。

Python urllib响应标头 (Python urllib response headers)

We can get response headers by calling info() function on response object. This returns a dictionary, so we can also extract specific header data from response.

我们可以通过在响应对象上调用info()函数来获取响应头。 这将返回一个字典,因此我们也可以从响应中提取特定的标头数据。

import urllib.request

response = urllib.request.urlopen('https://localhost:3000/employees')

print(response.info())

print('Response Content Type is = ', response.info()["content-type"])

Output:

输出:

X-Powered-By: Express
Vary: Origin, Accept-Encoding
Access-Control-Allow-Credentials: true
Cache-Control: no-cache
Pragma: no-cache
Expires: -1
X-Content-Type-Options: nosniff
Content-Type: application/json; charset=utf-8
Content-Length: 260
ETag: W/"104-LQla2Z3Cx7OedNGjbuVMiKaVNXk"
Date: Wed, 09 May 2018 19:26:20 GMT
Connection: close


Response Content Type is =  application/json; charset=utf-8

Python urllib开机自检 (Python urllib POST)

Let’s look at an example for POST method call.

让我们看一下POST方法调用的示例。

import urllib.request
import urllib.parse

post_url = 'https://localhost:3000/employees'

headers = {}
headers['Content-Type'] = 'application/json'

# POST request encoded data
post_data = urllib.parse.urlencode({'name' : 'David', 'salary'  : '9988'}).encode('ascii')

#Automatically calls POST method because request has data
post_response = urllib.request.urlopen(url=post_url, data=post_data)

print(post_response.read())

When we call urlopen function, if request has data then it automatically uses POST http method. Below image shows the output of above POST call for my demo service.

当我们调用urlopen函数时,如果请求中包含data那么它将自动使用POST http方法。 下图显示了上述演示服务的POST调用的输出。

Reference: API Doc

参考: API文档

翻译自: https://www.journaldev.com/20795/python-urllib-python-3-urllib

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值