Urllib是Python内置的HTTP请求库
urllib.request 请求模块
urllib.error 异常处理模块
urllib.parse url解析模块
urllib.robotparser rebots.txt解析模块
用法:
>>> import urllib.request
>>> response=urllib.request.urlopen('http://www.baidu.com')
>>> print(response.read().decode('utf-8'))
import urllib.request
response=urllib.request.urlopen('https://www.python.org')
print(type(response))
响应
响应类型
打开cmd——>jupyter notebook
import urllib.request
response=urllib.request.urlopen('https://www.python.org')
print(type(response))
输出为 <class 'http.client.HTTPResponse'>
状态码,响应头
import urllib.request
response=urllib.request.urlopen('https://www.python.org')
print(response.status)
print(response.getheaders())
print(response.getheader('Server'))
输出为
200
[(‘Server’, ‘nginx’), (‘Content-Type’, ‘text/html; charset=utf-8’), (‘X-Frame-Options’, ‘SAMEORIGIN’), (‘x-xss-protection’, ‘1; mode=block’), (‘X-Clacks-Overhead’, ‘GNU Terry Pratchett’), (‘Via’, ‘1.1 varnish’), (‘Content-Length’, ‘48806’), (‘Accept-Ranges’, ‘bytes’), (‘Date’, ‘Sun, 08 Jul 2018 06:45:33 GMT’), (‘Via’, ‘1.1 varnish’), (‘Age’, ‘2310’), (‘Connection’, ‘close’), (‘X-Served-By’, ‘cache-iad2127-IAD, cache-lax8643-LAX’), (‘X-Cache’, ‘HIT, HIT’), (‘X-Cache-Hits’, ‘3, 362’), (‘X-Timer’, ‘S1531032333.310203,VS0,VE0’), (‘Vary’, ‘Cookie’), (‘Strict-Transport-Security’, ‘max-age=63072000; includeSubDomains’)]
nginx
request
import urllib.request
request=urllib.request.Request('https://python.org')
response =urllib.request.urlopen(request)
print(response.read().decode('utf-8'))