HTTP的请求与响应

最新推荐文章于 2021-08-23 17:32:42 发布

LA Lee

最新推荐文章于 2021-08-23 17:32:42 发布

阅读量347

点赞数

分类专栏：爬虫知识文章标签： python爬虫

本文链接：https://blog.csdn.net/DFIE1234/article/details/88648679

版权

爬虫知识专栏收录该内容

1 篇文章 0 订阅

订阅专栏

一、常见请求方法

1、urllib包（python3）

在Python2中提供了urllib和urllib2。其中urllib提供较为底层的接口，urllib2对urllib进行了进一步封装。
在Python3中将urllib合并到了urllib2中，并只提供了标准库urllib包。

2、urllib3库

python3标准库urllib虽然能满足基本爬取，但是缺少了一些关键的功能。而非标准库的第三方库urllib3提供了，比如说连接池管理。

3、requests库

requests使用了urllib3，但是API更加友好，更加方便易用。相对运用较多。

二、urllib包（python3)的简单使用

urllib.request
- 用于打开和读写url
urllib.parse
- 解析url
urllib.error
- 捕获urllib.request引起的异常
urllib.robotparser
- 分析robots.txt 文件

1、urllib.request

1.1 urllib.request.urlopen 方法

urlopen(url, data=None)
url是链接地址字符串，或请求对象。
data提交的数据

from urllib.request import urlopen  

response = urlopen('http://www.bing.com')  # GET方法
with response:  # 支持上下文管理
    print(1, type(response))    # http.client.HTTPResponse 类文件对象
    print(2, response.status)   # 状态码
    print(3, response.reason)   # OK
    print(4, response.geturl()) # 跳转后真实的url
    print(5, response.read())   # 网页html文件

# 执行结果
1 <class 'http.client.HTTPResponse'> 
2 200
3 OK
4 http://cn.bing.com/?setmkt=zh-CN
5 b'<!DOCTYPE html>......</script></html>'

1.2 urllib.request.Request 方法

Request(url, data=None, headers={})
初始化方法，构造一个请求对象。可添加一个header的字典。data参数决定是GET还是POST请求(后面有这两种方法)。

from urllib.request import Request, urlopen

url = 'http://www.bing.com/'
ua_list = [
    "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36",# chrome
    "Mozilla/5.0 (Windows; U; Windows NT 6.1; zh-CN) AppleWebKit/537.36 (KHTML, like Gecko) Version/5.0.1 Safari/537.36", # safafi
    "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0", # Firefox
    "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)" # IE
]

request = Request(url)
request.add_header('User-Agent', ua_list[1])

response = urlopen(request, timeout=20) # request对象或者url都可以

with response:
    pass

2、urllib.parse

该模块可以完成对url的编解码

urllib.parse.urlencode

urlencode函数第一参数要求是一个字典或者二元组序列。

from urllib import parse

u = parse.urlencode({
    'url': "https://cn.bing.com/search?q=python语言"
})
print(u)

# 执行结果
url=https%3A%2F%2Fcn.bing.com%2Fsearch%3Fq%3Dpython%E8%AF%AD%E8%A8%80

3、GET方法

from urllib.request import Request, urlopen
from urllib.parse import urlencode

keyword = input('>>搜索内容')
data = urlencode({
    'q':keyword
})

# 构建url
base_url = 'http://cn.bing.com/search'
url = '{}?{}'.format(base_url, data)

# 添加代理
userAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36"
request = Request(url, headers={'User-agent': userAgent})
response = urlopen(request)

with response:
    pass    # 可做处理

print("=======END==========")

4、POST方法

from urllib.request import Request, urlopen
from urllib.parse import urlencode

url = 'http://httpbin.org/post'  # http://httpbin.org/ 测试网站
request = Request(url)
request.add_header(
    'User-agent',
    "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36"
)

data = urlencode({'name':'张三,@=/&*', 'age':'6'})

# data也可以通过Request类注入，如Request(url，data=data.encode())
response = urlopen(request, data=data.encode())  # POST方法，Form提交数据

with response:
     print(response.read())


# 执行结果
b'{\n  "args": {}, \n  "data": "", \n  "files": {}, \n  "form": {\n    "age": "6", \n    "name": "\\u5f20\\u4e09,@=/&*"\n  }, \n  "headers": {\n    "Accept-Encoding": "identity", \n    "Content-Length": "47", \n    "Content-Type": "application/x-www-form-urlencoded", \n    "Host": "httpbin.org", \n    "User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36"\n  }, \n  "json": null, \n  "origin": "114.250.100.128, 114.250.100.128", \n  "url": "https://httpbin.org/post"\n}\n'

三、urllib3库的简单使用

import urllib3

# 打开一个url返回一个对象
url = 'https://movie.douban.com/'
userAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36"

# 连接池管理器
with urllib3.PoolManager() as http:
    response = http.request('GET', url, headers={
        'User-Agent':userAgent
    })

四、 requests库的简单使用（常用）

import requests

url = 'https://movie.douban.com/'
userAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36"

response = requests.request('GET', url, headers={'User-Agent': userAgent}) # 发起请求

with response:
    print(response.url)             # https://movie.douban.com/
    print(response.status_code)     # 200
    print(response.request.headers) # 请求头
    print(response.headers)         # 响应头
    print(response.text)            # HTML的内容

LA Lee

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
HTTP的请求与响应

一、常见请求方法1、urllib包（python3）在Python2中提供了urllib和urllib2。其中urllib提供较为底层的接口，urllib2对urllib进行了进一步封装。在Python3中将urllib合并到了urllib2中，并只提供了标准库urllib包。2、urllib3库python3标准库urllib虽然能满足基本爬取，但是缺少了一些关键的功能。而非标准库的第...
复制链接

扫一扫