python3 urllib post_Python 爬虫 urllib模块：post方式

最新推荐文章于 2024-05-06 12:28:37 发布

weixin_39958248

最新推荐文章于 2024-05-06 12:28:37 发布

阅读量310

点赞数

文章标签： python3 urllib post

本程序以爬取 'http://httpbin.org/post' 为例

格式：

导入urllib.request

导入urllib.parse

数据编码处理，再设为utf-8编码: bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')

打开爬取的网页: response = urllib.request.urlopen('网址', data = data)

读取网页代码: html = response.read()

打印:

1.不decode

print(html) #爬取的网页代码会不分行，没有空格显示，很难看

2.decode

print(html.decode()) #爬取的网页代码会分行，像写规范的代码一样，看起来很舒服

查询请求结果：

a. response.status # 返回 200：请求成功 404：网页找不到，请求失败

b. response.getcode() # 返回 200：请求成功 404：网页找不到，请求失败

1.不decode的程序如下：import urllib.request

import urllib.parsse

data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')

response = urllib.request.urlopen(' data = data )

html = response.read()

print(html)

print("------------------------------------------------------------------")

print("------------------------------------------------------------------")

print(response.status)

print(response.getcode())

运行结果：

2.带decode的程序如下：import urllib.request

import urllib.parsse

data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')

response = urllib.request.urlopen(' data = data )

html = response.read()

print(html.decode())

print("------------------------------------------------------------------")

print("------------------------------------------------------------------")

print(response.status)

print(response.getcode())

运行结果：{

"args": {},

"data": "",

"files": {},

"form": {

"word": "hello"

},

"headers": {

"Accept-Encoding": "identity",

"Connection": "close",

"Content-Length": "10",

"Content-Type": "application/x-www-form-urlencoded",

"Host": "httpbin.org",

"User-Agent": "Python-urllib/3.4"

},

"json": null,

"origin": "106.14.17.222",

"url": "http://httpbin.org/post"

}

------------------------------------------------------------------

------------------------------------------------------------------

200

200

为什么要用bytes转换？

因为data = urllib.parse.urlencode({'word': 'hello'}) ##没有用bytes

response = urllib.request.urlopen('http://httpbin.org/post', data = data )

html = response.read()

错误提示：Traceback (most recent call last):

File "/usercode/file.py", line 15, in

response = urllib.request.urlopen('http://httpbin.org/post', data = data )

File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen

return opener.open(url, data, timeout)

File "/usr/lib/python3.4/urllib/request.py", line 453, in open

req = meth(req)

File "/usr/lib/python3.4/urllib/request.py", line 1104, in do_request_

raise TypeError(msg)

TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.

由此可见，post方式需要将请求内容用二进制编码。

classbytes([source[, encoding[, errors]]])

Return a new “bytes” object, which is an immutable sequence of integers in the range 0 <= x

Accordingly, constructor arguments are interpreted as for bytearray().

weixin_39958248

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python3 urllib post_Python 爬虫 urllib模块：post方式

本程序以爬取 'http://httpbin.org/post'为例格式：导入urllib.request导入urllib.parse数据编码处理，再设为utf-8编码: bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')打开爬取的网页: response = urllib.request.urlopen('...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。