一木.溪桥学爬虫-03：请求模块urllib、 urllib.request、urllib.parse.urlencode、urllib.parse.quote(str)、.unquote()

最新推荐文章于 2024-10-05 08:57:27 发布

一木.溪桥

最新推荐文章于 2024-10-05 08:57:27 发布

阅读量440

点赞数

分类专栏： Python 爬虫文章标签： python

本文链接：https://blog.csdn.net/fafrfu/article/details/113659554

版权

本文介绍了Python内置的urllib模块在爬虫中的应用，包括urllib.request模块的常用方法如urlopen和Request，以及urllib.parse模块的urlencode和quote方法。内容涵盖GET和POST请求，特别强调了处理含有汉字的查询参数和url解码问题。最后，文章提供了几个爬虫练习，如百度搜索结果保存和有道翻译的POST请求案例。

摘要由CSDN通过智能技术生成

一木.溪桥在Logic Education跟Jerry学爬虫

07期：Python 爬虫
一木.溪桥学爬虫-03：请求模块urllib、 urllib.request、urllib.parse.urlencode、urllib.parse.quote(str)、parse.unquote()
日期：2021年1月26日

学习目标：

请求模块urllib
urllib.request
urllib.parse.urlencode
urllib.parse.quote(str)
parse.unquote()
urllib post 案例

学习内容：

爬虫请求模块

urllib

为什么学习 urllib?

有的一些比较老的爬虫项目用的是urllib
有时我们在做一些爬虫的时候往往需要requests + urllib 一起使用
是python内置的模块
urllib在某些方面还是非常强大

urllib的快速入门

eg. 下载网上的一张图片

# 方法1--open, close

import requests


url = 'https://dss1.bdstatic.com/70cFuXSh_Q1YnxGkpoWK1HF6hhy/it/u=1603365312,' \
      '3218205429&fm=26&gp=0.jpg'
req = requests.get(url)
fn = open('code.png', 'wb')	# 文件命名为code.png，wb 写入二进制数据
fn.write(req.content)		# content中间存的是字节码（此处图片存储的就是二进制数据），而text中存的是Beautifulsoup根据猜测的编码方式将content内容编码成字符串。

fn.close()

# 方法2--with open, 可以不用close()

import requests


url = 'https://dss1.bdstatic.com/70cFuXSh_Q1YnxGkpoWK1HF6hhy/it/u=1603365312,' \
      '3218205429&fm=26&gp=0.jpg'
req = requests.get(url)
with open('code2.png', 'wb') as file_obj:
      file_obj.write(req.content)

# 方法3-- 用python内置模块 urllib 中的 request 方法

from urllib import request


url = 'https://dss1.bdstatic.com/70cFuXSh_Q1YnxGkpoWK1HF6hhy/it/u=1603365312,' \
      '3218205429&fm=26&gp=0.jpg'
request.urlretrieve(url, 'code3.jpg')       # url网址，文件名code3.jpg