urllib库解析

最新推荐文章于 2021-09-14 09:31:42 发布

半日闲12138

最新推荐文章于 2021-09-14 09:31:42 发布

阅读量183

点赞数

分类专栏：爬虫文章标签：爬虫学习

本文链接：https://blog.csdn.net/feiYu12138/article/details/102486986

版权

爬虫专栏收录该内容

17 篇文章 0 订阅

订阅专栏

# coding:utf-8
import urllib.request

url = "https://www.baidu.com"

response = urllib.request.urlopen(url)

print(response)  # 内存地址

print(response.url)  # 打印url

print(response.status)  # 200

#print(response.headers)

#print(response.read().decode("utf-8"))  # 打印网页

# with open("a.html", "w", encoding="utf-8") as fp:
#     fp.write(response.read().decode("utf-8"))


# with open("b.html", "wb") as fp:
#     fp.write(response.read())


html = response.read()
print(type(html))  # <class 'bytes'>

html2 = response.read().decode("utf-8")
print(type(html2))  # <class 'str'>
"""
文件编码
在python 3 中字符是以Unicode的形式存储的，
当然这里所说的存储是指存储在计算机内存当中，
如果是存储在硬盘里，Python 3的字符是以bytes形式存储，
也就是说如果要将字符写入硬盘，就必须对字符进行encode。
对上面这段话再解释一下，如果要将str写入文件，
如果以‘w’模式写入，则要求写入的内容必须是str类型；
如果以‘wb’形式写入，则要求写入的内容必须是bytes类型。
"""
"""
网页编码和文件编码方法差不多，
如下urlopen下载下来的网页read()且用decoding(‘utf-8’)解码，
那就必须以‘w’的方式写入文件。
如果只是read()而不用encoding(‘utf-8’)进行编码，一定要以‘wb’方式写入
"""

urllib.request：发送request和获取request的结果
urllib.error：包含urllib.request产生的异常
urllib.parse：用来解析和处理url
urllib.robotparse：用来解析页面的robots.txt文件

urllib.request 模块提供了最基本的构造 HTTP 请求的方法，利用它可以模拟浏览器的一个请求发起过程，同时它还带有处理 authenticaton （授权验证），redirections（重定向)，cookies(浏览器Cookies）以及其它内容。

报错：SyntaxError: Non-UTF-8 code starting with
代码第一行加：#coding:utf-8

在这里插入图片描述

半日闲12138

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录