python中urlLib的使用

最新推荐文章于 2022-05-23 18:32:37 发布

晴泪

最新推荐文章于 2022-05-23 18:32:37 发布

阅读量652

点赞数

分类专栏： python基础 python高级文章标签： python 开发语言后端

本文链接：https://blog.csdn.net/xxt201/article/details/123166474

版权

python基础同时被 2 个专栏收录

28 篇文章 1 订阅

订阅专栏

python高级

11 篇文章 1 订阅

订阅专栏

本文介绍了Python urlLib包如何操作Web网页URL，包括read(), readline(), readlines()函数及getcode()方法，并强调了requests模块的简洁性和优势。

摘要由CSDN通过智能技术生成

urlLIb包使用来操作web网页的url，可以利用它来进行爬取网页数据

urlLib 包包含以下几个模块：

urllib.request - 打开和读取 URL。
urllib.error - 包含 urllib.request 抛出的异常。
urllib.parse - 解析 URL。
urllib.robotparser - 解析 robots.txt 文件。

urllib.request

urllib.request 可以模拟浏览器的一个请求发起过程。

语法：

实例：

read() - 读取网页整页内容

# 使用read()  - 读取网页整页内容
from urllib.request import urlopen          # 从urllib包的request模块中导入urlopen模块

myURL = urlopen("https://www.runoob.com/")  # 请求网页
# 获取网页的 HTML 实体代码。
print(myURL.read())                         # 输出 在read方法的括号中可以指定读取行数默认是整页

输出：

readline() - 读取文件的一行内容

from urllib.request import urlopen

myURL = urlopen("https://www.runoob.com/")
line = myURL.readline() # 读取网页一行内容
print(line)

输出：

readlines() - 读取文件的全部内容，它会把读取的内容赋值给一个列表变量。

from urllib.request import urlopen

myURL = urlopen("https://www.runoob.com/")
lines = myURL.readlines() # 读取文件的全部内容，它会把读取的内容赋值给一个列表变量。
for i in lines:            # 用for循环将其遍历
    print(i)

输出：

getcode() - 函数获取网页状态码

返回 200 说明网页正常，返回 404 说明网页不存在

实例：

import urllib.request

myURL1 = urllib.request.urlopen("https://www.baidu.com/")
print(myURL1.getcode())   # 200

try:
    myURL2 = urllib.request.urlopen("https://www.baidu.com/aa")
except urllib.error.HTTPError as e:
    if e.code == 404:
        print(404)   # 404

输出：