【网页爬虫】BeautifulSoup4模块介绍

最新推荐文章于 2024-04-22 10:50:04 发布

huihuihhh

最新推荐文章于 2024-04-22 10:50:04 发布

阅读量837

点赞数 2

分类专栏： python网页文章标签： bs4 爬虫

本文链接：https://blog.csdn.net/huihuihhh/article/details/80934092

版权

1、BeautifulSoup4基础介绍
2、BeautifulSoup4处理标签方法
3、正则表达式
- - 正则表达式常用符号
- - 用正则表达式找图片
4、其它
- - 获取属性字典
- - Lambda表达式

1、BeautifulSoup4基础介绍

- 使用pip安装BeautifulSoup4

pip install BeautifulSoup4

- 导入BeautifulSoup4模块

import bs4

- 创建BeautifulSoup.bs4对象

# 引入urllib.request模块
import urllib.request
# html.read()为urllib.request.urlopen（）方法得到的字节对象，也可采用其他方法
html = urllib.request.urlopen("http://pythonscraping.com/pages/page1.html")
# 解析器采用python标准库："html.parser"，也可以采用其他库（需安装）
soup=bs4.BeautifulSoup(html.read(),"html.parser")

- 查找bs4对象

# 方法一：直接在bs4对象后跟对应的标签名,可以多级,结果相同
print(soup.h1)
print(soup.html.h1)
print(soup.html.body.h1)

# 方法二：使用find方法查找,返回类型为bs4.element.Tag
name=soup.find("span",{
  "class":"red"})
pri

最低0.47元/天解锁文章

huihuihhh

关注

2
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
【网页爬虫】BeautifulSoup4模块介绍

1、BeautifulSoup4基础介绍- 使用pip安装BeautifulSoup4- 导入BeautifulSoup4模块- 创建BeautifulSoup.bs4对象- 查找bs4对象2、BeautifulSoup4处理标签方法- 处理子标签与后代标签1、BeautifulSoup4基础介绍- 使用pip安装BeautifulSoup4pi...
复制链接

扫一扫