python day3

最新推荐文章于 2024-03-02 11:12:35 发布

hastings2k

最新推荐文章于 2024-03-02 11:12:35 发布

阅读量167

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/hastings2k/article/details/62442776

版权

python 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

网络爬虫第二周

W2.1 Beautiful Soup库安装

使用BeautifulSoup库

from bs4 import BeautifulSoup  #BeautifulSoup是一个类
soup = BeautifulSoup("html格式的代码","html.parser") #两个参数：一个是要解析的代码；另一个是解析器——BeautifulSoup有四个解析器

上面是从bs4库中引用了一个叫做“BeautifulSoup”的类型。当然也可以直接饮用bs4库

import bs4

认为html文档，标签树和BeautifulSoup类是等价的

也可以用打开文件的方式提供html文档，如下

soup2 = BeautifulSoup(open("D://demo.html"),"html.parser")

BeautifulSoup类有五种基本类型：标签，标签名字，标签属性，标签内字符串和注释

soup.tag #如soup.a 返回标签树中的第一个a标签
soup.a.name #获得a的名字
soup.a.parent.name #获得a父节点的名字
tag = soup.a
tag.attrs #获得标签的属性（这里获得属性似乎类似数组，老师称其为字典）
tag.attrs['class']
type() #用于查询类型（md没学过python的我好累
tag.string #获得标签的内容 可以跨过多个标签类型，获取第一个碰到的内容（估计是）

遍历有下行、上行和平行遍历。

遍历时会用到一种叫迭代类型的，只能用在循环中，循环语句如下

for pars in soup.title.parents:
	print(pars)

——prettify()方法，美化html页面，让页面方便阅读

bs4默认utf8编码

hastings2k

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python day3

网络爬虫第二周W2.1 Beautiful Soup库安装使用BeautifulSoup库from bs4 import BeautifulSoup #BeautifulSoup是一个类soup = BeautifulSoup("html格式的代码","html.parser") #两个参数：一个是要解析的代码；另一个是解析器——BeautifulSoup有四个解析器
复制链接

扫一扫