html 爬取标签获取,python beautifulsoup 对html 进行爬取分类（部分）

最新推荐文章于 2023-05-19 16:28:13 发布

南方姑娘走在成都街头

最新推荐文章于 2023-05-19 16:28:13 发布

阅读量260

点赞数

文章标签： html 爬取标签获取

html = '''

The Domouse's story

The Dormouse's story

Once upon a time there were little sisters;and their names were

and they lived at bottom of a well.

...

'''

from bs4 import BeautifulSoup

soup= BeautifulSoup(html,'lxml')

print(soup.prettify())#格式化代码，打印结果自动补全缺失的代码

print(soup.title.string)#文章标题

结果：

The Domouse's story

The Dormouse's story

Once upon a time there were little sisters;and their names were

Lacle

and

Tillie

and they lived at bottom of a well.

...

The Domouse's story

选择元素

html = '''

The Domouse's story

The Dormouse's story

Once upon a time there were little sisters;and their names were

and they lived at bottom of a well.

...

'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html,'lxml')

print(soup.title)

#

The Domouse's story

print(type(soup.title))

#

print(soup.head)

#

The Domouse's story

print(soup.p)#当出现多个时，只返回第一个

#

The Dormouse's story

获取标签名称：

html = '''

The Domouse's story

The Dormouse's story

Once upon a time there were little sisters;and their names were

and they lived at bottom of a well.

...

'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html,'lxml')

print(soup.title.name)

#title

获取属性：

html = '''

The Domouse's story

The Dormouse's story

Once upon a time there were little sisters;and their names were

and they lived at bottom of a well.

...

'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html,'lxml')

print(soup.p.attrs['name'])

#dromouse

print(soup.p['name'])

#dromouse

获取标签内容：

html = '''

The Domouse's story

The Dormouse's story

Once upon a time there were little sisters;and their names were

and they lived at bottom of a well.

...

'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html,'lxml')

print(soup.p.string)

#The Dormouse's story

根据name查找

html = '''

Hello

Foo
Bar
Jay

Foo
Bar

'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html,'lxml')

print(soup.find_all('ul'))#列表类型

print(type(soup.find_all('ul')[0]))

结果：

[

Foo
Bar
Jay

,

Foo
Bar

]

南方姑娘走在成都街头

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
html 爬取标签获取,python beautifulsoup 对html 进行爬取分类（部分）

html = '''The Domouse's storyThe Dormouse's storyOnce upon a time there were little sisters;and their names wereLacleandTillieand they lived at bottom of a well....'''from bs4 import BeautifulSoupsoup...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。