bs4中select方法的学习

最新推荐文章于 2024-03-29 23:51:53 发布

最低调的奢华

最新推荐文章于 2024-03-29 23:51:53 发布

阅读量3.2k

点赞数 4

分类专栏：爬虫文章标签： python

本文链接：https://blog.csdn.net/weixin_46700209/article/details/116665971

版权

爬虫专栏收录该内容

13 篇文章 0 订阅

订阅专栏

从以下文本中查找

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""

1.找a标签

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc,"lxml")
print(soup.select('a'))

在这里插入图片描述

2.通过类名来查找class=“sister”

'''
选择 class="intro" 的所有元素。.intro
class="sister"  .sister
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc,"lxml")
print(soup.select('.sister')

在这里插入图片描述

3.通过id查找

'''
选择 id="firstname" 的元素。
#firstname

id="link1" --> #link1
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc,"lxml")
print(soup.select('#link3'))

在这里插入图片描述

4.特殊查找方法

print(soup.select('head > title'))

在这里插入图片描述

5.读取文章内容

print(soup.select('title')[0].string)
print(soup.select('title')[0].get_text())

在这里插入图片描述

6获取a的第二个标签

trs = soup.select('a')[1]
print(trs)

在这里插入图片描述

7.获取所有class=story的p标签

trs = soup.select('.story')
print(trs)
trs = soup.select('p[class="story"]')
print(trs)

在这里插入图片描述

8.获取a标签的href属性

atags = soup.select('a')
for a in atags:
    href = a['href']
    print(href)

在这里插入图片描述

最低调的奢华

关注

4
点赞
踩
24

收藏

觉得还不错? 一键收藏
打赏
0
评论
bs4中select方法的学习

从以下文本中查找html_doc = """<html><head><title>The Dormouse's story</title></head><body>The Dormouse's storyOnce upon a time there were three l.
复制链接

扫一扫