BeautifulSoup

最新推荐文章于 2022-09-30 18:01:13 发布

chen_zan_yu_

最新推荐文章于 2022-09-30 18:01:13 发布

阅读量210

点赞数

分类专栏：人工智能实训

本文链接：https://blog.csdn.net/chen_zan_yu_/article/details/106672171

版权

人工智能实训专栏收录该内容

14 篇文章 5 订阅 ¥9.90 ¥99.00

订阅专栏

这篇博客介绍了如何使用BeautifulSoup库来解析HTML页面。通过发送HTTP请求获取网页内容，然后利用BeautifulSoup创建解析器对象，查找和操作标签，如title、a和img，并访问其属性。还展示了查询操作，如find_all、find以及CSS选择器的用法。

摘要由CSDN通过智能技术生成

from bs4 import BeautifulSoup
print("path-",BeautifulSoup)

import requests

headers = {
    "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36"
}
response_get = requests.get(url="http://www.qianlima.com/zb/area_305/",headers=headers)
print("code-",response_get.status_code)

# 设置编码格式
response_get.encoding = "utf-8"

# 获取返回数据
html = response_get.text

# print(html)

# 实例话BS对象 1 数据源 2 解析器
soup = BeautifulSoup(html,"html.parser")

# 1.Tag对象:可以通过tagname获取该文档中第一次出现的标签
title = soup.title
print(title,type(title))

a = soup.a
print(a,type(a))

img = soup.img
print(img,type(img))

# 2.Tag对象内容操作
#   text:只能获取自己标签内部的内容
#   string:获取自己标签内部和子级标签的内容
#   没有内容：返回Neoe，不能继续操作，否则报异常
print(title.text)
print(a.string)
# print(img.string.strip())
# AttributeError: 'NoneType' object has no attribut

了解本专栏

chen_zan_yu_

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
BeautifulSoup

from bs4 import BeautifulSoupprint("path-",BeautifulSoup)import requestsheaders = { "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36"}response_get = requests..
复制链接

扫一扫

专栏目录