爬虫第四课---网页解析

最新推荐文章于 2023-09-24 23:12:05 发布

韩淼燃

最新推荐文章于 2023-09-24 23:12:05 发布

阅读量268

点赞数

分类专栏： python怕虫项目课程文章标签： xpath lxml

本文链接：https://blog.csdn.net/weixin_36691991/article/details/89162685

版权

python怕虫项目课程专栏收录该内容

18 篇文章 10 订阅 ¥39.90 ¥99.00

订阅专栏

超级会员免费看

本篇博客介绍了如何使用BeautifulSoup4进行网页解析，详细解析了安装过程，并提供了官方文档链接。同时，也提及了XPath的使用，推荐了一个XPath教程链接，并指导了lxml库的安装方法。

摘要由CSDN通过智能技术生成

BeautifulSoup4的使用(文档https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html)

1.安装 pip install BeautifulSoup4

'''
bs4的使用
'''
import re
from bs4 import BeautifulSoup

#测试的html
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"&g

了解本专栏