采用beautifulsoup库解析html页面

最新推荐文章于 2023-05-20 20:58:09 发布

韦远科

最新推荐文章于 2023-05-20 20:58:09 发布

阅读量2.7k

点赞数 1

本文链接：https://blog.csdn.net/weiyuanke/article/details/16986639

版权

beautifulsoup是一个开源的html、xml操作库，它构建在第三方的xml、html解析器之上，负责对解析树进行操作。

可选的html、xml解析库有：lxml html5lib

1. 安装

pip install beautifulsoup4

2. 使用

import urllib
import bs4
soup = bs4.BeautifulSoup(urllib.urlopen("http://www.example.com/1.html"), "html5lib", from_encoding="gbk")

soup = bs4.BeautifulSoup(urllib.urlopen("http://www.example.com/1.html"), from_encoding="gbk")
soup = bs4.BeautifulSoup("<html>... ....</html>", from_encoding="gbk")
catlog = soup.find_all('div', class_="globalCrumbs")

title = soup.find_all('div', class_="articleTitle2011")
          for e in title:
                  print e
                  result["title"] = e.h1.text

优惠劵

韦远科

关注关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
采用beautifulsoup库解析html页面

beautifulsoup是一个开源的html、xml操作库，它构建在第三方的xml、html解析器之上，负责对解析树进行操作。可选的html、xml解析库有：lxml html5lib1. 安装pip install beautifulsoup42. 使用import urllibimport bs4soup = bs4.BeautifulSou
复制链接

扫一扫