python爬虫解析html_python爬虫beautifulsoup解析html方法

最新推荐文章于 2024-04-26 15:09:14 发布

巫升权

最新推荐文章于 2024-04-26 15:09:14 发布

阅读量320

点赞数

文章标签： python爬虫解析html

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_35201989/article/details/113984720

版权

用BeautifulSoup 解析html和xml字符串

实例：

#!/usr/bin/python

# -*- coding: UTF-8 -*-

from bs4 import BeautifulSoup

import re

#待分析字符串

html_doc = """

The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were

and

and they lived at the bottom of a well.

...

"""

# html字符串创建BeautifulSoup对象

soup = BeautifulSoup(html_doc, 'html.parser', from_encoding='utf-8')

#输出第一个 title 标签

print soup.title

#输出第一个 title 标签的标签名称

print soup.title.name

#输出第一个 title 标签的包含内容

print soup.title.string

#输出第一个 title 标签的父标签的标签名称

print soup.title.parent.name

#输出第一个 p 标签

print soup.p

#输出第一个 p 标签的 class 属性内容

print soup.p['class']

#输出第一个 a 标签的 href 属性内容

print soup.a['href']

'''

soup的属性可以被添加,删除或修改. 再说一次, soup的属性操作方法与字典一样

'''

#修改第一个 a 标签的href属性为 http://www.baidu.com/

soup.a['href'] = 'http://www.baidu.com/'

#给第一个 a 标签添加 name 属性

soup.a['name'] = u'百度'

#删除第一个 a 标签的 class 属性为

del soup.a['class']

##输出第一个 p 标签的所有子节点

print soup.p.contents

#输出第一个 a 标签

print soup.a

#输出所有的 a 标签，以列表形式显示

print soup.find_all('a')

#输出第一个 id 属性等于 link3 的 a 标签

print soup.find(id="link3")

#获取所有文字内容

print(soup.get_text())

#输出第一个 a 标签的所有属性信息

print soup.a.attrs

for link in soup.find_all('a'):

#获取 link 的 href 属性内容

print(link.get('href'))

#对soup.p的子节点进行循环输出

for child in soup.p.children:

print(child)

#正则匹配，名字中带有b的标签

for tag in soup.find_all(re.compile("b")):

print(tag.name)

爬虫设计思路：

详细手册：

到此这篇关于python爬虫beautifulsoup解析html方法的文章就介绍到这了,更多相关beautifulsoup解析html内容请搜索python博客以前的文章或继续浏览下面的相关文章希望大家以后多多支持python博客！

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python爬虫解析html_python爬虫beautifulsoup解析html方法

用BeautifulSoup 解析html和xml字符串实例：#!/usr/bin/python# -*- coding: UTF-8 -*-from bs4 import BeautifulSoupimport re#待分析字符串html_doc = """The Dormouse's storyThe Dormouse's storyOnce upon a time there were th...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。