python爬虫获取标签规则_python爬虫中，使用HTMLParser如何解析获取多个标签中的文本...

最新推荐文章于 2024-04-27 22:25:09 发布

Gitaco

最新推荐文章于 2024-04-27 22:25:09 发布

阅读量722

点赞数

文章标签： python爬虫获取标签规则

本文链接：https://blog.csdn.net/weixin_30577815/article/details/112881597

版权

问题

使用python进行网络爬虫编写时，如何使用HTMLParser解析获取到html文档中多个标签中的文本：

例如：

text1

text2

text3

text4

text5

使用handle_data函数进行处理时，只能获取到标签中的文本，即text1，text2，text4，

其他两个text3和text5无法获取。

求大师们指教！

解决方案

from HTMLParser import *

class MyParser(HTMLParser):

def __init__(self):

HTMLParser.__init__(self)

self.links = []

self.flag = 0

def handle_data(self, data):

data = data.strip()

if data and self.flag:

print "handle_data", data

def handle_starttag(self, tag, attrs):

self.flag = 0

def handle_endtag(self, tag):

tag = tag.strip()

if tag == "span标签":

self.flag = 1

handle_starttag每遇见一个以"

扫一扫关注IT屋

微信公众号搜索 “ IT屋 ” ，选择关注与百万开发者在一起

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注