Python基础 HTMLParser

最新推荐文章于 2024-06-15 11:43:27 发布

法迪

最新推荐文章于 2024-06-15 11:43:27 发布

阅读量464

点赞数 1

分类专栏： Python基础文章标签： python html html解析

本文链接：https://blog.csdn.net/su749520/article/details/78858566

版权

Python基础专栏收录该内容

143 篇文章 3 订阅

订阅专栏

解析该HTML页面

编写一个搜索引擎
1. 第一步是用爬虫把目标网站的页面抓下来
2. 第二步就是解析该HTML页面

运行示例

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Python基础 HTMLParser

data ='''<html>
    <head>
        <!-- head -->
    </head>
    <body>
        <!-- test html parser -->
    </body>
</html>'''

from html.parser import HTMLParser
from html.entities import name2codepoint

class MyHTMLParser(HTMLParser):

    def handle_starttag(self, tag, attrs):
        print('<%s>' % tag)

    def handle_endtag(self, tag):
        print('</%s>' % tag)

    def handle_startendtag(self, tag, attrs):
        print('<%s/>' % tag)

    def handle_data(self, data):
        print(data)

    def handle_comment(self, data):
        print('   <!--', data, '-->')

    def handle_entityref(self, name):
        print('&%s;' % name)

    def handle_charref(self, name):
        print('&#%s;' % name)

parser = MyHTMLParser()
parser.feed(data)

运行结果

D:\PythonProject>python main.py
<html>


<head>


   <!--  head  -->


</head>


<body>


   <!--  test html parser  -->


</body>


</html>

法迪

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
Python基础 HTMLParser

解析该HTML页面编写一个搜索引擎 1. 第一步是用爬虫把目标网站的页面抓下来 2. 第二步就是解析该HTML页面运行示例#!/usr/bin/env python3# -*- coding: utf-8 -*-# Python基础 HTMLParserdata ='''<html> <head>  </head> <body
复制链接

扫一扫