立即学习:https://edu.csdn.net/course/play/24756/280673?utm_source=blogtoedu
lxml是一个HTML/XMLa的解析器,主要功能是如何解析和提取HTML/XML数据
基本使用:
from lxml import etree # text = ''' # <div> # <ul> # <li class="item-0"><a href="www.baidu.com">baidu</a> # <li class="item-1"><a href="https://blog.csdn.net/qq_25343557">myblog</a> # <li class="item-2"><a href="https://www.csdn.net/">csdn</a> # <li class="item-3"><a href="https://hao.360.cn/?a1004">hao123</a> # ''' # #将字符串解析为html # html=etree.HTML(text) # print(html) # #将字符串序列化html # result=etree.tostring(html).decode('utf-8') # print(result)
从文件中读取代码:
#读取 html=etree.parse('hello.html') result=etree.tostring(html).decode('utf-8') print(result)