彻底掌握python中的lxml (二) lxml封装

最新推荐文章于 2023-06-26 13:41:08 发布

craftsman2020

最新推荐文章于 2023-06-26 13:41:08 发布

阅读量459

点赞数

分类专栏： XML Python 文章标签： python xml

https://blog.csdn.net/craftsman2020

本文链接：https://blog.csdn.net/craftsman2020/article/details/108392013

版权

Python 同时被 2 个专栏收录

87 篇文章 8 订阅

订阅专栏

XML

3 篇文章 1 订阅

订阅专栏

系列文章目录

第一章 XML基础速成
第二章彻底掌握python中的lxml (一)
第三章彻底掌握python中的lxml (二) lxml封装

前言

本文在上两篇文章的基础上，更进一步，谈一谈lxml的封装，博主下面的代码涵盖了lxml的一些常用功能，包括增、删、改、查、读取、输出等。

提示：以下是本篇文章正文内容，下面案例可供参考

一、python中lxml的封装

转载记得标明出处哦

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Author:   craftsman2020
# @Date  :   2020/08/11 13:30

from __future__ import print_function
from lxml import etree


class XML(object):

    def __init__(self):
        self.et = etree.ElementTree()
        self.root = self.et.getroot()
        self.parser = etree.XMLParser(remove_blank_text=True)  #

    def read_file(self, file_path):
        """
        解析xml文件
        :param file_path: xml文件的路径
        :return: 返回根节点，Element对象。
        """
        self.et = etree.parse(file_path, self.parser)
        self.root = self.et.getroot()
        return self.root

    def read_str(self, text):
        """
        解析字符串为xml
        :param text: 字符串
        :return: 返回根节点，Element对象
        """
        self.root = etree.fromstring(text)
        self.et = etree.ElementTree(self.root, parser=self.parser)  # 或者 self.et = self.root.getroottree()
        return self.root

    def find_direct_child(self, tag):
        """
        找直接子元素中的第一个标签为tag的节点
        :param tag: 节点的tag、或tag的路径。字符串类型
        :return:返回Element对象
        """
        return self.root.find(tag)

    def find_direct_nodes(self, tag):
        """
        找直接子元素中的所有的标签为tag的节点
        :param tag: 节点的tag、或tag的路径。字符串类型
        :return: 返回包含Element对象的list
        """
        return self.root.findall(tag)

    def match_all_nodes(self, tag, att_dic=None):
        """
        匹配所有子孙节点中，符合条件的所有节点
        :param tag: 节点的tag
        :param att_dic: 属性dict
        :return: 返回匹配到的节点，Element对象
        """
        if att_dic is None:
            att_dic = {}
        match_res = []
        not_match_dic = {}
        for elem in self.root.getiterator():
            if elem.tag == tag:
                match_res.append(elem)
                if att_dic:
                    for att in att_dic:
                        if att_dic.get(att) != elem.get(att):
                            not_match_dic.update({elem: False})
        for d in not_match_dic:
            match_res.remove(d)
        return match_res

    def is_exist(self, tag, att_dic=None):
        """
        判断所有子孙节点中，符合条件的节点是否存在
        :param tag: 节点的tag
        :param att_dic: 属性dict
        :return: 存在则返回True，反之返回False
        """
        if self.match_all_nodes(tag, att_dic):
            return True
        else:
            return False

    def add_child_nodes(self, node, site=-1):
        """
        添加子节点
        :param node: Element对象
        :param site: 插入节点的位置，默认-1，表示在尾部追加
        :return:
        """
        if site == -1:
            self.root.append(node)
        elif site >= 0:
            self.root.insert(site, node)

    def add_node_to_parent(self, parent, child, site=-1):
        """
        给父节点增加子节点
        :param parent: 父节点
        :param child: 子节点
        :param site: append
        :return:
        """
        parent_node = self.match_all_nodes(parent.tag, parent.attrib)[0]
        if site == -1:
            parent_node.append(child)
        elif site >= 0:
            parent_node.insert(site, child)

    @staticmethod
    def update_nodes_properties(nodelist, attrib, delete=False):
        """
        增加/修改/删除 节点的属性
        :param nodelist: 节点list
        :param attrib: 属性dict
        :param delete: 是否删除，默认False
        :return:
        """
        for node in nodelist:
            for att in attrib:
                if delete:
                    del node.attrib[att]
                else:
                    node.set(att, attrib[att])

    @staticmethod
    def update_nodes_texts(nodelist, text, add=False, delete=False):
        """
        增加/修改/删除 节点的文本
        :param nodelist: 节点list
        :param text: text
        :param add: 是否增加，默认False
        :param delete: 是否删除，默认False
        :return:
        """
        for node in nodelist:
            if add:
                if not node.text:
                    node.text = text
                else:
                    node.text += text
            if delete:
                node.text = ""
            else:
                node.text = text

    def del_nodes(self, tag, att_dic=None):
        """
        删除所有符合要求的节点
        :param tag: 节点的tag
        :param att_dic: 属性dict
        """
        node_list = self.match_all_nodes(tag, att_dic)
        for node in node_list:
            parent_node = node.getparent()
            try:
                parent_node.remove(node)
            except:
                pass

    def del_nodes_by_tag(self, tag):
        """
        删除所有子孙节点中标签为tag的节点
        :param tag 标签名，字符串类型
        """
        etree.strip_elements(self.root, tag)

    def del_all_attrib(self, attrib_name):
        """
        删除所有子孙节点中具有该属性名的属性
        :param attrib_name: 属性名
        """
        etree.strip_attributes(self.root, attrib_name)

    def write(self, out_path):
        """
        写入xml文件
        :param out_path: 写入的xml文件名
        """
        self.et.write(out_path, pretty_print=True,  with_tail=False, encoding="utf-8", xml_declaration=True)

二、代码演练

代码如下（示例）：

xml = XML()
# 读取xml文件
xml.read_file('./sample.xml')
# 打印根节点
print(xml.root)

<Element TradingAccounts at 0x4b1fc88>

# 查找tag为Constants的节点，返回Element对象
Constants = xml.find_direct_child('Constants')
# 打印Constants的属性
print(Constants.attrib)

{'cpu': '10', 'path': '/home/DOTA/Trade', 'ProjectName': 'DOTA'}

# 查找Constants节点的cpu属性的值
print(xml.find_direct_child('Constants').get('cpu'))

# 查找匹配到的第一个路径Strategies/Strategy， 并打印其属性
print(xml.find_direct_child('Strategies/Strategy').attrib)

{'commission': 'flase', 'name': 'CTA01', 'trade': 'true'}

# 查找目标tag或路径下的所有直接子元素
Accounts = xml.find_direct_nodes('Accounts/Account')
print(Accounts)

[<Element Account at 0x4b7fe08>, <Element Account at 0x4b7fb48>, <Element Account at 0x4b15188>]

# 打印匹配到的第一个节点
print(Accounts[0])

<Element Account at 0x4b7fe08>

# 匹配到的所有符合条件的节点
xx = xml.match_all_nodes(tag="Strategy", att_dic={'num': "2", 'prior': "1"})
print([i.attrib for i in xx])

[{'prior': '1', 'name': 'CTA01', 'id': '999', 'num': '2'}]

# 判断所有子孙节点中，某节点是否存在
print("xml.is_exist('Strategy') = ", xml.is_exist('Strategy'))
print("xml.is_exist('xxxxxx') = ", xml.is_exist('xxxxxx'))

xml.is_exist('Strategy') =  True
xml.is_exist('xxxxxx') =  False

# 增加子节点
child1 = etree.SubElement(_parent=xml.root, _tag='add_sub1', attrib={'id': '1'})
child1.text = 'girl'
xml.add_child_nodes(child1)
child2 = etree.Element(_tag='add_element1', attrib={'id': '2'})
child2.text = 'i am a boy'
xml.add_child_nodes(child2)

# 给指定节点增加子节点
grandson1 = etree.Element(_tag='add_grandson1', attrib={'id': '3'})
grandson1.text = 'my name is grandson'
xml.add_node_to_parent(parent=xml.match_all_nodes('add_sub1', att_dic={'id': '1'})[0],
                       child=grandson1)
grandson2 = etree.Element(_tag='add_grandson2', attrib={'id': '4'})
grandson2.text = 'i like play piano'
xml.add_node_to_parent(parent=xml.match_all_nodes('Strategy',
                       att_dic={'name': "CTA01", 'num': "3", 'prior': "1",  'id': "997"})[0],
                       child=grandson2)

# 匹配所有符合条件的节点，返回list
target_list = xml.match_all_nodes('Strategy', att_dic={'name': "CTA01"})
print('target_list = ', target_list)

target_list =  [<Element Strategy at 0x6012408>, <Element Strategy at 0x6012808>, <Element Strategy at 0x60115c8>, <Element Strategy at 0x60127c8>]

# 删除指定节点的的属性
xml.update_nodes_properties(target_list, attrib={'name': "CTA01"}, delete=True)

# 给指定节点增加text
xml.update_nodes_texts(target_list, text='### 2020 ###', add=True)

# 删除符合条件的节点
xml.del_nodes(tag='Strategy', att_dic={'name': "CTA02"})

# 根据tag直接删除所有节点
xml.del_nodes_by_tag('Strategy')

# 删除所有节点的指定属性
xml.del_all_attrib('name')

# 写入xml
xml.write('./lxml_example.xml')

xml_str = """
      <root>
        <a x='123'>aText
            <b/>
            <c/>
        </a>hello
        <a y='3'>Text
            <b/>
            <b/>
        </a>
      </root>
"""

# 解析字符串
my_xml = XML()
my_xml.read_str(xml_str)
# 打印根节点
print(my_xml.root)
print(type(my_xml.root))

<Element root at 0x6012848>
<class 'lxml.etree._Element'>

# 删除所有tag为a的节点
etree.strip_elements(my_xml.root, 'a')
etree.dump(my_xml.root)

<root>
        </root>

三、需要代码及xml文件的点击下载或留言邮箱@博主发送

点击下方链接下载，或评论下方留言@博主发送
代码及xml文件下载

四、附录

XML基础速成
彻底掌握python中的lxml (一)

总结

以上就是今天要讲的内容，本文在上两篇文章的基础上，通过lxml封装，使得读者对lxml的掌握更进一步，能不能达到了炉火纯青的境界还需要多动手。后面有时间会结合一些应用上一些项目。

craftsman2020

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
彻底掌握python中的lxml (二) lxml封装

文章目录系列文章目录前言一、python中lxml的封装二、使用步骤1.引入库2.读入数据总结系列文章目录提示：这里可以添加系列文章的所有文章的目录，目录需要自己手动添加例如：第一章 Python 机器学习入门之pandas的使用提示：写完文章后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录系列文章目录前言一、python中lxml的封装二、使用步骤1.引入库2.读入数据总结前言提示：这里可以添加本文要记录的大概内容：例如：随着人工智能的不断发展，机器学习这门技术也越来越重要
复制链接

扫一扫