python高级 08 python XML解析

本文介绍了Python中处理XML的三种方式:SAX、DOM和ElementTree。重点讲解了SAX解析XML,包括解释器和事件处理器的概念,展示了如何使用SAX解析XML文件并打印电影详情。此外,还提供了DOM解析和写入、更新XML文件的示例。
摘要由CSDN通过智能技术生成

Python xml解析

XML指可扩展标记语言(例如x tensible 中号 arkup 大号anguage),被设计为传输和存储数据

XML是一套定义语义标记的规则,这些标记将文档分成许多部件并与这些部件插入标识。也是元标记语言,定义用于定义其它与特定领域有关的,语义的、结构化的标记语言的句子语言。

 

Python对XML的解析

常见的xml编程接口有dom和sax,这两种接口处理xml文件的方式不同,使用场合也不同。Python有3中解析 xml:

  1. SAX(XML的简单API):python标准库包含SAX解释器,SAX用事件驱动模型,通过在解析xml的过程中触发一个个事件并调用用户定义的变量函数来处理XML文件。(速度较快、占内存小、需要用户实现尺寸函数)
  2. DOM(文档对象模型):将xml数据在内存中解析成一个树,通过对树的操作来操作xml。(需要将xml数据映射到内存中的数,所有比较慢、耗内存)
  3. ElemenTree(元素树):一个轻量级的DOM,具有方便友好的API,代码可用好、速度快、消耗内存小。

Python使用SAX解析xml

涉及到 解析器和事件处理器

解释器:负责读取xml文件,并向事件处理器发送事件,如元素开始和元素结束。

事件处理器:负责对事件作出响应,对传递的xml数据进行处理。

Python中使用sax处理xml需要xml.sax中的parse函数,xml.sax.handler中的ContentHandler.

 

创建新的解释器对象并返回:

      Xml.sax.make_parser([parser_list])

       .sax.make_parser([parser_list])

 

创建一个sax解释器并解析xml文档

          Xml.sax.parse(xmlfile, contenthandler[,errorhandler])

          .sax.parse(xmlfile, countenthandler[, errorhandler])

 

创建一个xml解析器并解析字符串:

          Xml.sax.parseString(xmlstring, contenthandler[, errorhandler])

          .sax.parseString(xmlstring, contenthandler[, errorhandler])

 

             Parser_lsit:   解析器列表

                   Xmlfile:   xml文件名

     Contenthandler:    必须是一个contentHandler的对象(xml.asx.handler中)

         Errorhandler:   参数

             Xmlstring:    xml字符串

 

Xml文件  movie.xml

<collection shelf="New Arrivals">
<moie title="Enemy Behind">
  ​<type>War, Thriller</type>
  ​<format>DVD</format>
  ​<year>2003</year>
  ​<rating>PG</rating>
  ​<stars>10</stars>
  ​<description>Talk about a US-Japan war</description>
</moie>
<moie title="TransFormers">
  ​<type>Anime, Science Fiction</type>
  ​<format>DVD</format>
  ​<year>1989</year>
  ​<rating>R</rating>
  ​<stars>8</stars>
  ​<description>A schientific fiction</description>
</moie>
<moie title="Trigun">
  ​<type>Anime, Action</type>
  ​<format>DVD</format>
  ​<episodes>4</episodes>
  ​<rating>PG</rating>
  ​<stars>10</stars>
  ​<description>Vash the Stampede!</description>
</moie>
<moie title="Ishtar">
  ​<type>Comedy</type>
  ​<format>VHS</format>
  ​<rating>PG</rating>
  ​<stars>2</stars>
  ​<description>Viewable boredom</description>
</moie>
</collection>

 

使用SAX解析xml

 

#!C:/Python27/python
# _*_coding:UTF-8_*_
"""
author : xuandc
time:2020-03-22
sax 解析xml
"""
import xml.sax


class MovieHandler(xml.sax.ContentHandler):
    def __init__(self):
        self.CurrentData = ""
        self.type = ""
        self.format = ""
        self.year = ""
        self.rating = ""
        self.stars = ""
        self.description = ""

    # 元素开始事件处理
    def startElement(self, tag, attributes):
        self.CurrentData = tag
        if tag == "moie":
            print "********Movie*******"
            title = attributes["title"]
            print "Title: ", title

    # 元素结束事件处理
    def endElement(self, tag):
        if self.CurrentData == "type":
            print "Type: ", self.type
        elif self.CurrentData == "format":
            print "Format: ", self.format
        elif self.CurrentData == "year":
            print "Year: ", self.year
        elif self.CurrentData == "rating":
            print "Rating: ", self.rating
        elif self.CurrentData == "stars":
            print "Stars: ", self.stars
        elif self.CurrentData == "description":
            print "Description: ", self.description
        self.CurrentData = ""

    # 内容事件处理
    def characters(self, content):
        if self.CurrentData == "type":
            self.type = content
        elif self.CurrentData == "format":
            self.format = content
        elif self.CurrentData == "year":
            self.year = content
        elif self.CurrentData == "rating":
            self.rating = content
        elif self.CurrentData == "stars":
            self.stars = content
        elif self.CurrentData == "description":
            self.description = content


if (__name__ == "__main__"):
    # 创建一个xmlreader
    parser = xml.sax.make_parser()
    # turn off namepsaces
    parser.setFeature(xml.sax.handler.feature_namespaces, 0)
    # 重写 ContextHandler
    Handler = MovieHandler()
    parser.setContentHandler(Handler)

    parser.parse("movie.xml")

 

使用DOM解析xml

 

#!C:/Python27/python
# _*_coding:UTF-8_*_
"""
author:xuandc
time: 2020-03-22
#使用xml.dom解析xml
"""

from xml.dom.minidom import parse
import xml.dom.minidom

# 使用minidom解析器打开 xml文档
DOMTree = xml.dom.minidom.parse("movie.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("shelf"):
    print "Root element : %s" % collection.getAttribute("shelf")

# 在集合中获取所有电影
movies = collection.getElementsByTagName("moie")
print movies
# 打印每部电影的详细信息
for movie in movies:
    print "*********movie**********"
    if movie.hasAttribute("title"):
        print "Title: %s" % movie.getAttribute("title")
    type = movie.getElementsByTagName('type')[0]
    print "Type : %s" % type.childNodes[0].data
    format = movie.getElementsByTagName('format')[0]
    print "Format : %s" % format.childNodes[0].data
    rating = movie.getElementsByTagName('rating')[0]
    print "Rating : %s" % rating.childNodes[0].data
    description = movie.getElementsByTagName('description')[0]
    print "Description : %s" % description.childNodes[0].data

 

 

  1. 写入xml文件

写入可以表示为:新建一个xml文件以及在已有的xml文件上追加元素

1通过dom = minidom.Document()来创建

2 通过解析得到dom对象,dom=parse(“./xuan.xml”)

 

创建一个新的元素结点createElement()

创建一个文本节点createTextNode()

将文本节点挂载元素结点上

将元素结点挂载到其父节元素上

下面是解析、新建、更新xml的几个步骤

 Xml文件 xuan.xml

<?xml version="1.0" encoding="utf-8"?>

<!-- This is list of customers --><customers>    
          <customer ID="C001">        
                    <name>xuan</name>        
                    <phone>13457705763</phone>        
                    <school>dfeadfadfa</school>        
          </customer>    
          <customer ID="C002">        
                    <name>xuan1</name>        
                    <phone>13457705763</phone>        
                    <school>adfaefafg</school>        
          </customer>    
</customers>

 

从解析到写入、更新

#!C:/Python27/python
# _*_coding:UTF-8_*_

from xml.dom.minidom import parse


# dom解析xml
def readXML():
    domTree = parse("xuan.xml")
    # 文档根元素
    rootNode = domTree.documentElement
    print(rootNode.nodeName)

    # 所有信息
    customers = rootNode.getElementsByTagName("customer")
    print("****学生信息****")
    for customer in customers:
        if customer.hasAttribute("ID"):
            print "ID:", customer.getAttribute("ID")
            # name 元素
            name = customer.getElementsByTagName("name")[0]
            print name.nodeName, ":", name.childNodes[0].data
            # phone 元素
            phone = customer.getElementsByTagName("phone")[0]
            print phone.nodeName, ":", phone.childNodes[0].data
            # comments 元素
            schools = customer.getElementsByTagName("school")[0]
            print schools.nodeName, ":", schools.childNodes[0].data
        print" "


# 写入C003,并创建新的xml文件
def writeXML():
    domTree = parse("xuan.xml")
    # 文档根元素
    rootNode = domTree.documentElement

    # 新建一个customer节点
    customer_node = domTree.createElement("customer")
    customer_node.setAttribute("ID", "C003")

    # 创建name节点,并设置textValue
    name_node = domTree.createElement("name")
    name_value = domTree.createTextNode("xuan11")
    # 把文本节点挂到name_node节点
    name_node.appendChild(name_value)
    customer_node.appendChild(name_node)

    # 创建phone节点,并设置textValue
    phone_node = domTree.createElement("phone")
    phone_text_value = domTree.createTextNode("1345630")
    # 把文本节点挂到name_node节点
    phone_node.appendChild(phone_text_value)
    customer_node.appendChild(phone_node)

    # 创建comments节点,这里是CDATA
    schools_node = domTree.createElement("school")
    cdata_text_value = domTree.createCDATASection("welcome to 河池学院")
    schools_node.appendChild(cdata_text_value)
    customer_node.appendChild(schools_node)
    # 将customer节点加入根节点
    rootNode.appendChild(customer_node)

    with open('xuan1.xml', 'w') as f:
        domTree.writexml(f, addindent='  ', encoding='utf-8')
    print "创建了新的xml文件"


# 修改xml文件内容
def updateXML():
    domTree = parse("xuan.xml")
    rootNode = domTree.documentElement

    names = rootNode.getElementsByTagName("name")
    for name in names:
        if name.childNodes[0].data == "xuan":
            pn = name.parentNode
            phone = pn.getElementsByTagName("phone")[0]
            phone.childNodes[0].data = 13457705763

    with open('xuan.xml', 'w') as f:
        domTree.writexml(f, addindent='  ', encoding='utf-8')
    print "更新了新文件"


if __name__ == '__main__':
    readXML()
    writeXML()
    updateXML()

 

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值