Python xml解析
XML指可扩展标记语言(例如x tensible 中号 arkup 大号anguage),被设计为传输和存储数据
XML是一套定义语义标记的规则,这些标记将文档分成许多部件并与这些部件插入标识。也是元标记语言,定义用于定义其它与特定领域有关的,语义的、结构化的标记语言的句子语言。
Python对XML的解析
常见的xml编程接口有dom和sax,这两种接口处理xml文件的方式不同,使用场合也不同。Python有3中解析 xml:
- SAX(XML的简单API):python标准库包含SAX解释器,SAX用事件驱动模型,通过在解析xml的过程中触发一个个事件并调用用户定义的变量函数来处理XML文件。(速度较快、占内存小、需要用户实现尺寸函数)
- DOM(文档对象模型):将xml数据在内存中解析成一个树,通过对树的操作来操作xml。(需要将xml数据映射到内存中的数,所有比较慢、耗内存)
- ElemenTree(元素树):一个轻量级的DOM,具有方便友好的API,代码可用好、速度快、消耗内存小。
Python使用SAX解析xml
涉及到 解析器和事件处理器
解释器:负责读取xml文件,并向事件处理器发送事件,如元素开始和元素结束。
事件处理器:负责对事件作出响应,对传递的xml数据进行处理。
Python中使用sax处理xml需要xml.sax中的parse函数,xml.sax.handler中的ContentHandler.
创建新的解释器对象并返回:
Xml.sax.make_parser([parser_list])
.sax.make_parser([parser_list])
创建一个sax解释器并解析xml文档
Xml.sax.parse(xmlfile, contenthandler[,errorhandler])
.sax.parse(xmlfile, countenthandler[, errorhandler])
创建一个xml解析器并解析字符串:
Xml.sax.parseString(xmlstring, contenthandler[, errorhandler])
.sax.parseString(xmlstring, contenthandler[, errorhandler])
Parser_lsit: 解析器列表
Xmlfile: xml文件名
Contenthandler: 必须是一个contentHandler的对象(xml.asx.handler中)
Errorhandler: 参数
Xmlstring: xml字符串
Xml文件 movie.xml
<collection shelf="New Arrivals">
<moie title="Enemy Behind">
<type>War, Thriller</type>
<format>DVD</format>
<year>2003</year>
<rating>PG</rating>
<stars>10</stars>
<description>Talk about a US-Japan war</description>
</moie>
<moie title="TransFormers">
<type>Anime, Science Fiction</type>
<format>DVD</format>
<year>1989</year>
<rating>R</rating>
<stars>8</stars>
<description>A schientific fiction</description>
</moie>
<moie title="Trigun">
<type>Anime, Action</type>
<format>DVD</format>
<episodes>4</episodes>
<rating>PG</rating>
<stars>10</stars>
<description>Vash the Stampede!</description>
</moie>
<moie title="Ishtar">
<type>Comedy</type>
<format>VHS</format>
<rating>PG</rating>
<stars>2</stars>
<description>Viewable boredom</description>
</moie>
</collection>
使用SAX解析xml
#!C:/Python27/python
# _*_coding:UTF-8_*_
"""
author : xuandc
time:2020-03-22
sax 解析xml
"""
import xml.sax
class MovieHandler(xml.sax.ContentHandler):
def __init__(self):
self.CurrentData = ""
self.type = ""
self.format = ""
self.year = ""
self.rating = ""
self.stars = ""
self.description = ""
# 元素开始事件处理
def startElement(self, tag, attributes):
self.CurrentData = tag
if tag == "moie":
print "********Movie*******"
title = attributes["title"]
print "Title: ", title
# 元素结束事件处理
def endElement(self, tag):
if self.CurrentData == "type":
print "Type: ", self.type
elif self.CurrentData == "format":
print "Format: ", self.format
elif self.CurrentData == "year":
print "Year: ", self.year
elif self.CurrentData == "rating":
print "Rating: ", self.rating
elif self.CurrentData == "stars":
print "Stars: ", self.stars
elif self.CurrentData == "description":
print "Description: ", self.description
self.CurrentData = ""
# 内容事件处理
def characters(self, content):
if self.CurrentData == "type":
self.type = content
elif self.CurrentData == "format":
self.format = content
elif self.CurrentData == "year":
self.year = content
elif self.CurrentData == "rating":
self.rating = content
elif self.CurrentData == "stars":
self.stars = content
elif self.CurrentData == "description":
self.description = content
if (__name__ == "__main__"):
# 创建一个xmlreader
parser = xml.sax.make_parser()
# turn off namepsaces
parser.setFeature(xml.sax.handler.feature_namespaces, 0)
# 重写 ContextHandler
Handler = MovieHandler()
parser.setContentHandler(Handler)
parser.parse("movie.xml")
使用DOM解析xml
#!C:/Python27/python
# _*_coding:UTF-8_*_
"""
author:xuandc
time: 2020-03-22
#使用xml.dom解析xml
"""
from xml.dom.minidom import parse
import xml.dom.minidom
# 使用minidom解析器打开 xml文档
DOMTree = xml.dom.minidom.parse("movie.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("shelf"):
print "Root element : %s" % collection.getAttribute("shelf")
# 在集合中获取所有电影
movies = collection.getElementsByTagName("moie")
print movies
# 打印每部电影的详细信息
for movie in movies:
print "*********movie**********"
if movie.hasAttribute("title"):
print "Title: %s" % movie.getAttribute("title")
type = movie.getElementsByTagName('type')[0]
print "Type : %s" % type.childNodes[0].data
format = movie.getElementsByTagName('format')[0]
print "Format : %s" % format.childNodes[0].data
rating = movie.getElementsByTagName('rating')[0]
print "Rating : %s" % rating.childNodes[0].data
description = movie.getElementsByTagName('description')[0]
print "Description : %s" % description.childNodes[0].data
- 写入xml文件
写入可以表示为:新建一个xml文件以及在已有的xml文件上追加元素
1通过dom = minidom.Document()来创建
2 通过解析得到dom对象,dom=parse(“./xuan.xml”)
创建一个新的元素结点createElement()
创建一个文本节点createTextNode()
将文本节点挂载元素结点上
将元素结点挂载到其父节元素上
下面是解析、新建、更新xml的几个步骤
Xml文件 xuan.xml
<?xml version="1.0" encoding="utf-8"?>
<!-- This is list of customers --><customers>
<customer ID="C001">
<name>xuan</name>
<phone>13457705763</phone>
<school>dfeadfadfa</school>
</customer>
<customer ID="C002">
<name>xuan1</name>
<phone>13457705763</phone>
<school>adfaefafg</school>
</customer>
</customers>
从解析到写入、更新
#!C:/Python27/python
# _*_coding:UTF-8_*_
from xml.dom.minidom import parse
# dom解析xml
def readXML():
domTree = parse("xuan.xml")
# 文档根元素
rootNode = domTree.documentElement
print(rootNode.nodeName)
# 所有信息
customers = rootNode.getElementsByTagName("customer")
print("****学生信息****")
for customer in customers:
if customer.hasAttribute("ID"):
print "ID:", customer.getAttribute("ID")
# name 元素
name = customer.getElementsByTagName("name")[0]
print name.nodeName, ":", name.childNodes[0].data
# phone 元素
phone = customer.getElementsByTagName("phone")[0]
print phone.nodeName, ":", phone.childNodes[0].data
# comments 元素
schools = customer.getElementsByTagName("school")[0]
print schools.nodeName, ":", schools.childNodes[0].data
print" "
# 写入C003,并创建新的xml文件
def writeXML():
domTree = parse("xuan.xml")
# 文档根元素
rootNode = domTree.documentElement
# 新建一个customer节点
customer_node = domTree.createElement("customer")
customer_node.setAttribute("ID", "C003")
# 创建name节点,并设置textValue
name_node = domTree.createElement("name")
name_value = domTree.createTextNode("xuan11")
# 把文本节点挂到name_node节点
name_node.appendChild(name_value)
customer_node.appendChild(name_node)
# 创建phone节点,并设置textValue
phone_node = domTree.createElement("phone")
phone_text_value = domTree.createTextNode("1345630")
# 把文本节点挂到name_node节点
phone_node.appendChild(phone_text_value)
customer_node.appendChild(phone_node)
# 创建comments节点,这里是CDATA
schools_node = domTree.createElement("school")
cdata_text_value = domTree.createCDATASection("welcome to 河池学院")
schools_node.appendChild(cdata_text_value)
customer_node.appendChild(schools_node)
# 将customer节点加入根节点
rootNode.appendChild(customer_node)
with open('xuan1.xml', 'w') as f:
domTree.writexml(f, addindent=' ', encoding='utf-8')
print "创建了新的xml文件"
# 修改xml文件内容
def updateXML():
domTree = parse("xuan.xml")
rootNode = domTree.documentElement
names = rootNode.getElementsByTagName("name")
for name in names:
if name.childNodes[0].data == "xuan":
pn = name.parentNode
phone = pn.getElementsByTagName("phone")[0]
phone.childNodes[0].data = 13457705763
with open('xuan.xml', 'w') as f:
domTree.writexml(f, addindent=' ', encoding='utf-8')
print "更新了新文件"
if __name__ == '__main__':
readXML()
writeXML()
updateXML()