Python解析xml文件（二）

最新推荐文章于 2024-08-06 17:00:20 发布

阴雨绵绵的雾都

最新推荐文章于 2024-08-06 17:00:20 发布

阅读量2.3k

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/Yejianyun1/article/details/52163517

版权

Python 专栏收录该内容

29 篇文章 1 订阅

订阅专栏

获取标签之间的数据值：

movie 节点，<pre name="code" class="html"><format>DVD</format>    <pre name="code" class="html"><pre name="code" class="html">format是一个文本节点

<collection shelf="New Arrivals">
<movie title="Enemy Behind">
   <type><span style="color:#FF0000;">War, Thriller</span></type>
   <format>DVD</format>
   <year>2003</year>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
   <type><span style="color:#FF0000;">Anime, Science Fiction</span></type>
   <format>DVD</format>
   <year>1989</year>
   <rating>R</rating>
   <stars>8</stars>
   <description>A schientific fiction</description>
</movie>
<movie title="Trigun">
   <type><span style="color:#FF0000;">Anime, Action</span></type>
   <format>DVD</format>
   <episodes>4</episodes>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Vash the Stampede!</description>
</movie>
<movie title="Ishtar">
   <type><span style="color:#FF0000;">Comedy</span></type>
   <format>VHS</format>
   <rating>PG</rating>
   <stars>2</stars>
   <description>Viewable boredom</description>
</movie>
</collection>

分析整个xml文件：

#coding=utf-8

import xml.dom.minidom
#读取xml文件
dom = xml.dom.minidom.parse("a.xml")
root = dom.documentElement

#获取节点的名字
print root.nodeName
#获取节点的值
print root.nodeValue
#获取节点的类型
print root.nodeType
print root.ELEMENT_NODE

#获取子节点,如果有多个，将以数组的形式存储到变量中
childs = root.getElementsByTagName('movie')
print '========childs========'
print 'childs: %s' % childs.length
for child in childs:
   #获取节点的属性值
   print child.getAttribute('title')
   grandsuns = child.getElementsByTagName('type')
   for grandsun in grandsuns:
      print "=====grandsun========"
      print grandsun.nodeName
      #获取节点的数据值，标签之间的值
      print grandsun.firstChild.data

以上这些实现分析，该方法在解析一个xml文件的时候，会将整个xml文件读入到内存中，形成一个element树，之后就可以通过使用不同的方法获取该树内的任意节点以及节点的属性和节点的值。通过

getElementsByTagName可以获取节点的子节点，每个节点的子节点可能含有不止一个的子节点，这样搜索结果返回的将是一个列表，可以根据数组的使用原则访问所有的返回节点。

firstChild 属性返回被选节点的第一个子节点，.data表示获取该节点人数据。理解：

<movie title="Ishtar">
   <type>
      <format>VHS</format>
      Comedy
   </type>
   <type>1234556</type>
   
   <rating>PG</rating>
   <stars>2</stars>
   <description>Viewable boredom</description>
</movie>

以上这个xml如果查看type的数据值，使用 firstChild得到的就是

<format>VHS</format>获取其data时就是一个空的。

第二种方式：该方法主要用于遍历某一个节点下的所有子节点

#coding=utf-8
from xml.etree import ElementTree as ET

per=ET.parse('a.xml')
p=per.findall('./movie/year')

print p
for x in p:
   print x.text
   print x.tag

p = per.findall('./movie/type')
for x in p:
   print x.text

findall用于指定在哪一级标签下开始遍历。

#coding=utf-8
from xml.etree import ElementTree as ET

per=ET.parse('a.xml')
p=per.findall('./movie/year')

print p
for x in p:
   print x.text
   print x.tag

p = <span style="color:#FF0000;">per.findall('./movie/type')</span><img src="https://img-blog.csdn.net/20160809164638559?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" />
   for y in
for x in p:
   print x.text

print "======tag====="
p = per.findall('./movie')
for x in p[0].getchildren():
   print x.tag
print "=====text======"
for x in p: x.getchildren():
      print "tag : %s , Value : %s" % (y.tag,y.text)

getchildren方法按照文档顺序返回所有子标签。并输出标签名（ child.tag ）和标签的数据（ child.text ）