第8章：XML和XML-RPC

最新推荐文章于 2024-06-24 10:05:24 发布

weixin_34259232

最新推荐文章于 2024-06-24 10:05:24 发布

阅读量77

点赞数

文章标签： python 数据结构与算法

原文链接：http://blog.51cto.com/liandesinian/1565474

版权

XML是派生于SGML（通用标记语言标准），相当于是给文本进行标记。将标签放在“<>”之中，便于解析。XML文档是层次结构的，例如如下XML文档：

<?xml version="1.0" encoding="ISO-8859-1"?>
<!--Sample XML Document-test.xml-->
<book>
    <title>Sample XML Thing</title>
    <author>
        <name><first>Ben</first> <last>Smith</last></name>
        <affiliation>Spring</affiliation>
    </author>
    
    <chapter number="1">
        <title>First Chapter</title>
        <para>
            I think widgetsw are great.<company>Spring</company>
        </para>
    </chapter>
<book>

从中可以看到xml文档的层次结构性。

通常，当使用xml文档时，会使用一个预先定义号的XML解析库，这些库通常使用下面两种方法之一来展示xml文档：树和事件。一个基于事件的解析器可以扫描文档，并在有感兴趣的内容出现的时候进行通知。例如第7章的HTMLParser例子就是一个基于事件解析器的例子。对于基于事件的解析器，可以实现编写号当文档中出现感兴趣的事情的时候，程序该做什么。

另一方面，基于树的解析器会扫描全部的文档，通过产生嵌套的数据结构来展现文档。对于一个基于树的解析器，可以在得到解析的节后后，浏览一下并挑除需要的信息。另外一个基于树的解析器的有点是，可以在把数据存入内容后进行修改，并通过系统把修改后的xml文档写到磁盘上。

Python对这两种方法都有支持。它的SAX模块可以实现基于事件的解析，它的DOM模块可以实现基于树的解析。本章只介绍DOM模块。。

#!/usr/bin/env python
#-*-coding:utf-8-*-

from xml.dom import minidom, Node

def scanNode(node, level=0):
    msg=node.__class__.__name__
    if node.nodeType==Node.ELEMENT_NODE:
        msg+=", tag: "+node.tagName
    print " "*level*4, msg
    if node.hasChildNodes:
        for child in node.childNodes:
            scanNode(child, level+1)
    
doc=minidom.parse('test.xml')
scanNode(doc)

以上代码生成如下结果：

 Document
     Comment
     Element, tag: book
         Text
         Element, tag: title
             Text
         Text
         Element, tag: author
             Text
             Element, tag: name
                 Element, tag: first
                     Text
                 Text
                 Element, tag: last
                     Text
             Text
             Element, tag: affiliation
                 Text
             Text
         Text
         Element, tag: chapter
             Text
             Element, tag: title
                 Text
             Text
             Element, tag: para
                 Text
                 Element, tag: company
                     Text
                 Text
             Text
         Text

ELENMENT对象表示文档中成对出现的标签。TEXT表示实际的文本，即使在处理空白内容的时候也经常建立TEXT对象。。在源代码中，parse()函数载入和解析XML文档并返回顶端节点。从这个节点开始向下访问其他的节点，就是scanNode()所做的事。

除了可以用dom库来分析xml文档，产生对象树之外，还可以反着用：生成一个对象的树，并使用库函数来写出一个xml文档。还可以从一个存在的文档中得到树后，修改它并写出结果。

#!/usr/bin/env python

from xml.dom import minidom, Node

doc=minidom.Document()

doc.appendChild(doc.createComment("Sample XML Document"))

#Generate the book

book=doc.createElement('book')
doc.appendChild(book)

#The title

title=doc.createElement('title')
title.appendChild(doc.createTextNode('Sample XML Thing'))
book.appendChild(title)

#The author section

author=doc.createElement('author')
book.appendChild(author)
name=doc.createElement('name')
author.appendChild(name)
firstname=doc.createElement('first')
name.appendChild(firstname)
firstname.appendChild(doc.createTextNode('Ben'))
name.appendChild(doc.createTextNode(' '))
lastname=doc.createElement('last')
lastname.appendChild(doc.createTextNode('Smith'))
name.appendChild(lastname)

affiliation=doc.createElement('affiliation')
author.appendChild(affiliation)
affiliation.appendChild(doc.createTextNode('Spring'))

#The chapter

chapter=doc.createElement('chapter')
book.appendChild(chapter)
chapter.setAttribute('number', '1')
title=doc.createElement('title')
chapter.appendChild(title)
title.appendChild(doc.createTextNode('First Chapter'))

para=doc.createElement('para')
chapter.appendChild(para)
para.appendChild(doc.createTextNode('I think widgets are great.'))
company=doc.createElement('company')
para.appendChild(company)
company.appendChild(doc.createTextNode('Spring'))

para.appendChild(doc.createTextNode('.'))

print doc.toprettyxml(indent=' ')

生成如下内容：

<?xml version="1.0" ?>
<!--Sample XML Document-->
<book>
 <title>Sample XML Thing</title>
 <author>
  <name>
   <first>Ben</first>
    
   <last>Smith</last>
  </name>
  <affiliation>Spring</affiliation>
 </author>
 <chapter number="1">
  <title>First Chapter</title>
  <para>
   I think widgets are great.
   <company>Spring</company>
   .
  </para>
 </chapter>
</book>

如果在最后，用toxml()（不带参数）替换toprettyxml()，会看到所有的文档都放到了一行里面，这就为DOM树提供一个真实的表示方法（不含任何空格），不过对人来说不美观。

XML-RPC

XML-RPC是一种与任何语言都没有关系，可以在网络上传递请求和应答的方法。它使用XML来作为基本的数据表示，但是XML-RPC用户不需要真正了解XML。

书中使用的例子与O'Reilly的市场服务有关，这个服务在2006年就关闭了，因此无法测试。。。。

转载于:https://blog.51cto.com/liandesinian/1565474

weixin_34259232

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
第8章：XML和XML-RPC

XML是派生于SGML（通用标记语言标准），相当于是给文本进行标记。将标签放在“<>”之中，便于解析。XML文档是层次结构的，例如如下XML文档：<?xmlversion="1.0"encoding="ISO-8859-1"?><book>&...
复制链接

扫一扫