上一篇简单介绍了python的基本语法,主要是从使用C或C++人的观点来说的。这一篇详细说一下,elementtree库的用法。Elenmenttree是python2.5以后加入python标准库的一个用C写的python库。
XML读取
from xml.etree.ElementTree import ElementTree, Element
import sys
def ReadFromXml(path):
''''read from xml and prase it
author:limin
path: the file path
return ElementTree'''
tree = ElementTree()
tree.parse(path);
return tree
仍然以上文的那个XML文件为例子
<?xml version="1.0"?>
<pdml version="0" creator="wireshark/1.20.0.1">
<packet>
<proto name="geninfo" pos="0" showname="General information" size="98">
<field name="num" pos="0" show="1246" showname="Number" value="4de" size="98"/>
<field name="len" pos="0" show="98" showname="Frame Length" value="62" size="98"/>
<field name="caplen" pos="0" show="98" showname="Captured Length" value="62" size="98"/>
<field name="timestamp" pos="0" show="Mar 6, 2013 18:28:28.729395000 China Standard Time" showname="Captured Time" value="1362565708.729395000" size="98"/>
</proto>
<proto name="frame" showname="Frame 1246: 98 bytes on wire (784 bits), 98 bytes captured (784 bits)" size="98" pos="0">
<field name="frame.time" showname="Arrival Time: Mar 6, 2013 18:28:28.729395000 China Standard Time" size="0" pos="0" show="Mar 6, 2013 18:28:28.729395000"/>
<field name="frame.time_epoch" showname="Epoch Time: 1362565708.729395000 seconds" size="0" pos="0" show="1362565708.729395000"/>
<field name="frame.time_delta" showname="Time delta from previous captured frame: 0.000475000 seconds" size="0" pos="0" show="0.000475000"/>
<field name="frame.time_delta_displayed" showname="Time delta from previous displayed frame: 0.000000000 seconds" size="0" pos="0" show="0.000000000"/>
<field name="frame.time_relative" showname="Time since reference or first frame: 93.072253000 seconds" size="0" pos="0" show="93.072253000"/>
<field name="frame.number" showname="Frame Number: 1246" size="0" pos="0" show="1246"/>
<field name="frame.len" showname="Frame Length: 98 bytes (784 bits)" size="0" pos="0" show="98"/>
<field name="frame.cap_len" showname="Capture Length: 98 bytes (784 bits)" size="0" pos="0" show="98"/>
<field name="frame.marked" showname="Frame is marked: False" size="0" pos="0" show="0"/>
<field name="frame.ignored" showname="Frame is ignored: False" size="0" pos="0" show="0"/>
<field name="frame.protocols" showname="Protocols in frame: eth:ip:udp:mmtss:sicap" size="0" pos="0" show="eth:ip:udp:mmtss:sicap"/>
<field name="frame.coloring_rule.name" showname="Coloring Rule Name: SICAP" size="0" pos="0" show="SICAP"/>
<field name="frame.coloring_rule.string" showname="Coloring Rule String: sicap" size="0" pos="0" show="sicap"/>
</proto>
<proto name="eth" showname="Ethernet II, Src: 192.168.254.1 (00:0f:bb:69:93:ee), Dst: DCT-INF (00:10:18:cb:b5:fd)" size="14" pos="0">
<field name="eth.dst" showname="Destination: DCT-INF (00:10:18:cb:b5:fd)" size="6" pos="0" show="00:10:18:cb:b5:fd" value="001018cbb5fd">
<field name="eth.addr" showname="Address: DCT-INF (00:10:18:cb:b5:fd)" size="6" pos="0" show="00:10:18:cb:b5:fd" value="001018cbb5fd"/>
<field name="eth.ig" showname=".... ...0 .... .... .... .... = IG bit: Individual address (unicast)" size="3" pos="0" show="0" value="0" unmaskedvalue="001018"/>
<field name="eth.lg" showname=".... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)" size="3" pos="0" show="0" value="0" unmaskedvalue="001018"/>
</field>
<field name="eth.src" showname="Source: 192.168.254.1 (00:0f:bb:69:93:ee)" size="6" pos="6" show="00:0f:bb:69:93:ee" value="000fbb6993ee">
<field name="eth.addr" showname="Address: 192.168.254.1 (00:0f:bb:69:93:ee)" size="6" pos="6" show="00:0f:bb:69:93:ee" value="000fbb6993ee"/>
<field name="eth.ig" showname=".... ...0 .... .... .... .... = IG bit: Individual address (unicast)" size="3" pos="6" show="0" value="0" unmaskedvalue="000fbb"/>
<field name="eth.lg" showname=".... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)" size="3" pos="6" show="0" value="0" unmaskedvalue="000fbb"/>
</field>
<field name="eth.type" showname="Type: IP (0x0800)" size="2" pos="12" show="0x0800" value="0800"/>
</proto>
<proto name="ip" showname="Internet Protocol, Src: 192.168.254.68 (192.168.254.68), Dst: DCT-INF (192.168.254.2)" size="20" pos="14">
<field name="ip.version" showname="Version: 4" size="1" pos="14" show="4" value="45"/>
<field name="ip.hdr_len" showname="Header length: 20 bytes" size="1" pos="14" show="20" value="45"/>
<field name="ip.dsfield" showname="Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)" size="1" pos="15" show="0" value="00">
<field name="ip.dsfield.dscp" showname="0000 00.. = Differentiated Services Codepoint: Default (0x00)" size="1" pos="15" show="0x00" value="0" unmaskedvalue="00"/>
<field name="ip.dsfield.ect" showname=".... ..0. = ECN-Capable Transport (ECT): 0" size="1" pos="15" show="0" value="0" unmaskedvalue="00"/>
<field name="ip.dsfield.ce" showname=".... ...0 = ECN-CE: 0" size="1" pos="15" show="0" value="0" unmaskedvalue="00"/>
</field>
<field name="ip.len" showname="Total Length: 84" size="2" pos="16" show="84" value="0054"/>
<field name="ip.id" showname="Identification: 0xce64 (52836)" size="2" pos="18" show="0xce64" value="ce64"/>
<field name="ip.flags" showname="Flags: 0x00" size="1" pos="20" show="0x00" value="00">
<field name="ip.flags.rb" showname="0... .... = Reserved bit: Not set" size="1" pos="20" show="0" value="00"/>
<field name="ip.flags.df" showname=".0.. .... = Don't fragment: Not set" size="1" pos="20" show="0" value="00"/>
<field name="ip.flags.mf" showname="..0. .... = More fragments: Not set" size="1" pos="20" show="0" value="00"/>
</field>
<field name="ip.frag_offset" showname="Fragment offset: 0" size="2" pos="20" show="0" value="0000"/>
<field name="ip.ttl" showname="Time to live: 254" size="1" pos="22" show="254" value="fe"/>
<field name="ip.proto" showname="Protocol: UDP (17)" size="1" pos="23" show="17" value="11"/>
<field name="ip.checksum" showname="Header checksum: 0x709b [correct]" size="2" pos="24" show="0x709b" value="709b">
<field name="ip.checksum_good" showname="Good: True" size="2" pos="24" show="1" value="709b"/>
<field name="ip.checksum_bad" showname="Bad: False" size="2" pos="24" show="0" value="709b"/>
</field>
<field name="ip.src" showname="Source: 192.168.254.68 (192.168.254.68)" size="4" pos="26" show="192.168.254.68" value="c0a8fe44"/>
<field name="ip.addr" showname="Source or Destination Address: 192.168.254.68 (192.168.254.68)" hide="yes" size="4" pos="26" show="192.168.254.68" value="c0a8fe44"/>
<field name="ip.src_host" showname="Source Host: 192.168.254.68" hide="yes" size="4" pos="26" show="192.168.254.68" value="c0a8fe44"/>
<field name="ip.host" showname="Source or Destination Host: 192.168.254.68" hide="yes" size="4" pos="26" show="192.168.254.68" value="c0a8fe44"/>
<field name="ip.dst" showname="Destination: DCT-INF (192.168.254.2)" size="4" pos="30" show="192.168.254.2" value="c0a8fe02"/>
<field name="ip.addr" showname="Source or Destination Address: DCT-INF (192.168.254.2)" hide="yes" size="4" pos="30" show="192.168.254.2" value="c0a8fe02"/>
<field name="ip.dst_host" showname="Destination Host: DCT-INF" hide="yes" size="4" pos="30" show="DCT-INF" value="c0a8fe02"/>
<field name="ip.host" showname="Source or Destination Host: DCT-INF" hide="yes" size="4" pos="30" show="DCT-INF" value="c0a8fe02"/>
</proto>
<proto name="udp" showname="User Datagram Protocol, Src Port: 35429 (35429), Dst Port: rfe (5002)" size="8" pos="34">
<field name="udp.srcport" showname="Source port: 35429 (35429)" size="2" pos="34" show="35429" value="8a65"/>
<field name="udp.dstport" showname="Destination port: rfe (5002)" size="2" pos="36" show="5002" value="138a"/>
<field name="udp.port" showname="Source or Destination Port: 35429" hide="yes" size="2" pos="34" show="35429" value="8a65"/>
<field name="udp.port" showname="Source or Destination Port: 5002" hide="yes" size="2" pos="36" show="5002" value="138a"/>
<field name="udp.length" showname="Length: 64" size="2" pos="38" show="64" value="0040"/>
<field name="udp.checksum_coverage" showname="Checksum coverage: 64" hide="yes" size="0" pos="38" show="64"/>
<field name="udp.checksum" showname="Checksum: 0xf3f8 [validation disabled]" size="2" pos="40" show="0xf3f8" value="f3f8">
<field name="udp.checksum_good" showname="Good Checksum: False" size="2" pos="40" show="0" value="f3f8"/>
<field name="udp.checksum_bad" showname="Bad Checksum: False" size="2" pos="40" show="0" value="f3f8"/>
</field>
</proto>
调用读取函数后,可以看到在内存中文件的组织结构式这样的
文件在内存中是一个树结构,最外层是elementtree这个对象的内置方法,文件中的内容
elementtree读取的数据,操作的时候到要先获得root,如这个文件,root就是
<pdml version="0" creator="wireshark/1.20.0.1">
获得root的函数为
root = tree.getroot()
root节点下面,具备的属性有
children:root的子节点
attrib:属性,也就是XML文件中用
version="0" creator="wireshark/1.20.0.1
text:没有用<>包围的部分
展开root中的children节点
就可以看到root的节点,也就是用XML文件中<packet></packet> 包围的部分,同时也可以看到packet节点的子节点proto,也就是XML文件中用<proto></proto>包围的部分。
到这来,基本上已经清楚了。elementtree这个库将XML读入内存用层化的机构连接。
XML文件中数据的读取
如上文所说,我们要读取的就是各个层级节点下的attrib,tag,tail,text这几个参数。
几个读取的方法
直接读取
py2.7支持直接读取的方式,如以下代码
for messagelstParaNode in node: #1st<field
if messagelstParaNode.attrib['name'] == 'sicap.header': #sicap header
SicapHead = parseSiCapHead(messagelstParaNode)
insertRecSendFieldIntoList(SicapMessageList,SicapHead)
else:
if messagelstParaNode.attrib['name'].find('sicap.') == 0: #para field 1st message type
templist.insert(0,messagelstParaNode.attrib['showname'])
templist.insert(1,SicapHead['value'])
SicapMessageList.insert(0,templist)
for message2stParaNode in messagelstParaNode: #para field 2st message
GetFieldinfo(SicapMessageList,message2stParaNode)
for message3stParaNode in message2stParaNode: #para field 3st message
GetFieldinfo(SicapMessageList,message3stParaNode)
for message4stParaNode in message3stParaNode: #para field 4st message
GetFieldinfo(SicapMessageList,message4stParaNode)
for message5stParaNode in message4stParaNode: #para field 5st message
GetFieldinfo(SicapMessageList,message5stParaNode)
for message6stParaNode in message5stParaNode:
GetFieldinfo(SicapMessageList,message6stParaNode)
这个代码中,node就是root节点对象,一共解析了6层节点。在相应的节点层次,就可以使用采用如下方法访问各个参数
attrib:
attrib 在elemettree中是用字典格式存储的,使用它的key就可以访问,如
messagelstParaNode.attrib['name']