python判断xml的iter为空,使用LXML和Python解析空白XML标签

When parsing XML documents in the format of:

Blue

Chevy

Camaro

I use the following code:

carData = element.xpath('//Root/Foo/Bar/Car/node()[text()]')

parsedCarData = [{field.tag: field.text for field in carData} for action in carData]

print parsedCarData[0]['Color'] #Blue

This code will not work if a tag is empty such as :

Blue

Chevy

Using the same code as above:

carData = element.xpath('//Root/Foo/Bar/Car/node()[text()]')

parsedCarData = [{field.tag: field.text for field in carData} for action in carData]

print parsedCarData[0]['Model'] #Key Error

How would I parse this blank tag.

解决方案

You're putting in a [text()] filter which explicitly asks only for elements which have text nodes them... and then you're unhappy when it doesn't give you elements without text nodes?

Leave that filter out, and you'll get your model element:

>>> s='''

...

...

... Blue

... Chevy

...

...

... '''

>>> e = lxml.etree.fromstring(s)

>>> carData = e.xpath('Car/node()')

>>> carData

[, , ]

>>> dict(((e.tag, e.text) for e in carData))

{'Color': 'Blue', 'Make': 'Chevy', 'Model': None}

That said -- if your immediate goal is to iterate over the nodes in the tree, you might consider using lxml.etree.iterparse() instead, which will avoid trying to build a full DOM tree in memory and otherwise be much more efficient than building a tree and then iterating over it with XPath. (Think SAX, but without the insane and painful API).

Implementing with iterparse could look like this:

def get_cars(infile):

in_car = False

current_car = {}

for (event, element) in lxml.etree.iterparse(infile, events=('start', 'end')):

if event == 'start':

if element.tag == 'Car':

in_car = True

current_car = {}

continue

if not in_car: continue

if element.tag == 'Car':

yield current_car

continue

current_car[element.tag] = element.text

for car in get_cars(infile = cStringIO.StringIO('''BlueChevy''')):

print car

...it's more code, but (if we weren't using StringIO for the example) it could process a file much larger than could fit in memory.

Python中有多种库可以用来将XML文件转换为CSV格式,其中最常用的是`xml.etree.ElementTree`用于解析XML,以及`pandas`用于数据处理和CSV文件操作。 首先,你需要安装这两个库,如果尚未安装,可以使用pip命令: ```bash pip install xml.etree.ElementTree pandas ``` 然后,你可以编写一个脚本来完成这个任务。下面是一个简单的示例,展示了如何读取XML文件,提取所需的数据,并将其保存到CSV文件中: ```python import xml.etree.ElementTree as ET import pandas as pd # 解析XML文件 def parse_xml(xml_file): tree = ET.parse(xml_file) root = tree.getroot() # 创建空列表存储数据 data_list = [] for item in root.findall('.//your_tag_name'): # 根据实际XML结构替换'your_tag_name' row_data = { 'column1': item.attrib.get('attribute1'), # 假设这里有属性attribute1 'column2': item.text, # 假设这里元素有文本内容 # ...添加其他列的处理 } data_list.append(row_data) return data_list # 将数据列表转换为DataFrame并保存为CSV def xml_to_csv(xml_file, csv_file): data = parse_xml(xml_file) df = pd.DataFrame(data) df.to_csv(csv_file, index=False) # 确保索引不被写入CSV # 使用函数 xml_to_csv('input.xml', 'output.csv') ``` 在这个例子中,你需要替换`your_tag_name`、`attribute1`和列名(如`column1`和`column2`)为实际XML文档中的标签名和属性名称。运行上述代码后,XML文件的内容会被转换成CSV格式并保存到指定的输出文件中。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值