python处理svg_Python LXML解析SVG文件

I'm trying to parse .svg files from http://kanjivg.tagaini.net/ , but I can't successfully extract the information inside.

A part of 0f9ab.svg looks like this:

My .py file:

import lxml.etree as ET

svg = ET.parse('0f9ab.svg')

print(svg) #

# AttributeError: 'lxml.etree._ElementTree' object has no attribute 'tag'

print(svg.tag)

# TypeError: 'lxml.etree._ElementTree' object is not subscriptable

print(svg[0])

# TypeError: 'lxml.etree._ElementTree' object is not iterable

for child in svg:

print(child)

# None

print(svg.find("./svg"))

# []

print(svg.findall("//g"))

# []

print(svg.xpath("//g"))

Purpose

I tried all kinds of operations I could think of, but nothing gets me any data from the .svg file.

I want to extract the kanji (Japanese character) in kvg:element="kanji" (which are at different depth levels).

Question

Is using lxml the wrong package for this?

If not, how do I extract information from my parsed .svg file?

Other solution

I could of course I could just read the file as a string and search

for kvg:element=", but I would like to proper way of extracting xml

/ svg.

I used xmltodict before, but my code became really messy extracting kvg:element, because they were at different depth levels.

解决方案

.parse() returns an ElementTree, which represents the tree as a whole. To query individual nodes, you need an Element, most likely the root element of the tree.

Replace part of your code with this:

xml = ET.parse('0f9ab.svg')

svg = xml.getroot()

print(svg) #

and I think you'll have some success.

Note also that .findall() requires a relative path and, in your case, a namespace qualifier:

print(svg.findall(".//{http://www.w3.org/2000/svg}g"))

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值