python elementtree乱码,ElementTree不会使用Python 2.7解析特殊字符

I had to rewrite my python script from python 3 to python2 and after that I got problem parsing special characters with ElementTree.

This is a piece of my xml:

Avsättning egenavgifter

This is the ouput when I parse this row:

('account:', '89890000', 'AccountType:', 'Kostnad', 'Name:', 'Avs\xc3\xa4ttning egenavgifter')

So it seems to be a problem with the character "ä".

This is how i do it in the code:

sys.setdefaultencoding( "UTF-8" )

xmltree = ET()

xmltree.parse("xxxx.xml")

printAccountPlan(xmltree)

def printAccountPlan(xmltree):

print("account:",str(i.attrib['number']), "AccountType:",str(i.attrib['type']),"Name:",str(i.text))

Anyone have an ide to get the ElementTree parse the charracter "ä", so the result will be like this:

('account:', '89890000', 'AccountType:', 'Kostnad', 'Name:', 'Avsättning egenavgifter')

解决方案

You're running into two separate differences between Python 2 and Python 3 at the same time, which is why you're getting unexpected results.

The first difference is one you're probably already aware of: Python's print statement in version 2 became a print function in version 3. That change is creating a special circumstance in your case, which I'll get to a little later. But briefly, this is the difference in how 'print' works:

In Python 3:

>>> # Two arguments 'Hi' and 'there' get passed to the function 'print'.

>>> # They are concatenated with a space separator and printed.

>>> print('Hi', 'there')

>>> Hi there

In Python 2:

>>> # 'print' is a statement which doesn't need parenthesis.

>>> # The parenthesis instead create a tuple containing two elements

>>> # 'Hi' and 'there'. This tuple is then printed.

>>> print('Hi', 'there')

>>> ('Hi', 'there')

The second problem in your case is that tuples print themselves by calling repr() on each of their elements. In Python 3, repr() displays unicode as you want. But in Python 2, repr() uses escape characters for any byte values which fall outside the printable ASCII range (e.g., larger than 127). This is why you're seeing them.

You may decide to resolve this issue, or not, depending on what you're goal is with your code. The representation of a tuple in Python 2 uses escape characters because it's not designed to be displayed to an end-user. It's more for your internal convenience as a developer, for troubleshooting and similar tasks. If you're simply printing it for yourself, then you may not need to change a thing because Python is showing you that the encoded bytes for that non-ASCII character are correctly there in your string. If you do want to display something to the end-user which has the format of how tuples look, then one way to do it (which retains correct printing of unicode) is to manually create the formatting, like this:

def printAccountPlan(xmltree):

data = (i.attrib['number'], i.attrib['type'], i.text)

print "('account:', '%s', 'AccountType:', '%s', 'Name:', '%s')" % data

# Produces this:

# ('account:', '89890000', 'AccountType:', 'Kostnad', 'Name:', 'Avsättning egenavgifter')

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值