摘要:
一般性的python不支持中文字符,就算是注释都不行,
但是,注意但是……..
1.python中的中文字符问题.
当然了是因为编码问题,细节内容可查看:
https://www.python.org/dev/peps/pep-0263/
人家发现问题了,然后也给出了解决方法(你可以设定你的代码的编码方式):
> Defining the Encoding
> Python will default to ASCII as standard encoding if no other
> encoding hints are given.
> ***To define a source code encoding, a magic comment must
> be placed into the source files either as first or second
> line in the file, such as:***
> # coding=
> or (using formats recognized by popular editors)
> #!/usr/bin/python
> # -*- coding: -*-
> or
> #!/usr/bin/python
> # vim: set fileencoding= :
> More precisely, the first or second line must match the regular
> expression "^[ \t\v]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)".
> The first group of this
> expression is then interpreted as encoding name. If the encoding
> is unknown to Python, an error is raised during compilation. There
> must not be any Python statement on the line that contains the
> encoding declaration. If the first line matches the second line
> is ignored.
>
> To aid with platforms such as Windows, which add Unicode BOM marks
> to the beginning of Unicode files, the UTF-8 signature
> '\xef\xbb\xbf' will be interpreted as 'utf-8' encoding as well
> (even if no magic encoding comment is given).
>
> If a source file uses both the UTF-8 BOM mark signature and a
> magic encoding comment, the only allowed encoding for the comment
> is 'utf-8'. Any other encoding will cause an error.
于是乎你就可以在代码中定义代码的编码方式(在第一行或者第二行定义编码方式):
例如:test.py
**# coding=utf-8**
kk='文字'
print kk
执行
$python test.py
输出:
文字
如果没有”# coding=utf-8”:
kk='文字'
print kk
输出为:
SyntaxError: Non-ASCII character '\xe6' in file nouse2.py on line 9, but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details
2.python 读取xml文件
要读取的xml文件格式,VOC2007.
VOC2007
000001.jpg
The VOC2007 Database
PASCAL VOC2007
flickr
341012865
Fried Camels
Jinky the Fruit Bat
353
500
3
0
dog
Left
1
0
48
240
195
371
person
Left
1
0
8
12
352
498
读取方式:
`import xml.etree.ElementTree as ET #xml的解析库
import os
import cPickle
import numpy as np
def readxml(filename):
tree = ET.parse(filename)#加载并且解析xml文件,tree为根节点.
objs = tree.findall(‘object’) #在根节点上寻找node
num_objs = len(objs)#
for ix, obj in enumerate(objs):#遍历objs的下标和内容
bbox = obj.find(‘objectbox’)
x1 = float(bbox.find(‘xmin’).text)
y1 = float(bbox.find(‘ymin’).text)
x2 = float(bbox.find(‘xmax’).text)
y2 = float(bbox.find(‘ymax’).text)
******
3.python中的文字字符串比较.
先瞎扯点:
项目实在faster-rcnn下做车辆检测.所以自己做了个标注工具,matlab实现,name,color,pose等参数是汉字存储的.而fasterrcnn是基于VOC的标注数据格式,以上参数均为英文.所以修改了fasterrcnn的数据读取接口,将汉字类转化为英文字符.
在自己标注的xml文件,头行显示为:
为 utf-8编码
假如我用2中的方法获取了一个节点的内容:
name = obj.find('name')
而在xml中name的内容是”宠物”,则name=宠物
那么通过type(),可以查看bbox内容格式:
print type(name)
输出:
至于unicode是何意思,自行百度.
又如果我要判断是”宠物”,我将bbox设为”chong wu”则可以用下列代码实现:
name = obj.find('name')# name="宠物"这里是unicode格式 if name=='宠物'.decode('utf-8'):#这里的"宠物"是< string>格式,所以需要修改编码格式 name='chongwu' ‘宠物’.decode(‘utf-8’) ,的意思是将”宠物”重编码为’utf-8’格式.这样就可以比较了. (重编码前”宠物”是 < type ‘string’>格式,所以无法比较.)