I have a .csv file which is encoded in UTF-8.
I am working with Python 2.7.
Something intereseting happens on Ubuntu.
When I print out the results of the file like this:
with open("file.csv", "r") as file:
myFile = csv.reader(file, delimiter = ",")
for row in myFile:
print row
I get signs like \xc3\x, \xa1\, .... Note that row is a list and all the elements in my list are marked as strings by '' in the output.
When I print out the results like this:
with open("file.csv", "r") as file:
myFile = csv.reader(file, delimiter = ",")
for row in myFile:
print ",".join(row)
Everything is decoded fine. Note that every row from my original file is one big string here.
Why is that?
解决方案
This is because in the case of printing a list, Python is using repr(), but when printing a string it is using str(). Example:
unicode_str = 'åäö'
unicode_str_list = [unicode_str, unicode_str]
print 'unwrapped:', unicode_str
print 'in list:', unicode_str_list
print 'repr:', repr(unicode_str)
print 'str:', str(unicode_str)
Produces:
unwrapped: åäö
in list: ['\xc3\xa5\xc3\xa4\xc3\xb6', '\xc3\xa5\xc3\xa4\xc3\xb6']
repr: '\xc3\xa5\xc3\xa4\xc3\xb6'
str: åäö