python终端与编辑器的差别_python unicode相等比较在终端中失败，但在Spyder编辑器下工作...

最新推荐文章于 2024-05-28 16:34:21 发布

weixin_39683858

最新推荐文章于 2024-05-28 16:34:21 发布

阅读量133

点赞数

文章标签： python终端与编辑器的差别

本文链接：https://blog.csdn.net/weixin_39683858/article/details/111740417

版权

I need to compare a unicode string coming from a utf-8 file with a constant defined in the Python script.

I'm using Python 2.7.6 on Linux.

If I run the above script within Spyder (a Python editor) I got it working, but if I invoke the Python script from a terminal, I got the test failing. Do I need to import/define something in the terminal before invoking the script?

Script ("pythonscript.py"):

#!/usr/bin/env python

# -*- coding: utf-8 -*-

import csv

some_french_deps = []

idata_raw = csv.DictReader(open("utf8_encoded_data.csv", 'rb'), delimiter=";")

for rec in idata_raw:

depname = unicode(rec['DEP'],'utf-8')

some_french_deps.append(depname)

test1 = "Tarn"

test2 = "Rhône-Alpes"

if test1==some_french_deps[0]:

print "Tarn test passed"

else:

print "Tarn test failed"

if test2==some_french_deps[2]:

print "Rhône-Alpes test passed"

else:

print "Rhône-Alpes test failed"

utf8_encoded_data.csv:

DEP

Tarn

Lozère

Rhône-Alpes

Aude

Run output from Spyder editor:

Tarn test passed

Rhône-Alpes test passed

Run output from terminal:

$ ./pythonscript.py

Tarn test passed

./pythonscript.py:20: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

if test2==some_french_deps[2]:

Rhône-Alpes test failed

解决方案

You are comparing a byte string (type str) with a unicode value. Spyder has changed the default encoding from ASCII to UTF-8, and Python does an implicit conversion between byte strings and unicode values when comparing the two types. Your byte strings are encoded to UTF-8, so under Spyder that comparison succeeds.

The solution is to not use byte strings, use unicode literals for your two test values instead:

test1 = u"Tarn"

test2 = u"Rhône-Alpes"

Changing the system default encoding is, in my opinion, a terrible idea. Your code should use Unicode correctly instead of relying on implicit conversions, but to change the rules of implicit conversions only increases the confusion, not make the task any easier.