好吧,我已经建立了一个MySQL数据库。大多数桌子都是拉丁裔的,Django处理得很好。但是,其中一些是UTF-8,Django不处理它们。
这是一个示例表(这些表都来自Django地名):
DROP TABLE IF EXISTS `geoname`;
SET @saved_cs_client = @@character_set_client;
SET character_set_client = utf8;
CREATE TABLE `geoname` (
`id` int(11) NOT NULL,
`name` varchar(200) NOT NULL,
`ascii_name` varchar(200) NOT NULL,
`latitude` decimal(20,17) NOT NULL,
`longitude` decimal(20,17) NOT NULL,
`point` point default NULL,
`fclass` varchar(1) NOT NULL,
`fcode` varchar(7) NOT NULL,
`country_id` varchar(2) NOT NULL,
`cc2` varchar(60) NOT NULL,
`admin1_id` int(11) default NULL,
`admin2_id` int(11) default NULL,
`admin3_id` int(11) default NULL,
`admin4_id` int(11) default NULL,
`population` int(11) NOT NULL,
`elevation` int(11) NOT NULL,
`gtopo30` int(11) NOT NULL,
`timezone_id` int(11) default NULL,
`moddate` date NOT NULL,
PRIMARY KEY (`id`),
KEY `country_id_refs_iso_alpha2_e2614807` (`country_id`),
KEY `admin1_id_refs_id_a28cd057` (`admin1_id`),
KEY `admin2_id_refs_id_4f9a0f7e` (`admin2_id`),
KEY `admin3_id_refs_id_f8a5e181` (`admin3_id`),
KEY `admin4_id_refs_id_9cc00ec8` (`admin4_id`),
KEY `fcode_refs_code_977fe2ec` (`fcode`),
KEY `timezone_id_refs_id_5b46c585` (`timezone_id`),
KEY `geoname_52094d6e` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
SET character_set_client = @saved_cs_client;
现在,如果我试图直接使用mysqldb和一个光标从表中获取数据,我将使用正确的编码获得文本:
>>> import MySQLdb
>>> from django.conf import settings
>>>
>>> conn = MySQLdb.connect (host = "localhost",
... user = settings.DATABASES['default']['USER'],
... passwd = settings.DATABASES['default']['PASSWORD'],
... db = settings.DATABASES['default']['NAME'])
>>> cursor = conn.cursor ()
>>> cursor.execute("select name from geoname where name like 'Uni%Hidalgo'");
1L
>>> g = cursor.fetchone()
>>> g[0]
'Uni\xc3\xb3n Hidalgo'
>>> print g[0]
Unión Hidalgo
但是,如果我尝试使用geoname模型(它实际上是
django.contrib.gis.db.models.Model
):它失败了:
>>> from geonames.models import Geoname
>>> g = Geoname.objects.get(name__istartswith='Uni',name__icontains='Hidalgo')
>>> g.name
u'Uni\xc3\xb3n Hidalgo'
>>> print g.name
Unión Hidalgo
很明显这里有一个编码错误。在这两种情况下,数据库都返回“uni\xc3\xb3n hidalgo”,但django(不正确?)正在将'\xc3\xb3n'转换为__3。
我能做些什么来解决这个问题?
更新
好吧,这很奇怪:
>>> c = unicode('Uni\xc3\xb3n Hidalgo','utf-8')
>>> c
u'Uni\xf3n Hidalgo'
>>> print c
Unión Hidalgo
如果我
力
python将字符串从utf-8编码成unicode,它可以工作。然而,这又重现了错误:
>>> c = unicode('Unión Hidalgo','latin1')
>>> c
u'Uni\xc3\xb3n Hidalgo'
>>> print c
Unión Hidalgo
所以,我猜mysql发送的是utf-8,但是告诉python它是拉丁语1?