mysql数据排序python,mysql-python排序规则问题:如何强制unicode数据类型?

在更改数据库字段的字符集后,发现MySQL连接返回的数据类型发生了变化,导致Python接收到的是二进制字符串而非Unicode。通过创建自定义类型转换字典并传递给`connect()`方法,可以强制MySQLdb返回Unicode对象。具体操作包括移除FLAG.BINARY的特殊转换并为所有情况添加解码器。此外,也可以通过设置`use_unicode=0`并自行处理编码,但不推荐。
摘要由CSDN通过智能技术生成

For certain purposes I had to change field collations from utf8_unicode_ci to utf8_bin in a database. It turned out that the change lead to changes in datatypes that come to python.

The question is how to force mysql-python to return unicode objects to python.

Here is a sample that shows the problem (explicit charset forces use_unicode=1):

>>> con = MySQLdb.connect(..., charset='utf8')

>>> c = c.cursor()

>>> c.execute('SELECT %s COLLATE utf8_bin', u'м')

1L

>>> c.fetchone()

('\xd0\xbc',)

>>> c.description

(("'\xd0\xbc' COLLATE utf8_bin", 253, 2, 3, 3, 31, 0),)

>>> c.execute('SELECT %s COLLATE utf8_unicode_ci', u'м')

1L

>>> c.fetchone()

(u'\u043c',)

>>> c.description

(("'\xd0\xbc' COLLATE utf8_unicode_ci", 253, 2, 3, 3, 31, 0),)

In my database the fields are of type VARCHAR, but after the change they behave like BINARY which is not what I want.

解决方案

It turns out, that the problem is rather awkward. In short, most variaties and species in MySQL string datatypes map to a single datatype in MySQL's interface with an additional BINARY flag.

Thus, MySQL's VARCHAR, VARBINARY, and a string literal map to the same MySQLdb.constants.FIELD_TYPE.VAR_STRING type in column type definitions, but having an additional MySQLdb.constants.FLAG.BINARY flag when the type is VARBINARY or a string collated with a *_bin collation.

Even though there is a MySQLdb.constants.FIELD_TYPE.VARCHAR type, I failed to find out when it is used. As I said, MySQL VARCHAR columns maps to FIELD_TYPE.VAR_STRING.

The solution becomes rather fragile, if your application uses true binary strings (for example, you store images and fetch them with the same connection as text), since it assumes decoding all binary strings to unicode. Though, it works.

As official docs states:

Because MySQL returns all data as strings and expects you to convert it yourself. This would be a real pain in the ass, but in fact, _mysql can do this for you. (And MySQLdb does do this for you.) To have automatic type conversion done, you need to create a type converter dictionary, and pass this to connect() as the conv keyword parameter.

In practice, real pain in the ass might be the process of constructing your own converters dictionary. But you can import the default one from MySQLdb.converters.conversions and patch it, or even patch it on an instance of the Connection. The trick is to remove a special converter for a FLAG.BINARY flag and add a decoder for all cases. If you explicitly specify a charset parameter for MySQLdb.connect, it forces use_unicode=1 parameter, which adds the decoder for you, but you can do it yourself:

>>> con = MySQLdb.connect(**params)

>>> con.converter[FIELD_TYPE.VAR_STRING]

[(128, ), (None, )]

>>> con.converter[FIELD_TYPE.VAR_STRING] = [(None, con.string_decoder)]

>>> c = con.cursor()

>>> c.execute("SELECT %s COLLATE utf8_bin", u'м')

1L

>>> c.fetchone()

(u'\u043c',)

You might probably need to make the same hack for FIELD_TYPE.STRING if required.

Another solution is to pass explicit use_unicode=0 to MySQLdb.connect and make all decodings in your code, but I would not.

Hope, this might be useful to someone.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值