这是我的序列化字符串,包括中文字符:'a:2:{s:3:"key";s:0:"";s:8:"solution";a:1:{i:0;a:1:{i:0;a:3:{s:4:"text";s:6:"**你好";s:3:"fig";N;s:5:"score";i:0;}}}}'
下面是我的python脚本:
^{pr2}$
错误如下:Traceback (most recent call last):
File "test.py", line 8, in
print phpserialize.loads(phpstring,decode_strings=True)
File "/usr/lib/python2.7/dist-packages/phpserialize.py", line 522, in loads
object_hook, array_hook)
File "/usr/lib/python2.7/dist-packages/phpserialize.py", line 512, in load
return _unserialize()
File "/usr/lib/python2.7/dist-packages/phpserialize.py", line 497, in _unserialize
return array_hook(_load_array())
File "/usr/lib/python2.7/dist-packages/phpserialize.py", line 463, in _load_array
item = _unserialize()
File "/usr/lib/python2.7/dist-packages/phpserialize.py", line 490, in _unserialize
_expect(b'"')
File "/usr/lib/python2.7/dist-packages/phpserialize.py", line 444, in _expect
raise ValueError('failed expectation, expected %r got %r' % (e, v))
ValueError: failed expectation, expected '"' got 'o'
我已经找到原因了。在
在php中,汉字被编码和序列化为UTF-8,其中包含6个文本字符:'a:2:{s:3:"key";s:0:"";s:8:"solution";a:1:{i:0;a:1:{i:0;a:3:{s:4:"text";s:6:"**\u4f60\u597d";s:3:"fig";N;s:5:"score";i:0;}}}}'
但是当它进入python时,字符串变成了文本中包含8个字符的字符串:'a:2:{s:3:"key";s:0:"";s:8:"solution";a:1:{i:0;a:1:{i:0;a:3:{s:4:"text";s:6:"**\xe4\xbd\xa0\xe5\xa5\xbd";s:3:"fig";N;s:5:"score";i:0;}}}}'
因此,当我将字符串中的长度从6更改为8时,可以在python中正确加载它。在
但是在我的数据库中有数以百万计的序列化字符串等待在python中进行处理。在
如何在Python中正确加载这些Unicode字符串?在