python translate方法_在Python中使用string.translate音译西里尔字母?

I'm getting UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-51: ordinal not in range(128) exception trying to use string.maketrans in Python. I'm kinda discouraged with this kind of error in following code (gist):

# -*- coding: utf-8 -*-

import string

def translit1(string):

""" This function works just fine """

capital_letters = {

u'А': u'A',

u'Б': u'B',

u'В': u'V',

u'Г': u'G',

u'Д': u'D',

u'Е': u'E',

u'Ё': u'E',

u'Ж': u'Zh',

u'З': u'Z',

u'И': u'I',

u'Й': u'Y',

u'К': u'K',

u'Л': u'L',

u'М': u'M',

u'Н': u'N',

u'О': u'O',

u'П': u'P',

u'Р': u'R',

u'С': u'S',

u'Т': u'T',

u'У': u'U',

u'Ф': u'F',

u'Х': u'H',

u'Ц': u'Ts',

u'Ч': u'Ch',

u'Ш': u'Sh',

u'Щ': u'Sch',

u'Ъ': u'',

u'Ы': u'Y',

u'Ь': u'',

u'Э': u'E',

u'Ю': u'Yu',

u'Я': u'Ya'

}

lower_case_letters = {

u'а': u'a',

u'б': u'b',

u'в': u'v',

u'г': u'g',

u'д': u'd',

u'е': u'e',

u'ё': u'e',

u'ж': u'zh',

u'з': u'z',

u'и': u'i',

u'й': u'y',

u'к': u'k',

u'л': u'l',

u'м': u'm',

u'н': u'n',

u'о': u'o',

u'п': u'p',

u'р': u'r',

u'с': u's',

u'т': u't',

u'у': u'u',

u'ф': u'f',

u'х': u'h',

u'ц': u'ts',

u'ч': u'ch',

u'ш': u'sh',

u'щ': u'sch',

u'ъ': u'',

u'ы': u'y',

u'ь': u'',

u'э': u'e',

u'ю': u'yu',

u'я': u'ya'

}

translit_string = ""

for index, char in enumerate(string):

if char in lower_case_letters.keys():

char = lower_case_letters[char]

elif char in capital_letters.keys():

char = capital_letters[char]

if len(string) > index+1:

if string[index+1] not in lower_case_letters.keys():

char = char.upper()

else:

char = char.upper()

translit_string += char

return translit_string

def translit2(text):

""" This method should be more easy to grasp,

but throws exception:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-51: ordinal not in range(128)

"""

symbols = string.maketrans(u"абвгдеёзийклмнопрстуфхъыьэАБВГДЕЁЗИЙКЛМНОПРСТУФХЪЫЬЭ",

u"abvgdeezijklmnoprstufh'y'eABVGDEEZIJKLMNOPRSTUFH'Y'E")

sequence = {

u'ж':'zh',

u'ц':'ts',

u'ч':'ch',

u'ш':'sh',

u'щ':'sch',

u'ю':'ju',

u'я':'ja',

u'Ж':'Zh',

u'Ц':'Ts',

u'Ч':'Ch'

}

for char in sequence.keys():

text = text.replace(char, sequence[char])

return text.translate(symbols)

if __name__ == "__main__":

print translit1(u"Привет") # prints Privet as expected

print translit2(u"Привет") # throws exception: UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-51: ordinal not in range(128)

Original trace:

Traceback (most recent call last):

File "translit_error.py", line 124, in

print translit2(u"Привет") # throws exception: UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-51: ordinal not in range(128)

File "translit_error.py", line 103, in translit2

u"abvgdeezijklmnoprstufh'y'eABVGDEEZIJKLMNOPRSTUFH'Y'E")

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-51: ordinal not in range(128)

I mean, why Python string.maketrans trying to use ascii table anyway? And how comes English alphabet letters are out of 0-128 range?

$ python -c "print ord(u'A')"

65

$ python -c "print ord(u'z')"

122

$ python -c "print ord(u\"'\")"

39

After several hours I feel like absolutely exhausted to solve this issue.

Can someone say what is happening and how to fix it?

解决方案

translate behaves differently when used with unicode strings. Instead of a maketrans table, you have to provide a dictionary ord(search)->ord(replace):

symbols = (u"абвгдеёжзийклмнопрстуфхцчшщъыьэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ",

u"abvgdeejzijklmnoprstufhzcss_y_euaABVGDEEJZIJKLMNOPRSTUFHZCSS_Y_EUA")

tr = {ord(a):ord(b) for a, b in zip(*symbols)}

# for Python 2.*:

# tr = dict( [ (ord(a), ord(b)) for (a, b) in zip(*symbols) ] )

text = u'Добрый Ден'

print text.translate(tr) # looks good

That said, I'd second the suggestion not to reinvent the wheel and to use an established library: http://pypi.python.org/pypi/Unidecode

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值