pythonencode是生成字节码吗,python使用变量将字节码转换为utf-8

I have the following problem:

From a SQL Server database I am reading data using python module pypyodbc and ODBC Driver 13 for SQL Server and writing to txt files.

Database contains all kinds of special characters and they read as:

'PR\xc3\x86KVAL'

The '\xc3\x86' part is bytecode and should be interpreted that way. The other characters should be interpreted as shown. UTF8 would translate '\xc3\x86' to Æ.

If I type the value in b'PR\xc3\x86KVAL' , python recognizes it as bytecode and I can translate it to PRÆKVAL. See below:

s = b'PR\xc3\x86KVAL'

print(s)

bb = s.decode('utf-8')

print(bb)

The problem is that I don’t know how I can turn 'PR\xc3\x86KVAL’ to be recognized as a bytecode object.

I want the value that has to be decoded to be a variable so that all data from database can flow through it.

I Also tried ast.literal_eval(r”b'PR\xc3\x86KVAL'”), but variables won’t work in this way.

解决方案

Since you start out with PR\xc3\x86KVAL as a text string and decode indeed expects a raw byte sequence, you need to convert the text string into a bytes object. But when converting from one "encoding" value to another, Python needs to know what encoding it is starting with!

The easiest way to do so is explicitly encoding the string, using an encoding that does not change the special characters. You must be careful, because it is very well possible that a character code might be translated to something else, destroying their meaning.

You can see that with a simple example: attempting to tell Python this should be plain ASCII fails, for an obvious reason.

>>> s = 'PR\xc3\x86KVAL'.encode('ascii')

UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128)

Even though there are more than 1,000 questions on Stack Overflow about this, the reason for the failure should be easy to understand. All an encoder/decoder pair does is translate each character from 'source' to 'destination'. This can only work if the character in question actually exists in both the 'source' and 'destination' encodings. Suppose you want to translate a Greek character β to a Russian б, then the source must be able to decode the Greek character (because that is what you entered it in) and the destination must be able to encode the Russian character.

So you must be careful to choose an encoding which does not change the character \x86 in your input string into Ж (which it would do when using cp866, for example).

Fortunately, as quoted from https://stackoverflow.com/a/2617930/2564301, there is an encoding that does not mess up things:

Pass data.decode('latin1') to the codec. latin1 maps bytes 0-255 to Unicode characters 0-255, which is kinda elegant.

and so this should work:

>>> s = 'PR\xc3\x86KVAL'.encode('latin1')

>>> print(s)

b'PR\xc3\x86KVAL'

Now s is a properly encoded byte object, so you can decode it at will:

>>> bb = s.decode('utf-8')

>>> print(bb)

PRÆKVAL

Done!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值