python删除最后一个字符,如何删除python字符串的最后utf8字符

I have a string containing utf-8 encoded text. I need to remove the last utf-8 character.

So far I did

msg = msg[:-1]

but this only removes the last byte. It works as long as the last character is an ASCII code. It doesn't work anymore when the last character is a multibyte character.

解决方案

The simplest way is to decode your UTF-8 bytes to Unicode text:

without_last = msg.decode('utf8')[:-1]

You can always encode it again.

The alternative would be for you to search for a UTF-8 start byte; UTF-8 byte sequences always start with a byte with the most significant bit set to 0, or the two most significant bits set to 1, while continuation bytes always start with 10:

# find starting byte of last codepoint

pos = len(msg) - 1

while pos > -1 and ord(msg[pos]) & 0xC0 == 0x80:

# character at pos is a continuation byte (bit 7 set, bit 6 not)

pos -= 1

msg = msg[:pos]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值