python表情符号编码大全,Python3表情符号字符作为Unicode

这篇博客讨论了如何在Python3中处理包含表情的字符串,目标是将表情转换为它们的Unicode编码形式。作者提供了一段代码,该代码能够将两个UTF-16字节对转换为十六进制序列,从而实现表情到Unicode的转换。虽然过程比预期复杂,但最终实现了正确长度和内容的Unicode字符串。
摘要由CSDN通过智能技术生成

I have a string in python3 that has emojis in it and I want to treat the emojis as their unicode representation. I need to do some manipulation on the emoji in this format.

s = '😬 😎 hello'

This treats each emoji as its own character such that len(s) == 9 && s[0] == 😬

I want to be change the format of the string so that it is in unicode points such that

s = '😬 😎 hello'

u = to_unicode(s) # Some function to change the format.

print(u) # '\ud83d\ude2c \ud83d\ude0e hello'

u[0] == '\ud83d' and u[1] == '\ude2c'

len(u) == 11

Any thoughts on creating a function to_unicode that will take s and change it into u? I could be thinking about how strings/unicode works in python3 wrong so any help/corrections would be greatly appreciated.

解决方案

Here's some code that will take any character that maps into two UTF-16 words and convert it to a hex sequence.

s = '\U0001f62c \U0001f60e hello'

def pairup(b):

return [(b[i] << 8 | b[i+1]) for i in range(0, len(b), 2)]

def utf16(c):

e = c.encode('utf_16_be')

return ''.join(chr(x) for x in pairup(e))

u = ''.join(utf16(c) for c in s)

print(repr(u))

print(u[0] == '\ud83d' and u[1] == '\ude2c')

print(len(u))

'\ud83d\ude2c \ud83d\ude0e hello'

True

11

I thought this was going to be a no-brainer, but it turned out to be trickier than I expected. Especially since I didn't understand the problem properly the first time through.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值