I need to get the octal escape sequence for UTF-8 characters in Python and was wondering whether there's any simpler way of doing what I want to do, e.g. something in the standard library that I overlooked. I have a makeshift string manipulation function but I'm hoping there is a better solution.
I want to get from (e.g.): 𐅥
To: \360\220\205\245
Right now I'm doing this:
char = '\U00010165' # this is how Python hands it over to me
char = str(char.encode())
# char = "b'\xf0\x90\x85\xa5'"
arr = char[4:-1].split(“\\x”)
# arr = ['f0', '90', '85', 'a5']
char = ''
for i in arr:
char += '\\' + str(oct(int(i,16)))
# char = \0o360\0o220\0o205\0o245
char = char.replace("0o", "")
Any suggestions?
解决方案
Use format(i, '03o') to format to octal numbers without leading 0o indicator, or str.format() to include the literal backslash too:
>>> format(16, '03o')
'020'
>>> '\\{:03o}'.format(16)
'\\020'
and just loop over the encoded bytes value; each character is yielded as an integer:
char = ''.join(['\\{:03o}'.format(c) for c in char.encode('utf8')])
Demo:
>>> char = '\U00010165'
>>> ''.join(['\\{:03o}'.format(c) for c in char.encode('utf8')])
'\\360\\220\\205\\245'
>>> print(''.join(['\\{:03o}'.format(c) for c in char.encode('utf8')]))
\360\220\205\245