emoji表情其实就是四位的unicode
所以我们可以通过unicode来识别emoji表情
<U+1F300> - <U+1F5FF> # symbols & pictographs
<U+1F600> - <U+1F64F> # emoticons
<U+1F680> - <U+1F6FF> # transport & map symbols
<U+2600> - <U+2B55> # other
目标是要匹配文本两个emoji表情中间的文字
例如
😄testtest😄
代码:
readline=['😄testtest😄']
import re
pat=re.compile(u'['u'\U0001F300-\U0001F64F'u'\U0001F680-\U0001F6FF'u'\u2600-\u2B55]'+'(.*?)'+u'['u'\U0001F300-\U0001F64F'u'\U0001F680-\U0001F6FF'u'\u2600-\u2B55]', re.UNICODE)
for line in readline:
print(pat.findall(line))
结果:
testtest