Python 解析字符串中的Emoji
最近发现了一个可以批量解析字符串中emoji的包,分享给大家。
1. advertools 解析Emoji简单教程
1.1 安装
pip install advertools
1.2 文档链接
1.3 Emoji 搜索
advertools支持根据关键词搜索相关emoji
>>> import advertools as adv
>>> adv.emoji_search('dog')
codepoint status emoji name group sub_group
0 1F436 fully-qualified 🐶 dog face Animals & Nature animal-mammal
1 1F415 fully-qualified 🐕 dog Animals & Nature animal-mammal
2 1F9AE fully-qualified 🦮 guide dog Animals & Nature animal-mammal
3 1F415 200D 1F9BA fully-qualified 🐕🦺 service dog Animals & Nature animal-mammal
4 1F32D fully-qualified 🌭 hot dog Food & Drink food-prepared
1.4 Emoji 提取分析
>>> import advertools as adv
# 测试用例
>>> posts = ['I am grinning 😀','A grinning cat 😺',
'hello! 😀😀😀 💛💛', 'Just text']
# extract_emoji 方法返回值为包含字符串列表所含emoji信息的字典
>>> emoji_summary = adv.extract_emoji(posts)
>>> emoji_summary.keys()
dict_keys(['emoji', 'emoji_text', 'emoji_flat', 'emoji_flat_text',
'emoji_counts', 'emoji_freq', 'top_emoji', 'top_emoji_text',
'top_emoji_groups', 'top_emoji_sub_groups', 'overview'])
# emoji 中为每个字符串对应包含的Emoji列表
>>> emoji_summary["emoji"]
[['😀'], ['😺'], ['😀', '😀', '😀', '💛', '💛'], []]
# emoji_text 中为每个字符串对应包含的Emoji(文字形式)列表
>>> emoji_summary["emoji_text"]
[['grinning face'], ['grinning cat'], ['grinning face', 'grinning face', 'grinning face', 'yellow heart', 'yellow heart'], []]
# emoji_flat 中为所有字符串中所含Emoji的列表
>>> emoji_summary['emoji_flat']
['😀', '😺', '😀', '😀', '😀', '💛', '💛']
# 同上,不过是文字形式
>>> emoji_summary['emoji_flat_text']
['grinning face', 'grinning cat', 'grinning face', 'grinning face', 'grinning face', 'yellow heart', 'yellow heart']
# emoji_counts 中为每个字符串所含Emoji个数
>>> emoji_summary['emoji_counts']
[1, 1, 5, 0]
# emoji_freq 中为字符串所含Emoji个数的统计(一个字符串不含Emoji,两个字符串含一个Emoji ....)
>>> emoji_summary['emoji_freq']
[(0, 1), (1, 2), (5, 1)]
# top_emoji 中为列表中出现的Emoji和对应的数量(降序排列)
>>> emoji_summary['top_emoji']
[('😀', 4), ('💛', 2), ('😺', 1)]
# 同上,不过是文字形式
>>> emoji_summary['top_emoji_text']
[('grinning face', 4), ('yellow heart', 2), ('grinning cat', 1)]
# top_emoji_groups 中为列表中出现的Emoji类和对应的数量(降序排列)
>>> emoji_summary['top_emoji_groups']
[('Smileys & Emotion', 7)]
# top_emoji_sub_groups 中为列表中出现的Emoji子类和对应的数量(降序排列)
>>> emoji_summary['top_emoji_sub_groups']
[('face-smiling', 4), ('emotion', 2), ('cat-face', 1)]
# overview 中为各项总体统计数据
>>> emoji_summary['overview']
{'num_posts': 4, 'num_emoji': 7, 'emoji_per_post': 1.75, 'unique_emoji': 3}
2. 利用以上功能对含有Emoji的字符串特征进行特征处理及特征衍生
2.1 创建用例dataframe
>>> import pandas as pd
>>> import advertools as adv
>>> posts = ['I am grinning 😀','A grinning cat 😺',
'hello! 😀😀😀 💛💛', 'Just text']
>>> df = pd.DataFrame({"texts":posts})
>>> df
texts
0 I am grinning 😀
1 A grinning cat 😺
2 hello! 😀😀😀 💛💛
3 Just text
2.2 创建特征 have_emoji,emoji_count
>>> emoji_summary = adv.extract_emoji(df["texts"].apply(str).tolist())
>>> df["emoji_count"] = emoji_summary["emoji_counts"]
>>> df["have_emoji"] = df["emoji_count"] != 0
>>> df
texts emoji_count have_emoji
0 I am grinning 😀 1 True
1 A grinning cat 😺 1 True
2 hello! 😀😀😀 💛💛 5 True
3 Just text 0 False
2.3 创建特征 emoji_count_with_freq
emoji_count_with_freq 为每个Emoji的个数乘以各自frequency的和
>>> emoji_dct = dict(emoji_summary["top_emoji_text"])
>>> def get_freq(x):
... count = 0
... for i in x:
... count += emoji_dct[i]
... return count
...
>>> df["emoji_text"] = emoji_summary["emoji_text"]
>>> df["emoji_counts_with_freq"] = df["emoji_text"].apply(get_freq)
>>> df.drop(columns=["emoji_text"])
texts emoji_count have_emoji emoji_counts_with_freq
0 I am grinning 😀 1 True 4
1 A grinning cat 😺 1 True 1
2 hello! 😀😀😀 💛💛 5 True 16
3 Just text 0 False 0
2.4 创建特征 texts_with_emoji
>>> def get_emoji_test(x):
... text = x["texts"]
... emojis = x["emoji_text"]
... for i in emojis:
... text += " "
... text += i
... return text
>>> df["texts_with_emoji"] = df.apply(get_emoji_test, axis=1)
>>> df.drop(columns=["emoji_text"],inplace=True)
>>> df
texts emoji_count have_emoji emoji_counts_with_freq texts_with_emoji
0 I am grinning 😀 1 True 4 I am grinning 😀 grinning face
1 A grinning cat 😺 1 True 1 A grinning cat 😺 grinning cat
2 hello! 😀😀😀 💛💛 5 True 16 hello! 😀😀😀 💛💛 grinning face grinning face grin...
3 Just text 0 False 0 Just text