Emotion Analysis中往往会使用词典的方法获取句子的情绪
这里使用Python和NRC词典获取句子的8种情绪总量
NRC词典下载链接:NRC Word-Emotion Association Lexicon
# load libraries for emotion analysis
import nltk
from nltk.stem.snowball import SnowballStemmer
from tqdm import tqdm_notebook as tqdm
from tqdm import trange
from nltk import word_tokenize
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.text import Text
from nltk.corpus import brown
nltk.download('punkt')
# Use NRC Lexicon obtain 8 types of emotions
def text_emotion(df, column):
'''
INPUT: DataFrame, string
OUTPUT: the original DataFrame with ten new columns for each emotion
'''
new_df = df.copy()
xlsx = pd.read_excel('./NRC-Emotion-Lexicon-v0.92-In105Languages-Nov2017Translations.xlsx')
emolex_df = xlsx[['English', 'Positive','Negative','Anger', 'Anticipation', 'Disgust', 'Fear','Joy',
'Sadness', 'Surprise', 'Trust']]
emotions = emolex_df.columns.drop('English')
emo_df = pd.DataFrame(0, index=df.index, columns=emotions)
stemmer = SnowballStemmer("english")
with tqdm(total=len(list(new_df.iterrows()))) as pbar:
for i, row in new_df.iterrows():
pbar.update(1)
document = word_tokenize(new_df.loc[i][column])
for English in document:
English = stemmer.stem(English.lower())
emo_score = emolex_df[emolex_df.English == English]
if not emo_score.empty:
for emotion in list(emotions):
emo_df.at[i, emotion] += emo_score[emotion]
new_df = pd.concat([new_df, emo_df], axis=1)
return new_df
df_sentiments = text_emotion(df, 'column_name')
PS: 这里给出的是获取英文句子的情绪,如果要换成其他语言的直接把代码中的"English"换成你需要的语言
参考链接:Silmarillion-NLP