NRC Lexicon获取8种情绪

最新推荐文章于 2024-12-10 18:28:51 发布

原创最新推荐文章于 2024-12-10 18:28:51 发布

· 2.8k 阅读

9 ·

版权

文章标签：

#自然语言处理 #python #nlp

Python 专栏收录该内容

21 篇文章

订阅专栏

Emotion Analysis中往往会使用词典的方法获取句子的情绪

这里使用Python和NRC词典获取句子的8种情绪总量

NRC词典下载链接：NRC Word-Emotion Association Lexicon

# load libraries for emotion analysis
import nltk
from nltk.stem.snowball import SnowballStemmer
from tqdm import tqdm_notebook as tqdm
from tqdm import trange
from nltk import word_tokenize
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.text import Text
from nltk.corpus import brown
nltk.download('punkt')

# Use NRC Lexicon obtain 8 types of emotions
def text_emotion(df, column):
    '''
    INPUT: DataFrame, string
    OUTPUT: the original DataFrame with ten new columns for each emotion
    '''

    new_df = df.copy()

    xlsx = pd.read_excel('./NRC-Emotion-Lexicon-v0.92-In105Languages-Nov2017Translations.xlsx')
    emolex_df = xlsx[['English', 'Positive','Negative','Anger', 'Anticipation', 'Disgust', 'Fear','Joy',
                      'Sadness', 'Surprise', 'Trust']]
    emotions = emolex_df.columns.drop('English')
    emo_df = pd.DataFrame(0, index=df.index, columns=emotions)

    stemmer = SnowballStemmer("english")

    
    with tqdm(total=len(list(new_df.iterrows()))) as pbar:
        for i, row in new_df.iterrows():
            pbar.update(1)
            document = word_tokenize(new_df.loc[i][column])
            for English in document:
                English = stemmer.stem(English.lower())
                emo_score = emolex_df[emolex_df.English == English]
                if not emo_score.empty:
                    for emotion in list(emotions):
                        emo_df.at[i, emotion] += emo_score[emotion]

    new_df = pd.concat([new_df, emo_df], axis=1)

    return new_df


df_sentiments = text_emotion(df, 'column_name')