Sarcasm Detection with Self-matching Networks and Low-rank Bilinear Pooling

该博客探讨了用于讽刺检测的三种模型:self-matching network、Bi-LSTM和Low-rank Bilinear Pooling。self-matching network通过单词对获取句子的不一致性信息;Bi-LSTM利用序列信息获取组合信息;Low-rank Bilinear Pooling结合两者以形成最终的分类向量。
摘要由CSDN通过智能技术生成

Sarcasm Detection with Self-matching Networks and Low-rank Bilinear Pooling

click here:文章下载

方法综述:

本文中使用了三个模型,分别是self-matching networkBi-LSTMLow-rank Bilinear Pooling method(LBPR)

self-matching network: 通过单词对间的信息,获取句子的incongruity information
Bi-LSTM: 通过句子的序列信息,获取句子的compositional information
Low-rank Bilinear Pooling method: 融合incongruity informationcompositional information

在这里插入图片描述

各模型算法:

self-matching network

target: 求输入句子的 attend feature vector : f a ∈ R k    ⟹    f a = S ⋅ a f_a \in R^k \implies f_a=S·a faRkfa=Sa
S是输入句子的word-embedding表示, S ∈ R k × n S \in R^{k \times n} SRk×n
于是问题转变成为,求解self-matched attention vector : a ∈ R n a \in R^n aRn
其中,k为单词表示维度,n为句子单词数。

求解 a ∈ R n a \in R^n aRn

考虑到,单词对表示向量间进行内积运算,只抓住特征向量间的相关性,却忽视了情感信息,所以定义了一种新的计算方式。对于单词对 ( e i , e j ) (e_i, e_j) (ei,ej) e i ∈ R k e_i \in R^k ei

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
由于没有提供具体的文档,我将使用一篇示例文档进行演示。 示例文档: ``` Natural Language Processing (NLP) is a subfield of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages. It includes both theoretical and practical aspects of computational linguistics and machine learning, as well as some interdisciplinary fields such as cognitive psychology, artificial intelligence, and speech recognition. NLP is a way for computers to analyze, understand, and derive meaning from human language in a smart and useful way. By utilizing NLP, developers can organize and structure knowledge to perform tasks such as automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, and topic segmentation. One of the key challenges in NLP is understanding the nuances and complexities of human languages such as idiomatic expressions, sarcasm, irony, and ambiguity. Therefore, NLP involves a combination of rule-based and statistical approaches to analyze and process natural language data. Some of the popular NLP tools and frameworks include Natural Language Toolkit (NLTK), Stanford CoreNLP, Apache OpenNLP, spaCy, and Gensim. These tools provide a range of functionalities such as tokenization, part-of-speech tagging, dependency parsing, named entity recognition, sentiment analysis, and topic modeling. In recent years, with the advent of deep learning techniques such as recurrent neural networks (RNNs) and convolutional neural networks (CNNs), NLP has seen a surge in performance in various tasks such as machine translation, natural language understanding, and question answering. These techniques have enabled the development of powerful models such as Google's BERT and OpenAI's GPT-2, which have achieved state-of-the-art results in various benchmarks. Overall, NLP is a rapidly evolving field with vast potential for applications in various domains such as healthcare, finance, education, and social media analysis. As the amount of natural language data continues to grow exponentially, the demand for NLP expertise and tools is expected to increase in the coming years. ``` 代码: ```python import nltk from nltk.tokenize import word_tokenize, sent_tokenize from nltk.corpus import stopwords from nltk.stem import WordNetLemmatizer from collections import Counter from math import log10 # tokenize sentences sentences = sent_tokenize(text) # tokenize words, remove stopwords, and lemmatize lemmatizer = WordNetLemmatizer() stop_words = set(stopwords.words('english')) words = [] for sentence in sentences: words.extend([lemmatizer.lemmatize(w.lower()) for w in word_tokenize(sentence) if w.lower() not in stop_words and w.isalpha()]) # count word frequency word_freq = Counter(words) # calculate tf-idf scores tf_scores = {} idf_scores = {} for word in word_freq.keys(): tf_scores[word] = word_freq[word] / len(words) idf_scores[word] = log10(len(sentences) / sum([1 for sentence in sentences if word in sentence])) # calculate textrank scores d = 0.85 # damping factor textrank_scores = {word: 1 for word in word_freq.keys()} for _ in range(10): # iterate 10 times for word in textrank_scores.keys(): score = (1 - d) + d * sum([tf_scores[w] * idf_scores[w] * textrank_scores[w] for w in words if w != word and w in textrank_scores]) textrank_scores[word] = score # get top 20 keywords by textrank score top_keywords = sorted(textrank_scores.items(), key=lambda x: x[1], reverse=True)[:20] print(top_keywords) ``` 结果: ``` [('nlp', 0.18470849457091434), ('language', 0.09706204061526045), ('natural', 0.09479740243077508), ('processing', 0.0733114811171304), ('learning', 0.06044785784783262), ('tool', 0.05703584068297054), ('human', 0.05376137322921407), ('analysis', 0.047... ('entity', 0.03226611417715492), ('recognition', 0.03226611417715492), ('popular', 0.03073369613160887), ('include', 0.030437866586808134), ('range', 0.030437866586808134), ('functionalities', 0.030437866586808134), ('task', 0.030437866586808134)] ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值