twitter 情感分析_使用ulmfit进行Twitter情感分析

这篇博客介绍了如何利用UlmFit库进行Twitter情感分析,详细翻译自Medium上的一篇文章。
摘要由CSDN通过智能技术生成

twitter 情感分析

In today’s modern world, where we are suffering from overloaded data, companies often are gathering tonnes of data regarding customer feedback, shopping behavior, etc. Companies can dexterously change their digital profile, products, or services to best suit the new marketplace and customers by analyzing this data. However, it is still difficult for any human to interpret it manually without any mistake or bias.

在当今现代世界中,我们正遭受数据过载的困扰,公司经常收集有关客户反馈,购物行为等方面的大量数据。公司可以通过以下方式灵活地更改其数字资料,产品或服务,以最适合新市场和客户:分析这些数据。 但是,对于任何人来说,在没有任何错误或偏见的情况下,仍然很难手动解释它。

Sentiment Analysis is a method to evaluate if a piece of writing or text is positive, negative, or neutral. Sentiment analysis lets data analysts around multinational corporations gauge public sentiment and product perception, and consider consumer perception.

情感分析是一种评估一段文字或文字是肯定的,否定的还是中立的方法。 情绪分析使跨国公司的数据分析人员能够评估公众情绪和产品感知,并考虑消费者的感知。

Today, Deep Learning and Natural Language Processing (NLP) play a significant role in Sentiment Analysis. This blog focuses on applying sentiment analysis on twitter data scraped from every major U.S. airline to classify the tweets. Our goal will be to classify customer tweets into three categories positive, negative, and neutral. There are several Machine Learning algorithm for classification such example is K- Nearest Neighbor as explained in one of my blog Telecom Industry Customer Churn Prediction with K Nearest Neighbor. However, in this case we will use NLP since our data is unstructured (raw text). At the end of this blog, we will successfully build and train a State-of-The-Art (SoTA) Machine learning model to classify tweets based on sentiments.

如今,深度学习和自然语言处理(NLP)在情感分析中起着重要作用。 该博客着重于对从美国各主要航空公司收集的推特数据进行情感分析,以对推文进行分类。 我们的目标是将客户推文分为积极,消极和中立三个类别。 有几种用于分类的机器学习算法,例如我的博客《电信行业客户使用K最近邻的客户流失预测》中所述的例子是K最近邻。 但是,在这种情况下,我们将使用NLP,因为我们的数据是非结构化的(原始文本)。 在本博客的结尾,我们将成功构建和训练一种最新的(SOTA)机器学习模型,以根据情感对推文进行分类。

This dataset is available on Kaggle: https://www.kaggle.com/crowdflower/twitter-airline-sentiment.

该数据集可在Kaggle上找到: https ://www.kaggle.com/crowdflower/twitter-airline-sentiment

Image for post
Datacamp.com Datacamp.com

We will apply a supervised ULMFiT model to the Twitter data of major U.S. airlines. We will follow the ULMFiT approach of Howard and Ruder presented in the paper Universal Language Model Fine-tuning for Text Classification. ULMFiT stands for Universal Language Model Fine-tuning. It is an efficient Transfer Learning approach that can be extended to any NLP function to implement language model fine-tuning techniques.

我们将对美国主要航空公司的Twitter数据应用受监管的ULMFiT模型。 我们将遵循Howard and Ruder的ULMFiT方法,该方法在针对文本分类的通用语言模型微调中提出。 ULMFiT表示通用语言模型微调。 它是一种有效的转移学习方法,可以扩展到任何NLP功能,以实现语言模型微调技术。

Image for post
Pennylane Pennylane

We will follow a step by step procedure to build a ULMFiT model, starting from Data Exploration then Text Preprocessing followed by building Language Model and then at the end building Classifier Model. Finally we will predict the accuracy of out ULMFiT model.

我们将按照逐步的过程来构建ULMFiT模型,首先是数据探索,然后是文本预处理,然后是构建语言模型,最后是构建分类器模型。 最后,我们将预测出超ULMFiT模型的准确性。

The complete Jupyter notebook for this can be found here: Twitter-Sentiment-Analysis-using-ULMFiT. So let’s begin.

完整的Jupyter笔记本可以在这里找到:

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值