twitter 数据集处理_Twitter数据清理和数据科学预处理

twitter 数据集处理

In the past decade, new forms of communication, such as microblogging and text messaging have emerged and become ubiquitous. While there is no limit to the range of information conveyed by tweets and texts, often these short messages are used to share opinions and sentiments that people have about what is going on in the world around them.

过去的十年中,诸如微博和文本消息之类的新通信形式已经出现并无处不在。 尽管对推文和文本传达的信息范围没有限制,但这些短消息通常用于分享人们对周围世界正在发生的事情的看法和观点。

Opinion mining (known as sentiment analysis or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.

观点挖掘(称为情感分析或情感AI)是指使用自然语言处理,文本分析,计算语言学和生物识别技术来系统地识别,提取,量化和研究情感状态和主观信息。 情绪分析广泛应用于客户材料的声音,例如评论和调查响应,在线和社交媒体以及医疗保健材料,其应用范围从营销到客户服务再到临床医学。

Both Lexion and Machine learning-based approach will be used to for Emoticons based sentiment analysis. Firstly we stand up with the Machine Learning based clustering. In MachineLearning based approach we are used Supervised and Unsupervised learning methods. The twitter data are collected and given as input in the system. The system classifies each tweets data as Positive, Negative and Neutral and also produce the positive, negative and neutral no of tweets of each emoticon separately in the output. Besides being the polarity of each tweet is also determined on the basis of polarity.

Lexion和基于机器学习的方法都将用于基于表情的情绪分析。 首先,我们支持基于机器学习的集群。 在基于MachineLearning的方法中,我们使用了有监督和无监督的学习方法。 收集twitter数据并作为系统中的输入给出。 系统将每个推文数据分类为“正”,“负”和“中性”,并且还分别在输出中生成每个表情符号的正,负和中性no。 除了作为每个推文的极性之外,还基于极性来确定。

Collection of Data

资料收集

To collecting the twitter data, we have to do some data mining process. In that process, we have created our own applicating with help of twitter API. With the help of twitter API, we have collected a large no of the dataset . From this, we have to create a developer account and register our app. Here we received a consumer key and a consumer secret: these are used in application settings and from the configuration page of the app we also require an access token and an access token secrets which provide the application access to Twitter on behalf of the account. The process is divided into two sub-process. This is discussed in the next subsection.

要收集Twitter数据,我们必须执行一些数据挖掘过程。 在此过程中,我们借助twitter API创建了自己的应用程序。 借助twitter API,我们已收集了大量数据集。 由此,我们必须创建一个开发人员帐户并注册我们的应用程序。 在这里,我们收到了一个用户密钥和一个消费者密钥:这些密钥用于应用程序设置中,并且在应用程序的配置页面中,我们还需要访问令牌和访问令牌密钥,以代表帐户向Twitter提供应用程序访问权限。 该过程分为两个子过程。 下一部分将对此进行讨论。

  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值