python 语法分析器_如何使用Python和Google的自然语言API制作自己的情感分析器

最新推荐文章于 2021-08-25 21:48:50 发布

cumian9828

最新推荐文章于 2021-08-25 21:48:50 发布

阅读量778

点赞数

文章标签： python java 人工智能大数据数据分析

原文链接：https://www.freecodecamp.org/news/how-to-make-your-own-sentiment-analyzer-using-python-and-googles-natural-language-api-9e91e1c493e/

版权

python 语法分析器

Imagine you are a product owner who wants to know what people are saying about your product in social media. Maybe your company launched a new product and you want to know how people reacted to it. You might want to use a sentiment analyzer like MonkeyLearn or Talkwalker. But wouldn’t it be cool if we could make our own sentiment analyzer? Let’s make it then!

假设您是一位产品所有者，想知道人们在社交媒体上对您的产品怎么说。也许您的公司推出了新产品，您想知道人们对此有何React。您可能需要使用MonkeyMonitor或Talkwalker之类的情绪分析器。但是，如果我们可以制造自己的情感分析器，那岂不是很酷吗？那我们就做吧！

In this tutorial, we are going to make a Telegram Bot that will do the sentiment analysis of tweets related to the keyword that we define.

在本教程中，我们将制作一个Telegram Bot，它将对与我们定义的关键字相关的推文进行情感分析。

If this is your first time building a Telegram Bot, you might want to read this post first.

如果这是您第一次构建Telegram Bot，则可能需要先阅读这篇文章。

入门 (Getting started)

1.安装库 (1. Install the libraries)

We are going to use tweepy to gather the tweet data. We will use nltk to help us clean the tweets. Google Natural Language API will do the sentiment analysis. python-telegram-bot will send the result through Telegram chat.

我们将使用tweepy收集推文数据。我们将使用nltk帮助我们清理推文。 Google自然语言API会进行情感分析。 python-telegram-bot将通过电报聊天发送结果。

pip3 install tweepy nltk google-cloud-language python-telegram-bot

2.获取Twitter API密钥 (2. Get Twitter API Keys)

To be able to gather the tweets from Twitter, we need to create a developer account to get the Twitter API Keys first.

为了能够从Twitter收集推文，我们需要创建一个开发人员帐户以首先获取Twitter API密钥。

Go to Twitter Developer website, and create an account if you don’t have one.

转到Twitter开发者网站，并创建一个帐户(如果您没有)。

Open Apps page, click “Create an app”, fill out the form and click “Create”.

打开应用程序页面，点击“创建一个应用程序”，填写表格，然后点击“创建”。

Click on “Keys and tokens” tab, copy the API Key and API Secret Key in the “Consumer API keys” section.

单击“密钥和令牌”选项卡，在“消费者API密钥”部分中复制API密钥和API秘密密钥。

Click the “Create” button under “Access token & access token secret” section. Copy the Access Token and Access Token Secret that have been generated.

单击“访问令牌和访问令牌密钥”部分下的“创建”按钮。复制已生成的访问令牌和访问令牌密钥。

Great! Now you should have four keys — API Key, API Secret Key, Access Token, and Access Token Secret. Save those keys for later use.

大！现在，您应该具有四个密钥-API密钥，API秘密密钥，访问令牌和访问令牌秘密。保存这些密钥以备后用。

3.启用Google自然语言API (3. Enable Google Natural Language API)

We need to enable the Google Natural Language API first if we want to use the service.

如果要使用该服务，我们需要先启用Google自然语言API。

Go to Google Developers Console and create a new project (or select the one you have).

转到Google Developers Console并创建一个新项目(或选择您拥有的项目)。

In the project dashboard, click “ENABLE APIS AND SERVICES”, and search for Cloud Natural Language API.

在项目仪表板中，单击“启用API和服务”，然后搜索Cloud Natural Language API。

Click “ENABLE” to enable the API.

单击“启用”以启用API。

4.创建服务帐号密钥 (4. Create service account key)

If we want to use Google Cloud services like Google Natural Language, we need a service account key. This is like our credential to use Google’s services.

如果我们要使用Google自然语言等Google Cloud服务，则需要一个服务帐户密钥。这就像我们使用Google服务的凭据。

Go to Google Developers Console, click “Credentials” tab, choose “Create credentials” and click “Service account key”.

转到Google Developers Console ，单击“凭据”标签，选择“创建凭据”，然后单击“服务帐户密钥”。

Choose “App Engine default service account” and JSON as key type, then click “Create”.

选择“ App Engine默认服务帐户”和JSON作为密钥类型，然后单击“创建”。

There is a .json file that will be automatically downloaded, name it creds.json.

有一个.json文件将被自动下载，命名为creds.json 。

Set the GOOGLE_APPLICATION_CREDENTIALS with the path of our creds.json file in the terminal.

在终端中使用我们creds.json文件的路径设置GOOGLE_APPLICATION_CREDENTIALS 。

export GOOGLE_APPLICATION_CREDENTIALS='[PATH_TO_CREDS.JSON]'

If everything is good, then it’s time to write our program.

如果一切都很好，那么是时候编写我们的程序了。

编写程序 (Write the program)

This program will gather all the tweets containing the defined keyword in the last 24 hours with a maximum of 50 tweets. Then it will analyze the tweets’ sentiments one by one. We will send the result (average sentiment score) through Telegram chat.

该程序将收集最近24小时内包含已定义关键字的所有推文，最多50条推文。然后它将一一分析推文的情绪。我们将通过电报聊天发送结果(平均情感分数)。

This is a simple workflow of our program.

这是我们程序的简单工作流程。

connect to the Twitter API -> search tweets based on the keyword -> clean all of the tweets -> get tweet’s sentiment score -> send the result

连接到Twitter API-＆g t; 烧焦基于关键词CH鸣叫- >清洁所有的鸣叫- >获得评论的sentim ENT 得分 - >发送结果

Let’s make a single function to define each flow.

让我们做一个定义每个流程的函数。

1.连接到Twitter API (1. Connect to the Twitter API)

The first thing that we need to do is gather the tweets’ data, so we have to connect to the Twitter API first.

我们需要做的第一件事是收集推文的数据，因此我们必须首先连接到Twitter API。

Import the tweepy library.

导入tweepy库。

import tweepy

Define the keys that we generated earlier.

定义我们之前生成的密钥。

ACC_TOKEN = 'YOUR_ACCESS_TOKEN'
ACC_SECRET = 'YOUR_ACCESS_TOKEN_SECRET'
CONS_KEY = 'YOUR_CONSUMER_API_KEY'
CONS_SECRET = 'YOUR_CONSUMER_API_SECRET_KEY'

Make a function called authentication to connect to the API, with four parameters which are all of the keys.

创建一个名为authentication的函数以使用四个都是全部密钥的参数连接到API。

def authentication(cons_key, cons_secret, acc_token, acc_secret):
    auth = tweepy.OAuthHandler(cons_key, cons_secret)
    auth.set_access_token(acc_token, acc_secret)
    api = tweepy.API(auth)
    return api

2.搜索推文 (2. Search the tweets)

We can search the tweets with two criteria, based on time or quantity. If it’s based on time, we define the time interval and if it’s based on quantity, we define the total tweets that we want to gather. Since we want to gather the tweets from the last 24 hours with maximum tweets of 50, we will use both of the criteria.

我们可以根据时间或数量用两个条件搜索推文。如果基于时间，则定义时间间隔，如果基于数量，则定义要收集的总推文。由于我们要收集最近24小时内的推文，最大推文为50，因此我们将同时使用这两个条件。

Since we want to gather the tweets from the last 24 hours, let's take yesterday’s date as our time parameter.

由于我们要收集过去24小时的推文，因此我们将昨天的日期作为我们的时间参数。

from datetime import datetime, timedelta

today_datetime = datetime.today().now()
yesterday_datetime = today_datetime - timedelta(days=1)
today_date = today_datetime.strftime('%Y-%m-%d')
yesterday_date = yesterday_datetime.strftime('%Y-%m-%d')

Connect to the Twitter API using a function we defined before.

使用我们之前定义的功能连接到Twitter API。

api = authentication(CONS_KEY,CONS_SECRET,ACC_TOKEN,ACC_SECRET)

Define our search parameters. q is where we define our keyword, since is the start date for our search, result_type='recent' means we are going to take the newest tweets, lang='en' is going to take the English tweets only, and items(total_tweets) is where we define the maximum tweets that we are going to take.

定义我们的搜索参数。 q是定义关键字的地方， since是搜索的开始日期， result_type='recent'意味着我们将采用最新的推文， lang='en'将仅采用英语的推文，而items(total_tweets)是我们定义将要使用的最大tweet的地方。

search_result = tweepy.Cursor(api.search, 
                              q=keyword, 
                              since=yesterday_date,
                              result_type='recent', 
                              lang='en').items(total_tweets)

Wrap those codes in a function called search_tweets with keyword and total_tweets as the parameters.

将这些代码包装在名为search_tweets的函数中，并将keyword和total_tweets作为参数。

def search_tweets(keyword, total_tweets):
    today_datetime = datetime.today().now()
    yesterday_datetime = today_datetime - timedelta(days=1)
    today_date = today_datetime.strftime('%Y-%m-%d')
    yesterday_date = yesterday_datetime.strftime('%Y-%m-%d')
    api = authentication(CONS_KEY,CONS_SECRET,ACC_TOKEN,ACC_SECRET)
    search_result = tweepy.Cursor(api.search, 
                                  q=keyword, 
                                  since=yesterday_date, 
                                  result_type='recent', 
                                  lang='en').items(total_tweets)
    return search_result

3.清洁推文 (3. Clean the tweets)

Before we analyze the tweets sentiment, we need to clean the tweets a little bit so the Google Natural Language API can identify them better.

在分析推文情感之前，我们需要稍微整理一下推文，以便Google Natural Language API可以更好地识别它们。

We will use the nltk and regex libraries to help us in this process.

我们将使用nltk和regex库在此过程中为我们提供帮助。

import re
from nltk.tokenize import WordPunctTokenizer

We remove the username in every tweet, so basically we can remove everything that begins with @ and we use regex to do it.

我们会在每条推文中删除用户名，因此基本上我们可以删除以@开头的所有内容，并使用regex进行操作。

user_removed = re.sub(r'@[A-Za-z0-9]+','',tweet.decode('utf-8'))

We also remove links in every tweet.

我们还将删除每条推文中的链接。

link_removed = re.sub('https?://[A-Za-z0-9./]+','',user_removed)

Numbers are also deleted from all of the tweets.

数字也会从所有推文中删除。

number_removed = re.sub('[^a-zA-Z]',' ',link_removed)

The last, convert all of the characters into lower space, then remove every unnecessary space.

最后，将所有字符转换为较低的空格，然后删除所有不必要的空格。

lower_case_tweet = number_removed.lower()
tok = WordPunctTokenizer()
words = tok.tokenize(lower_case_tweet)
clean_tweet = (' '.join(words)).strip()

Wrap those codes into a function called clean_tweets with tweet as our parameter.

将这些代码包装到一个名为clean_tweets的函数中， clean_tweets以tweet作为参数。

def clean_tweets(tweet):
    user_removed = re.sub(r'@[A-Za-z0-9]+','',tweet.decode('utf-8'))
    link_removed = re.sub('https?://[A-Za-z0-9./]+','',user_removed)
    number_removed = re.sub('[^a-zA-Z]', ' ', link_removed)
    lower_case_tweet= number_removed.lower()
    tok = WordPunctTokenizer()
    words = tok.tokenize(lower_case_tweet)
    clean_tweet = (' '.join(words)).strip()
    return clean_tweet

4.了解推文的情绪 (4. Get tweet’s sentiment)

To be able to get a tweet’s sentiment, we will use Google Natural Language API.

为了获得推文的情绪，我们将使用Google Natural Language API。

The API provides Sentiment Analysis, Entities Analysis, and Syntax Analysis. We will only use the Sentiment Analysis for this tutorial.

该API提供了情感分析，实体分析和语法分析。在本教程中，我们将仅使用情感分析。

In Google’s Sentiment Analysis, there are score and magnitude. Score is the score of the sentiment ranges from -1.0 (very negative) to 1.0 (very positive). Magnitude is the strength of sentiment and ranges from 0 to infinity.

在Google的情绪分析中，有score和magnitude 。 Score是情感分数，范围从-1.0(非常负面)到1.0(非常正面)。 Magnitude是情感的强度，范围从0到无穷大。

For the sake of simplicity of this tutorial, we will only consider the score. If you are thinking of doing deep NLP analysis, you should consider the magnitude too.

为了简化本教程，我们仅考虑score 。如果您打算进行深入的NLP分析，则也应考虑magnitude 。

Import the Google Natural Language library.

导入Google自然语言库。

from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types

Make a function called get_sentiment_score which takes tweet as the parameter, and returns the sentiment score.

制作一个名为get_sentiment_score的函数，该函数将tweet作为参数，并返回sentiment分数。

def get_sentiment_score(tweet):
    client = language.LanguageServiceClient()
    document = types\
               .Document(content=tweet,
                         type=enums.Document.Type.PLAIN_TEXT)
    sentiment_score = client\
                      .analyze_sentiment(document=document)\
                      .document_sentiment\
                      .score
    return sentiment_score

5.分析推文 (5. Analyze the tweets)

Let’s make a function that will loop the list of tweets we get from search_tweets function and get the sentiment’s score of every tweet using get_sentiment_score function. Then we’ll calculate the average. The average score will determine whether the given keyword has a positive, neutral, or negative sentiment.

让我们做一个函数，该函数将循环从search_tweets函数获得的推文列表，并使用get_sentiment_score函数获取每条推文的情感分数。然后，我们将计算平均值。平均分数将确定给定的关键字是正面，中立还是负面情绪。

Define score equals to 0 , then use search_tweets function to get the tweets related to the keyword that we define.

定义score等于0 ，然后使用search_tweets函数获取与我们定义的关键字相关的推文。

score = 0
tweets = search_tweets(keyword, total_tweets)

Loop through the list of tweets, and do the cleaning using clean_tweets function that we created before.

遍历推文列表，并使用我们之前创建的clean_tweets函数进行清理。

for tweet in tweets:
    cleaned_tweet = clean_tweets(tweet.text.encode('utf-8'))

Get the sentiment score using get_sentiment_score function, and increment the score by adding sentiment_score.

使用get_sentiment_score函数获取情感分数，并通过添加sentiment_score来增加score 。

for tweet in tweets:
    cleaned_tweet = clean_tweets(tweet.text.encode('utf-8'))
    sentiment_score = get_sentiment_score(cleaned_tweet)
    score += sentiment_score

Let’s print out each tweet and its sentiment so we can see the progress detail in the terminal.

让我们打印出每条推文及其情感，以便我们可以在终端中看到进度的详细信息。

for tweet in tweets:
    cleaned_tweet = clean_tweets(tweet.text.encode('utf-8'))
    sentiment_score = get_sentiment_score(cleaned_tweet)
    score += sentiment_score
    print('Tweet: {}'.format(cleaned_tweet))
    print('Score: {}\n'.format(sentiment_score))

Calculate the average score and pass it to final_score variable. Wrap all of the codes into analyze_tweets function, with keyword and total_tweets as the parameters.

计算平均分数并将其传递给final_score变量。将所有代码包装到analyze_tweets函数中，并使用keyword和total_tweets作为参数。

def analyze_tweets(keyword, total_tweets):
    score = 0
    tweets = search_tweets(keyword, total_tweets)
    for tweet in tweets:
        cleaned_tweet = clean_tweets(tweet.text.encode('utf-8'))
        sentiment_score = get_sentiment_score(cleaned_tweet)
        score += sentiment_score
        print('Tweet: {}'.format(cleaned_tweet))
        print('Score: {}\n'.format(sentiment_score))
    final_score = round((score / float(total_tweets)),2)
    return final_score

6.发送推文的情绪分数 (6. Send the tweet’s sentiment score)

Let’s make the last function in the workflow. This function will takes user’s keyword and calculate the average sentiment’s score. Then we’ll send it through Telegram Bot.

让我们在工作流程中做最后一个功能。该功能将使用用户的关键字并计算平均情感分数。然后，我们将其通过Telegram Bot发送。

Get the keyword from the user.

从用户那里获取关键字。

keyword = update.message.text

Use analyze_tweets function to get the final score, keyword as our parameter, and set the total_tweets = 50 since we want to gather 50 tweets.

使用analyze_tweets函数获取最终得分， keyword作为我们的参数，并由于要收集50条tweets，将total_tweets = 50设置为50。

final_score = analyze_tweets(keyword, 50)

We define whether a given score is considered negative, neutral, or positive using Google’s score range, as we see in the image below.

如下图所示，我们使用Google的分数范围定义给定的分数是负数，中性还是正数。

if final_score <= -0.25:
    status = 'NEGATIVE ❌'
elif final_score <= 0.25:
    status = 'NEUTRAL ?'
else:
    status = 'POSITIVE ✅'

Lastly, send the final_score and the status through Telegram Bot.

最后，通过Telegram Bot发送final_score和status 。

bot.send_message(chat_id=update.message.chat_id,
                 text='Average score for '
                       + str(keyword) 
                       + ' is ' 
                       + str(final_score) 
                       + ' ' 
                       + status)

Wrap the codes into a function called send_the_result.

将代码包装到一个名为send_the_result的函数中。

def send_the_result(bot, update):
    keyword = update.message.text
    final_score = analyze_tweets(keyword, 50)
    if final_score <= -0.25:
        status = 'NEGATIVE ❌'
    elif final_score <= 0.25:
        status = 'NEUTRAL ?'
    else:
        status = 'POSITIVE ✅'
    bot.send_message(chat_id=update.message.chat_id,
                     text='Average score for '
                           + str(keyword) 
                           + ' is ' 
                           + str(final_score) 
                           + ' ' 
                           + status)

7.主程序 (7. Main program)

Lastly, create another function called main to run our program. Don’t forget to change YOUR_TOKEN to your bot’s token.

最后，创建另一个名为main函数以运行我们的程序。 不要忘记将 YOUR_TOKEN 更改为机器人的令牌。

from telegram.ext import Updater, MessageHandler, Filters

def main():
    updater = Updater('YOUR_TOKEN')
    dp = updater.dispatcher
    dp.add_handler(MessageHandler(Filters.text, send_the_result))
    updater.start_polling()
    updater.idle()
    
if __name__ == '__main__':
    main()

In the end, your code should look like this

最后，您的代码应如下所示

import tweepy
import re

from telegram.ext import Updater, MessageHandler, Filters
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types
from datetime import datetime, timedelta
from nltk.tokenize import WordPunctTokenizer


ACC_TOKEN = 'YOUR_ACCESS_TOKEN'
ACC_SECRET = 'YOUR_ACCESS_TOKEN_SECRET'
CONS_KEY = 'YOUR_CONSUMER_API_KEY'
CONS_SECRET = 'YOUR_CONSUMER_API_SECRET_KEY'

def authentication(cons_key, cons_secret, acc_token, acc_secret):
    auth = tweepy.OAuthHandler(cons_key, cons_secret)
    auth.set_access_token(acc_token, acc_secret)
    api = tweepy.API(auth)
    return api
    
def search_tweets(keyword, total_tweets):
    today_datetime = datetime.today().now()
    yesterday_datetime = today_datetime - timedelta(days=1)
    today_date = today_datetime.strftime('%Y-%m-%d')
    yesterday_date = yesterday_datetime.strftime('%Y-%m-%d')
    api = authentication(CONS_KEY,CONS_SECRET,ACC_TOKEN,ACC_SECRET)
    search_result = tweepy.Cursor(api.search, 
                                  q=keyword, 
                                  since=yesterday_date, 
                                  result_type='recent', 
                                  lang='en').items(total_tweets)
    return search_result

def clean_tweets(tweet):
    user_removed = re.sub(r'@[A-Za-z0-9]+','',tweet.decode('utf-8'))
    link_removed = re.sub('https?://[A-Za-z0-9./]+','',user_removed)
    number_removed = re.sub('[^a-zA-Z]', ' ', link_removed)
    lower_case_tweet= number_removed.lower()
    tok = WordPunctTokenizer()
    words = tok.tokenize(lower_case_tweet)
    clean_tweet = (' '.join(words)).strip()
    return clean_tweet

def get_sentiment_score(tweet):
    client = language.LanguageServiceClient()
    document = types\
               .Document(content=tweet,
                         type=enums.Document.Type.PLAIN_TEXT)
    sentiment_score = client\
                      .analyze_sentiment(document=document)\
                      .document_sentiment\
                      .score
    return sentiment_score

def analyze_tweets(keyword, total_tweets):
    score = 0
    tweets = search_tweets(keyword,total_tweets)
    for tweet in tweets:
        cleaned_tweet = clean_tweets(tweet.text.encode('utf-8'))
        sentiment_score = get_sentiment_score(cleaned_tweet)
        score += sentiment_score
        print('Tweet: {}'.format(cleaned_tweet))
        print('Score: {}\n'.format(sentiment_score))
    final_score = round((score / float(total_tweets)),2)
    return final_score

def send_the_result(bot, update):
    keyword = update.message.text
    final_score = analyze_tweets(keyword, 50)
    if final_score <= -0.25:
        status = 'NEGATIVE ❌'
    elif final_score <= 0.25:
        status = 'NEUTRAL ?'
    else:
        status = 'POSITIVE ✅'
    bot.send_message(chat_id=update.message.chat_id,
                     text='Average score for '
                           + str(keyword) 
                           + ' is ' 
                           + str(final_score) 
                           + ' ' 
                           + status)

def main():
    updater = Updater('YOUR_TOKEN')
    dp = updater.dispatcher
    dp.add_handler(MessageHandler(Filters.text, send_the_result))
    updater.start_polling()
    updater.idle()
    
if __name__ == '__main__':
    main()

Save the file and name it main.py, then run the program.

保存该文件并将其命名为main.py ，然后运行该程序。

python3 main.py

Go to your telegram bot by accessing this URL: https://telegram.me/YOUR_BOT_USERNAME. Type any product, person name, or whatever you want and send it to your bot. If everything runs, there should be a detailed sentiment score for each tweet in the terminal. The bot will reply with the average sentiment score.

通过访问以下URL转到电报机器人： https://telegram.me/YOUR_BOT_USERNAME : https://telegram.me/YOUR_BOT_USERNAME 。输入任何产品，人员名称或任何您想要的名称，然后将其发送到您的机器人。如果一切顺利，则终端中每条推文都应有详细的情感评分。该机器人将以平均情感分数进行回复。

The pictures below are an example if I type valentino rossi and send it to the bot.

下面的图片是我输入valentino rossi并将其发送到机器人的示例。

If you managed to follow the steps until the end of this tutorial, that’s awesome! You have your sentiment analyzer now, how cool is that!?

如果您设法按照这些步骤操作，直到本教程结束，那就太好了！您现在有了情绪分析器，那太酷了！？

You can also check out my GitHub to get the code. Please do not hesitate to connect and leave a message in my Linkedin profile if you want to ask about anything.

您也可以签出我的GitHub以获得代码。如果您有任何疑问，请随时联系并在我的Linkedin个人资料中留言。

Please leave a comment if you think there are any errors in my code or writing.

如果您认为我的代码或写作有任何错误，请发表评论。

Thank you and good luck! :)

谢谢，祝你好运！ :)