python查找标签数目_使用Python查找相关的热门标签-第1部分

python查找标签数目

In the age of social media, it is all about getting more audience. One of the may ways of doing so is using hashtags. If you know what the top hashtags are, you can include them in your post and potentially get discovered by more audience who are looking at posts associated with those hashtags.

在社交媒体时代,一切都与吸引更多受众有关。 这样做的可能方法之一是使用标签。 如果您知道最重要的标签是什么,则可以将其包含在帖子中,并有可能被正在查看与这些标签相关的帖子的更多受众发现。

The source code can be found here.

源代码可以在这里找到

The outline:

大纲:

Here, we will write code that takes some text input from the user, converts it to a hashtag and then retrieves top tweets with that hashtag from twitter. The code will then scrape the retrieved tweets for other hashtag and finally return a list of all the hashtags for the user to review/use.

在这里,我们将编写代码,从用户那里获取一些文本输入,将其转换为主题标签,然后从twitter检索具有该主题标签的热门推文。 然后,代码将对其他主题标签刮取检索到的tweet,并最终返回所有主题标签的列表,以供用户查看/使用。

Getting user input

获取用户输入

We will ask the user for input and save it to a variable “tag” after converting it to string.

我们将要求用户输入并将其转换为字符串后将其保存到变量“ tag”。

tag =str(input(“Please enter your hashtag/text: “))

Cleaning user input

清洁用户输入

Hashtags are lower case with no space. Therefore, you have to consider the possibility that the user may provide text that may not be all lower case and may also contain spaces. We will write a function that takes the text, converts it to lower case and removes all spaces. It will also remove “#” from the beginning of the tag in case the user gave a hashtag as input.

标签是小写字母,没有空格。 因此,您必须考虑用户提供的文本可能不全是小写并且也可能包含空格的可能性。 我们将编写一个函数,该函数接收文本,将其转换为小写并删除所有空格。 如果用户输入了井号标签,它还将从标签的开头删除“#”。

def clean_input(tag):
tag =tag.replace(“ “,””)
if tag.startswith(‘#’):
return tag[1:].lower()
else:
return tag.lower()

Extracting hashtags from tweets

从推文中提取主题标签

Before we talk about retrieving the tweets, let’s write a function that extracts hashtags from tweets. Here is the code:

在讨论检索推文之前,让我们编写一个从推文中提取主题标签的函数。 这是代码:

def return_all_hashtags(tweets, tag):
all_hashtags=[]
for tweet in tweets:
for word in tweet.split():
if word.startswith(‘#’) and word.lower() != '#"+tag.lower():
all_hashtags.append(word.lower())
return all_hashtags

The above functions takes a list of tweets and the user’s hashtag. It loops through the list of tweets, split each tweet into words, then loops through all the words. It then checks if the word starts with a hash (#), if it does, it make’s sure it is not the same as the user’s input tag and if it is not, it will add it to the list of all hashtags. Eventually it will return the list of all hashtags.

上面的函数获取推文列表和用户的主题标签。 它遍历推文列表,将每个推文拆分为单词,然后遍历所有单词。 然后,它检查单词是否以井号(#)开头,如果是,则确保它与用户的输入标签不同,如果不是,则将其添加到所有井号列表中。 最终它将返回所有标签的列表。

Retrieving tweets

检索推文

We are going to use a very robust python library “tweepy” for retrieving tweets. In order to access twitter API. You will need to register as a developer and get your consumer key, consumer secret, access token and access token secret. Please refer to tweepy authentication and twitter API documentations for how to get these access codes.

我们将使用一个非常强大的python库“ tweepy”来检索推文。 为了访问twitter API。 您将需要注册为开发人员,并获取您的消费者密钥,消费者秘密,访问令牌和访问令牌秘密。 请参阅tweepy身份验证twitter API文档,以了解如何获取这些访问代码。

Authenticating and setting up your access codes:

验证和设置您的访问代码:

consumer_key= [Your consumer key]
consumer_secret= [Your consumer secret]
access_token= [Your access token]
access_token_secret= [Your access token secret]
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)

Now let’s put it all together in the function that takes the tag as input and then using our previous functions, cleans it, retrieves relevant tweets, extract hashtags from them and then return a sorted list of the hashtags along with their frequencies.

现在,将其放到以标记作为输入的函数中,然后使用我们之前的函数进行清理,清理,检索相关的tweet,从中提取主题标签,然后返回主题标签的排序列表及其频率。

import tweepy as tw
def get_hashtags(tag):
search_tag=clean_input(tag)
tweets = tw.Cursor(api.search,
q=’#’+search_tag,
lang=”en”).items(200)
tweets_list=[]
for tweet in tweets:
tweets_list.append(tweet.text)
all_tags= return_all_hashtags(tweets_list, search_tag)
frequency={}
for item in set(all_tags):
frequency[item]=all_tags.count(item)
return {k: v for k, v in sorted(frequency.items(),
key=lambda item: item[1], reverse= True)}

Getting it all together

汇集全部

Finally, we will put it all together and print out the hashtags with their counts.

最后,我们将所有内容放在一起并打印出带有其计数的主题标签。

all_tags = get_hashtags(tag)
for item in all_tags:
print(item, all_tags[item])

Next

下一个

Why limit yourself to twitter? In part 2, we will add the code for scraping Instagram posts for hashtags.

为什么限制自己使用Twitter? 在第2部分中 ,我们将添加用于将Instagram帖子抓取为#标签的代码。

翻译自: https://medium.com/swlh/find-relevant-top-hashtags-using-python-f3c46844c630

python查找标签数目

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值