大数据分析标签库_标签分析

大数据分析标签库

This article is following the steps of the analysis started here.

本文正在按照从此处开始的分析步骤进行操作

We are going to have a look at the tags used in our 60,000 questions from StackOverflow with Quality Rating. It should give us a better understanding of the situation and, with a bit of work, we might already be able to spot some trends.

我们将看看StackOverflow的“ 60000个带有质量评级”问题中使用的标签。 它应该使我们对情况有了更好的了解,并且通过一些工作,我们也许已经能够发现一些趋势。

介绍 (Introduction)

In this article, we want to do a few things using the Tags field. We want to have a look at what the bulk of the questions are about but we also want to see if there are some common combinations. All this will eventually be confronted to the quality of the post to try and identify trends.

在本文中,我们想使用“ Tags字段做一些事情。 我们想看看大部分问题是关于什么,但我们也想看看是否有一些常见的组合。 所有这些最终都将面临职位质量,以尝试识别趋势。

To that end, we are going to use the lambda function, build cleaning functions, build a bag of words, create a wordcloud and use nltk's FreqDist.

为此,我们将使用lambda函数,构建清理函数,构建一袋单词,创建wordcloud并使用nltk的FreqDist

进口和清洁功能 (Imports and cleaning functions)

Nothing too fancy with the cleaning functions but the one we are going to use for our wordclouds is a little more invasive to try and get rid of some noise.

清理功能没有什么花哨的,但是我们将用于词云的清理功能更具侵入性,可以消除一些噪音。

from nltk import FreqDist
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (30,30)def wc(text):
"""
Cleaning function to be used with our first wordcloud
"""

if text:
tags = text.replace('><',' ')
tags = tags.replace('-','')
tags = tags.replace('.','DOT')
tags = tags.replace('c++','Cpp')
tags = tags.replace('c#','Csharp')
tags = tags.replace('>','')
return tags.replace('<','')
else:
return 'None'

def clean_tags(text):
"""
Cleaning function for tags
"""

if text:
tags = text.replace('><',' ')
tags = tags.replace('>','')
return tags.replace('<','')
else:
return 'None'

词云 (Wordclouds)

wordcloud() needs a document of space-separated words. We are going to create a list of words then use the

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值