词云的制作

最新推荐文章于 2024-12-31 13:37:11 发布

Arvin2015

最新推荐文章于 2024-12-31 13:37:11 发布

阅读量1.5k

点赞数

文章标签：词云分词词频

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/arvin2015/article/details/79280354

版权

1. 导入分词、制作词云的包

#调入分词的库

library("rJava")

library("Rwordseg")

#调入绘制词云的库

library("RColorBrewer")

library("wordcloud")

2. 导入文本数据

#读入数据(特别注意，read.csv竟然可以读取txt的文本)

myfile<-read.csv(file.choose(),header=FALSE)

3. 分词

#分词，并将分词结果转换为向量

myfile.words<- unlist(lapply(X = myfile.res,FUN = segmentCN))

4. 剔除URL等各种不需要的字符，还需要删除什么特殊的字符可以依样下面增加gsub的语句

myfile.words<-gsub(pattern="http:[a-zA-Z\\/\\.0-9]+","",myfile.words)

myfile.words<- gsub("\n","",myfile.words)

myfile.words<- gsub("　","",myfile.words)

5. 去掉停用词

data_stw=read.table(file=file.choose(),colClasses="character")

stopwords_CN=c(NULL)

for(iin 1:dim(data_stw)[1]){

stopwords_CN=c(stopwords_CN,data_stw[i,1])

}

for(jin 1:length(stopwords_CN)){

myfile.words<- subset(myfile.words,myfile.words!=stopwords_CN[j])

}

6. 过滤掉1个字的词

myfile.words<- subset(myfile.words,nchar(as.character(myfile.words))>1)

7. 统计词频

myfile.freq<- table(unlist(myfile.words))

myfile.freq<- rev(sort(myfile.freq))

myfile.freq<- data.frame(word=names(myfile.freq), freq=myfile.freq)

8. 按词频过滤词，过滤掉只出现过一次的词，这里可以根据需要调整过滤的词频数

myfile.freq2=subset(myfile.freq,myfile.freq$freq>=2)

9. 画词云

wordcloud(myfile.freq2$word,myfile.freq2$freq.Freq,random.order=FALSE,random.color=FALSE,colors=mycolors,family="myFont")

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。