本示例说明如何通过将纯文本读取到字符串数组中,对其进行预处理并将其传递给wordcloud
函数来从纯文本创建词云。如果安装了Text Analytics Toolbox,则可以直接从字符串数组创建词云。有关更多信息,请参见wordcloud
(文本分析工具箱)。
使用fileread
函数读取莎士比亚十四行诗中的文字。
sonnets = fileread('sonnets.txt');
sonnets(1:135)
ans =
'THE SONNETS
by William Shakespeare
I
From fairest creatures we desire increase,
That thereby beauty's rose might never die,'
使用string
函数将文本转换为字符串。然后,使用splitlines
函数将其分割为换行符。
sonnets = string(sonnets);
sonnets = splitlines(sonnets);
sonnets(10:14)
ans = 5x1 string
" From fairest creatures we desire increase,"
" That thereby beauty's rose might never die,"
" But as the riper should by time decease,"
" His tender heir might bear his memory:"
" But thou, contracted to thine own bright eyes,"
用空格替换一些标点符号。
p = ["." "?" "!" "," ";" ":"];
sonnets = replace(sonnets,p," ");%字符串处理
sonnets(10:14)
ans = 5x1 string
" From fairest creatures we desire increase "
" That thereby beauty's rose might never die "
" But as the riper should by time decease "
" His tender heir might bear his memory "
" But thou contracted to thine own bright eyes "
拆分sonnets
为一个字符串数组,其元素包含单个单词。为此,将所有字符串元素连接成1×1字符串,然后在空格字符处拆分。
sonnets = join(sonnets);
sonnets = split(sonnets);
sonnets(7:12)
ans = 6x1 string
"From"
"fairest"
"creatures"
"we"
"desire"
"increase"
删除少于五个字符的单词。
sonnets(strlength(sonnets)<5) = [];
转换sonnets
为分类数组,然后使用绘制wordcloud
。该函数绘制C
中的每一个元素,元素大小与它们的频率计数相对应。
C = categorical(sonnets);
figure
wordcloud(C);
title("Sonnets Word Cloud")
注:本文根据MATLAB官网内容修改而成。
我推荐给你以下的视频教程,特点是没有PPT,不掺水,直接编程环境下的实操课程: 用100分钟了解MATLAB编程
知乎 - 安全中心www.1data.pro《MATLAB编程360》视频课程:
MATLAB编程360 - 网易云课堂study.163.comstudy.163.com