此教程说明如何通过将纯文本读入字符串数组、进行预处理并传递给 wordcloud
函数,使用纯文本创建文字云。如果您安装了 Text Analytics Toolbox™,则可以直接使用字符串数组创建文字云。
使用 fileread
函数从莎士比亚的十四行诗中读取文本。
sonnets = fileread('sonnets.txt');
sonnets(1:135)
ans =
'THE SONNETS
by William Shakespeare
I
From fairest creatures we desire increase,
That thereby beauty's rose might never die,'
使用 string
函数将文本转换为字符串。然后,使用 splitlines
函数按换行符对其进行拆分。
sonnets = string(sonnets);
sonnets = splitlines(sonnets);
sonnets(10:14)
ans = 5x1 string array
" From fairest creatures we desire increase,"
" That thereby beauty's rose might never die,"
" But as the riper should by time decease,"
" His tender heir might bear his memory:"
" But thou, contracted to thine own bright eyes,"
用空格替换一些标点字符。
p = ["." "?" "!" "," ";" ":"];
sonnets = replace(sonnets,p," ");
sonnets(10:14)
ans = 5x1 string array
" From fairest creatures we desire increase "
" That thereby beauty's rose might never die "
" But as the riper should by time decease "
" His tender heir might bear his memory "
" But thou contracted to thine own bright eyes "
将 sonnets
拆分为其元素包含单个单词的字符串数组。要完成此操作,需要将所有字符串元素合并成一个 1×1 字符串,然后在空白字符处进行拆分。
sonnets = join(sonnets);
sonnets = split(sonnets);
sonnets(7:12)
ans = 6x1 string array
"From"
"fairest"
"creatures"
"we"
"desire"
"increase"
删除少于五个字符的单词。
sonnets(strlength(sonnets)<5) = [];
将 sonnets
转换为分类数组,然后使用 wordcloud
进行绘图。此函数绘制 C
的唯一元素,大小与这些元素的频率计数对应。
C = categorical(sonnets);
figure
wordcloud(C);
title("Sonnets Word Cloud")
关注公众号: MATLAB基于模型的设计 (ID:xaxymaker) ,每天推送MATLAB学习最常见的问题,每天进步一点点,业精于勤荒于嬉。
可保存后扫码关注哦!