29.MATLAB NLP 文字工具箱简单文本分析

最新推荐文章于 2021-03-25 09:56:08 发布

waiting不是违停

最新推荐文章于 2021-03-25 09:56:08 发布

阅读量2.3k

点赞数 2

本文链接：https://blog.csdn.net/weixin_44737922/article/details/105179952

版权

官方教程Create Simple Text Model for Classification

1.词频直方图统计：

categorical创建分类数组

histogram用法

data.event_type = categorical(data.event_type);前后data.event_type变化：

把“”删除了

data.event_type = categorical(data.event_type);
figure
h = histogram(data.event_type);
xlabel("Class")
ylabel("Frequency")
title("Class Distribution")

词频直方图如下，当然如果是NLP的话，得删掉StopWords，太长的以及太短的。

2.删掉低频词

有些词频太低了，在直方图中把它删掉

面向对象，类，属性，方法

classCounts = h.BinCounts;
classNames = h.Categories;
idxLowCounts = classCounts < 10;
infrequentClasses = classNames(idxLowCounts);
idxInfrequent = ismember(data.event_type,infrequentClasses);
data(idxInfrequent,:) = [];

这种写法也行哦

bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
YTrain(idx) = [];

3.划分训练集和测试集

cvpartition

10%划成测试集，90%划成训练集

cvp = cvpartition(data.event_type,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);

提取标签

textDataTrain = dataTrain.event_narrative;
textDataTest = dataTest.event_narrative;
YTrain = dataTrain.event_type;
YTest = dataTest.event_type;

然后用监督学习训练数据。

waiting不是违停

关注

2
点赞
踩
20

收藏

觉得还不错? 一键收藏
0
评论
29.MATLAB NLP 文字工具箱简单文本分析

官方教程Create Simple Text Model for Classification1.词频直方图统计：categorical创建分类数组histogram用法data.event_type = categorical(data.event_type);前后data.event_type变化：把“”删除了data.event_type = ca...
复制链接

扫一扫