文本建模常用的预处理方法

最新推荐文章于 2024-03-14 09:31:31 发布

VIP文章 mmc2015

最新推荐文章于 2024-03-14 09:31:31 发布

阅读量1.7k

点赞数 1

分类专栏：机器学习——文本挖掘文章标签：机器学习数据挖掘文本建模特征预处理

本文链接：https://blog.csdn.net/mmc2015/article/details/46730289

版权

最近看文本建模，给一大段文本，如何建模？？？

以MeTa代码为例：

[[analyzers]] 
method = "ngram-word" 
ngram = 1 
	[[analyzers.filter]] 
	type = "whitespace-tokenizer" 
	[[analyzers.filter]] 
	type = "lowercase" 
	[[analyzers.filter]] 
	type = "alpha" 
	[[analyzers.filter]] 
	type = "length" 
	min = 2 
	max = 35 
	[[analyzers.filter]] 
	type = "list" 
	file = "lemur-stopwords.txt" 
	[[analyzers.filter]] 
	type = "porter2-stemmer"

This tells MeTA how to process the text before indexing the documents. “ngram=1” configures MeTA to use unigrams (single words). Each “[[analyzers.filter]]” tag defines a text filter that applies a special function on the text. T

最低0.47元/天解锁文章

优惠劵

mmc2015

关注关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
文本建模常用的预处理方法

最近看文本建模，给一大段文本，如何建模？？？以MeTa代码为例：[[analyzers]] method = "ngram-word" ngram = 1 [[analyzers.filter]] type = "whitespace-tokenizer" [[analyzers.filter]] type = "lowercase" [[analyzers.fil
复制链接

扫一扫