sklearn.feature_extraction.text.CountVector

最新推荐文章于 2023-02-10 15:45:14 发布

飞奔的帅帅

最新推荐文章于 2023-02-10 15:45:14 发布

阅读量3.6k

点赞数 2

分类专栏： python基础文章标签： sklearn.feature_extraction.tex CountVector python 文本特征提取

本文链接：https://blog.csdn.net/ustbbsy/article/details/80047916

版权

本文介绍了sklearn.feature_extraction.text.CountVectorizer的参数、使用示例和2-grams的概念。该方法用于文本特征提取，包括与其他方法如TfidfVectorizer的对比。文中详细解释了token_pattern参数，并给出了1-grams和2-grams的使用区别。

摘要由CSDN通过智能技术生成

1，参数

sklearn.feature_extraction.text.CountVector是sklearn.feature_extraction.text提供的文本特征提取方法的一种。

sklearn.feature_extraction.text 的4中文本特征提取方法：

CounterVector
TfidfVectorizer
TfidfTransformer
HashingVectorizer

看看这个函数的参数：

sklearn.feature_extraction.text.CountVectorizer(
input=’content’,         #输入，可以是文件名字，文件，文本内容
encoding=’utf-8’,       #默认编码方式
decode_error=’strict’, # 编码错误的处理方式，有三种{'strict','ignore','replace}
strip_accents=None, # 去除音调，三种{'ascill','unicode',None},ascii处理的速度快，但只适用于ASCll编码，unicode适用于所有的字符，但速度慢
lowercase=True, # 转化为小写
preprocessor=None,
tokenizer=None, #
stop_words=None,
token_pattern=’(?u)\b\w\w+\b’, ngram_range=(1, 1),
analyzer=’word’, #停止词，一些特别多，但没有意义的词，例如 a ,the an
ma

最低0.47元/天解锁文章

飞奔的帅帅

关注

2
点赞
踩
11

收藏

觉得还不错? 一键收藏
1
评论
sklearn.feature_extraction.text.CountVector

1，参数sklearn.feature_extraction.text.CountVector是sklearn.feature_extraction.text提供的文本特征提取方法的一种。sklearn.feature_extraction.text 的4中文本特征提取方法：CounterVectorTfidfVectorizerTfidfTransformerHashingVectorizer看...
复制链接

扫一扫

专栏目录