python tfidf特征变换_Tfidfvectorizer从变换中获取具有权重的特征

最新推荐文章于 2021-01-21 21:58:08 发布

我要抢一个娘亲

最新推荐文章于 2021-01-21 21:58:08 发布

阅读量227

点赞数

文章标签： python tfidf特征变换

本文链接：https://blog.csdn.net/weixin_42502811/article/details/112835081

版权

Scikit Learn如何计算tfidf的详细信息是可用的here，下面是一个使用单词n-grams实现的示例。在from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.metrics.pairwise import cosine_similarity

# Train the vectorizer

text="this is a simple example"

singleTFIDF = TfidfVectorizer(ngram_range=(1,2)).fit([text])

singleTFIDF.vocabulary_ # show the word-matrix position pairs

# Analyse the training string - text

single=singleTFIDF.transform([text])

single.toarray() # displays the resulting matrix - all values are equal because all terms are present

# Analyse two new strings with the trained vectorizer

doc_1 = ['is this example working', 'hopefully it is a good example', 'no matching words here']

query = singleTFIDF.transform(doc_1)

query.toarray() # displays the resulting matrix - only matched terms have non-zero values

# Compute the cosine similarity between text and doc_1 - the second string has only two matching terms, therefore it has a lower similarity value

cos_similarity = cosine_similarity(single.A, query.A)

输出：

^{pr2}$

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注