sklearn.feature_extraction.text.TfidfVectorizer
TfidfVectorizerfrom sklearn.feature_extraction.text import TfidfVectorizerimport numpy as np"""tf-idf(t,d) = tf(t,d)*idf(t)idf(t) = log(n_d/df(d,t))+1平滑版 idf(t) = log(1+n_d/1+df(d,t))+1tf(t,d)是tf值,表示某一篇文本d中,词项t的频度,从式子可以看出tf值由词项和文本共同决定.idf(t)是词项t的
原创
2020-11-15 19:46:49 ·
195 阅读 ·
0 评论