sklearn .feature_extraction.text.TfidVectorizer.fit_transform(text)
def normal_test():
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = [
'This is the first document.',
'This document is the second document.',
]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)
print(X)
output:
(0, 0) 0.40909010368335985
(0, 1) 0.5749618667993135
(0, 4) 0.40909010368335985
(0, 2) 0.40909010368335985
(0, 5) 0.40909010368335985
(1, 3) 0.4691317250431934
(1, 0) 0.6675821723880022
(1, 4) 0.3337910861940011
(1, 2) 0.3337910861940011
(1, 5) 0.3337910861940011
sklearn .feature_extraction.text.TfidVectorizer.fit_transform(text)
- 功能解析:
计算每个词在其所在的文章中的tf_idf,即逆文档词频。