单词、词语:作为特征值
方法1:sklearn.feature_extraction.text.CountVectorizer(stop_words=[]),
返回词语出现的次数,返回词频矩阵,stop_words=[]停用词列表
·CountVectorizer.fit_transform(X)X:文本或者包含文本字符串的可迭代对象返回值:返回sparse矩降
·CountVectorizer.inverse_transform(X)Xarray数组或者sparse矩阵返回值;转换之前数据格
·CountVectorizer.get_feature_names() 返回值;单词列表
import pandas as pdimport numpy as npfrom sklearn.feature_extraction.text import CountVectorizerdata=["Maybe it was better to just really enjoy life. this is the life","享受生活,顺其自然。这就是生活"]transfer = CountVectorizer() #实例化一个转换器类data_new = transfer.fit_transform(data) #调用fit_transform()#print(data_new)print(transfer.get_feature_names())print(data_new.toarray()) #构建成一个二维表:data=pd.DataFrame(data_new