http://scikit-learn.org/stable/modules/feature_extraction.html
带病在网吧里。。。。。。写,求支持。。。
1、首先澄清两个概念:特征提取和特征选择(
Feature extraction is very different from Feature selection
)。the former consists in transforming arbitrary data, such as text or images, into numerical features usable for machine learning. The latter is a machine learning technique applied on these features(从已经提取的特征中选择更好的特征).
下面分为四大部分来讲,主要还是4、text feature extraction
2、loading features form dicts
class DictVectorizer。举个例子就好: