非数值型特征
首先,对于非数值列,肯定得想办法处理为数值型或者onehot编码。
- sklearn中,DictVectorizer可以做这件事:
DictVectorizer implements what is called one-of-K or “one-hot” coding for categorical (aka nominal, discrete) features. Categorical features are “attribute-value” pairs where the value is restricted to a list of discrete of possibilities without ordering (e.g. topic identifiers, types of objects, tags, names…).
示例代码:
dt = DictVectorizer(sparse=False)
x_train = dt.