机器学习实战阅读笔记
软间隔分类
在Scikit—learn的SVC类中通过超参数C来控制如果有违例的情况的接受度。C值越小违例也会越多。在使用软间隔分类的时候,可以使用降低C来进行正则化。
下面加载鸢尾花数据集,缩放特征,然后训练一个线性SVC模型。使用LinearSVC类,C=0.1,hinge损失函数。
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
iris = datasets.load_iris() # 加载鸢尾花数据集
X = iris["data"][:,(2,3)]
y = (iris["target"] == 2).astype(np.float64) # Iris-Virginica
svm_clf = Pipeline((
("scaler", StandardScaler()),
("linear_svc",LinearSVC(C =1, loss = "hinge")),
))
svm_clf.fit(X, y)
Pipeline(memory=None,
steps=[('scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('linear_svc', LinearSVC(C=1, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='hinge', max_iter=1000, multi_class='ovr',
penalty='l2', random_state=None, tol=0.0001, verbose=0))])
svm_clf.predict([[5.5,1.7]]</