斯坦福ML课程——python转写(Week7—课程作业ex6_1)

利用python完成课程作业ex6,Introduction如下:

In the rst half of this exercise, you will be using support vector machines (SVMs) with various example 2D datasets. Experimenting with these datasets will help you gain an intuition of how SVMs work and how to use a Gaussian kernel with SVMs.

代码如下(和之前的代码相比较,降低了代码整体结构的复杂性,提高了代码的可读性):

import matplotlib.pyplot as plt
import scipy.io as sio
import numpy as np
#import scipy.optimize as op
import plotData as pltD
from sklearn import svm
import visualizeBoundary as vb
import gaussianKernel as gk
from sklearn.model_selection import GridSearchCV

print('Loading and Visualizing Data ...\n')
# Load from ex6data1: 
# You will have X, y in your environment

Data = sio.loadmat('D:\exercise\machine-learning-ex6\machine-learning-ex6\ex6\ex6data1.mat')
X = Data['X']
y = Data['y'].flatten()

plt.figure()
pltD.plotData(X, y)

'''
%% ==================== Part 2: Training Linear SVM ====================
%  The following code will train a linear SVM on the dataset and plot the
%  decision boundary learned.
%

% Load from ex6data1: 
% You will have X, y in your environment
'''
C = 1
clf = svm.SVC(C, kernel='linear', tol=1e-3)
clf.fit(X, y)
plt.figure()
pltD.plotData(X, y)
vb.visualizeBoundary(clf, X, 0, np.max(X[:,0]), 1.5, np.max(X[:,1]))
plt.title('decision boundary with C = {}'.format(C))

C = 100
clf = svm.SVC(C, kernel='linear', tol=1e-3)
clf.fit(X, y)
plt.figure()
pltD.plotData(X, y)
vb.visualizeBoundary(clf, X, 0, np.max(X[:,0]), 1.5, np.max(X[:,1]))
plt.title('decision boundary with C = {}'.format(C))

'''
%% =============== Part 3: Implementing Gaussian Kernel ===============
%  You will now implement the Gaussian kernel to use
%  with the SVM. You should complete the code in gaussianKernel.m
%
'''

print('Evaluating the Gaussian Kernel')
x1 = np.array([1, 2, 1])
x2 = np.array([0, 4, -1])
sigma = 2
sim = gk.gaussianKernel(x1, x2, sigma)
print('Gaussian kernel between x1 = [1, 2, 1], x2 = [0, 4, -1], sigma = {} : {:0.6f}\n'
      '(for sigma = 2, this value should be about 0.324652)'.format(sigma, sim))

'''
%% =============== Part 4: Visualizing Dataset 2 ================
%  The following code will load the next dataset into your environment and 
%  plot the data. 
%
'''
print('Loading and Visualizing Data ...\n')
Data2 = sio.loadmat('D:\exercise\machine-learning-ex6\machine-learning-ex6\ex6\ex6data2.mat')
X2 = Data2['X']
y2 = Data2['y'].flatten()

plt.figure()
pltD.plotData(X2, y2)

'''
%% ========== Part 5: Training SVM with RBF Kernel (Dataset 2) ==========
%  After you have implemented the kernel, we can now use it to train the 
%  SVM classifier.
% 
'''
C = 1
sigma = 0.1
clf = svm.SVC(C, kernel='rbf', gamma=np.power(sigma, -2)) #使用封装好的高斯核函数 rbf,gamma即为cost计算公式的分母
clf.fit(X2, y2)
plt.figure()
pltD.plotData(X2, y2)
vb.visualizeBoundary(clf, X, 0, np.max(X2[:,0]), 0.4, np.max(X2[:,1]))
plt.title('decision boundary with C = {}'.format(C))

'''
%% =============== Part 6: Visualizing Dataset 3 ================
%  The following code will load the next dataset into your environment and 
%  plot the data. 
%
'''

print('Loading and Visualizing Data ...\n')
Data3 = sio.loadmat('D:\exercise\machine-learning-ex6\machine-learning-ex6\ex6\ex6data3.mat')
X3 = Data3['X']
y3 = Data3['y'].flatten()
Xval = Data3['Xval']
yval = Data3['yval'].flatten()

plt.figure()
#pltD.plotData(X3, y3)

'''
%% ========== Part 7: Training SVM with RBF Kernel (Dataset 3) ==========

%  This is a different dataset that you can use to experiment with. Try
%  different values of C and sigma here.
% 

% Load from ex6data3: 
% You will have X, y in your environment

'''
clf = svm.SVC(C, kernel='rbf', gamma=np.power(sigma, -2))
clf.fit(Xval, yval)
plt.figure()
pltD.plotData(Xval, yval)
vb.visualizeBoundary(clf, X, -0.6, np.max(Xval[:,0]), -0.8, np.max(Xval[:,1]))
plt.title('decision boundary with C = {}'.format(C))


#由于电脑配置不行,运行很慢,所以一直没有得到结果
#这么点数据就跑不起来,真是难受



#GridSearchCV,它存在的意义就是自动调参,只要把参数输进去,就能给出最优化的结果和参数。
#https://blog.csdn.net/u012969412/article/details/72973055
clf3 = svm.SVC(kernel='rbf', class_weight='balanced')
C_list = np.array([0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30])
gamma_list = np.array([0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30])
param_test = [{'kernel': ['rbf'], 'C': C_list, 'gamma': gamma_list}]
grid = GridSearchCV(clf3, param_grid = param_test, cv=3)
grid1 = grid.fit(X3, y3)
score = grid.score(Xval, yval)
print('精度为: {}'.format( score))

 运行结果如下:

 

                                        

 这里对GridsearchCV函数中参数n_jobs出现了一些疑惑,

按照以前的理解,在一定条件下并行数越多,运行速度越快,所以当我这里设置成为-1时,即和我的电脑CPU核数相同

但是,运行速度极其缓慢;当设置为默认值时,结果瞬间得出。问了大神得出了一下几点结论:

  • python的并行还是不可靠的,适合用于多进程的并行
  •  内部本身的bug

这里有一些链接,跟我的问题有些相似:

https://stackoverflow.com/questions/50993867/increasing-n-jobs-has-no-effect-on-gridsearchcv

https://github.com/dmlc/xgboost/issues/2163

如果你用的mac的话可能和这个有关系:

https://scikit-learn.org/stable/faq.html#why-do-i-sometime-get-a-crash-freeze-with-n-jobs-1-under-osx-or-linux

为了记录自己的学习进度同时也加深自己对知识的认知,刚刚开始写博客,如有错误或者不妥之处,请大家给予指正。 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值