编程作业（六）

最新推荐文章于 2023-06-12 21:03:43 发布

RookieFCB

最新推荐文章于 2023-06-12 21:03:43 发布

阅读量499

点赞数

分类专栏： Andrew NG机器学习笔记

本文链接：https://blog.csdn.net/u013058162/article/details/78340056

版权

Andrew NG机器学习笔记专栏收录该内容

56 篇文章 2 订阅

订阅专栏

支持向量机

本部分练习，我们将在2D示例数据集上使用支持向量机。通过在这些数据集上使用支持向量机，将帮助我们初识支持向量机的运行原理，以及如何使用高斯核函数支持向量机。

任务一示例数据集1

本任务要求我们修改参数C的值，观察支持向量机对数据集的判定边界。ex6.m文件中已将相关代码备好，其代码如下：

%% =============== Part 1: Loading and Visualizing Data ================
%  We start the exercise by first loading and visualizing the dataset. 
%  The following code will load the dataset into your environment and plot
%  the data.
%

fprintf('Loading and Visualizing Data ...\n')

% Load from ex6data1: 
% You will have X, y in your environment
load('ex6data1.mat');

% Plot training data
plotData(X, y);

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ==================== Part 2: Training Linear SVM ====================
%  The following code will train a linear SVM on the dataset and plot the
%  decision boundary learned.
%

% Load from ex6data1: 
% You will have X, y in your environment
load('ex6data1.mat');

fprintf('\nTraining Linear SVM ...\n')

% You should try to change the C value below and see how the decision
% boundary varies (e.g., try C = 1000)
C = 1;
model = svmTrain(X, y, C, @linearKernel, 1e-3, 20);
visualizeBoundaryLinear(X, y, model);

fprintf('Program paused. Press enter to continue.\n');
pause;

当参数C = 1时，其运行结果如下：

C = 1

当参数C = 100时，其运行结果如下：

C = 100

任务二高斯核函数支持向量机

本部分我们使用高斯核函数支持向量机对数据集进行非线性地划分。

首先我们需要使用高斯核函数构建新的特征变量fi，i = 1, 2, 3, ......因此，我们需要在gaussianKernel.m文件中，将高斯核函数补充完整。

高斯核函数

参考代码如下：

sim = exp(-(x1 - x2)' * (x1 - x2) / (2 * sigma * sigma));

在ex6.m文件中运行如下部分代码，可得到的结果为：0.324652。

%% =============== Part 3: Implementing Gaussian Kernel ===============
%  You will now implement the Gaussian kernel to use
%  with the SVM. You should complete the code in gaussianKernel.m
%
fprintf('\nEvaluating the Gaussian Kernel ...\n')

x1 = [1 2 1]; x2 = [0 4 -1]; sigma = 2;
sim = gaussianKernel(x1, x2, sigma);

fprintf(['Gaussian Kernel between x1 = [1; 2; 1], x2 = [0; 4; -1], sigma = %f :' ...
         '\n\t%f\n(for sigma = 2, this value should be about 0.324652)\n'], sigma, sim);

fprintf('Program paused. Press enter to continue.\n');
pause;

我们需要导入新的数据集-样例数据集2，通过使用高斯核函数支持向量机，对该数据集进行非线性划分。如下为ex6.m文件中已备好的相关代码。

%% =============== Part 4: Visualizing Dataset 2 ================
%  The following code will load the next dataset into your environment and 
%  plot the data. 
%

fprintf('Loading and Visualizing Data ...\n')

% Load from ex6data2: 
% You will have X, y in your environment
load('ex6data2.mat');

% Plot training data
plotData(X, y);

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ========== Part 5: Training SVM with RBF Kernel (Dataset 2) ==========
%  After you have implemented the kernel, we can now use it to train the 
%  SVM classifier.
% 
fprintf('\nTraining SVM with RBF Kernel (this may take 1 to 2 minutes) ...\n');

% Load from ex6data2: 
% You will have X, y in your environment
load('ex6data2.mat');

% SVM Parameters
C = 1; sigma = 0.1;

% We set the tolerance and max_passes lower here so that the code will run
% faster. However, in practice, you will want to run the training to
% convergence.
model= svmTrain(X, y, C, @(x1, x2) gaussianKernel(x1, x2, sigma)); 
visualizeBoundary(X, y, model);

fprintf('Program paused. Press enter to continue.\n');
pause;

其运行结果为：

为了更进一步熟悉使用高斯核函数支持向量机，我们导入新的数据集——样例数据集3。在ex6data3.mat文件中，其将该数据集分为了两部分，即训练集和交叉验证集。我们需要将dataset3Params.m文件补充完整，即我们需要在参数C∈{0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30}和参数σ∈{0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30}中找到一个最优值，使其能够对数据集进行正确地划分。

dataset3Params.m文件的参考代码如下：

C_temp = [0.01; 0.03; 0.1; 0.3; 1; 3; 10; 30];  
sigma_temp = [0.01; 0.03; 0.1; 0.3; 1; 3; 10; 30];

error_val = zeros(length(C_temp), length(sigma_temp));

for i = 1 : length(C_temp)
    for j = 1 : length(sigma_temp)
        model = svmTrain(X, y, C_temp(i), @(x1, x2) gaussianKernel(x1, x2, sigma_temp(j)));
        predictions = svmPredict(model, Xval);
        error_val(i, j) = mean(double(predictions ~= yval));
    end
end

[I, J] = find(error_val ==  min(error_val(:)));    % 找最小元素位置
C = C_temp(I)                                      % 1
sigma = sigma_temp(J)                              % 0.100

ex6.m文件中本部分代码如下：

%% =============== Part 6: Visualizing Dataset 3 ================
%  The following code will load the next dataset into your environment and 
%  plot the data. 
%

fprintf('Loading and Visualizing Data ...\n')

% Load from ex6data3: 
% You will have X, y in your environment
load('ex6data3.mat');

% Plot training data
plotData(X, y);

fprintf('Program paused. Press enter to continue.\n');
pause;

%% ========== Part 7: Training SVM with RBF Kernel (Dataset 3) ==========

%  This is a different dataset that you can use to experiment with. Try
%  different values of C and sigma here.
% 

% Load from ex6data3: 
% You will have X, y in your environment
load('ex6data3.mat');

% Try different SVM Parameters here
[C, sigma] = dataset3Params(X, y, Xval, yval);

% Train the SVM
model= svmTrain(X, y, C, @(x1, x2) gaussianKernel(x1, x2, sigma));
visualizeBoundary(X, y, model);

fprintf('Program paused. Press enter to continue.\n');
pause;

其运行结果为：

垃圾邮件分类器

通过使用支持向量机实现垃圾邮件过滤器。

任务一电子邮件预处理

首先，我们导入样本数据，其内容如下：

众所周知，每封电子邮件都包含有类似于URLs和电子邮件地址等内容，但其主体内容是不同的。因此，为了提高分类效率，我们对这些类似的内容进行归一化处理，例如：将URLs用“httpaddr”文本代替等。该功能代码已在processEmail.m文件中准备好了。

我们运行ex6_spam.m文件中该部分代码，结果如下：

==== Processed Email ====

anyon know how much it cost to host a web portal well it depend on how mani
visitor you re expect thi can be anywher from less than number buck a month
to a coupl of dollarnumb you should checkout httpaddr or perhap amazon ecnumb
if your run someth big to unsubscrib yourself from thi mail list send an
email to emailaddr

任务二词汇表

本部分我们利用已有的词汇表与样例邮件中的单词构建映射关系。因此，我们需要对processEmail.m文件进行补充，构建单词表映射关系，即样例邮件中的某一单词属于词汇表，则将其添加至word_indices变量；若不属于则跳过，验证下一单词。

processEmail.m文件的参考代码如下：

    for i = 1 : length(vocabList)
        if (strcmp(vocabList{i}, str))
            word_indices = [word_indices; i];
        end
    end

任务三提取样例邮件的特征变量

本部分将word_indices变量中的数据转换为特性变量x的向量模式，即

其中，若第i个单词在样例邮件中，则xi = 1；否则xi = 0。

emailFeatures.m文件中的参考代码如下：

x(word_indices) = 1;

任务四使用支持向量机训练

本部分通过使用支持向量机模型对训练集进行训练，对交叉验证集进行验证。本部分代码如下：

%% =========== Part 3: Train Linear SVM for Spam Classification ========
%  In this section, you will train a linear classifier to determine if an
%  email is Spam or Not-Spam.

% Load the Spam Email dataset
% You will have X, y in your environment
load('spamTrain.mat');

fprintf('\nTraining Linear SVM (Spam Classification)\n')
fprintf('(this may take 1 to 2 minutes) ...\n')

C = 0.1;
model = svmTrain(X, y, C, @linearKernel);

p = svmPredict(model, X);

fprintf('Training Accuracy: %f\n', mean(double(p == y)) * 100);

%% =================== Part 4: Test Spam Classification ================
%  After training the classifier, we can evaluate it on a test set. We have
%  included a test set in spamTest.mat

% Load the test dataset
% You will have Xtest, ytest in your environment
load('spamTest.mat');

fprintf('\nEvaluating the trained Linear SVM on a test set ...\n')

p = svmPredict(model, Xtest);

fprintf('Test Accuracy: %f\n', mean(double(p == ytest)) * 100);
pause;

任务五预测垃圾邮件

本部分使用支持向量机对新的电子邮件进行预测。本部分代码为：

%% =================== Part 6: Try Your Own Emails =====================
%  Now that you've trained the spam classifier, you can use it on your own
%  emails! In the starter code, we have included spamSample1.txt,
%  spamSample2.txt, emailSample1.txt and emailSample2.txt as examples. 
%  The following code reads in one of these emails and then uses your 
%  learned SVM classifier to determine whether the email is Spam or 
%  Not Spam

% Set the file to be read in (change this to spamSample2.txt,
% emailSample1.txt or emailSample2.txt to see different predictions on
% different emails types). Try your own emails as well!
filename = 'spamSample1.txt';

% Read and predict
file_contents = readFile(filename);
word_indices  = processEmail(file_contents);
x             = emailFeatures(word_indices);
p = svmPredict(model, x);

fprintf('\nProcessed %s\n\nSpam Classification: %d\n', filename, p);
fprintf('(1 indicates spam, 0 indicates not spam)\n\n');

RookieFCB

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
编程作业（六）

支持向量机支持向量机本部分练习，我们将在2D示例数据集上使用支持向量机。通过在这些数据集上使用支持向量机，将帮助我们初识支持向量机的运行原理，以及如何使用高斯核函数支持向量机。任务一示例数据集1本任务要求我们修改参数C的值，观察支持向量机对数据集的判定边界。ex6.m文件中已将相关代码备好，其代码如下：%% =============== Part 1: Loading and Vis
复制链接

扫一扫