[原创]ML吴恩达系列习题解答4_machine_learning_ex6

7 篇文章 0 订阅

最近有点忙,慢更,见谅
在这里插入图片描述
要求如上,实现如下:
(这个练习的代码比较简单,就不细讲了。吴老师在课程中一直强调SVM已经有成熟的算法库,练习的目的是理解并熟练运用SVM,不必太关心SVM库实现。。。)
1.实现gaussianKernel,就是计算相似度的

% ====================== YOUR CODE HERE ======================
% Instructions: Fill in this function to return the similarity between x1
%               and x2 computed using a Gaussian kernel with bandwidth
%               sigma
%
%


sim = exp(-(x1 - x2)'*(x1 - x2)./(2*sigma*sigma));

% =============================================================

看看输出

Evaluating the Gaussian Kernel ...
Gaussian Kernel between x1 = [1; 2; 1], x2 = [0; 4; -1], sigma = 2.000000 :
	0.324652
(for sigma = 2, this value should be about 0.324652)
Program paused. Press enter to continue.

在这里插入图片描述
在这里插入图片描述

2.实现dataset3Params,就是在两个8x1向量中,组合出一对prediction_error最小的配置

% ====================== YOUR CODE HERE ======================
% Instructions: Fill in this function to return the optimal C and sigma
%               learning parameters found using the cross validation set.
%               You can use svmPredict to predict the labels on the cross
%               validation set. For example, 
%                   predictions = svmPredict(model, Xval);
%               will return the predictions on the cross validation set.
%
%  Note: You can compute the prediction error using 
%        mean(double(predictions ~= yval))
%

Clist = [ 0.01 0.03 0.1 0.3 1 3 10 30];
Sigmalist = [ 0.01 0.03 0.1 0.3 1 3 10 30];
prediction_error = [];
for i=1:8
    for j=1:8
        model = svmTrain(X, y, Clist(i), @(x1, x2) gaussianKernel(x1, x2, Sigmalist(j)));
        predictions = svmPredict(model,Xval);
        prediction_error((i-1)*8 + j) = mean(double(predictions ~= yval));
    end
end

[~,p] = min(prediction_error,[],2);

C = Clist(floor(p/8) + 1);
sigma = Sigmalist(mod(p,8));

% =========================================================================

看看SVM划分
在这里插入图片描述
可以可以,继续继续
3.实现processEmail

    % ====================== YOUR CODE HERE ======================
    % Instructions: Fill in this function to add the index of str to
    %               word_indices if it is in the vocabulary. At this point
    %               of the code, you have a stemmed word from the email in
    %               the variable str. You should look up str in the
    %               vocabulary list (vocabList). If a match exists, you
    %               should add the index of the word to the word_indices
    %               vector. Concretely, if str = 'action', then you should
    %               look up the vocabulary list to find where in vocabList
    %               'action' appears. For example, if vocabList{18} =
    %               'action', then, you should add 18 to the word_indices 
    %               vector (e.g., word_indices = [word_indices ; 18]; ).
    % 
    % Note: vocabList{idx} returns a the word with index idx in the
    %       vocabulary list.
    % 
    % Note: You can use strcmp(str1, str2) to compare two strings (str1 and
    %       str2). It will return 1 only if the two strings are equivalent.
    %
    for idx=1:length(vocabList)
        if strcmp( str,vocabList(idx))
            word_indices = [word_indices;idx];
        end
    end
    % =============================================================

看看输出

=========================
Word Indices: 
 86 916 794 1077 883 370 1699 790 1822 1831 883 431 1171 794 1002 1893 1364 592 1676 238 162 89 688 945 1663 1120 1062 1699 375 1162 479 1893 1510 799 1182 1237 810 1895 1440 1547 181 1699 1758 1896 688 1676 992 961 1477 71 530 1699 531

Program paused. Press enter to continue.

4.实现emailFeatures

% ====================== YOUR CODE HERE ======================
% Instructions: Fill in this function to return a feature vector for the
%               given email (word_indices). To help make it easier to 
%               process the emails, we have have already pre-processed each
%               email and converted each word in the email into an index in
%               a fixed dictionary (of 1899 words). The variable
%               word_indices contains the list of indices of the words
%               which occur in one email.
% 
%               Concretely, if an email has the text:
%
%                  The quick brown fox jumped over the lazy dog.
%
%               Then, the word_indices vector for this text might look 
%               like:
%               
%                   60  100   33   44   10     53  60  58   5
%
%               where, we have mapped each word onto a number, for example:
%
%                   the   -- 60
%                   quick -- 100
%                   ...
%
%              (note: the above numbers are just an example and are not the
%               actual mappings).
%
%              Your task is take one such word_indices vector and construct
%              a binary feature vector that indicates whether a particular
%              word occurs in the email. That is, x(i) = 1 when word i
%              is present in the email. Concretely, if the word 'the' (say,
%              index 60) appears in the email, then x(60) = 1. The feature
%              vector should look like:
%
%              x = [ 0 0 0 0 1 0 0 0 ... 0 0 0 0 1 ... 0 0 0 1 0 ..];
%
%


for idx=1:length(word_indices)
    x(word_indices(idx)) = 1;
end

% =========================================================================

看看输出

==== Processed Email ====

anyon know how much it cost to host a web portal well it depend on how mani 
visitor you re expect thi can be anywher from less than number buck a month 
to a coupl of dollarnumb you should checkout httpaddr or perhap amazon ecnumb 
if your run someth big to unsubscrib yourself from thi mail list send an 
email to emailaddr 

=========================
Length of feature vector: 1899
Number of non-zero entries: 45
Program paused. Press enter to continue.

最后,我还真找了个垃圾邮件试试
号码被我用xxxx隐去了,这些人虽然烦,但也要尊重一下他们的隐私,遂隐去
新建个文件spamSample12019.txt测试一下

Dear: 
You need to invoice, it is worth paying attention!
Professional agent to open. Each. The place. Zheng. Regulations. Send. Ticket points discount! Manager Zhang 1326516xxxx WeChat / QQ: 180890xxxx

看看识别结果

==== Processed Email ====

dear you need to invoic it is worth pai attent profession agent to open each 
the place zheng regul send ticket point discount manag zhang number wechat qq 
number 

=========================

Processed spamSample12019.txt

Spam Classification: 1
(1 indicates spam, 0 indicates not spam)

作为一只做底层电路的渣渣,看到这波操作,只能说666

好了,谢谢大家

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值