机器学习课堂笔记（十五）

最新推荐文章于 2021-01-10 09:17:56 发布

数据纵横

最新推荐文章于 2021-01-10 09:17:56 发布

阅读量465

点赞数

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/github_27432191/article/details/51339904

版权

机器学习专栏收录该内容

14 篇文章 1 订阅

订阅专栏

机器学习课堂笔记（十五）

<matlab>
% Find closest cluster members
idx = findClosestCentroids(X, centroids)

% Essentially, now we have represented the image X as in terms of 
% the indices in idx. 

% We can now recover the image from the indices (idx) by mapping
% each pixel (specified by it's index in idx) to the centroid value
X_recovered = centroids(idx,:);

% Reshape the recovered image into proper dimensions
X_recovered = reshape(X_recovered, img_size(1), img_size(2), 3);
<matlab>

调用idx=findClosestCentroids(X, centroids)得到 $size(X,1)\times1$ 的矩阵

调用X_recovered = centroids(idx,:)得到 $size(X,1){\times}size(X,2)$ 的矩阵

<matlab>
% Instructions: Compute the projection of the data using only the 
% top K eigenvectors in U (first K columns). 
% For the i-th example X(i,:), the projection on to the k-th 
% eigenvector is given as follows:
% x = X(i, :)';
% projection_k = x' * U(:, k);
<matlab>

调用x = X(i, :)';得到 $size(X,2)\times1$ 的列向量
调用projection_k = x' * U(:, k);得到 $1\times1$ 的值
这里写图片描述
根据样本值计算 $\mu$ 和 $\sigma$

每个特征值服从不同的高斯分布

设计一种评估算法性能的方法让选择特征更容易

训练样本:交叉验证集:测试集=6:2:2

对于偏斜的数据集：
1、计算真阳性，假阳性，假阴性，真阴性的值
2、计算查准率和召回率
3、计算 $F_1$ 积分
这里写图片描述
使用交叉验证集选择 $\sigma$ ，然后使用测试集评估算法的性能

对于异常检测：如果异常的种类很多的话，少量的正样本难以学习到所有的异常。未来的异常可能和以前的异常完全不同
对于监督学习：拥有足够的正样本，未来的正样本和训练集中的正样本相似
这里写图片描述
当拥有大量的正样本和负样本时，异常检测也可以使用监督学习的算法

使用hist(x_i)查看 $x_i$ 的分布
变换 $x_i$ 使其满足高斯分布

寻找算法没能标记的异常点，以此启发创造新的特征变量，从而使其与正常点区分
这里写图片描述

$\Sigma(1,1)$ 改变 $x_1$ 下降速度
$\Sigma(2,2)$ 改变 $x_2$ 下降速度

$\Sigma(1,2)$ 和 $\Sigma(2,1)$ 改变 $x_1$ 和 $x_2$ 的相关性

多元分布的中心值为 $(\mu(1),\mu(2))$

根据样本值计算 $\mu$ 和 $\Sigma$

原来的模型是多元高斯模型的一个特例
这里写图片描述
在m>10n的情况下，使用多元高斯模型能省去手动创造参数来捕捉异常的工作
如果m>n的情况下 $\Sigma$ 任然不可逆，检查冗余特征变量

数据纵横

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习课堂笔记（十五）

机器学习课堂笔记（十五）<matlab>% Find closest cluster membersidx = findClosestCentroids(X, centroids)% Essentially, now we have represented the image X as in terms of % the indices in idx. % We can now recover
复制链接

扫一扫