Machine Learning Week8 Unsupervised Learning and ex7

最新推荐文章于 2024-06-30 08:09:31 发布

Walker0319

最新推荐文章于 2024-06-30 08:09:31 发布

阅读量127

点赞数

分类专栏：笔记 ML

本文链接：https://blog.csdn.net/weixin_57901967/article/details/119532008

版权

笔记同时被 2 个专栏收录

33 篇文章 0 订阅

订阅专栏

9 篇文章 0 订阅

订阅专栏

1 Clustering

1.1 Introduction to Unsupervised Learning

UNSL uses an unlabeled training set. We don’t have the vector of expected results, and we only have a dataset of features to find structure. UNSL is good for:

Market segmentation
Social network analysis
Orgnaization computer clusters
Astronomical data analysis

1.2 K-Means Algorithm

Initialize K points called centroids randomly.
Assign all examples into one of each of K groups which the example is closest to.
Move each centroid to the average center of all the examples inside the group
Re-run step 2 and 3 until we have found our clusters

Main variables
K: number of clusters
Training set: x(1), x(2), … x(m), #no y(i)
n: number fo features
x(i): n*1
Note: not use x0=1 convention

Randomly initialize K cluster centroids mu(1), mu(2), ..., mu(K)
Repeat:
	for i = 1 to m:
	c(i):= index (from 1 to K) of cluster centroid closest to x(i) #assign cluster step
	for k = 1 to K:
	mu(k):= average (mean) of points assigned to cluster k #move centroid step

assign cluster step
c(i) = argmin( || x(i) - mu(k)|| ^ 2)
move centroid step
mu(k) = (1/n) * (x(k1) + x(k2) + … + x(kn)) #x(ki) are training examples assigned to group k
if no example is assigned to a cluster centroid, we can randomly re-initialize that centroid to a new point or eliminate that cluster group.

K-means can still be useful on non-seperated clusters, eg T-shirts is S,M and L size.

1.3 Optimization Ojective

Cost function
J(c(1), …, c(m), mu1, …, muK) = (1/m) sum(||x(i)-muc(i)||^2)
J is called the distortion of the tranining examples.
Optimization objective
minimize J on c and mu, or minimize the average of the distances of every training example to its centroid.

in the assign cluster step
minimize J with c(1), … , c(m), holding mu(1), … , mu(K) fixed.
in the move centroid step
minimize J with mu(1), …, mu(k)

K-means is not possible for the cost function to increase. It should always descend.

1.4 Recommanded Method for Random Initialization

Have K < m
Randomly pick K training examples
Set mu(1), …, mu(k) to these K training examples
K-means can get stuck in local optima. To decrease the chance, we can run on many different random initializations, especially if K < 10.

1.5 Choosing the Number of Clusters

The elbow method: I got it

2 Dimensionality Reduction

2.1 Motivation of DMRD

Data Compression: 2D to 1D, or 3D to 2D, …, or nD to KD
Visualization: lower dimention can be plotted, and some features can be combined to plot a ouline of the most features.

2.2 Principle Component Analysis(PCA) Problem Formulation

2D(x1, x2) to 1D(z), z is good, z’ is not good
PCA vs. LinReg

2.3 PCA Algorithm

Data processing before PCA

mean normalization
mu(j)= (1/m) sum x(i)
x(j) = x(j) - mu(j)
feature scaling
x(j) = x(j) / s(j)

Compute “covariance matrix”
Compute “eigenvectors” of covariance matrix
Take the first k columns of the U matrix and compute z

2.4 Reconstruction from Compressed Representaion

since z = Ureduce’ * x
so x = Ureduce * z #x is the approximations of the original data

2.5 Choosing the Number of PCA

one way is usiing the following convention:

Compute the average squared projection error
computer the total variation in the data
choose k to be the smallest value such that:

Fortunately we can use svd funciton to ease the process:

2.6 Advice for applying PCA

Speed up Supervised learning
Suppose training set: (x(1), y(1)), (x(2), y(2), …, (x(m), y(m))
Extract all inputs, x(1), x(2), … ,x(m)
Use PCA to reduce dimension as z(1), z(2), …, z(m) (eg: reduce n=10000 to K=1000)
Use New training set: (z(1), y(1)), (z(2), y(2), …, (z(m), y(m))
Bad use of PCA: to prevent overfitting
Because PCA may be ommiting important information of y(i)
Use regularization instead!
When is the right time to use PCA?

First try with the original/raw data x(i) without PCA. Only if that doesn’t do what we want, then implement PCA and consider usinig z(i)

3 ex7

3.1 K-means find closest centroids

% my init code here, runing rusult is ok, but failed to pass grader afer submitting, 
% although ex7.m got the expected answer.
%m = size(X, 1);
%for i = 1:m,
%    xi = X(i,:);
%    clist = zeros(1, size(X,2));
%    // for each given x(of m examples), compute distance from xi to K centroids
%    // keep K result in a clist(vector), then use min to get the index. 
%    for j = 1:K,
%        miuj = centroids(j,:);
%        clist(j) = (xi-miuj) * (xi-miuj)';
%    end;
%    [vmin, indexmin] = min(clist);
%    idx(i) = indexmin;
%end;

%  have to ref some online assignment below, then it passed the grader.
% this method is much cleaner
m = size(X, 1);
for i = 1:m,
    % first computer a value
    minIndex = 1;
    minDist = (X(i,:)-centroids(1,:)) * (X(i,:)-centroids(1,:))';
    for j = 2:K,
        curDist = (X(i,:)-centroids(j,:)) * (X(i,:)-centroids(j,:))';
        if (curDist < minDist),
            minIndex = j;
            minDist = curDist;
        end;
    end;
    idx(i) = minIndex;
end;

Is there matrix method?

3.2 K-means compute centroid means

% in order to use matrix, expend vector idx to matrix, with "1" on the idx number 
Idx_matrix = zeros(m, K);
for i = 1:m,
    Idx_matrix(i,idx(i)) = 1;
end;

% in Idex_matrix' (K * m)
% row1: the number of "1" on this row, stands for the number of  closest to centroid 1;
% row2: the number of "1" on this row, stands for the number of  closest to centroid 2;
% ...
% rowK: the number of "1" on this row, stands for the number of  closest to centroid K;
centroids = Idx_matrix' * X; 
% compute the sum of xi that are closest to centroid k, but not the mean
% so next is to count the number of xi that are closest to centroid k, aka Ck 
for j = 1:K,
    idx_vec(j,1) = sum(Idx_matrix'(j,:)); // count "1" of each row in Idex_matrix
end;

% divide Ck for each miuj, but sum(Matrix) just do the same, so the code is more simple:
centroids = centroids ./ (sum(Idx_matrix))';

3.3 PCA (not fully understand for a long time, maybe later to redo)

3.4 submit ok

在这里插入图片描述

Walker0319

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Machine Learning Week8 Unsupervised Learning and ex7

1 Clustering1.1 Introduction to Unsupervised LearningUNSL uses an unlabeled training set. We don’t have the vector of expected results, and we only have a dataset of features to find structure. UNSL is good for:Market segmentationSocial network analys
复制链接

扫一扫