Machine Learning Week8 Unsupervised Learning and ex7

9 篇文章 0 订阅

1 Clustering

1.1 Introduction to Unsupervised Learning

UNSL uses an unlabeled training set. We don’t have the vector of expected results, and we only have a dataset of features to find structure. UNSL is good for:

  • Market segmentation
  • Social network analysis
  • Orgnaization computer clusters
  • Astronomical data analysis

1.2 K-Means Algorithm

  1. Initialize K points called centroids randomly.
  2. Assign all examples into one of each of K groups which the example is closest to.
  3. Move each centroid to the average center of all the examples inside the group
  4. Re-run step 2 and 3 until we have found our clusters

Main variables
K: number of clusters
Training set: x(1), x(2), … x(m), #no y(i)
n: number fo features
x(i): n*1
Note: not use x0=1 convention

Randomly initialize K cluster centroids mu(1), mu(2), ..., mu(K)
Repeat:
	for i = 1 to m:
	c(i):= index (from 1 to K) of cluster centroid closest to x(i) #assign cluster step
	for k = 1 to K:
	mu(k):= average (mean) of points assigned to cluster k #move centroid step
  • assign cluster step
    c(i) = argmin( || x(i) - mu(k)|| ^ 2)
  • move centroid step
    mu(k) = (1/n) * (x(k1) + x(k2) + … + x(kn)) #x(ki) are training examples assigned to group k
    if no example is assigned to a cluster centroid, we can randomly re-initialize that centroid to a new point or eliminate that cluster group.

K-means can still be useful on non-seperated clusters, eg T-shirts is S,M and L size.

1.3 Optimization Ojective

Cost function
J(c(1), …, c(m), mu1, …, muK) = (1/m) sum(||x(i)-muc(i)||^2)
J is called the distortion of the tranining examples.
Optimization objective
minimize J on c and mu, or minimize the average of the distances of every training example to its centroid.

  • in the assign cluster step
    minimize J with c(1), … , c(m), holding mu(1), … , mu(K) fixed.
  • in the move centroid step
    minimize J with mu(1), …, mu(k)

K-means is not possible for the cost function to increase. It should always descend.

1.4 Recommanded Method for Random Initialization

  1. Have K < m
  2. Randomly pick K training examples
  3. Set mu(1), …, mu(k) to these K training examples
    K-means can get stuck in local optima. To decrease the chance, we can run on many different random initializations, especially if K < 10.

1.5 Choosing the Number of Clusters

The elbow method: I got it

2 Dimensionality Reduction

2.1 Motivation of DMRD

  1. Data Compression: 2D to 1D, or 3D to 2D, …, or nD to KD
  2. Visualization: lower dimention can be plotted, and some features can be combined to plot a ouline of the most features.

2.2 Principle Component Analysis(PCA) Problem Formulation

  1. 2D(x1, x2) to 1D(z), z is good, z’ is not good
    在这里插入图片描述
  2. PCA vs. LinReg
    在这里插入图片描述

2.3 PCA Algorithm

  1. Data processing before PCA
  • mean normalization
    mu(j)= (1/m) sum x(i)
    x(j) = x(j) - mu(j)
  • feature scaling
    x(j) = x(j) / s(j)
  1. Compute “covariance matrix”
    在这里插入图片描述

  2. Compute “eigenvectors” of covariance matrix
    在这里插入图片描述

  3. Take the first k columns of the U matrix and compute z
    在这里插入图片描述

2.4 Reconstruction from Compressed Representaion

since z = Ureduce’ * x
so x = Ureduce * z #x is the approximations of the original data

2.5 Choosing the Number of PCA

one way is usiing the following convention:

  • Compute the average squared projection error
  • computer the total variation in the data
  • choose k to be the smallest value such that:
    在这里插入图片描述
    Fortunately we can use svd funciton to ease the process:
    在这里插入图片描述

2.6 Advice for applying PCA

  1. Speed up Supervised learning
    Suppose training set: (x(1), y(1)), (x(2), y(2), …, (x(m), y(m))
    Extract all inputs, x(1), x(2), … ,x(m)
    Use PCA to reduce dimension as z(1), z(2), …, z(m) (eg: reduce n=10000 to K=1000)
    Use New training set: (z(1), y(1)), (z(2), y(2), …, (z(m), y(m))
  2. Bad use of PCA: to prevent overfitting
    Because PCA may be ommiting important information of y(i)
    Use regularization instead!
  3. When is the right time to use PCA?

First try with the original/raw data x(i) without PCA. Only if that doesn’t do what we want, then implement PCA and consider usinig z(i)

3 ex7

3.1 K-means find closest centroids

% my init code here, runing rusult is ok, but failed to pass grader afer submitting, 
% although ex7.m got the expected answer.
%m = size(X, 1);
%for i = 1:m,
%    xi = X(i,:);
%    clist = zeros(1, size(X,2));
%    // for each given x(of m examples), compute distance from xi to K centroids
%    // keep K result in a clist(vector), then use min to get the index. 
%    for j = 1:K,
%        miuj = centroids(j,:);
%        clist(j) = (xi-miuj) * (xi-miuj)';
%    end;
%    [vmin, indexmin] = min(clist);
%    idx(i) = indexmin;
%end;

%  have to ref some online assignment below, then it passed the grader.
% this method is much cleaner
m = size(X, 1);
for i = 1:m,
    % first computer a value
    minIndex = 1;
    minDist = (X(i,:)-centroids(1,:)) * (X(i,:)-centroids(1,:))';
    for j = 2:K,
        curDist = (X(i,:)-centroids(j,:)) * (X(i,:)-centroids(j,:))';
        if (curDist < minDist),
            minIndex = j;
            minDist = curDist;
        end;
    end;
    idx(i) = minIndex;
end;

Is there matrix method?

3.2 K-means compute centroid means

% in order to use matrix, expend vector idx to matrix, with "1" on the idx number 
Idx_matrix = zeros(m, K);
for i = 1:m,
    Idx_matrix(i,idx(i)) = 1;
end;

% in Idex_matrix' (K * m)
% row1: the number of "1" on this row, stands for the number of  closest to centroid 1;
% row2: the number of "1" on this row, stands for the number of  closest to centroid 2;
% ...
% rowK: the number of "1" on this row, stands for the number of  closest to centroid K;
centroids = Idx_matrix' * X; 
% compute the sum of xi that are closest to centroid k, but not the mean
% so next is to count the number of xi that are closest to centroid k, aka Ck 
for j = 1:K,
    idx_vec(j,1) = sum(Idx_matrix'(j,:)); // count "1" of each row in Idex_matrix
end;

% divide Ck for each miuj, but sum(Matrix) just do the same, so the code is more simple:
centroids = centroids ./ (sum(Idx_matrix))';

3.3 PCA (not fully understand for a long time, maybe later to redo)

3.4 submit ok

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值