Principal Component Analysis

An important machine learning method for dimensionality reduction is called Principal Component Analysis. It is a method that uses simple matrix operations from linear algebra and statistics to calculate a projection of the original data into the same number or fewer dimensions.

After this , you will know:

  • The procedure for calculating the Principal Component Analysis and how to choose principal components.
  • How to calculate the Principal Component Analysis from scratch in NumPy
  • How to calculate the Principal Component Analysis for reuse on more data in scikit-learn.

1.1 Tutorial Overview

This tutorial is divided into 3 parts; they are:

  • What is Principal Component Analysis
  • Calculate Principal Component Analysis
  • Principal Component Analysis in scikit-learn

1.2 What is Prinncipal Component Analysis

Principal Component Analysis, or PCA for short, is a method for reducing the dimensionality of data.The PCA method can be described and implemented using the tools of linear algebra.PCA is an operation applied to a dataset.represented by an n x m matrix A that results in a projection of A which we will call B.

 

1.3  Calculate Principal Component Analysis0123060

There is no pca() function in NumPy, but we can easily calculate the Principal Component Analysis step-by-step using NumPy functions. The example below defines a small 3 × 2 matrix, centers the data in the matrix, calculates the covariance matrix of the centered data, and then the eigendecomposition of the covariance matrix. The eigenvectors and eigenvalues are taken as the principal components and singular values and used to project the original data.

# Example of calculating a PCA manually
# principal component ananlysis
from numpy import array
from numpy import mean
from numpy import cov
from numpy.linalg  import eig
# define matrix
A = array([
    [1, 2],
    [3, 4],
    [5, 6]
])
print(A)
# column means
M = mean(A.T, axis=1)
# center columns by substracting column means
C = A - M
# calculate covariance matrix of centered matrix
V = cov(C.T)
# factorize covariance matrix
values,vectors = eig(V)
print(vectors)
print(values)
# project data
P = vectors.T.dot(C.T)
print(P.T)

1.4 Principal Component Analysis in scikit-learn

We can calculate a Principal Component Analysis on a dataset using the PCA() class in the scikit-learn library. The benefit of this approach is that once the projection is calculated, it can be applied to new data again and again quite easily. When creating the class, the number of components can be specified as a parameter. The class is first fit on a dataset by calling the fit() function, and then the original dataset or other data can be projected into a subspace with the chosen number of dimensions by calling the transform() function. Once fit, the singular values and principal components can be accessed on the PCA class via the explained variance and components attributes. The example below demonstrates using this class by first creating an instance, fitting it on a 3 × 2 matrix, accessing the values and vectors of the projection, and transforming the original data.

# Example of calculating a PCA with scikit-learn
# principal component ananlysis with scikit-learn
from numpy import array
from sklearn.decomposition import PCA
# define matrix
A = array([
    [1, 2],
    [3, 4],
    [5, 6]
])
print(A)
# create the transform
pca = PCA(2)
# fit transform
pca.fit(A)
# access values and vectors
print(pca.components_)
print(pca.explained_variance_)
# transform data
B = pca.transform(A)
print(B)

Running the example first prints the 3 × 2 data matrix, then the principal components and values, followed by the projection of the original matrix. We can see, that with some very minor floating point rounding that we achieve the same principal components, singular values, and projection as in the previous example.

1.5 Summary

In this tutorial , you discovered the Principal Component Analysis machine learning method for dimensionality reduction. Specifically, you learned :

  • The procedure for calculating the Principal Component Analysis and how to choose principal components.
  • How to calculate the Principal Component Analysis from scratch in NumPy.
  • How to calculate the Principal Component Analysis for reuse on more data in scikit-learn
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值