pca主成分分析降维
An important task to handle dataset with more number of features/dimensions.
处理具有更多特征/维数的数据集的一项重要任务。
Data keeps on increasing every second and it has become crucial to interpreting insights from this data to solve problems. And, as features of data increases so dimensions of the dataset increases. Eventually, a Machine Learning model needs to handle the complex data resulting in more complexities. On the other hand, there are a lot of features that are futile for the model or are correlated with others. Principal Component Analysis (PCA) is the way out to reduce dimensions and deduct correlated features from the dataset.
数据以每秒的速度增长,因此解释这些数据的见解以解决问题变得至关重要。 并且,随着数据特征的增加,数据集的维度也随之增加。 最终,机器学习模型需要处理复杂的数据,从而导致更加复杂。 另一方面,许多功能对于模型是徒劳的或与其他功能相关。 主成分分析(PCA)是缩小尺寸并从数据集中扣除相关特征的出路。
The article is divided into the following sections:
本文分为以下几节:
- Definition- PCA 定义-PCA
- Need & Advantages of PCA PCA的需求和优势
- Real-time usage/application实时使用/应用
- Steps to perform PCA —执行PCA的步骤-
- Data standardization 数据标准化
- Computing covariance matrix计算协方差矩阵
- Determining eigenvalues and eigenvectors确定特征值和特征向量
- Computing PCA features计算PCA功能
5. Implementing PCA to MNIST dataset using Python
5.使用Python实现PCA到MNIST数据集
6. Conclusion
六,结论
什么是PCA?(What is PCA?)
Principal Component Analysis(PCA) is a Dimensionality Reduction technique that enables you to identify correlations and patterns in a dataset so that it can be transformed into a dataset of significantly fewer dimensions without loss of any important information.
主成分分析(PCA)是一种降维技术,使您可以识别数据集中的相关性和模式,以便可以将其转换为维度明显更少的数据集,而不会丢失任何重要信息。
需要PCA (Need of PCA)
A dataset with more number of features takes more time for training the model and make data processing and exploratory data analysis(EDA) more convoluted.
具有更多特征的数据集需要花费更多时间来训练模型,并使数据处理和探索性数据分析(EDA)更加复杂。
PCA的优势 (Advantages of PCA)
- Reduces training time.减少培训时间。
- Removes correlated features (removes noise). 删除相关功能(删除噪音)。
- Ease for data exploration (EDA). 易于进行数据探索(EDA)。
- Easy to visualize data (maximum 3D data). 易于可视化数据(最大3D数据)。
实时应用 (Real-time applications)
PCA is used for dimensionality reduction in the domains such as face recognition, computer vision, image compression, image detection, object detection, image classification, etc.
PCA用于在诸如人脸识别,计算机视觉,图像压缩,图像检测,物体检测,图像分类等领域降低尺寸。
执行PCA的步骤: (Steps to perform PCA:)
数据标准化 (Data Standardization)
- Standardization is all about scaling the data in such a way that all the values/variables are in a similar range. Standardization means rescaling data to have a mean of 0 and a