Hands-On Machine Learning with Scikit-Learn & TensorFlow Exercise Q&A Chapter08

最新推荐文章于 2020-07-08 23:54:48 发布

Leonardo Liu

最新推荐文章于 2020-07-08 23:54:48 发布

阅读量493

点赞数

分类专栏： Python 机器学习 Scikit-Learn Hands-On ML with sklearn & TensorFlow Exercise Q&A 文章标签： Machine Learning HandsOn Dimensionality Reduction

本文链接：https://blog.csdn.net/leowinbow/article/details/88749061

版权

本文介绍了维度约简的主要动机，如加速训练、数据可视化和节省空间，同时也讨论了其可能带来的信息丢失、计算成本增加和解释难度增大等缺点。解释了维度灾难的概念，即高维空间中出现的问题。PCA作为常见的降维方法，可以在非线性数据集中使用，但可能会损失大量信息。PCA的逆操作通常不可能完美还原，因为降维过程会丢失信息。根据数据集，PCA可以将1000维数据降至5%方差解释率时的任意维度。不同场景下，可以选择普通PCA、增量PCA、随机PCA或核PCA。评估降维算法性能的方法是通过重建误差，而串联两种不同的降维算法有时能以更短的时间达到相似的效果。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Q1. What are the main motivations for reducing a dataset's dimensionality? What are the main drawbacks?

A1:

Motivations:

To speed up a subsequent training algorithm.
To visualize the data and gain insights on the most important features.
Simply to save space like compression.

Drawbacks:

Some information is lost, possibly degrading the performance of subsequent training algorithms.
It can be computationally intensive.
It adds some complexity to your Machine Learning pipelines.
Transformed features are often hard to interpret or reconstruct.

Q2. What is the curse of dimensionality?

A2: The curse of dimensionality is the fact that many problems that do not exist in low-dimensional space arise in high-dimensional space.

Q3. Once a dataset's dimensionality has been reduced, is it possible to reverse the operation? If so, how? If not,