《Hands-On Machine Learning with Scikit-Learn and TensorFlow》 学习笔记第六篇----机器学习中的降维方法

本文介绍了机器学习中的降维技术,包括降维的意义、投影与流形学习概念。详细讲解了PCA、增量PCA、Kernal PCA和LLE等方法,其中PCA适用于线性数据,LLE适合非线性数据,而增量PCA则解决了大型数据集的问题。降维有助于提高训练效率,但可能损失信息。
摘要由CSDN通过智能技术生成

降维的意义

许多机器学习问题涉及训练实例的几千甚至上百万个特征,这不仅导致训练非常缓慢, 也让我们更加难以找到好的解决方案。 这个问题通常被称为维度的诅咒,所以有效的减少特征的数量会提高效率,加快训练。降维对于数据可视化也是非常有用的。 将维度降到两个(或三个) , 就可以在图形上绘制出高维训练集,所以降维是必要的。

综合上述,在训练模型之前降低训练集的维度,肯定可以加快训练速度,但这并不总是会导致更好或更简单的解决方案,它取决于数据集。

降维不可逆转,会损失一定的信息量

数据降维方法

投影

高维空间的所有训练实例实际上(或近似于)受一个低得多的低维子空间所影响,所以可以通过投影来改变特征空间。但在在许多情况下, 子空间可能会弯曲或转动,此时投影不是一个好的选择,可以选择流形学习。

流形学习

2D流形就是一个能够在更高维空间里面弯曲和扭转的2D形状。 更概括地说, d维流形
就是n(其中, d<n) 维空间的一部分, 局部类似于一个d维超平面。

许多降维算法是通过对训练实例进行流形建模来实现的, 这被称为流形学习。 它依赖于流形假设, 也称为流形假说, 认为大多数现实世界的高维度数据集存在一个低维度的流形来重新表示。

数据降维主要技术

PCA

PCA可以用来对线性数据进行显著降维,属于一种线性、 非监督、 全局的降维算法, 即便是高度非线性的数据集, 因为它至少可以消除无用的维度。 但是如果不存在无用的维度 , 那么使用PCA降维将会损失太多信息。
PCA假设数据集围绕原点集中,将数据投影到新的坐标轴,选择保留最大差异性(方差最大)的轴即可,新的第i条轴的单位向量就叫作第i个主成分。
两种方法

  • 基于SVD分解协方差矩阵实现PCA算法
  • 基于特征值分解协方差矩阵实现PCA算法

具体详见PCA部分

sklearn实现PCA,并求得主成分
参数:

  • n_components代表新的特征空间维度
  • whiten判断是否对降维后的数据的每个特征进行归一化,默认为False
  • svd_solver即指定奇异值分解SVD的方法,有4个可以选择的值:{‘auto’, ‘full’, ‘arpack’, ‘randomized’}。randomized一般适用于数据量大,数据维度多同时
When most people hearMachine Learning,” they picture a robot: a dependable butler or a deadly Terminator depending on who you ask. But Machine Learning is not just a futuristic fantasy, it’s already here. In fact, it has been around for decades in some specialized applications, such as Optical Character Recognition (OCR). But the first ML application that really became mainstream, improving the lives of hundreds of millions of people, took over the world back in the 1990s: it was the spam filter. Not exactly a self-aware Skynet, but it does technically qualify as Machine Learning (it has actually learned so well that you seldom need to flag an email as spam anymore). It was followed by hundreds of ML applications that now quietly power hundreds of products and features that you use regularly, from better recommendations to voice search. Where does Machine Learning start and where does it end? What exactly does it mean for a machine to learn something? If I download a copy of Wikipedia, has my computer really “learned” something? Is it suddenly smarter? In this chapter we will start by clarifying what Machine Learning is and why you may want to use it. Then, before we set out to explore the Machine Learning continent, we will take a look at the map and learn about the main regions and the most notable landmarks: supervised versus unsupervised learning, online versus batch learning, instance-based versus model-based learning. Then we will look at the workflow of a typical ML project, discuss the main challenges you may face, and cover how to evaluate and fine-tune a Machine Learning system. This chapter introduces a lot of fundamental concepts (and jargon) that every data scientist should know by heart. It will be a high-level overview (the only chapter without much code), all rather simple, but you should make sure everything is crystal-clear to you before continuing to the rest of the book. So grab a coffee and let’s get started!
When most people hearMachine Learning,” they picture a robot: a dependable butler or a deadly Terminator depending on who you ask. But Machine Learning is not just a futuristic fantasy, it’s already here. In fact, it has been around for decades in some specialized applications, such as Optical Character Recognition (OCR). But the first ML application that really became mainstream, improving the lives of hundreds of millions of people, took over the world back in the 1990s: it was the spam filter. Not exactly a self-aware Skynet, but it does technically qualify as Machine Learning (it has actually learned so well that you seldom need to flag an email as spam anymore). It was followed by hundreds of ML applications that now quietly power hundreds of products and features that you use regularly, from better recommendations to voice search. Where does Machine Learning start and where does it end? What exactly does it mean for a machine to learn something? If I download a copy of Wikipedia, has my computer really “learned” something? Is it suddenly smarter? In this chapter we will start by clarifying what Machine Learning is and why you may want to use it. Then, before we set out to explore the Machine Learning continent, we will take a look at the map and learn about the main regions and the most notable landmarks: supervised versus unsupervised learning, online versus batch learning, instance-based versus model-based learning. Then we will look at the workflow of a typical ML project, discuss the main challenges you may face, and cover how to evaluate and fine-tune a Machine Learning system. This chapter introduces a lot of fundamental concepts (and jargon) that every data scientist should know by heart. It will be a high-level overview (the only chapter without much code), all rather simple, but you should make sure everything is crystal-clear to you before continuing to the rest of the book. So grab a coffee and let’s get started!
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值