机器学习数据缩放_机器学习的数据缩放基本指南

最新推荐文章于 2023-05-01 11:18:34 发布

weixin_26750481

最新推荐文章于 2023-05-01 11:18:34 发布

阅读量982

点赞数

文章标签：机器学习 python 人工智能大数据 java

原文链接：https://towardsdatascience.com/data-scaling-for-machine-learning-the-essential-guide-d6cfda3e3d6b

版权

机器学习数据缩放

It’s possible that you will come across datasets with lots of numerical noise built-in, such as variance or differently-scaled data, so a good preprocessing is a must before even thinking about machine learning. A good preprocessing solution for this type of problem is often referred to as standardization.

您可能会遇到带有大量内置数字噪声的数据集，例如方差或不同比例的数据，因此，在考虑机器学习之前，必须进行良好的预处理。针对此类问题的良好预处理解决方案通常称为标准化 。

Standardization is a preprocessing method used to transform continuous data to make it look normally distributed. In scikit-learn this is often a necessary step because many models assume that the data you are training on is normally distributed, and if it isn't, your risk biasing your model.

标准化是一种预处理方法，用于转换连续数据以使其看起来呈正态分布。在scikit-learn这通常是必要的步骤，因为许多模型都假设您正在训练的数据是正态分布的，如果不是，则可能会使模型存在风险。

You can standardize your data in different ways, and in this article, we’re going to talk about the popular data scaling method — data scaling. Or standard scaling to be more precise.

您可以通过不同的方式标准化数据，在本文中，我们将讨论流行的数据缩放方法- 数据缩放。 或使用标准比例缩放来更精确。

It’s also important to note that standardization is a preprocessing method applied to continuous, numerical data, and there are a few different scenarios in which you want to use it:

还需要注意的是， 标准化是一种应用于连续数值数据的预处理方法，在几种不同的情况下，您都可以使用它：

When working with any kind of model that uses a linear distance metric or operates on a linear space — KNN, linear regression, K-means
当使用任何使用线性距离度量或在线性空间上运行的模型时-KNN，线性回归，K均值
When a feature or features in your dataset have high variance — this could bias a model that assumes the data is normally distributed, if a feature in has a variance that’s an order

最低0.47元/天解锁文章

weixin_26750481

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
机器学习数据缩放_机器学习的数据缩放基本指南

机器学习数据缩放It’s possible that you will come across datasets with lots of numerical noise built-in, such as variance or differently-scaled data, so a good preprocessing is a must before even thinking abo...
复制链接

扫一扫