机器学习 数据缩放_机器学习的数据缩放基本指南

机器学习 数据缩放

It’s possible that you will come across datasets with lots of numerical noise built-in, such as variance or differently-scaled data, so a good preprocessing is a must before even thinking about machine learning. A good preprocessing solution for this type of problem is often referred to as standardization.

您可能会遇到带有大量内置数字噪声的数据集,例如方差或不同比例的数据,因此,在考虑机器学习之前,必须进行良好的预处理。 针对此类问题的良好预处理解决方案通常称为标准化

Standardization is a preprocessing method used to transform continuous data to make it look normally distributed. In scikit-learn this is often a necessary step because many models assume that the data you are training on is normally distributed, and if it isn't, your risk biasing your model.

标准化是一种预处理方法,用于转换连续数据以使其看起来呈正态分布。 在scikit-learn这通常是必要的步骤,因为许多模型都假设您正在训练的数据是正态分布的,如果不是,则可能会使模型存在风险。

You can standardize your data in different ways, and in this article, we’re going to talk about the popular data scaling method — data scaling. Or standard scaling to be more precise.

您可以通过不同的方式标准化数据,在本文中,我们将讨论流行的数据缩放方法- 数据缩放。 或使用标准比例缩放来更精确。

It’s also important to note that standardization is a preprocessing method applied to continuous, numerical data, and there are a few different scenarios in which you want to use it:

还需要注意的是, 标准化是一种应用于连续数值数据的预处理方法,在几种不同的情况下,您都可以使用它:

  1. When working with any kind of model that uses a linear distance metric or operates on a linear space — KNN, linear regression, K-means

    当使用任何使用线性距离度量或在线性空间上运行的模型时-KNN,线性回归,K均值
  2. When a feature or features in your dataset have high variance — this could bias a model that assumes the data is normally distributed, if a feature in has a variance that’s an order
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值