《Machine Learning for OpenCV》学习笔记:数据预处理

本文是《Machine Learning for OpenCV》的学习笔记,主要介绍了特征标准化、特征归一化和特征缩放到特定范围的方法。通过scikit-learn的preprocessing模块,使用scale()函数进行特征标准化,normalize()函数实现特征归一化,并利用MinMaxScaler类将特征缩放到0到1之间。验证方法包括检查标准化和归一化后的数据特性。
摘要由CSDN通过智能技术生成

一.特征标准化

1.1特征标准化:

标准化是指把数据缩放到拥有零均值(均值为0)和单位方差(方差为1)的过程。可以手动标准化我们的数据:每个数据点减去所有数据的平均值  (μ) , 然后除以数据的方差(σ) 。即对于每个特征值,计算(x - μ) / σ 。

1.2scikit-learn代码实现

使用preprocessing模块的scale()函数实现。

1.3验证

一个实现了标准化后的特征矩阵应该每行的均值等于(或接近于),每一行的方差都为1。可以查看矩阵的mean属性和std属性进行验证。

# -*- coding:utf-8 -*-
from sklearn import preprocessing
import numpy as np

X = np.array(
	[[1., -2., 2.], [3., 0., 0.], [0., 1., -1]]
)   # 创建一个三行三列的二维数组
x_scaled = preprocessing.scale(X)
print(x_scaled)
"""
输出结果:
[[-0.26726124 -1.33630621  1.33630621]
 [ 1.33630621  0.26726124 -0.26726124]
 [-1.06904497  1.06904497 -1.06904497]]
"""
# 方差验证
print(x_scaled.mean(axis=0))  # [7.40148683e-17 0.00000000e+00 0.00000000e+00]
print(x_scaled.std())   # 1.0

 

二.特征归一化

归一化是缩放单个样本以使它们拥有单位范数的过程。在scikit-learn中,数据可以使用norma

Machine learning is no longer just a buzzword, it is all around us: from protecting your email, to automatically tagging friends in pictures, to predicting what movies you like. As a subfield of data science, machine learning enables computers to learn through experience: to make predictions about the future using collected data from the past. And the amount of data to be analyzed is enormous! Current estimates put the daily amount of produced data at 2.5 exabytes (or roughly 1 billion gigabytes). Can you believe it? This would be enough data to fill up 10 million blu-ray discs, or amount to 90 years of HD video. In order to deal with this vast amount of data, companies such as Google, Amazon, Microsoft, and Facebook have been heavily investing in the development of data science platforms that allow us to benefit from machine learning wherever we go—scaling from your mobile phone application all the way to supercomputers connected through the cloud. In other words: this is the time to invest in machine learning. And if it is your wish to become a machine learning practitioner, too—then this book is for you! But fret not: your application does not need to be as large-scale or influential as the above examples in order to benefit from machine learning. Everyone starts small. Thus, the first step of this book is to introduce you to the essential concepts of statistical learning, such as classification and regression, with the help of simple and intuitive examples. If you have already studied machine learning theory in detail, this book will show you how to put your knowledge into practice. Oh, and don't worry if you are completely new to the field of machine learning—all you need is the willingness to learn. Once we have covered all the basic concepts, we will start exploring various algorithms such as decision trees, support vector machines, and Bayesian networks, and learn how to combine them with other OpenCV functionality. Along the way, you will learn how to understand the task by understanding the data and how to build fully functioning machine learning pipelines.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值