Machine learning for openCV 学习笔记 day1


第一部分:基础内容介绍

在OpenCV中,图片的输入是32-bit的浮点型数值(0-1之间),或者是8-bit的整型数值(0-255之间)。

1.Numpy

我们使用功能强大的Numpy工具包去进行矩阵的相关操作

import numpy as np

2. scikit-learn

Python中我们使用scikit-learn来下载一些外网的数据库。我们也可以直接在 www.mldata.org 上面下载数据库

from sklearn import datasets

mnist = datasets.fetch_mldata('MNIST original')

3.Matplotlib

Matplotlib是基于Numpy矩阵搭建的多平台数据可视化库,我们通过Matplotlib来观察我们的数据。通过先加载matplotlib,然后再从中加载matplot.pyplot,我们常用的是matplotlib.pyplot去画图。

import matplotlib as mpl
import matploblib.pyplot as plt
通过linspace()函数构建一个坐标系,linspace(0,,10,100)表示坐标范围为0-10,取函数100个点

import numpy as np
x=np.linspace(0,10,100)
plt.plot(x,np.sin(x))
#保存图像:
plt.savefig('name')

知识点总结:

Numpy进行数据操作,scikit-learn进行数据获取,Matplotlib对数据可视化

import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
digits = datasets.load_digits()
print(digits.data.shape)
print(digits.images.shape)
plt.figure(figsize=(14,4))
for image_index in range(10):
    subplot_index=image_index+1
    plt.subplot(2,5,subplot_index)
    plt.imshow(digits.images[image_index,:,:],cmap='gray')
plt.show() #我使用的是Pycharm,不加plt.show()是不能正常显示出图像的
结果就是:


第二部分:监督学习

首先在OpenCV中设置一个机器学习的模型遵循以下步骤:

 初始化——>设置参数——>训练模型——>预测标签——>模型打分

而我们模型得分分为:

a) 准确率(accuracy)

b) 精准率(precision)

c)召回率(recall)

1.对准确率的理解

import numpy as np
'''random.seed(42)函数限定了随机数种子为42,不同人编辑的程序,随机数生成的肯定不一样若限定的随机数种子相同,则输出
的随机数也是相同的,方便得到一样的结果'''
np.random.seed(42)  
y_true = np.random.randint(0,2,size=5)
print(y_true)
y_pred=np.ones(5,dtype=np.int32)
print(y_pred)
print(np.sum(y_pred==y_true)/len(y_true))
from sklearn import metrics
print(metrics.accuracy_score(y_pred,y_true))

如上,当我们随机生成一个[0,1,0,0,0]的真值矩阵,而预测矩阵为[1,1,1,1,1],准确率就为0.2,我们可以用sklearn中的metrics模块来直接估算模型最终的准确率,运行以后可以看出,结果均为0.2。


2.对精准率和召回率的理解


(图片来自维基百科)

import numpy as np
#限定输出的随机数
np.random.seed(42) 
y_ture=np.random.randint(0,2,size=5)
y_pred=np.ones(5,dtype=np.int32)
truly_a_positive = (y_ture == 1)
predicted_a_positive = (y_pred == 1)
# You though it was 1, and it actually was a 1
true_positive = np.sum(predicted_a_positive*truly_a_positive)
print(true_positive)
# You though it was 1, and it actually was a 0
false_positive = np.sum((y_pred == 1)*(y_ture == 0))
print(false_positive)
# You though it was 0, and it actually was a 1
false_negative = np.sum((y_pred == 0)*(y_ture == 1))
print(false_negative)
# You though it was 0, and it actually was a 0
true_negative = np.sum((y_pred == 0)*(y_ture == 0))
print(true_negative)
#To make sure we did right
accuracy = np.sum(true_positive+true_negative)/len(y_ture)
print(accuracy)
# Precision
precision = np.sum(true_positive)/np.sum(true_negative+true_positive)
print(precision)
# Recall
recall = true_positive/(true_positive+false_negative)
print(recall)
from sklearn import metrics
print(metrics.precision_score(y_ture,y_pred))
以上输出依次为:

3.回归问题

以上1和2点是针对于分类问题,分类问题的特点是输出都是离散的,而回归(regression)问题输出则是连续的,这时上面提到的a,b,c点就不起作用了。我们用sklearn中新的平方差模块(mean_squared_error)和 R2 score (r2_score)来进行模型打分。以sin(x)这样一个连续函数距离说明:

import numpy as np
x=np.linspace(0,10,100)
y_ture = np.sin(x) + np.random.rand(x.size) - 0.5
y_pred = np.sin(x)
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.figure(figsize=(10,6))
plt.plot(x,y_pred, linewidth=4, label = 'model')
plt.plot(x,y_ture,'o',label = 'data')
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='lower left')
plt.show()
结果输出为:



(因为x是随机取的,所以下面算出的方差,以及图像都是不一样的)

1.为了衡量我们模型预测的精准性,我们需要计算模型的均方误差(mean square error)有两种方法:

方法一:

mse = np.mean((y_ture-y_pred)**2)
print(mse)

方法二:

from sklearn import metrics
# verify our math
print(metrics.mean_squared_error(y_pred,y_ture))

输出结果:



可以看见计算结果是一样的。

2. 第二种常见的方法就是计算数据的方差来观察预测数据与真实数据的离散程度。我们也是使用常规计算和scikit-learn工具包两种方法进行验证。

方法一:

fvu = np.var(y_ture - y_pred) / np.var(y_ture)
print(fvu)
print(1.0-fvu)

方法二:

# verify our math
print(metrics.explained_variance_score(y_ture,y_pred))

输出结果:


此处我的输出结果方差是:0.8420....

3.R2分数

方法一:

r2 = 1.0 - mse / np.var(y_ture)
print(r2)

方法二:

print(metrics.r2_score(y_ture,np.mean(y_ture)*np.ones_like(y_ture)))


输出结果R2分数如下:



方法一、二输出的均是:0.8394...

当数据拟合度越高,R2分数越接近1。数据是随机挑选,所以结果和我的不一致也是正常的。

Chapter 1, A Taste of Machine Learning, will gently introduce you to the different subfields of machine learning, and explain how to install OpenCV and other essential tools in the Python Anaconda environment. Chapter 2, Working with Data in OpenCV and Python, will show you what a typical machine learning workflow looks like, and where data comes in to play. I will explain the difference between training and test data, and show you how to load, store, manipulate, and visualize data with OpenCV and Python. Chapter 3, First Steps in Supervised Learning, will introduce you to the topic of supervised learning by reviewing some core concepts, such as classification and regression. You will learn how to implement a simple machine learning algorithm in OpenCV, how to make predictions about the data, and how to evaluate your model. Chapter 4, Representing Data and Engineering Features, will teach you how to get a feel for some common and well-known machine learning datasets and how to extract the interesting stuff from your raw data. Chapter 5, Using Decision Trees to Make a Medical Diagnosis, will show you how to build decision trees in OpenCV, and use them in a variety of classification and regression problems. Chapter 6, Detecting Pedestrians with Support Vector Machines, will explain how to build support vector machines in OpenCV, and how to apply them to detect pedestrians in images. Chapter 7, Implementing a Spam Filter with Bayesian Learning, will introduce you to probability theory, and show you how you can use Bayesian inference to classify emails as spam or not. Chapter 8, Discovering Hidden Structures with Unsupervised Learning, will talk about unsupervised learning algorithms such as k-means clustering and Expectation-Maximization, and show you how they can be used to extract hidden structures in simple, unlabeled datasets. Chapter 9, Using Deep Learning to Classify Handwritten Digits, will introduce you to the exciting field of deep learning. Starting with the perceptron and multi-layer perceptrons, you will learn how to build deep neural networks in order to classify handwritten digits from the extensive MNIST database. Chapter 10, Combining Different Algorithms into an Ensemble, will show you how to effectively combine multiple algorithms into an ensemble in order to overcome the weaknesses of individual learners, resulting in more accurate and reliable predictions. Chapter 11, Selecting the Right Model with Hyper-Parameter Tuning, will introduce you to the concept of model selection, which allows you to compare different machine learning algorithms in order to select the right tool for the task at hand. Chapter 12, Wrapping Up, will conclude the book by giving you some useful tips on how to approach future machine learning problems on your own, and where to find information on more advanced topics.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值