Machine learning for OpenCV 学习笔记 day2

第三部分:算法实现

1.K近邻法(K-NN)

工作原理:存在一个样本数据集合,也称作训练样本集,并且样本中每个数据都存在标签,输入没有标签的新数据后,将新数据的每个特征与样本集中数据对应的特征作比较,然后算法提取样本集中特征最相似数据(最近邻)的分类标签。

我们使用OpenCV中的 cv2.ml.KNearest_create函数就可以实现这个算法,实现该算法一般遵循以下步骤:

生成训练数据——>选取K个目标点——>分别找到离这K个目标点最近的近邻点——>将这些近邻点注上标签——>输出结果

所以首先我们要先生成训练数据:

import numpy as np
import cv2
import matplotlib.pyplot as plt
plt.style.use('ggplot')

np.random.seed(42)
# 0 - 100 之间随机生成2个整数
single_data_point = np.random.randint(0,100,2)
print(single_data_point)
# 生成数据标签,此处标签是2个
single_label = np.random.randint(0,2)
print(single_label)

先加载我们要用的模块,为了测试,我们先随机生成2个0-100的整数,并且生成它们的标签,0或者是1.

# Generate the training data
def generate_data(num_samples, num_features=2):
    """Randomly generates a number of data points"""
    data_size = (num_samples,num_features)
    train_data = np.random.randint(0,100,size=data_size)
    labels_size=(num_samples,1)
    labels= np.random.randint(0,2,size=labels_size)
    return train_data.astype(np.float32),labels

train_data, labels = generate_data(11)
#print(train_data)
plt.plot(train_data[:,0],train_data[:,1],'sb')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
运行以上程序可以得到:

输出点的坐标为[51,92],标签为0,并且将标签为0的点都设置成蓝色正方形,最终在坐标中随机的11个点都显示出的结果为下图:


我们将上面的11个点随机分成2中不同标签,分别在坐标轴中显示出来,此外由工作原理可知,我们额外添加一个绿色的圆点作为我们需要分类的点,然后训练我们的数据,并通过学习到的模型对随机生成的新点进行预测,输出结果:

# Visualize the whole data
def plot_data(all_blue,all_red):
    plt.figure(figsize=(10,6))
    plt.scatter(all_blue[:,0], all_blue[:,1], c='b',marker='s',s=180)
    plt.scatter(all_red[:, 0], all_red[:, 1], c='r', marker='^', s=180)
    plt.plot(newcomer[:, 0], newcomer[:, 1], 'go', markersize=14)
    plt.xlabel('x')
    plt.ylabel('y')
    plt.show()

labels.ravel() == 0   %对标签随机分配
blue = train_data[labels.ravel()==0]
red = train_data[labels.ravel()==1]

# Training the classifier
knn = cv2.ml.KNearest_create()
knn.train(train_data, cv2.ml.ROW_SAMPLE, labels)

# Predicting the label of the new data point
newcomer, _ =generate_data(1)  %此处改变需要预测的点的个数
plot_data(blue,red)
ret , results, neighbor , dist = knn.findNearest(newcomer,6) 
print('Predicted label:\t',results)
print('Neighbor\'s label:\t', neighbor)
print('Distance to neighbor:\t',dist)

knn.setDefaultK(6) % 可设置K的个数
print(knn.predict(newcomer))
输出结果为:


    

在使用K-NN算法的时候我们不能事先知道合适的K的个数,最好的方法就是试一系列的K的值直到找到合适的,简单问题还行,到了复杂问题就不适用了。


2.用回归模型去预测连续的输出

我们用波士顿房价的经典例子来说明回归模型的问题。

首先第一步我们还是要用scikit-learn来获得数据集

import numpy as np
import cv2
import  matplotlib.pyplot as plt

from sklearn import datasets
from sklearn import metrics
from sklearn import model_selection
from  sklearn import linear_model

plt.style.use('ggplot')
plt.rcParams.update({'font.size':16})

#download the dataset
boston = datasets.load_boston() %下载数据集
boston数据集总共有506个数据点,每个数据点包含了13个特征,而我们最终要预测的是房价。下载完以后我们开准备训练数据集。

第二步是设定模型:

linreg = linear_model.LinearRegression()
划分训练集和测试集

X_train, X_test, y_train, y_test = model_selection.train_test_split(boston.data, boston.target
                                                                   , test_size = 0.1, random_state=42)

在sklearn中train的函数是fit,而在opencv中也同样适用,所以训练模型,顺便把均方差也算了出来

linreg.fit(X_train,y_train)
metrics.mean_squared_error(y_train,linreg.predict(X_train))
linreg.score(X_train,y_train)

最后让我们测试训练得到的模型:

y_pred = linreg.predict(X_test)
metrics.mean_squared_error(y_test,y_pred)
图形可视化的程序为:

plt.figure(figsize=(10,6))
plt.plot(y_test,linewidth=3,label='ground truth')
plt.plot(y_pred,linewidth=3,label='predicted')
plt.legend(loc='best')
plt.xlabel('test the data')
plt.ylabel('traget vaalue')
plt.show()

# 计算并输出R2分数
plt.figure(figsize=(10,6))
plt.plot(y_test,y_pred,'o')
plt.plot([-10,60],[-10,60],'k--')
plt.axis([-10,60,-10,60])
plt.xlabel('ground truth')
plt.ylabel('predicted')
scorestr = r'R$^2$ = %.3f' % linreg.score(X_test,y_test)
errstr = 'MSE = %.3f' % metrics.mean_squared_error(y_test,y_pred)
plt.text(-5, 50 , scorestr, fontsize= 12)
plt.text(-5, 45 , errstr, fontsize= 12)
plt.show()
以上程序输出结果为:




3.过拟合问题

本节主要通过过拟合问题也会影响线性模型的表现,来引出正规化(regularization)的概念和用法。主要分2大种常见的正则化L1和L2:

L1是将所有权重W的绝对值相加,L2是将所有权重W的平方相加。

代码实现方面,主要就是将第2小节中的模型语句:

linreg = linear_model.LinearRegression()

根据使用的情况改为L1正规化:

 lassoreg = linear_model.Lasso()

或者是L2正规化:

ridgereg = linear_model.RidgeRegression()
其他地方均与上述相似,当使用L1正规化时输出结果为:



而使用L2正规化结果为:

可以看出L2正规化下,模型表现更好。

Machine Learning for OpenCV by Michael Beyeler English | 14 July 2017 | ISBN: 1783980281 | ASIN: B0713QL4T3 | 382 Pages | AZW3 | 7.03 MB Expand your OpenCV knowledge and master key concepts of machine learning using this practical, hands-on guide. About This Book Load, store, edit, and visualize data using OpenCV and Python Grasp the fundamental concepts of classification, regression, and clustering Understand, perform, and experiment with machine learning techniques using this easy-to-follow guide Evaluate, compare, and choose the right algorithm for any task Who This Book Is For This book targets Python programmers who are already familiar with OpenCV; this book will give you the tools and understanding required to build your own machine learning systems, tailored to practical real-world tasks. What You Will Learn Explore and make effective use of OpenCV's machine learning module Learn deep learning for computer vision with Python Master linear regression and regularization techniques Classify objects such as flower species, handwritten digits, and pedestrians Explore the effective use of support vector machines, boosted decision trees, and random forests Get acquainted with neural networks and Deep Learning to address real-world problems Discover hidden structures in your data using k-means clustering Get to grips with data pre-processing and feature engineering In Detail Machine learning is no longer just a buzzword, it is all around us: from protecting your email, to automatically tagging friends in pictures, to predicting what movies you like. Computer vision is one of today's most exciting application fields of machine learning, with Deep Learning driving innovative systems such as self-driving cars and Google's DeepMind. OpenCV lies at the intersection of these topics, providing a comprehensive open-source library for classic as well as state-of-the-art computer vision and machine learning algorithms. In combination with Python Anaconda, you will have access to all the open-source computing libraries you could possibly ask for. Machine learning for OpenCV begins by introducing you to the essential concepts of statistical learning, such as classification and regression. Once all the basics are covered, you will start exploring various algorithms such as decision trees, support vector machines, and Bayesian networks, and learn how to combine them with other OpenCV functionality. As the book progresses, so will your machine learning skills, until you are ready to take on today's hottest topic in the field: Deep Learning. By the end of this book, you will be ready to take on your own machine learning problems, either by building on the existing source code or developing your own algorithm from scratch! Style and approach OpenCV machine learning connects the fundamental theoretical principles behind machine learning to their practical applications in a way that focuses on asking and answering the right questions. This book walks you through the key elements of OpenCV and its powerful machine learning classes, while demonstrating how to get to grips with a range of models.
Chapter 1, A Taste of Machine Learning, will gently introduce you to the different subfields of machine learning, and explain how to install OpenCV and other essential tools in the Python Anaconda environment. Chapter 2, Working with Data in OpenCV and Python, will show you what a typical machine learning workflow looks like, and where data comes in to play. I will explain the difference between training and test data, and show you how to load, store, manipulate, and visualize data with OpenCV and Python. Chapter 3, First Steps in Supervised Learning, will introduce you to the topic of supervised learning by reviewing some core concepts, such as classification and regression. You will learn how to implement a simple machine learning algorithm in OpenCV, how to make predictions about the data, and how to evaluate your model. Chapter 4, Representing Data and Engineering Features, will teach you how to get a feel for some common and well-known machine learning datasets and how to extract the interesting stuff from your raw data. Chapter 5, Using Decision Trees to Make a Medical Diagnosis, will show you how to build decision trees in OpenCV, and use them in a variety of classification and regression problems. Chapter 6, Detecting Pedestrians with Support Vector Machines, will explain how to build support vector machines in OpenCV, and how to apply them to detect pedestrians in images. Chapter 7, Implementing a Spam Filter with Bayesian Learning, will introduce you to probability theory, and show you how you can use Bayesian inference to classify emails as spam or not. Chapter 8, Discovering Hidden Structures with Unsupervised Learning, will talk about unsupervised learning algorithms such as k-means clustering and Expectation-Maximization, and show you how they can be used to extract hidden structures in simple, unlabeled datasets. Chapter 9, Using Deep Learning to Classify Handwritten Digits, will introduce you to the exciting field of deep learning. Starting with the perceptron and multi-layer perceptrons, you will learn how to build deep neural networks in order to classify handwritten digits from the extensive MNIST database. Chapter 10, Combining Different Algorithms into an Ensemble, will show you how to effectively combine multiple algorithms into an ensemble in order to overcome the weaknesses of individual learners, resulting in more accurate and reliable predictions. Chapter 11, Selecting the Right Model with Hyper-Parameter Tuning, will introduce you to the concept of model selection, which allows you to compare different machine learning algorithms in order to select the right tool for the task at hand. Chapter 12, Wrapping Up, will conclude the book by giving you some useful tips on how to approach future machine learning problems on your own, and where to find information on more advanced topics.
Machine learning is no longer just a buzzword, it is all around us: from protecting your email, to automatically tagging friends in pictures, to predicting what movies you like. As a subfield of data science, machine learning enables computers to learn through experience: to make predictions about the future using collected data from the past. And the amount of data to be analyzed is enormous! Current estimates put the daily amount of produced data at 2.5 exabytes (or roughly 1 billion gigabytes). Can you believe it? This would be enough data to fill up 10 million blu-ray discs, or amount to 90 years of HD video. In order to deal with this vast amount of data, companies such as Google, Amazon, Microsoft, and Facebook have been heavily investing in the development of data science platforms that allow us to benefit from machine learning wherever we go—scaling from your mobile phone application all the way to supercomputers connected through the cloud. In other words: this is the time to invest in machine learning. And if it is your wish to become a machine learning practitioner, too—then this book is for you! But fret not: your application does not need to be as large-scale or influential as the above examples in order to benefit from machine learning. Everyone starts small. Thus, the first step of this book is to introduce you to the essential concepts of statistical learning, such as classification and regression, with the help of simple and intuitive examples. If you have already studied machine learning theory in detail, this book will show you how to put your knowledge into practice. Oh, and don't worry if you are completely new to the field of machine learning—all you need is the willingness to learn. Once we have covered all the basic concepts, we will start exploring various algorithms such as decision trees, support vector machines, and Bayesian networks, and learn how to combine them with other OpenCV functionality. Along the way, you will learn how to understand the task by understanding the data and how to build fully functioning machine learning pipelines.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值