Matlab计算机视觉/图像处理工具箱推荐

Matlab计算机视觉/图像处理工具箱推荐

计算机视觉/图像处理研究中经常要用到Matlab,虽然其自带了图像处理和计算机视觉的许多功能,但是术业有专攻,在进行深入的视觉算法研究的时候Matlab的自带功能难免会不够用。本文收集了一些比较优秀的Matlab计算机视觉工具箱,希望能对国内的研究者有所帮助。

VLFeat:著名而常用

项目网站:http://www.vlfeat.org

许可证:BSD

著名的计算机视觉/图像处理开源项目,知名度应该不必OpenCV低太多,曾获ACM Open Source Software Competition 2010一等奖。使用C语言编写,提供C语言和Matlab两种接口。实现了大量计算机视觉算法,包括:

  • 常用图像处理功能,包括颜色空间变换、几何变换(作为Matlab的补充),常用机器学习算法,包括GMM、SVM、KMeans等,常用的图像处理的plot工具。
  • 特征提取,包括 Covariant detectorsHOGSIFT,MSER等。VLFeat提供了一个vl_covdet() 函数作为框架,可以方便的统一所谓“co-variant feature detectors”,包括了DoG, Harris-Affine, Harris-Laplace并且可以提取SIFT或raw patches描述子。
  • 超像素(Superpixel)分割,包括常用的Quick shiftSLIC算法等
  • 高级聚类算法,比如整数KMeans:Integer k-means (IKM)、hierarchical version of integer k-means (HIKM),基于互信息自动判定聚类类数的算法Agglomerative Information Bottleneck (AIB) algorithm等
  • 高维特曾匹配算法,随机KD树Randomized kd-trees

可以在这里查看VLFeat完整的功能列表。

(欢迎访问计算机视觉研究笔记http://cvnote.info或者关注新浪@cvnote

MexOpenCV:让Matlab支持调用的OpenCV

项目网站:http://www.cs.sunysb.edu/~kyamagu/mexopencv/

作者Kota Yamaguchi桑是石溪大学(Stony Brook University)的PhD,早些时候自己搞了一套东西把OpenCV的代码编译成Matlab可用的mex接口,然后这个东西迅速火了。今年夏天这个项目被OpenCV吸收为一个模块,貌似是搞了一个Google Summer of Code(GSoC)的项目,最近(大概是9、10月)已经merge到了OpenCV主包,有兴趣的可以到Github的OpenCV库下的module/matlab去玩一下,应该会在10月份的OpenCV 3 alpha里正式发布。现在OpenCV就同时有了Python和Maltab的binding(好强大)。具体的功能就不细说了,既然是OpenCV的binding,当然是可以使用OpenCV的绝大多数算法了。比如这样:

% load an image (Matlab)

I = imread(‘cameraman.tif’);

% compute the DFT (OpenCV)

If = cv.dft(I, cv.DFT_COMPLEX_OUTPUT);

 

facedetect

Peter Kovesi的工具箱:轻量好用,侧重图像处理

项目网站:http://www.csse.uwa.edu.au/~pk/research/matlabfns/

这位Peter大哥目前在The University of Western Australia工作,他自己写了一套Matlab计算机视觉算法,所谓工具箱其实就是许多m文件的集合,全部Matlab实现,无需编译安装,支持Octave(如果没有Matlab的话,有了这个工具箱也可以在Octave下进行图像处理了)。别看这位大哥单枪匹马,人家的工具箱可是相当有名,研究时候需要哪个Matlab的计算机视觉小功能,直接到他家主页上下几个m文件放在自己文件夹就好了。这个工具箱主要以图像处理算法为主,附带一些三维视觉的基本算法,列一些包括的功能:

可以在网站上看到全部功能的介绍和下载,非常推荐试一下,也可以学到不少算法。

Machine Vision Toolbox:侧重机器视觉、三维视觉

项目网站:http://www.petercorke.com/Machine_Vision_Toolbox.html

许可证:LGPL

以前没有用过这个工具箱,最近发现竟然非常强大,而且和我自己的工作还很相关。这个工具箱侧重机器视觉,作者是另一个Peter,Peter Corke在机器人界很有名,他在2011年写了一本书《Robotics, Vision & Control》介绍了机器视觉相关的颜色、相机模型、三维视觉、控制等研究,并配套这个工具箱。算法包括了大量常用的视觉和图像处理小函数,,这些就不提了,提几个别的工具箱一般没有的功能

  • Bag of words的Matlab实现
  • 各种相机模型的实现,包括普通相机、鱼眼相机、Catadioptric相机模型等等。如果你做机器人视觉、挂在各种广角相机的话,这些模型实现会很有用
  • 自带简单的相机标定功能
  • 对极几何(Epipolar Geomtry)的相关算法函数
  • Plucker坐标的实现,做广义相机模型(Generalized camera model)很有用

Piotr’s Image & Video Matlab Toolbox:侧重物体识别

项目网站:http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html

许可证:Simple BSD

(补充一个工具箱)由UCSD的Piotr Dollar编写,侧重物体识别(Object Recognition)检测相关的特征提取和分类算法。这个工具箱属于专而精的类型,主要就是Dollar的几篇物体检测的论文的相关算法,如果做物体识别相关的研究,应该是很好用的。同时它的图像操作或矩阵操作函数也可以作为Matlab图像处理工具箱的补充,功能主要包括几个模块:

  • channels模块,图像特征提取,包括HOG等,Dollar的研究工作提出了一种Channel Feature的特征1,因此这个channels主要包括了提取这一特征需要的一些基本算法梯度、卷及等基本算法
  • classify模块,一些快速的分类相关算法,包括random ferns, RBF functions, PCA等
  • detector模块,与Channel Feature特征对应的检测算法1
  • filters模块,一些常规的图像滤波器
  • images模块,一些常规的图像、视频操作,有一些很实用的函数
  • matlab模块,一些常规的Matlab函数,包括矩阵计算、显示、变量操作等,很实用
  • videos模块,一些常规的视频操作函数等

1. P. Dollár, Z. Tu, P. Perona and S. Belongie, “Integral Channel Features”, BMVC 2009.

DIPUM Toolbox:经典教材配套

项目地址:http://www.imageprocessingplace.com/DIPUM_Toolbox_2/DIPUM_Toolbox_2.htm

冈萨雷斯著名的图像处理教材《数字图像处理》的配套工具包,主要是书中图像处理算法的实现,名气自然是不必说了,网上可以免费下到加密后的p文件放在Matlab下面用,作为图像处理入门的上手玩具。

MATLAB Functions for Multiple View Geometry:又一个经典教材配套

项目网站:http://www.robots.ox.ac.uk/~vgg/hzbook/code/

许可证:MIT

又是一本大名鼎鼎的教材《计算机视觉中的多图几何》(Multiple View Geometry in Computer Vision),值得所有做三维视觉的研究者好好研究的书,国内很早就翻译了中文版。作者Zisserman提供了部分书中算法的Matlab实现,是深入理解书中理论的非常好的辅助材料。

其他的工具箱

  • DIPImage & DIPLib,提供Matlab和C接口的图像处理功能,比较早,现在估计很少有人用或者知道了吧?
  • Matlab CVPR toolbox,计算机视觉和模式识别相关的Matlab功能,好像没什么人用。
  • 相关领域的工具箱,比如做机器学习的、做Markov随机场的等等,以后有机会写一下。
  • 特定功能的工具箱,比如相机标定工具箱,这个可推荐的还阵挺多,以后有机会写一下。
  • 这个链接里可以找到一些Matlab的开源工具箱。

转自 计算机视觉研究笔记

14

Viola–Jones object detection framework–Rapid Object Detection using a Boosted Cascade of S…

ACCEPTED CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2001

Rapid Object Detection using a Boosted Cascade of Simple

F eatures

  简单特征的优化级联在快速目标检测中的应用

Paul Viola                                                            Michael Jones

viola@merl.com                                                 mjones@crl.dec.com

Mitsubishi Electric Research Labs                                        Compaq CRL               

三菱电气实验室                                                      康柏剑桥研究所

201 Broadway, 8th FL                                            One Cambridge Center

Cambridge, MA 02139                                           Cambridge, MA 02142

Abstract

This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers[6]. The third contribution is a method for combining increasingly more complex classi- fiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more compu- tation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guar- antees that discarded regions are unlikely to contain the ob- ject of interest. In the domain of face detection the system yields detection rates comparable to the best previous sys- tems.  Used in real-time applications, the detector runs at  15 frames per second without resorting to image differenc- ing or skin color detection.

摘要

本文描述了一个视觉目标检测的机器学习法,它能够非常快速地处理图像而且能实现高检测速率。这项工作可分为三个创新性研究成果。第一个是一种新的图像表征说明,称为“积分图”,它允许我们的检测的特征得以很快地计算出来。第二个是一个学习算法,基于Adaboost自适应增强法,可以从一些更大的设置和产量极为有效的分类器中选择出几个关键的视觉特征。第三个成果是一个方法:用一个“级联”的形式不断合并分类器,这样便允许图像的背景区域被很快丢弃,从而将更多的计算放在可能是目标的区域上。这个级联可以视作一个目标特定的注意力集中机制,它不像以前的途径提供统计保障,保证舍掉的地区不太可能包含感兴趣的对象。在人脸检测领域,此系统的检测率比得上之前系统的最佳值。在实时监测的应用中,探测器以每秒15帧速度运行,不采用帧差值或肤色检测的方法。

1 .  Introduction

This paper brings together new algorithms and insights to construct a framework for robust and extremely rapid object detection. This framework is demonstrated on, and in part motivated by, the task of face detection.  Toward this end we have constructed a frontal face detection system which achieves detection and false positive rates which are equiv- alent to the best published results [16, 12, 15, 11, 1]. This face detection system is most clearly distinguished from previous approaches in its ability to detect faces extremely rapidly. Operating on 384 by 288 pixel images, faces are de tected at 15 frames per second on a conventional 700 MHz Intel Pentium III. In other face detection systems, auxiliary information, such as image differences in video sequences, or pixel color in color images, have been used to achieve high frame rates.   Our system achieves high frame rates working only with the information present in a single grey scale image.  These alternative sources of information can also be integrated with our system to achieve even higher frame rates.

1.引言

本文汇集了新的算法和见解,构筑一个鲁棒性良好的极速目标检测框架。这一框架主要是体现人脸检测的任务。为了实现这一目标,我们已经建立了一个正面的人脸检测系统,实现了相当于已公布的最佳结果的检测率和正误视率, [16,12,15,11,1]。这种人脸检测系统区分人脸比以往的方法都要清楚,而且速度很快。通过对384×288像素的图像,硬件环境是常规700 MHz英特尔奔腾III,人脸检测速度达到了每秒15帧。在其它人脸检测系统中,一些辅助信息如视频序列中的图像差异,或在彩色图像中像素的颜色,被用来实现高帧率。而我们的系统仅仅使用一个单一的灰度图像信息实现了高帧速率。上述可供选择的信息来源也可以与我们的系统集成,以获得更高的帧速率。

There are three main contributions of our object detection framework. We will introduce each of these ideas briefly below and then describe them in detail in subsequent sections.

本文的目标检测框架包含三个主要创新性成果。下面将简短介绍这三个概念,之后将分章节对它们一一进行详细描述。

The first contribution of this paper is a new image representation called an integral image that allows for very fast feature evaluation. Motivated in part by the work of Papageorgiou et al. our detection system does not work directly with image intensities [10].  Like these authors we use a set of features which are reminiscent of Haar Basis functions (though we will also use related filters which are more complex than Haar filters). In order to compute these fea- tures very rapidly at many scales we introduce the integral image representation for images. The integral image can be computed from an image using a few operations per pixel. Once computed, any one of these Harr-like features can be computed at any scale or location in constant time.

本文的第一个成果是一个新的图像表征,称为积 分图像 ,允许进行快速特征评估。我们的检测系统不能直接利用图像强度的信息工作[10]。和这些作者一样,我们使用一系列与Haar基本函数相关的特征:(尽管我们也将使用一些更复杂的滤波器)。为了非常迅速地计算多尺度下的这些特性,我们引进了积分图像。在一幅图像中,每个像素使用很少的一些操作,便可以计算得到积分图像。任何一个类Haar特征可以在任何规模或位置上被计算出来,且是在固定时间内。

The second contribution of this paper is a method for constructing a classifier by selecting a small number of im- portant features using AdaBoost [6]. Within any image sub- window the total number of Harr-like features is very large, far larger than the number of pixels. In order to ensure fast classification, the learning process must exclude a large ma- jority of the available features, and focus on a small set of critical features. Motivated by the work of Tieu and Viola, feature selection is achieved through a simple modification of the AdaBoost procedure: the weak learner is constrained so that each weak classifier returned can depend on only a single feature [2].  As a result each stage of the boosting process, which selects a new weak classifier, can be viewed as a feature selection process. AdaBoost provides an effec- tive learning algorithm and strong bounds on generalization performance [13, 9, 10].

本文的第二个成果是通过使用AdaBoost算法选择数个重要的特征构建一个分类器[6]。在任何图像子窗口里的类Haar特征的数目非常大,远远超过了像素数目。为了确保快速分类,在学习过程中必须剔除的大部分可用的特征,关注一小部分关键特征。选拔工作是通过一个AdaBoost的程序简单修改:约束弱学习者,使每一个弱分类器返回时仅可依赖1个特征[2]。因此,每个改善过程的阶段,即选择一个新的弱分类器的过程,可以作为一个特征选择过程。 AdaBoost算法显示了一个有效的学习算法和良好的泛化性能[13,9,10]。

The third major contribution of this paper is a method for combining successively more complex classifiers in a cascade structure which dramatically increases the speed of the detector by focusing attention on promising regions of the image. The notion behind focus of attention approaches is that it is often possible to rapidly determine where in an image an object might occur [17, 8, 1]. More complex pro- cessing is reserved only for these promising regions.  The key measure of such an approach is the “false negative” rate of the attentional process.  It must be the case that all, or almost all, object instances are selected by the attentional filter.

本文的第三个主要成果是在一个在级联结构中连续结合更复杂的分类器的方法,通过将注意力集中到图像中有希望的地区,来大大提高了探测器的速度。在集中注意力的方法背后的概念是,它往往能够迅速确定在图像中的一个对象可能会出现在哪里[17,8,1]。更复杂的处理仅仅是为这些有希望的地区所保留。衡量这种做法的关键是注意力过程的“负误视”(在模式识别中,将属于物体标注为不属于物体)的概率。在几乎所有的实例中,对象实例必须是由注意力滤波器选择。

We will describe a process for training an extremely sim- ple and efficient classifier which can be used as a “super- vised” focus of attention operator.   The term supervised refers to the fact that the attentional operator is trained to detect examples of a particular class. In the domain of face detection it is possible to achieve fewer than 1% false neg- atives and 40% false positives using a classifier constructed from two Harr-like features.  The effect of this filter is to reduce by over one half the number of locations where the final detector must be evaluated.

我们将描述一个过程:训练一个非常简单又高效的分类器,用来作为注意力操作的“监督”中心。术语“监督”是指:注意力操作被训练用来监测特定分类的例子。在人脸检测领域,使用一个由两个类Haar特征构建的分类器,有可能达到1%不到的负误视和40%正误视。该滤波器的作用是减少超过一半的最终检测器必须进行评估的地方。

Those sub-windows which are not rejected by the initial classifier are processed by a sequence of classifiers, each slightly more complex than the last. If any classifier rejects the sub-window, no further processing is performed.  The structure of the cascaded detection process is essentially that of a degenerate decision tree, and as such is related to the work of Geman and colleagues [1, 4].

这些没有被最初的分类器排除的子窗口,由接下来的一系列分类处理,每个分类器都比其前一个稍有复杂。如果某个子窗口被任一个分类器排除,那它将不会被进一步处理。在检测过程的级联结构基本上是一个退化型决策树,这点可以参照German和同事的工作[1,4]。

An extremely fast face detector will have broad prac- tical applications.   These include user interfaces, image databases,  and teleconferencing.    In applications where rapid frame-rates are not necessary, our system will allow for significant additional post-processing and analysis.  In addition our system can be implemented on a wide range of small low power devices, including hand-helds and embed- ded processors. In our lab we have implemented this face detector on the Compaq iPaq handheld and have achieved detection at two frames per second (this device has a low power 200 mipsStrong Arm processor which lacks floating point hardware).

一个非常快速的人脸检测器有广泛实用性。这包括用户界面,图像数据库,及电话会议。在不太需要高帧速率的应用中,我们的系统可提供额外的重要后处理和分析。另外我们的系统能够在各种低功率的小型设备上实现,包括手持设备和嵌入式处理器。在我们实验室我们已经将该人脸检测系统在Compaq公司的ipaq上实现,并达到了两帧每秒的检测率(该设备仅有200 MIPS的低功耗处理器,缺乏浮点硬件)。

The remainder of the paper describes our contributions and a number of experimental results, including a detailed description of our experimental methodology.  Discussion of closely related work takes place at the end of each section.

本文接下来描述我们的研究成果和一些实验结果,包括我们实验方法学的详尽描述。每章结尾会有对近似工作的讨论。

2 .  Features

Our object detection procedure classifies images based on the value of simple features.  There are many motivations for using features rather than the pixels directly. The most common reason is that features can act to encode ad-hoc domain knowledge that is difficult to learn using a finite quantity of training data.  For this system there is also a second critical motivation for features:  the feature based system operates much faster than a pixel-based system.

2.特征

我们的目标检测程序是基于简单的特征值来分类图像的。之所以选择使用特征而不是直接使用像素,主要是因为特征可以解决特定领域知识很难学会使用有限训练资料的问题。对于这些系统来说,选择使用特征还有另外一个重要原因:基于特征的系统的运行速度要远比基于像素的快。

The simple features used are reminiscent of Haar basis functions which have been used by Papageorgiou et al. [10]. More specifically, we use three kinds of features. The value of a two-rectangle feature is the difference between the sum of the pixels within two rectangular regions.  The regions have the same size and shape and are horizontally or ver- tically adjacent ( see Figure 1 ).  A three-rectangle feature computes the sum within two outside rectangles subtracted from the sum in a center rectangle. Finally a four-rectangle feature computes the difference between diagonal pairs of rectangles.

上述简单特征是基于Haar基本函数设置的,Papageorgiou等人已使用过[10]。而我们则是更具体地选择了特定的三类特征。其中,双矩形特征的值定义为两个矩形区域里像素和的差。而区域则具有相同尺寸和大小,并且水平或垂直相邻(如图1)。而三矩形特征的值则是两个外侧矩形的像素和减去中间矩形的和所得的最终值。最后一个四矩形特征的值是计算两组对角线矩形的区别而得的。

Given that the base resolution of the detector is 24×24, the exhaustive set of rectangle features is quite large, over 180,000 . Note that unlike the Haar basis, the set of rectan- gle features is overcomplete 1  .

检测器的基本分辨率设定为24×24,既而得到数目巨大的矩形特征的完备集,超过了180000。需要注意的是,矩形特征的集合不像Haar基底,它是过完备 1 的。

   Figure 1: Example rectangle features shown relative to the enclosing detection window. The sum of the pixels which lie within the white rectangles are subtracted from the sum of pixels in the grey rectangles. Two-rectangle features are shown in (A) and (B). Figure (C) shows a three-rectangle feature, and (D) a four-rectangle feature.

矩形特征可以反映检测窗口之间的联系。白色矩形框中的像素和减去灰色矩形框内的像素和得到特征值。(A)和(B)是矩形特征。(C)是三矩形特征。(D)是四矩形特征。

图 1

2.1. Integral Image

Rectangle features can be computed very rapidly using an intermediate representation for the image which we call the integral image. 2 The integral image at location  x, ycontains the sum of the pixels above and to the left of  x, y  , inclusive:

我们采用一个中间表示方法来计算图像的矩形特征,这里称为积分图像 2 。位置x,y上的积分图像包含点x,y上边和左边的像素和,包括:




1 A complete basis has no linear dependence between basis elements and has the same number of elements as the image space, in this case 576. The full set of 180,000 thousand features is many times over-complete.

2 There is a close relation to “summed area tables” as used in graphics [3]. We choose a different name here in order to emphasize its use for the analysis of images, rather than for texture mapping.

1  一个完备基底在集元素之间没有线性独立,且数目和图像空间的元素个数相等,这里是576。在总数为180,000的全集中,数千特征是多次过完备的。

2 在图形学中还有个近义词称为“区域求和表”[3]。这里我们选择一个不同名称,是为了便于读者理解这是用来进行图像处理,而不是纹理映射的。


Figure 2: The sum of the pixels within rectangle D can be computed with four array references. The value of the integral image at location 1 is the sum of the pixels in rectangle A. The value at location 2 is A+B , at location 3 is A+C, and at location 4 is A+B+C+D. The sum within D can be computed as 4+1-(2+3).

矩形D内的像素和可以按四个数组计算。位置1的积分图像的值就是矩形A中的像素之和。位置2的值是A+B,位置3的值是A+C,而位置4的值是A+B+C+D。那么D中的像素和就是4+1-(2+3)。

图 2

当 ii (x,y)是积分图像, i (x,y)是原始图像。可以使用下列一对循环:

(  这里S(x,y)是行累积和 S(x,-1)=0, ii (-1,y)=0   积分图像可以通过已知原始图像而一步求得。

Using the integral image any rectangular sum can be computed in four array references (see Figure 2).  Clearly the difference between two rectangular sums can be computed in eight references. Since the two-rectangle features defined above involve adjacent rectangular sums they can be computed in six array references, eight in the case of the three-rectangle features, and nine for four-rectangle features.

使用积分图像可以把任意一个矩形用四个数组计算(见图2)。显然两个矩形和之差可以用八个数组。因为双矩形特征的定义是两个相邻矩形的和,所以仅用6个数组就可以计算出结果。同理三矩形特征用8个,四矩形特征用9个。

2.2. Feature Discussion

Rectangle features are somewhat primitive when compared with alternatives such as steerable filters [5, 7]. Steerable fil- ters, and their relatives, are excellent for the detailed analy- sis of boundaries, image compression, and texture analysis. In contrast rectangle features, while sensitive to the pres- ence of edges, bars, and other simple image structure, are quite coarse.  Unlike steerable filters the only orientations available are vertical, horizontal, and diagonal. The set of rectangle features do however provide a rich image repre- sentation which supports effective learning. In conjunction with the integral image , the efficiency of the rectangle fea- ture set provides ample compensation for their limited flex- ibility.

2.2特征讨论

和一些相似方法,如导向滤波比较起来,矩形特征看似有些原始[5,7]。导向滤波等类似方法,非常适合做对边界的详细分析,图像压缩,纹理分析。相比之下矩形特征,对于边缘,条纹,以及其他简单的图像结构的敏感度,是相当粗糙的。不同于导向滤波,它仅有的有效位置就是垂直,水平和对角线。矩形特征的设置做不过是提供了丰富的图像表征,支持有效的学习。与积分图像一起,矩形特征的高效给它们有限的灵活性提供了极大补偿。

3 .  Learning Classification Functions

Given a feature set and a training set of positive and neg- ative images, any number of machine learning approaches could be used to learn a classification function. In our sys- tem a variant of AdaBoost is used both to select a small set of features and train the classifier [6]. In its original form, the AdaBoost learning algorithm is used to boost the clas- sification performance of a simple (sometimes called weak) learning algorithm. There are a number of formal guaran- tees provided by the AdaBoost learning procedure. Freund and Schapire proved that the training error of the strong classifier approaches zero exponentially in the number of rounds.  More importantly a number of results were later proved about generalization performance [14].   The key insight is that generalization performance is related to the margin of the examples, and that AdaBoost achieves large margins rapidly.

3.自学式分类功能

给定一个特征集和一个包含正图像和负图像的训练集,任何数量的机器学习方法可以用来学习分类功能。在我们的系统中,使用AdaBoost的一种变种来选择小规模特征集和调试分类器[6]。在其原来的形式中,这种AdaBoost自学式算法是用来提高一个简单(有时称为弱式)自学式算法的。AdaBoost自学步骤提不少有效保证。Freund和Schapire证明,在相当数量的循环中,强分类器的调试误差接近于零。更重要的是,最近相当数量的结果证明了关于泛化性能的优势[14]。其关键观点是泛化性能与例子的边界有关,而AdaBoost能迅速达到较大的边界。

Recall that there are over 180,000 rectangle features as- sociated with each image sub-window, a number far larger than the number of pixels.  Even though each feature can be computed very efficiently, computing the complete set is prohibitively expensive. Our hypothesis, which is borne out by experiment, is that a very small number of these features can be combined to form an effective classifier. The main challenge is to find these features.

回想一下,有超过180,000个矩形特征与每个图像子窗口有关,这个数字远大过像素数。虽然每个特征的计算效率非常高,但是对整个集合进行计算却花费高昂。而我们的假说,已被实验证实,可以将极少数的特征结合起来,形成有效的分类器。而主要挑战是如何找到这些特征。

In support of this goal, the weak learning algorithm is designed to select the single rectangle feature which best separates the positive and negative examples (this is similar to the approach of [2] in the domain of image database re- trieval). For each feature, the weak learner determines the optimal threshold classification function, such that the min- imum number of examples are misclassified. A weak clas- sifier h j (x) thus consists of a feature f j , a threshold θ j  and a parity p j indicating the direction of the inequality sign:

Here x is a 24×24 pixel sub-window of an image. See Ta- ble 1 for a summary of the boosting process.

为实现这一目标,我们设计弱学习算法,用来选择使得正例和负例得到最佳分离的单一矩形特征(这是[2]中方法类似,在图像数据库检索域)。对于每一个特征,弱学习者决定最优阈值分类功能,这样可以使错误分类的数目最小化。弱分类器h j (x)包括:特征 f j ,阈值 θ j ,和一个正负校验 p j ,即保证式子两边符号相同:


这里是一个图像中2424像素的子窗口。表1是优化过程的概述。

In practice no single feature can perform the classifica- tion task with low error. Features which are selected in early rounds of the boosting process had error rates between 0.1 and 0.3.  Features selected in later rounds, as the task be- comes more difficult, yield error rates between 0.4 and 0.5.

在实践中没有单个特征能在低错误的条件下执行分类任务。在优化过程的循环初期中被选中的特征错误率在0.1到0.3之间。在循环后期,由于任务变得更难,因此被选择的特征误差率在0.4和0.5之间。

3.1. Learning Discussion

Many general feature selection procedures have been pro- posed (see chapter 8 of [18] for a review). Our final appli- cation demanded a very aggressive approach which would discard the vast majority of features. For a similar recogni- tion problem Papageorgiou et al. proposed a scheme for fea- ture selection based on feature variance [10]. They demon- strated good results selecting 37 features out of a total 1734 features.

3.1自学习讨论

许多通用的特征选择程序已经提出(见18]的第八章)。我们的最终应用的方法要求是一个非常积极的,能抛弃绝大多数特征的方法。对于类似的识别问题,Papageorgiou等人提出了一个基于特征差异的特征选择计划。他们从1734个特征中选出37个特征,实现了很好的结果。

Roth et al.   propose a feature selection process based on the Winnow exponential perceptron learning rule [11]. The Winnow learning process converges to a solution where many of these weights are zero. Nevertheless a very large number of features are retained (perhaps a few hundred or thousand).

Roth等人提出了一种基于winnow指数感知机学习规则的特征选择过程[11]。这种Winnow学习过程收敛了一个解决方法,其中有不少权重为零。然而却保留下来相当大一部分的特征(也许有好几百或几千)。

Table  1:   The  AdaBoost  algorithm  for  classifier learning.  Each round of boosting selects one feature from the

180,000 potential features.

表1:关于自学式分类的Adaboost算法。每个循环都在180,000个潜在特征中选择一个特征。

3.2. Learning Results

While details on the training and performance of the final system are presented in Section 5, several simple results merit discussion.  Initial experiments demonstrated that a frontal face classifier constructed from 200 features yields a detection rate of 95% with a false positive rate of 1 in 14084. These results are compelling, but not sufficient for many real-world tasks. In terms of computation, this clas- sifier is probably faster than any other published system, requiring 0.7 seconds to scan an 384 by 288 pixel image. Unfortunately, the most straightforward technique for im- proving detection performance, adding features to the classifier, directly increases computation time.

3.2自学习结果

最终系统的详细调试和执行将在第5节中介绍,现在对几个简单的结果进行讨论。初步实验证明,正面人脸分类器由200个特征构造而成,正误视率在14084中为1,检测率为95%。这些结果是引人注目的,但对许多实际任务还是不够的。就计算而言,这个分类器可能比任何其他公布的系统更快,扫描由1个384乘288像素图像仅需要0.7秒。不幸的是,若用这个最简单的技术改善检测性能,给分类器添加特征,会直接增加计算时间。

For the task of face detection, the initial rectangle fea- tures selected by AdaBoost are meaningful and easily inter- preted. The first feature selected seems to focus on the prop- erty that the region of the eyes is often darker than the region of the nose and cheeks (see Figure 3).  This feature is rel- atively large in comparison with the detection sub-window, and should be somewhat insensitive to size and location of the face. The second feature selected relies on the property that the eyes are darker than the bridge of the nose.

对于人脸检测的任务,由AdaBoost选择的最初的矩形特征是有意义的且容易理解。选定的第一个特征的重点是 眼睛区域往往比鼻子和脸颊区域更黑暗 (见图3)。此特征的检测子窗口相对较大,并且某种程度上不受面部大小和位置的影响。第二个特征选择依赖于 眼睛的所在位置比鼻梁更暗 。

Figure 3:  The first and second features selected by Ad- aBoost. The two features are shown in the top row and then overlayed on a typical training face in the bottom row. The first feature measures the difference in intensity between the region of the eyes and a region across the upper cheeks. The feature capitalizes on the observation that the eye region is often darker than the cheeks. The second feature compares the intensities in the eye regions to the intensity across the bridge of the nose.

这两个特点显示在最上面一行,然后一个典型的调试面部叠加在底部一行。第一个特点,测量眼睛部区域和上脸颊地区的强烈程度的区别。该特征利用了眼睛部区域往往比脸颊更暗。第二个特点比较了眼睛区域与鼻梁的强度。

4.  The Attentional Cascade

This section describes an algorithm for constructing a cas- cade of classifiers which achieves increased detection per- formance while radically reducing computation time. The key insight is that smaller, and therefore more efficient, boosted classifiers can be constructed which reject many of the negative sub-windows while detecting almost all posi- tive instances (i.e. the threshold of a boosted classifier can be adjusted so that the false negative rate is close to zero). Simpler classifiers are used to reject the majority of sub- windows before more complex classifiers are called upon to achieve low false positive rates.

4.注意力级联

本章描述了构建级联分类器的算法,它能增加检测性能达从而从根本上减少计算时间。它的主要观点是构建一种优化分类器,其规模越小就越高效。这种分类器 在检测几乎所有都是正例时剔除许多负子窗口(即,优化分类器阈值可以调整使得负误视率接近零) 。在调用较复杂的分类器之前,我们使用相对简单的分类器来剔除大多数子窗口,以实现低正误视率。

The overall form of the detection process is that of a degenerate decision tree, what we call a “cascade” (see Fig- ure 4). A positive result from the first classifier triggers the evaluation of a second classifier which has also been ad- justed to achieve very high detection rates. A positive result from the second classifier triggers a third classifier, and so on. A negative outcome at any point leads to the immediate rejection of the sub-window.

在检测过程中,整体形式是一个退化决策树,我们称之为“级联”(见图4)。从第一个分类得到的有效结果能触发第二个分类器,也已调整至达到非常高的检测率。再得到一个有效结果使得第二个分类器触发第三个分类器,以此类推。在任何一个点的错误结果都导致子窗口立刻被剔除。

Stages in the cascade are constructed by training clas- sifiers using AdaBoost and then adjusting the threshold to minimize false negatives.  Note that the default AdaBoost threshold is designed to yield a low error rate on the train- ing data. In general a lower threshold yields higher detec tion rates and higher false positive rates.

级联阶段的构成首先是利用AdaBoost训练分类器,然后调整阈值使得负误视最大限度地减少。注意,默认AdaBoost的阈值旨在数据过程中产生低错误率。一般而言,一个较低的阈值会产生更高的检测速率和更高的正误视率。

Figure 4: Schematic depiction of a the detection cascade. A series of classifiers are applied to every sub-window. The initial classifier eliminates a large number of negative exam- ples with very little processing. Subsequent layers eliminate additional negatives but require additional computation. Af- ter several stages of processing the number of sub-windows have been reduced radically.  Further processing can take any form such as additional stages of the cascade (as in our detection system) or an alternative detection system.

一系列的分类器适用于每一个子窗口。最初的分类器用很少的处理来消除大部分的负例。随后的层次消除额外的负例,但是需要额外的计算。经过数个阶段处理以后,子窗口的数量急剧减少。进一步的处理可以采取任何形式,如额外的级联阶段(正如我们的检测系统中的)或者另一个检测系统。

For example an excellent first stage classifier can be con- structed from a two-feature strong classifier by reducing the threshold to minimize false negatives. Measured against a validation training set, the threshold can be adjusted to de- tect 100% of the faces with a false positive rate of 40%. See Figure 3 for a description of the two features used in this classifier.

 例如,一个两特征强分类器通过降低阈值,达到最小的负误视后,可以构成一个优秀的第一阶段分类器。测量一个定的训练集时,阈值可以进行调整,最后达到100%的人脸检测率和40%的正误视率。图3为此分类器这两个特征的使用说明。

Computation of the two feature classifier amounts to about 60 microprocessor instructions.   It seems hard to imagine that any simpler filter could achieve higher rejec- tion rates.  By comparison, scanning a simple image tem- plate, or a single layer perceptron, would require at least 20 times as many operations per sub-window.

计算这两个特征分类器要使用大约60个微处理器指令。很难想象还会有其它任何简单的滤波器可以达到更高的剔除率。相比之下,一个简单的图像扫描模板,或单层感知器,将至少需要20倍于每个子窗口的操作。

The  structure  of  the  cascade  reflects  the  fact  that within any single image an overwhelming majority of sub- windows are negative. As such, the cascade attempts to re- ject as many negatives as possible at the earliest stage pos- sible. While a positive instance will trigger the evaluation of every classifier in the cascade, this is an exceedingly rare event.

 该级联结构反映了,在任何一个单一的图像中,绝大多数的子窗口是无效的。因此,我们的级联试图在尽可能早的阶段剔除尽可能多的负例。虽然正例将触发评估每一个在级联中的分类器,但这极其罕见。

Much like a  decision tree,  subsequent classifiers are trained using those examples which pass through all the previous stages.  As a result, the second classifier faces a more difficult task than the first. The examples which make it through the first stage are “harder” than typical exam- ples.  The more difficult examples faced by deeper classi- fiers push the entire receiver operating characteristic (ROC) curve downward. At a given detection rate, deeper classi- fiers have correspondingly higher false positive rates.

随后的分类器就像一个决策树,使用这些通过所有以前的阶段例子进行训练。因此,第二个分类器所面临的任务比第一个更难。这些过第一阶段的例子比典型例子更“难”。这些例子推动整个受试者工作特征曲线(ROC)向下。在给定检测率的情况下,更深层次分类器有着相应较高的正误视率。

4.1. Training a Cascade of Classifiers

The cascade training process involves two types of trade- offs.    In most cases classifiers with more features will achieve higher detection rates and lower false positive rates.At the same time classifiers with more features require more time to compute. In principle one could define an optimiza- tion framework in which: i) the number of classifier stages, ii) the number of features in each stage, and iii) the thresh- old of each stage, are traded off in order to minimize the expected number of evaluated features. Unfortunately find- ing this optimum is a tremendously difficult problem.

4.1 调试分类器级联

级联的调试过程包括两个类型的权衡。在大多数情况下具有更多的特征分类器达到较高的检测率和较低的正误视率。同时具有更多的特征的分类器需要更多的时间来计算。原则上可以定义一个优化框架,其中:一)分级级数,二)在每个阶段的特征数目,三)每个阶段为最小化预计数量评价功能而进行的门限值交换。不幸的是,发现这个最佳方案是一个非常困难的问题。

In practice a very simple framework is used to produce an effective classifier which is highly efficient. Each stage in the cascade reduces the false positive rate and decreases the detection rate.  A target is selected for the minimum reduction in false positives and the maximum decrease in detection. Each stage is trained by adding features until the target detection and false positives rates are met ( these rates are determined by testing the detector on a validation set). Stages are added until the overall target for false positive and detection rate is met.

在实践中用一个非常简单的框架产生一个有效的高效分类器。级联中的每个阶段降低了正误视率并且减小了检测率。现在的目标旨在最小化正误视率和最大化检测率。调试每个阶段,不断增加特征,直到检测率和正误视率的目标实现(这些比率是通过将探测器在验证设置上测试而得的)。同时添加阶段,直到总体目标的正误视和检测率得到满足为止。

4.2. Detector Cascade Discussion

The complete face detection cascade has 38 stages with over 6000 features. Nevertheless the cascade structure results in fast average detection times.  On a difficult dataset, con- taining 507 faces and 75 million sub-windows, faces are detected using an average of 10 feature evaluations per sub- window. In comparison, this system is about 15 times faster than an implementation of the detection system constructed by Rowley et al. 3 [12]

4.2 探测器级联的探讨

完整的人脸检测级联已经有拥有超过6000个特征的38个阶段。尽管如此,级联结构还是能够缩短平均检测时间。在一个复杂的包含507张人脸和7500万个子窗口的数据集中,人脸在检测时是每个子窗口由平均10个特征来评估。相比之下,本系统的速度是由罗利等人 3 [12]构建的检测系统的15倍。

A notion similar to the cascade appears in the face de- tection system described by Rowley et al. in which two de- tection networks are used [12]. Rowley et al. used a faster yet less accurate network to prescreen the image in order to find candidate regions for a slower more accurate network. Though it is difficult to determine exactly, it appears that Rowley et al.’s two network face system is the fastest existing face detector. 4

由Rowley等人描述的一个类似于级联的概念出现人脸检测系统中。在这个系统中他们使用了两个检测网络。Rowley等人用更快但相对不准确的网络,以先筛选图像,这样做是为了使较慢但更准确的网络找到候选区域。虽然这很难准确判断,但是Rowley等人的双网络系统,是目前速度最快的脸部探测器。 4

The structure of the cascaded detection process is es- sentially that of a degenerate decision tree, and as such is related to the work of Amit and Geman [1].  Unlike tech- niques which use a fixed detector, Amit and Geman propose an alternative point of view where unusual co-occurrences of simple image features are used to trigger the evaluation of a more complex detection process. In this way the full detection process need not be evaluated at many of the po- tential image locations and scales. While this basic insight is very valuable, in their implementation it is necessary to first evaluate some feature detector at every location. These features are then grouped to find unusual co-occurrences. In practice, since the form of our detector and the features that it uses are extremely efficient, the amortized cost of evalu- ating our detector at every scale and location is much faster than finding and grouping edges throughout the image.

在检测过程中的级联结构基本上是退化决策树,因此是涉及到了Amit和Geman[1]的工作。,Amit和Geman建议不再使用固定一个探测器的技术,而他们提出一个不寻常的合作同现,即简单的图像特征用于触发评价一个更为复杂的检测过程。这样,完整的检测过程中不需要对潜在的图像位置和范围进行估计。然而这种基本的观点非常有价值,在它们的执行过程中,必须要对每一个位置的某些功能检测首先进行估计。这些特征被归类,以用于找到不寻常的合作。在实践中,由于我们的检测器的形式,它的使用非常高效,用于评估我们在每个探测器的规模和位置的成本消耗比寻找和分组整个图像边缘快很多。

In recent work Fleuret and Geman have presented a face detection technique which relies on a “chain” of tests in or- der to signify the presence of a face at a particular scale and location [4]. The image properties measured by Fleuret and Geman, disjunctions of fine scale edges, are quite different than rectangle features which are simple, exist at all scales, and are somewhat interpretable. The two approaches also differ radically in their learning philosophy. The motivation for Fleuret and Geman’s learning process is density estima- tion and density discrimination, while our detector is purely discriminative. Finally the false positive rate of Fleuret and Geman’s approach appears to be higher than that of previ- ous approaches like Rowley et al. and this approach. Un- fortunately the paper does not report quantitative results of this kind. The included example images each have between 2 and 10 false positives.

在最近的工作中Fleuret和Geman已经提交了一种人脸检测技术,它以“链测试”为主调,用来表示在某一特定范围和位置人脸是否存在[4]。由Fleuret和Geman测量的图像属性,细尺度边界的分离,与简单、存在于所有尺度且某种程度可辨别的矩阵特征有很大的不同。这两种方法的基本原理也存在根本上的差异。Fleuret和Geman的学习过程的目的是密度估计和密度辨别,而我们的探测器是单纯的辨别。最后,Fleuret和Geman的方法中的正误视率似乎也比以前的如Rowley等人的方法中的更高。不幸的是,这种办法在文章中并没有定量分析结果。图像所包含的每个例子都有2到10个正误视。

5    Results

A 38 layer cascaded classifier was trained to detect frontal upright faces. To train the detector, a set of face and non- face training images were used. The face training set con- sisted of 4916 hand labeled faces scaled and aligned to a base resolution of 24 by 24 pixels.   The faces were ex- tracted from images downloaded during a random crawl of the world wide web. Some typical face examples are shown in Figure 5.  The non-face subwindows used to train the detector come from 9544 images which were manually in- spected and found to not contain any faces. There are about 350 million subwindows within these non-face images.

5.实验结果

我们训练一个38层级联分类器,用来检测正面直立人脸。为了训练分类器,我们使用了一系列包含人脸和不包含人脸的图片。人脸训练集由4916个手标人脸组成,都缩放和对齐成24×24像素的基本块。提取人脸的图片是在使用随机爬虫在万维网上下载。一些典型人脸例子如图5所示。训练检测器的没有人脸的子窗口来自9544张图片,都已经进行人工检查,确定不包含任何人脸。在这些没有人脸的图片中,子窗口共有大概3.5亿个。

The number of features in the first five layers of the de- tector is 1, 10, 25, 25 and 50 features respectively.  The remaining layers have increasingly more features. The total number of features in all layers is 6061.

在开始五层检测器中特征的数量分别为1、10、25、25和50。剩下的各层包含的特征数量急剧增多。特征总数是6061个。

Each classifier in the cascade was trained with the 4916 training faces (plus their vertical mirror images for a total of 9832 training faces) and 10,000 non-face sub-windows (also of size 24 by 24 pixels) using the Adaboost training procedure.  For the initial one feature classifier, the non- face training examples were collected by selecting random sub-windows from a set of 9544 images which did not con- tain faces. The non-face examples used to train subsequent layers were obtained by scanning the partial cascade across the non-face images and collecting false positives. A max- imum of 10000 such non-face sub-windows were collected for each layer.

在级联中的每个分类器都经过4916个受训人脸(加上它们的垂直镜像,一共有9832个受训人脸)和10000个无人脸的子窗口(同样它们的尺寸都是24×24),使用自适应增强训练程序训练。对于最初的含一个特征的分类器,无人脸训练实例从一系列9544张没有人脸的图片中随机选择出子窗口。用来训练随后的层的没有人脸实例是通过扫描部分级联的无人脸图像以及收集正误视率而得的。每一层收集的像这样无人脸的子窗口的最大值是10000。

Figure 5: Example of frontal upright face images used for training

Speed of the Final Detector

The speed of the cascaded detector is directly related to the number of features evaluated per scanned sub-window. Evaluated on the MIT+CMU test set [12], an average of 10 features out of a total of 6061 are evaluated per sub-window. This is possible because a large majority of sub-windows are rejected by the first or second layer in the cascade. On a 700 Mhz Pentium III processor, the face detector can pro- cess a 384 by 288 pixel image in about .067 seconds (us- ing a starting scale of 1.25 and a step size of 1.5 described below).  This is roughly 15 times faster than the Rowley- Baluja-Kanade detector [12] and about 600 times faster than the Schneiderman-Kanade detector [15].

最终检测器的速度

级联的检测器的速度是和在每次扫描子窗口中评估的特征数目有直接影响的。在MIT+CMU测试集的评估中[12],平均6061个特征中有10个特征被挑出,评估每一个子窗口。这并非不可能,因为有大量子窗口被级联的第一层和第二层剔除。在700兆赫的奔腾3处理器上,该人脸检测可以约0.67秒的速度处理一幅384×288像素大小的图像(使用)。这个大概是Rowley-Baluja-Kanade检测器[12]的速度的15倍,是Schneiderman- Kanade检测器[15]速度的约600倍。

Image Processing

All example sub-windows used for training were vari- ance normalized to minimize the effect of different light- ing conditions. Normalization is therefore necessary during detection as well.  The variance of an image sub-window can be computed quickly using a pair of integral images. Recall that   , where    is the standard deviation,      is the mean, and     is the pixel value within the sub-window. The mean of a sub-window can be com- puted using the integral image. The sum of squared pixels is computed using an integral image of the image squared (i.e. two integral images are used in the scanning process). During scanning the effect of image normalization can be achieved by post-multiplying the feature values rather than pre-multiplying the pixels.

图像处理

所有用来训练的子窗口实例都经过方差标准化达到最小值,尽量减少不同光照条件的影响。因此,在检测中也必须规范化。一个图像子窗口的方差可以使用一对积分图像快速计算。回忆  ,此处是标准差,是均值,而是在子窗口中的像素值。子窗口的均值可以由积分图像计算得出。像素的平方和可以由一个图像的积分图像的平方得出(即,两个积分图像在扫描进程中使用)。在扫描图像中,图像的规范化可以通过后乘以特征值达到,而不是预先乘以像素值。

Scanning the Detector

The final detector is scanned across the image at multi- ple scales and locations. Scaling is achieved by scaling the detector itself, rather than scaling the image. This process makes sense because the features can be evaluated at any scale with the same cost. Good results were obtained using a set of scales a factor of 1.25 apart.

扫描检测器

扫描最终检测器在多尺度和定位下对图像进行扫描。尺度缩放更多是由缩放检测器自身而不是缩放图像得到。这个进程的意义在于特征可以在任意尺度下评估。使用1.25的间隔的可以得到良好结果。

The detector is also scanned across location. Subsequent locations are obtained by shifting the window some number of pixels Δ. This shifting process is affected by the scale of the detector: if the current scale is S the window is shifted by [SΔ] , where [] is the rounding operation.

检测器也根据定位扫描。后续位置的获得是通过将窗口平移⊿个像素获得的。这个平移程序受检测器的尺度影响:若当前尺度是s,窗口将移动[s⊿],这里[]是指凑整操作。

The choice of Δ affects both the speed of the detector as well as accuracy. The results we present are for  Δ = 1.0 . We can achieve a significant speedup by setting Δ = 1.5 with only a slight decrease in accuracy.

⊿的选择不仅影响到检测器的速度还影响到检测精度。我们展示的结果是取了⊿=1.0。通过设定⊿=1.5,我们实现一个有意义的加速,而精度只有微弱降低。

Integration of Multiple Detections

Since the final detector is insensitive to small changes in translation and scale, multiple detections will usually occur around each face in a scanned image. The same is often true of some types of false positives. In practice it often makes sense to return one final detection per face. Toward this end it is useful to postprocess the detected sub-windows in order to combine overlapping detections into a single detection.

多检测的整合

因为最终检测器对于传递和扫描中的微小变化都很敏感,在一幅扫描图像中每个人脸通常会得到多检测结果,一些类型的正误视率也是如此。在实际应用中每个人脸返回一个最终检测结果才显得比较有意义。

In these experiments detections are combined in a very simple fashion.  The set of detections are first partitioned into disjoint subsets. Two detections are in the same subset if their bounding regions overlap.  Each partition yields a single final detection.  The corners of the final bounding region are the average of the corners of all detections in the set.

在这些试验中,我们用非常简便的模式合并检测结果。首先把一系列检测分割成许多不相交的子集。若两个检测结果的边界区重叠了,那么它们就是相同子集的。每个部分产生单个最终检测结果。最后的边界区的角落定义为一个集合中所有检测结果的角落平均值。

Experiments on a Real-World Test Set

We tested our system on the MIT+CMU frontal face test set [12]. This set consists of 130 images with 507 labeled frontal faces. A ROC curve showing the performance of our detector on this test set is shown in Figure 6. To create the ROC curve the threshold of the final layer classifier is adjusted from -∞ to +∞ .  Adjusting the threshold to +∞ will yield a detection rate of 0.0 and a false positive rate of 0.0. Adjusting the threshold to -∞ , however, increases both the detection rate and false positive rate, but only to a certain point. Neither rate can be higher than the rate of the detection cascade minus the final layer. In effect, a threshold of -∞ is equivalent to removing that layer.  Further increasing the detection and false positive rates requires decreasing the threshold of the next classifier in the cascade.Thus, in order to construct a complete ROC curve, classifier layers are removed. We use the number of false positives as opposed to the rate of false positives for the x-axis of the ROC curve to facilitate comparison with other systems. To compute the false positive rate, simply divide by the total number of sub-windows scanned.  In our experiments, the number of sub-windows scanned is 75,081,800.

在现实测试集中实验

我们在MIT+CMU正面人脸测试集[12]上对系统进行测试。这个集合由130幅图像组成,共有507个标记好的正面人脸。图6是一个ROC曲线,显示在该测试集上运行的检测器的性能。其中末层分类器的阈值设置为从—∞到+∞。当调节阈值趋近+∞时,检测率趋于0.0,正误视率也趋于0.0。而当调节阈值趋近—∞时,检测率和正误视率都增长了,但最终会趋向一个恒值。速率最高的就是级联中末层的。实际上,阈值趋近—∞就等价于移走这一层。要想得到检测率和正误视率更多的增长,就需要减小下一级分类器的阈值。因此,为了构建一个完整的ROC曲线,我们将分类器层数移走了。为了方便与其它系统比较,我们使用正误视的数目而不是正误视概率作为坐标的x轴  为了计算正误视率,简单将扫描的子窗口总数与之相除即可。在我们的实验中,扫描过的子窗口总数达到了75,081,800。

Unfortunately, most previous published results on face detection have only included a single operating regime (i.e. single point on the ROC curve). To make comparison with our detector easier we have listed our detection rate for the false positive rates reported by the other systems. Table 2 lists the detection rate for various numbers of false detec- tions for our system as well as other published systems. For the Rowley-Baluja-Kanade results [12], a number of differ- ent versions of their detector were tested yielding a number of different results they are all listed in under the same head- ing. For the Roth-Yang-Ahuja detector [11], they reported their result on the MIT+CMU test set minus 5 images containing line drawn faces removed.

不幸的是,大多数人脸检测的先前已公布的结果仅有单一操作制度(即,ROC曲线上的单一点)。为了使之与我们的检测器更容易进行比较,我们将我们系统在由其它系统测出的正误视率下的检测率进行列表。表2列出了我们的系统和其它已公布系统的不同数目错误检测结果下的检测率。对Rowley-Baluja-Kanade的结论[12],我们对他们的一些不同版本的检测器进行测试,产生一些不同结果,都列在同一标题下。Roth-Yang-Ahuja[11]检测器的结果的5幅图像包括线绘人脸被移除了。

Figure 6:   ROC curve for our face detector on the MIT+CMU test set. The detector was run using a step size of 1.0 and starting scale of 1.0 (75,081,800 sub-windows scanned).

图 6 检测器在MIT+CMU测试集上的ROC曲线

Figure 7 shows the output of our face detector on some test images from the MIT+CMU test set.

图7则展示了对于一些来自MIT+CMU测试集中的测试图片,我们的人脸检测器的输出结果。

Figure 7: Output of our face detector on a number of test images from the MIT+CMU test set.

图7:我们的人脸检测器的输出结果,在数个来自MIT+CMU测试集的测试图像上

A simple voting scheme to further improve results

In table 2 we also show results from running three de- tectors (the 38 layer one described above plus two similarly trained detectors) and outputting the majority vote of the three detectors. This improves the detection rate as well as eliminating more false positives. The improvement would be greater if the detectors were more independent. The cor- relation of their errors results in a modest improvement over the best single detector.

简易完善计划

在表2我们也显示了运行三个检测器的结果(一个本文描述的38层检测器加上两个类似受训检测器)。在提高检测率的同时也消除很多正误视率,且随检测器独立性增强而提高。由于它们之间存在误差,所以对于最佳的单一检测器,检测率是有一个适度提高。

Table 2: Detection rates for various numbers of false positives on the MIT+CMU test set containing 130 images and 507 faces.

表2:不同正误视率下的检测率,MIT+CMU测试集,包含130幅图像和507个人脸

6    Conclusions

We have presented an approach for object detection which minimizes computation time while achieving high detection accuracy.  The approach was used to construct a face de- tection system which is approximately 15 faster than any previous approach.

6.结论

我们展示了一个目标检测的方法,既能使计算时间最小化,又能达到高检测精度。这个用该方法构建的一个人脸检测系统,达到检测速度约是以往方法的15倍。

This paper brings together new algorithms, representa- tions, and insights which are quite generic and may well have broader application in computer vision and image pro- cessing.

本文结合了十分通用的新算法、表征和概念,可能会在机器视觉和图像处理方面实现广泛应用。

Finally this paper presents a set of detailed experiments on a difficult face detection dataset which has been widely studied. This dataset includes faces under a very wide range of conditions including: illumination, scale, pose, and cam- era variation.  Experiments on such a large and complex dataset are difficult and time consuming. Nevertheless sys- tems which work under these conditions are unlikely to be brittle or limited to a single set of conditions. More impor- tantly conclusions drawn from this dataset are unlikely to be experimental artifacts.

本文最后展示了的一系列详细的实验,是在一个已得到广泛研究的复杂人脸检测数据库中进行的。这个数据库中的人脸各式各样条件都广泛不同:照明、规模、构成及相机的变化。在这样一个庞大繁杂的数据库中实验难度很大,且十分耗时。然而,在这样的条件下工作的系统不易损坏或者受限于单一条件。从该数据库中取得的更多重要结论,都不可能是实验的人为产物。

References

 参考文献

[1]  Y. Amit, D. Geman, and K. Wilder. Joint induction of shape features and tree classifi ers, 1997.

[2]  Anonymous. Anonymous. In Anonymous , 2000.

[3]  F. Crow.    Summed-area tables for texture mapping.    In

Proceedings of SIGGRAPH , volume 18(3), pages 207–212,

1984.

[4]  F. Fleuret and D. Geman. Coarse-to-fi ne face detection. Int.

J . Computer Vision , 2001.

[5]  William T. Freeman and Edward H. Adelson.  The design and use of steerable fi lters.   IEEE Transactions on Pattern Analysis and Machine Intelligence , 13(9):891–906, 1991.

[6]  Yoav Freund and Robert E. Schapire.  A decision-theoretic generalization of on-line learning and an application to boosting. In Computational Learning Theory: Eurocolt ’95 , pages 23–37. Springer-Verlag, 1995.

[7]  H. Greenspan, S. Belongie, R. Gooodman, P. Perona, S. Rak- shit, and C. Anderson. Overcomplete steerable pyramid fi l- ters and rotation invariance. InProceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition , 1994.

[8]  L. Itti, C. Koch, and E. Niebur.  A model of saliency-based visual attention for rapid scene analysis.   IEEE Patt. Anal. Mach. Intell. , 20(11):1254–1259, November 1998.

[9]  Edgar Osuna, Robert Freund, and Federico Girosi. Training support vector machines:  an application to face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 1997.

[10]  C. Papageorgiou, M. Oren, and T. Poggio. A general frame- work for object detection.  In International Conference on Computer Vision , 1998.

[11]  D. Roth, M. Yang, and N. Ahuja. A snowbased face detector.

In Neural Information Processing 12 , 2000.

[12]  H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. InIEEE Patt. Anal. Mach. Intell. , volume 20, pages 22–38, 1998.

[13]  R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee. Boost- ing the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. , 26(5):1651–1686, 1998.

[14]  Robert  E.  Schapire,  Yoav  Freund,  Peter  Bartlett,  and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods.  In Proceedings of the Fourteenth International Conference on Machine Learning ,

1997.

[15]  H. Schneiderman and T. Kanade. A statistical method for 3D object detection applied to faces and cars.  In International Conference on Computer Vision , 2000.

[16]  K. Sung and T. Poggio.  Example-based learning for view- based face detection. InIEEE Patt. Anal. Mach. Intell. , vol- ume 20, pages 39–51, 1998.

[17]  J.K. Tsotsos, S.M. Culhane, W.Y.K. Wai, Y.H. Lai, N. Davis, and F. Nuflo.  Modeling visual-attention via selective tun- ing. Artifi cial Intelligence Journal , 78(1-2):507–545, Octo- ber 1995.

[18]  Andrew Webb. Statistical Pattern Recognition . Oxford Uni- versity Press, New York, 1999.

  Matlab implementation Viola Jones Detection

本文转载自:博客园-所有随笔区
11

基于Matlab的BP神经网络分段插值仿真

CORDIC也许是最值得研究的硬件实现技术了,但是却又很难被实现:它的优势就是同样的硬件可以被实现为多种函数,但是性能却非常不好。High-order polynomial approximation可以给予很低的误差实现,但是这普遍不适合硬件实现。一种很有成效的方法是table-driven方法,但是如果表很大将会带来很大的开销。

给出一种现在使用最广泛的方法,很明显对于FPGA来说这是一种当前最好的实现方式即low-order polynomials(主要是线性的)和小的查找表的结合。对于这种技术主要的挑战是如何选择内插点以及如何确保查找表较小。Low-order interpolation主要有三大优势。第一,同一硬件结构可以被用于多种函数实现,由于仅需要修改low-order polynomials系数。第二,它很适合FPGA当前结构,由于内建的mulipliers、adders和memory核

对于大多数函数,通过均匀分隔来进行插值不够理想的。在sigmoid函数这个例子中,显然更多的间隔应该被使用。然而硬件实现需要更快的将参数映射到合适的间隔上。本着这种选择和线性插值法,最关键的就在于函数值和每个间隔如何联系起来。最普遍的方法就是武断的选择间隔的重点,也就是如果x∈[L,U],f(x)=f(L/2 + U/2),或者去选择一个值来减小这种绝对误差。这并不是一种好的方法。对于一个定点间隔,最好的函数值一般并不是中点所对应的。这取决于函数的“curvature(曲率)”,相对误差可能比绝对误差更重要。例如对于sigmoid函数,f(x) =1/(1+e^-x),函数是关于原点对称的,但是相对误差只在一面变得更加明显,两边的误差取决于间隔,因此绝对误差并不总是一个常量或者是线性的。

         普遍的方法是:假设I = [L,U]做为一个间隔,L<U 且f : I ->R被逼近,假设f’ :I->R是一个线性函数,f’(x) = c1+c2x,对于常量c1和c2,对于f,我们的目标是观察相对误差函数。

下面是利用Matlab对分段插值来逼近sigmoid函数的代码:

%% 插值误差计算t = 0.5:1/16:1; %9个for i = 1:8	i	sigL(i) = 1/(1 + exp(-t(i)));	sigU(i) = 1/(1 + exp(-t(i+1)));	%sigL(i)	%sigU(i)	c2(i) = (sigU(i)-sigL(i))/(t(i+1)-t(i));	c1(i) = sigU(i)-c2(i)*t(i+1);endf2 = (c1(1)+c2(1)*x).*(x>=t(1)&x<t(2))+(c1(2)+c2(2)*x).*(x>=t(2)&x<t(3))+(c1(3)+c2(3)*x).*(x>=t(3)&x<t(4))+...	(c1(4)+c2(4)*x).*(x>=t(4)&x<t(5))+(c1(5)+c2(5)*x).*(x>=t(5)&x<t(6))+(c1(6)+c2(6)*x).*(x>=t(6)&x<t(7))+...	(c1(7)+c2(7)*x).*(x>=t(7)&x<t(8))+(c1(8)+c2(8)*x).*(x>=t(8)&x<=t(9));f1 = 1./(1+exp(-x)).*(x>=0.5 & x <=1);%x = linspace(0.5,1);figure(1)plot(x,f2,'g'); %分段函数绘图wucha = (f1-f2)./f1*2000;hold on;plot(x,wucha,'r');axis([0.5,1,-0.2,0.8]);

下面是仿真图

 

本文转载自:CSDN博客
4

COS每周精选:谈钱不伤感情

本周投稿: 谢益辉  冷静 施涛    肖楠

  • 谈钱不伤感情: Revolutions 做了一项调查,发现掌握R语言的人的工资水平在11万美元左右(年薪),比Mapreduce, hadoop 都要高。。哎,别跟我谈钱,多俗啊!忍不住的快来看闪闪发光的标题《 R skills attract the highest salaries 》
  • 生活中的选择: 三楼的正能量励志贴,原来我统是这样被羡慕嫉妒恨啊。同时也要告诉自己,两者选其一叫选择,而干不了就换一不叫选择,而叫做 逃避 。
  • 马尔科夫链: 如果你对一些事物还不是很熟悉(例如,红酒,恋爱),但又想显得见多识广,驾轻就熟,那么让马尔科夫链告诉你应如何表达观点。有了 理论支持  是不是觉得底气十足?
  • R大战Matlab: 一个小系列,比较R和Matlab各自的 优势  。第一轮Matlab胜。感兴趣的同学可以自己试一试。
  • R语言: R3.0.3 已经发布。 先睹为快~
  • R Markdown:  其实新版的R Markdown也有超级牛力,不过现在还不是推广的时候:不过,也许 看看 也行?
  • R Function: 统计R中最常用的函数  。
  • 生活中的大数据: 大数据是什么?在生活中我们能用大数据来做什么?来看看大家都怎么  。(来源于quora,多图慎点)。
  •   为什么女科学家这么少? 女孩子不够聪明?学数学的女孩不够酷?这是事实导致的差异还是文化引导的结果?来看看女科学家作何解。 英文版
  • 图模型/温故而知新,可以为师矣: Eric Xing老师的Probabilistic Graphical Model课程终于完结了,听过Daphne Koller老师PGM在线课程的同学可以再温习下哦,一天看一集,补脑又提神。注意:无字幕,有口音,听习惯就好:)。点击链接  提神醒脑XD。
  •  Hamiltonian蒙特卡洛方法的JavaScript 实现 。小编不明觉厉,暂且不做评价。
  • 贝叶斯 : 尽信书不如无书,且看贝叶斯大家Xi’an对《贝叶斯数据分析》一书中的后验预测P值的 质疑 。
  • 贝叶斯 :  一个“八个学校”的例子教会了我贝叶斯统计,PhillipPrice如是说: 点击这里 。这个例子展现了一个最基础的贝叶斯分层模型。
本文转载自:统计之都
8

PCA matlab实现

PCA 流程如下:

1、去均值  2、计算协方差矩阵 3、计算协方差特征值和特征向量 4、降序排列特征值选取较大的特征值,选择相应的特征值和特征向量

以下按照步骤编写matlab代码。

1.去均值

Matlab函数mean可得:如下

Mean_Image=mean(Train_SET,2); Train_SET=Train_SET-Mean_Image*ones(1,Train_NUM);

2.计算协方差矩阵


协方差定义: 

具体求解: 

*注意分母为(n-1)而不是n,因为这样定义的协方差方差是总体方差的无偏估计(具体可见:

http://en.wikipedia.org/wiki/Unbiased_estimator#Sample_variancehttp://www.zhihu.com/question/20099757 )

协方差矩阵如下:

其元素aij表示变量i,j之间的协方差cov(i,j);

(关于协方差矩阵的含义,可见blog:http://blog.csdn.net/ice110956/article/details/14250745 )

计算方法:

A.

其中Xj为去中心化之后的特征向量。

去中心化后,  可表示为如下:

 

求和之后,可以得到协方差矩阵。

B.

同理,上面的向量相加我们可以直接用矩阵相乘的形式得到。

设X为去中心后的特征矩阵,那么

 

C.

直接用 matlab自带的协方差的函数cov()计算,不过注意cov按行计算,实际运用时要转置。

我们使用矩阵形式,得到如下代码:

R=Train_SET*Train_SET'/(Train_NUM-1);

3.计算特征值与特征向量

根据PCA的原理,我们需要寻找使协方差矩阵对角化的变换矩阵

(可见blog: http://blog.csdn.net/ice110956/article/details/14250745 )。

一个方阵可以写成如下形式:

  

其中Q为其特征向量组成的矩阵,  为其特征值组成的对角矩阵,转化一下式子,得到:

于是,我们现在只要得到协方差矩阵的归一化特征向量,组成转化矩阵Q即可。

使用matlab自带的函数eig(),

代码:

[V,S]=eig(R);
**小样本问题:

上面的代码存在一个问题,就是常见的小样本问题。样本维数>>样本个数,这样得到的协方差矩阵很大,直接求解时间复杂度过高。于是我们通过另一种方式来求解。

SVD(奇异值分解):

A. 奇异值

设A为m*n阶实矩阵,则存在m阶正交阵U和n阶正交阵V,使得

A = U*S*V’

其中S=diag(σi,σ2,……,σr),σi>0 (i=1,…,r),r=rank(A)。

其中:

  

对任意矩阵A,它的奇异值就是AA’或A’A的非零特征值的开方(它们有相同的非零特征值),这些特征值都是正数。

U 为AA’单位特征向量矩阵。

V为A’A单位特征向量矩阵。

B. 奇异值与特征值的联系

奇异值有类似于特征值的性质,当矩阵为共轭对称矩阵时,特征值=奇异值。不过一般情况是不相同的。

如果把矩阵看做一个线性变换,那么特征值表征了其特征向量方向的能量大小。根据定义我们可以看出,奇异值也有类似的性质

C. 奇异值分解与 PCA的关系

通过变换,我们可以得到:

 

也就是,已知A,V,我们可以求得U。

如上,如果AA’维数过大,计算机不好求解其特征向量U,那么我们可以转而求A’*A的特征向量V。

求解PCA的过程中,对于小样本问题,样本维数M>>样本个数N,那么X*X’得到的协方差矩阵为M*M,不好特征分解。如果我们根据SVD的原理,解X’*X(N*N)的特征向量,最后再变化,也能达到同样的目的。

(SVD具体可见:

http://www.cnblogs.com/LeftNotEasy/archive/2011/01/19/svd-and-applications.html

http://szshdy.blog.163.com/blog/static/1322012512010511156587/ )

通过奇异值分解,得到协方差矩阵特征向量,代码如下:

R=Train_SET'*Train_SET/(Train_NUM-1);   [V,S]=Find_K_Max_Eigen(R,Eigen_NUM);  disc_value=S;  disc_set=zeros(NN,Eigen_NUM);   Train_SET=Train_SET/sqrt(Train_NUM-1);  for k=1:Eigen_NUM   disc_set(:,k)=(1/sqrt(disc_value(k)))*Train_SET*V(:,k);  end

4.完整代码

最终,整合上述的代码,得到如下完整的PCA代码:

function [disc_set,disc_value,Mean_Image]=Eigenface_f(Train_SET,Eigen_NUM)[NN,Train_NUM]=size(Train_SET);if NN<=Train_NUM        Mean_Image=mean(Train_SET,2);     Train_SET=Train_SET-Mean_Image*ones(1,Train_NUM);   R=Train_SET*Train_SET'/(Train_NUM-1);      [V,S]=Find_K_Max_Eigen(R,Eigen_NUM);   disc_value=S;   disc_set=V;else % 小样本问题,svd       Mean_Image=mean(Train_SET,2);     Train_SET=Train_SET-Mean_Image*ones(1,Train_NUM);  R=Train_SET'*Train_SET/(Train_NUM-1);    [V,S]=Find_K_Max_Eigen(R,Eigen_NUM);  disc_value=S;  disc_set=zeros(NN,Eigen_NUM);    Train_SET=Train_SET/sqrt(Train_NUM-1);  for k=1:Eigen_NUM    disc_set(:,k)=(1/sqrt(disc_value(k)))*Train_SET*V(:,k);  endend%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%function [Eigen_Vector,Eigen_Value]=Find_K_Max_Eigen(Matrix,Eigen_NUM)[NN,NN]=size(Matrix);[V,S]=eig(Matrix); %Note this is equivalent to; [V,S]=eig(St,SL); also equivalent to [V,S]=eig(Sn,St); %S=diag(S);[S,index]=sort(S);Eigen_Vector=zeros(NN,Eigen_NUM);Eigen_Value=zeros(1,Eigen_NUM);p=NN;for t=1:Eigen_NUM    Eigen_Vector(:,t)=V(:,index(p));    Eigen_Value(t)=S(p);    p=p-1;end
本文转载自:CSDN博客
1

Matlab与C++混合编程(依赖OpenCV)

Matlab与C++混合编程(依赖OpenCV)

zouxy09@qq.com

http://blog.csdn.net/zouxy09

       之前在运行别人论文的代码的时候,经常有遇到Matlab与C++混合编程的影子。实际上就是通过Matlab的Mex工具将C++的代码编译成Matlab支持调用的可执行文件和函数接口。这样一方面可以在Matlab中利用已经编写好的函数,尽管这个函数是用C++编写的。实现了交流无国界,没有江山一统的谁,只有四海之内皆兄弟的豪气。另一方面,取C++所长补己之短。Matlab擅长矩阵运算,但对循环操作的效率不及C++来得高效,例如Hilbert矩阵的创建。所以对于具有大循环的运算,可以借C++之力来完成。

      看到它的魅力,之前也一直想学下,可惜机缘不对。但在昨天缘分就到了。我需要用到一个论文给出来的代码,但是它的代码是C++的,而且还依赖了OpenCV的库,基于Linux平台。这与实验室给我定出来的平台有很大的不同,我们是得统一基于Windows + Matlab来实现的,这样组内各个同学的工作才好统一。所以没办法了,就得把这个原作者的代码编译成Matlab支持的可执行文件。

一、初级

      在使用MATLAB编译C/C++代码时,我们需要修改C/C++代码,在里面添加Matlab能支持的函数接口。这样Matlab才能调用它。然后再通过Matlab的Mex工具来编译它。下面就具体的举例子说明这两个步骤。

      假设我们有一个很简单的C++代码,实现的就是两个double型数的加法:

mexAdd.cpp

#include using namespace std;double add(double x, double y){    return x + y;}

1、修改代码文件

1)添加头文件mex.h

      在我们的c++文件开头处添加头文件:

#include"mex.h"

2)添加接口函数mexFunction()

      mexFunction的定义为:

void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[])

{

}

       首先,这个函数是没有返回值的。它不是通过返回值把c++代码的计算结果传回Matlab的,而是通过对参数plhs的赋值。例如我们在Matlab中,调用这个add函数一般是这样:

        >> a = 0.5; b = 0.8;

        >> c = add(a, b);

        那mexFunction怎么将输入参数a和b传入给c++的add函数,然后就怎么把计算结果返回给c呢?这些粗重活全部通过mexFunction的四个参数来实现:

         nlhs: 感觉是number of left hand size parameters,也就是Matlab调用语句左边的变量个数,实际上就是需要返回给Matlab的返回值变量有多少个。例如上面c = add(a, b);就只有一个返回参数c,所以nlhs就是1;

         plhs: 感觉是pointer of left hand size parameters,也就是函数返回参数的指针。但它是一个指针数组。换句话说,它是一个数组,每个元素是个指针,每个指针指向一个数据类型为mxArray的返回参数。例如上面c = add(a, b);就只有一个返回参数c,所以该数组只有一个指针,plhs[0]指向的结果会赋值给c。

         nrhs: 这个是number of right hand size parameters,也就是Matlab调用语句右边的变量个数。例如上面c = add(a, b),它给c++代码传入了两个参数a和b,所以nrhs为2;

         prhs:这个是pointer of right hand size parameters,和plhs类似,因为右手面有两个自变量,即该数组有两个指针,prhs[0]指向了a,prhs[1]指向了b。要注意prhs是const的指针数组,即不能改变其指向内容。

       因为Matlab最基本的单元为array,无论是什么类型也好,如有doublearray、 cell array、struct array……所以a,b,c都是array,b = 1.1便是一个1×1的double array。而在C语言中,Matlab的array使用mxArray类型来表示。所以就不难明白为什么plhs和prhs都是指向mxArray类型的指针数组(参考资料[1])。

       那mexFunction函数的函数体要怎么写呢?怎么样通过这个接口函数将Matlab的参数和c++代码中的相对应的参数联系起来呢?我们先把这个代码全部展现出来。

       最后的mexAdd.cpp是这样:

mexAdd.cpp

#include "opencv2/opencv.hpp"#include "mex.h"double add(double x, double y){    return x + y;} void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]){	double *a;    double b, c;    plhs[0] = mxCreateDoubleMatrix(1, 1, mxREAL);    a = mxGetPr(plhs[0]);    b = *(mxGetPr(prhs[0]));    c = *(mxGetPr(prhs[1]));    *a = add(b, c);}

      mexFunction的内容是什么意思呢?我们知道,如果在Matlab中这样调用函数时:

      >> output = add(0.5, 0.8);

      在未涉及具体的计算时,output的值是未知的,是未赋值的。所以在具体的程序中,我们建立一个1×1的实double矩阵(使用 mxCreateDoubleMatrix函数,其返回指向刚建立的mxArray的指针),然后令plhs[0]指向它。接着令指针a指向plhs [0]所指向的mxArray的第一个元素(使用mxGetPr函数,返回指向mxArray的首元素的指针)。同样地,我们把prhs[0]和prhs [1]所指向的元素(即0.5和0.8)取出来赋给b和c。于是我们可以把b和c作自变量传给函数add,得出给果赋给指针a所指向的mxArray中的元素。因为a是指向plhs[0]所指向的mxArray的元素,所以最后作输出时,plhs[0]所指向的mxArray赋值给output,则 output便是已计算好的结果了。

       实际上mexFunction是没有这么简单的,我们要对用户的输入自变量的个数和类型进行测试,以确保输入正确。如在add函数的例子中,用户输入char array便是一种错误了。

       从上面的讲述中我们总结出,MEX文件实现了一种接口,把C语言中的计算结果适当地返回给Matlab罢了。当我们已经有用C编写的大型程序时,大可不必在 Matlab里重写,只写个接口,做成MEX文件就成了。另外,在Matlab程序中的部分计算瓶颈(如循环),可通过MEX文件用C语言实现,以提高计算速度(参考资料[1])。

2、编译修改后的c++文件

       文件修改完后,我们需要将他编译,生成Matlab支持的可执行文件。这里需要的是Matlab自带的Mex工具。但在编译器,我们需要配置下这个工具,告诉它你要采用什么编译器来编译我们的c/c++代码。在Matlab中运行:

       >> mex -setup

       就会出现叫你选择一个默认的编译器。例如我这里是叫选择Matlab自带的Lcc或者我自己在电脑上安装的Microsoft Visual C++ 2010。一般都是选择后者。配置这个就可以编译了。编译也有以下几种情况:

>> mex XXX.cpp

>> mex X1.cpp X2.cpp X3.cpp %多个cpp文件,且有依赖。生成的库名字叫X1

>> mex -O X1.cpp  %大写O选项,优化编译

>> mex -largeArrayDims X1.cpp %对64位系统,通过这个选项来指定使用处理大容量数组的API。因为Matlab与C++之间的接口是以32位系统作为标准的,这就导致了人们在处理大容量数据时没办法利用C和C++语言的速度优势。但对64位系统来说,系统资源一般都比32位系统要充足,所以指定该接口,让它对大容量数据处理更游刃有余。

       还有一些编译选项,和gcc一样。例如-I指定额外需要include的目录,-L指定额外需要连接的库的目录,-l指定额外需要链接的库等。

       对于我们的程序就简单了。在MATLAB命令窗口输入以下命令:mexmexAdd.cpp,即可编译成功。编译成功后,在同文件夹下会出现一个同名的,但后缀是mexw32(32位的系统)或者mexw64(64位的系统)的文件,例如mexAdd.mexw32。然后在Matlab中就可以直接调用它来运算了:

       >> ans = mexAdd(0.5, 0.8);

二、进阶

       上面我们针对的是处理标量的情况,也就是数a,b或者c。这节我们让它处理二维数组,也就是图像。为了验证,我们很傻瓜地完成以下功能:

        >> [grayImage] =RGB2Gray(‘imageFile.jpeg’);

       也就是将一个图像文件名,传递给c++的代码,然后c++代码将这个图像读入,再转成灰度图,然后返回给Matlab。而c++代码里面的图像读入和灰度转换的操作通过调用OpenCV的库函数来实现。是不是很傻瓜呢?因为Matlab已经有实现同样功能的函数了。对,没错,就是多此一举。但我们只是为了说明二维数组的传递过程,没有什么用意。不过,如果要计算两个图像的光流的话,Matlab可能就真正需要OpenCV的帮助了。

       另外,因为cpp文件要链接OpenCV的库,所以为了统一或者规范编译工程,我写了一个make.m文件,它的功能类似于Makefile,实际上就实现了mex编译这个工程时候的编译规则。具体可以看后面的代码,然后就知道在里面做了什么了。

       首先是RGB2Gray.cpp代码:

// Interface: convert an image to gray and return to Matlab// Author : zouxy// Date   : 2014-03-05// HomePage : http://blog.csdn.net/zouxy09// Email  : zouxy09@qq.com#include "opencv2/opencv.hpp"#include "mex.h"using namespace cv;/*******************************************************Usage: [imageMatrix] = RGB2Gray('imageFile.jpeg');Input: 	a image fileOutPut: 	a matrix of image which can be read by Matlab**********************************************************/void exit_with_help(){	mexPrintf(	"Usage: [imageMatrix] = DenseTrack('imageFile.jpg');\n"	);}static void fake_answer(mxArray *plhs[]){	plhs[0] = mxCreateDoubleMatrix(0, 0, mxREAL);}void RGB2Gray(char *filename, mxArray *plhs[]){	// read the image	Mat image = imread(filename);	if(image.empty()) {		mexPrintf("can't open input file %s\n", filename);		fake_answer(plhs);		return;	}		// convert it to gray format	Mat gray;	if (image.channels() == 3)		cvtColor(image, gray, CV_RGB2GRAY);	else		image.copyTo(gray);		// convert the result to Matlab-supported format for returning	int rows = gray.rows;	int cols = gray.cols;	plhs[0] = mxCreateDoubleMatrix(rows, cols, mxREAL);	double *imgMat;    imgMat = mxGetPr(plhs[0]);	for (int i = 0; i < rows; i++)		for (int j = 0; j < cols; j++)			*(imgMat + i + j * rows) = (double)gray.at(i, j);		return;}void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]){	if(nrhs == 1)	{		char filename[256];		mxGetString(prhs[0], filename, mxGetN(prhs[0]) + 1);		if(filename == NULL)		{			mexPrintf("Error: filename is NULL\n");			exit_with_help();			return;		}		RGB2Gray(filename, plhs);	}	else	{		exit_with_help();		fake_answer(plhs);		return;	}}

       和上面的相比,里面多了几个东西。第一个就是传入参数的测试,看看Matlab传入的参数是否存在错误,还包括了些异常处理。第二个就是帮助信息。第三个就是主要的实现函数了。只有OpenCV的读图像和灰度转换这里就不讲了,就是两个函数的调用。关键的地方还是如果把一个图像,也就是二维数组,传递给mexFunction的参数,让它返回给Matlab。实际上,我们只要清楚一点:

        plhs[0] = mxCreateDoubleMatrix(2, 3,mxREAL);

        这个函数建立的矩阵的指针plhs[0]是按照列的方式来存储的。假设imgMat是它的指针,那么*(imgMat+1)就是矩阵元素[1, 0],*(imgMat+2)就是矩阵元素[0, 1],*(imgMat+4)就是矩阵元素[0, 2]。上面的代码就是按照这个方式,将图像gray中像素值赋值给参数plhs[0]相应的位置(实际上也许可以直接内存拷贝,但因为里面是指针操作,涉及到局部变量gray的销毁问题,所以就简单的用上面的笨但稳当的方式来实现了)。

       好了,下面是make.m文件。里面需要获取你的电脑的系统版本是32还是64位的,来选择编译选项。然后添加OpenCV的相关配置。如果您需要使用使用,请修改成您的OpenCV的相关目录。然后给出一个需要编译的文件的列表。最后分析这个列表,加上编译选项,用mex来编译列表里面的所有文件。

%// This make.m is for MATLAB%// Function: compile c++ files which rely on OpenCV for Matlab using mex%// Author : zouxy%// Date   : 2014-03-05%// HomePage : http://blog.csdn.net/zouxy09%// Email  : zouxy09@qq.com%% Please modify your path of OpenCV%% If your have any question, please contact Zou Xiaoyi% Notice: first use "mex -setup" to choose your c/c++ compilerclear all;%-------------------------------------------------------------------%% get the architecture of this computeris_64bit = strcmp(computer,'MACI64') || strcmp(computer,'GLNXA64') || strcmp(computer,'PCWIN64');%-------------------------------------------------------------------%% the configuration of compiler% You need to modify this configuration according to your own path of OpenCV% Notice: if your system is 64bit, your OpenCV must be 64bit!out_dir='./';CPPFLAGS = ' -O -DNDEBUG -I.\ -ID:\OpenCV_64\include'; % your OpenCV "include" pathLDFLAGS = ' -LD:\OpenCV_64\lib';					   % your OpenCV "lib" pathLIBS = ' -lopencv_core240 -lopencv_highgui240 -lopencv_video240 -lopencv_imgproc240';if is_64bit	CPPFLAGS = [CPPFLAGS ' -largeArrayDims'];end%% add your files here!compile_files = { 	% the list of your code files which need to be compiled	'RGB2Gray.cpp'};%-------------------------------------------------------------------%% compiling...for k = 1 : length(compile_files)    str = compile_files{k};    fprintf('compilation of: %s\n', str);    str = [str ' -outdir ' out_dir CPPFLAGS LDFLAGS LIBS];    args = regexp(str, '\s+', 'split');    mex(args{:});endfprintf('Congratulations, compilation successful!!!\n');

三、使用方法和结果

1、编译

      直接在Matlab中运行make.m。即可生成RGB2Gray.mexw64。然后在Matlab中运行:

      >> img = RGB2Gray(‘d:\test.jpg’);

      >> imshow(uint8(img));

      即可显示转换结果,如图:

注: 以上Matlab的说明都是在你的cpp文件所在目录下。

四、参考资料

[1] 如何写mexFunction函数

[2] matlab用mex编译cpp文件

本文转载自:zouxy09的专栏
applogo-ss

Matlab神经网络工具箱学习之二 – zhyhooo

螃蟹的分类

这个例子的目的是根据螃蟹的品种、背壳的长宽等等属性来判断螃蟹的性别,雄性还是雌性。

训练数据一共有六个属性:

species, frontallip, rearwidth, length, width and depth.

这里每个属性所对应的是螃蟹哪一部分的真实性状并不是关键。我们关心的只是已知样本是6维的向量,输出是0/1值,

求分类效果最好的网络模型。

首先载入样本数据

[x,t] = crab_dataset;
% size(x) = [6, 200];
% size(t) = [2, 200];

然后初始化神经网络

例子里使用含有一层隐含层的模型,隐含层有10个神经元。调用Matlab的patternnet()函数。

net = patternnet(10);view(net)
----------------------------

patternnet()函数的参数有 (hiddenSizes,trainFcn,performFcn)三个。hiddenSizes默认值是10,可以用数组表示多个隐含层。trainFcn默认值是 ‘trainscg’, Performance function默认值是 'crossentropy'。如果想要有两个隐含层,每层的神经元都是10个,则可以写成 

net = patternnet([10,10]);

接下去是训练网络。把网络模型、训练样本和测试样本作为参数传入train()函数。系统自动将数据分为training和validation.

[net,tr] = train(net,x,t);nntraintool
--------------------------

可以在窗口点击Performance或者调用PLOTPERFORM查看训练的过程。

plotperform(tr)
------------------------

测试分类器

testX = x(:,tr.testInd);testT = t(:,tr.testInd);testY = net(testX);testIndices = vec2ind(testY);

plotconfusion(testT,testY);

下图表示的是分类器的效果。绿色表示分类器的分类结果和测试数据的label一致,红色表示两者结果不一致。红色区域内的百分数越小,说明分类器结果的误差越小,分类结果越好。如果误差过大,则需要增加样本进再行训练或者增加隐含层的神经元个数。

本文转载自:博客园-所有随笔区
3

基于Matlab的BP神经网络–源代码与工具箱实现

    因为最近项目要使用BP神经网络来做一些飞行预测,所以今天从图书馆借来了《Matlab神经网络30个案例分析》,这本书很不错推荐给大家,然后研究了下代码,使用语音分类这个例子做了源码实现与工具箱实现,源码实现过程中进行了小小的改变,工具箱用起来非常方便,但是手写一下BP神经网络的前向后向对于理解BP神经网络还是有极大帮助的,这里把这两种实现方式贴出来并带有结果截图。显然BP神经网络对于这种非线性拟合场合效果是非常好的。

     (1) 源码实现 

%% 清空环境变量clcclear%% 训练数据预测数据提取及归一化%下载四类语音信号load data1 c1load data2 c2load data3 c3load data4 c4%四个特征信号矩阵合成一个矩阵data(1:500,:) = c1(1:500,:);data(501:1000,:) = c2(1:500,:);data(1001:1500,:) = c3(1:500,:);data(1501:2000,:) = c4(1:500,:);%从1到2000产生随机数k = rands(1,2000);[m,n] = sort(k);%%提取输入、输出数据input= data(:,2:25);output1= data(:,1);%将输出数据由一维变为四维for i = 1:1:2000	switch output1(i)		case 1 			output(i,:) = [1 0 0 0];		case 2			output(i,:) = [0 1 0 0];		case 3			output(i,:) = [0 0 1 0];		case 4			output(i,:) = [0 0 0 1];	endend%随机提取1500个测试数据,500个样本为预测数据input_train = input(n(1:1500),:)';output_train = output(n(1:1500),:)';input_test = input(n(1501:2000),:)';output_test = output(n(1501:2000),:)';%归一化[inputn,inputps] = mapminmax(input_train);%变量、权值初始化innum = 24;midnum = 25;outnum = 4;w1 = rands(midnum,innum);b1 = rands(midnum,1);w2 = rands(outnum,midnum);b2 = rands(outnum,1);w1_1 = w1;b1_1 = b1;w2_1 = w2;b2_1 = b2;xite = 0.1%%网络训练for ii=1:10	E(ii)=0;	for i = 1:1500		x = inputn(:,i);		for j = 1:1:midnum			%%计算隐层值			I(j) = inputn(:,i)'*w1(j,:)' + b1(j);			Iout(j) =1/(1+exp(-I(j)));		end				%%计算输出层值		yn = w2*Iout' + b2;				%%计算误差		e = output_train(:,i)-yn;		E(ii) = E(ii) + sum(abs(e));				%%计算权值变化率		dw2 = e*Iout;		db2 = e;				for j = 1:1:midnum			S = 1/(1+exp(-I(j)));			FI(j) = S*(1-S);		end				for k = 1:innum			for j = 1:midnum				dw1(j,k) = FI(j)*x(k)*(w2(:,j)'*e);				db1(j) = FI(j)*(w2(:,j)'*e);			end		end				%%更新权值		w1 = w1_1 + xite*dw1;		b1 = b1_1 + xite*db1';		w2 = w2_1 + xite*dw2;		b2 = b2_1 + xite*db2;				w1_1 = w1;		b1_1 = b1;		w2_1 = w2;		b2_1 = b2;	endend%%语音信号分类inputn_test = mapminmax('apply',input_test,inputps);for i=1:1:500	for j = 1:1:midnum		I(j) = inputn_test(:,i)'*w1(j,:)' + b1(j);		Iout(j) = 1/(1+exp(-I(j)));	end		fore(:,i) = w2*Iout' + b2;end%%计算误差for i =1:1:500	output_fore(i) = find(fore(:,i) == max(fore(:,i)));end	error =output_fore - output1(n(1501:2000))';%画出预测语音种类和实际语音种类的分类图figure(1)plot(output_fore,'r')hold onplot(output1(n(1501:2000))','b')legend('预测语音类别','实际语音类别')%画出误差图figure(2)plot(error)title('BP网络分类误差','fontsize',12)xlabel('语音信号','fontsize',12)ylabel('分类误差','fontsize',12)%print -dtiff -r600 1-4k=zeros(1,4);  %找出判断错误的分类属于哪一类for i=1:1:500	if error(i)~=0		[b,c]=max(output_test(:,i));		switch c			case 1                 k(1) = k(1) +1;			case 2                 k(2) = k(2) +1;			case 3                k(3) = k(3) +1;			case 4                k(4) = k(4) +1;		end	endend				%找出每类的个体和kk=zeros(1,4);for i=1:500    [b,c]=max(output_test(:,i));    switch c        case 1            kk(1)=kk(1)+1;        case 2            kk(2)=kk(2)+1;        case 3            kk(3)=kk(3)+1;        case 4            kk(4)=kk(4)+1;    endendradio = (kk-k)./kk

 

  (2) 工具箱实现

%清空环境变量clcclear%下载输入输出数据load data1 c1load data2 c2load data3 c3load data4 c4data(1:500,:) = c1(1:500,:);data(501:1000,:) = c2(1:500,:);data(1001:1500,:) = c3(1:500,:);data(1501:2000,:) = c4(1:500,:);input = data(:,2:25);output1 = data(:,1);k = rands(1,2000);[m,n] = sort(k);input_train = input(n(1:1500),:)';output_train = output1(n(1:1500),:)';input_test = input(n(1501:2000),:)';output_test = output1(n(1501:2000),:)';[inputn,inputps] = mapminmax(input_train);[outputn,outputps] = mapminmax(output_train);%BP神经网络构建net = newff(inputn,outputn,25);%网络参数配置net.trainParam.epochs = 100;net.trainParam.lr = 0.1;net.trainParam.goal = 0.00004;%BP神经网络训练net = train(net,inputn,outputn);%预测数据归一化inputn_test = mapminmax('apply',input_test,inputps);%BP神经网络预测输出an = sim(net,inputn_test);%输出结果反归一化BPoutput = mapminmax('reverse',an,outputps);%网络预测结果图形figure(1)plot(BPoutput,':og');hold onplot(output_test,'-*');legend('预测输出','期望输出')title('BP网络预测输出','fontsize',12)ylabel('函数输出','fontsize',12)xlabel('样本','fontsize',12)%预测误差error = BPoutput - output_test;figure(2)plot(error,'-*')title('BP网络预测误差','fontsize',12)ylabel('误差','fontsize',12)xlabel('样本','fontsize',12)figure(3)plot((output_test-BPoutput)./BPoutput,'-*');title('神经网络预测误差百分比')errorsum = sum(abs(error))

 

本文转载自:CSDN博客
4

初步体验libsvm用法(matlab实例)

最近在研究svm,当然就无法错过台湾的大牛写得libsvm了,现在我们就用libsvm在matlab中做一些实验来看看效果:

当然,首先得学会使用在matlab中学会使用libsvm,详情请见我的微博:

http://blog.csdn.net/urtheappleinmyeye/article/details/20386465


一、16棋盘格数据分类

试验目的: 产生16棋盘的训练数据,用svm训练出一个模型,然后对新来的样本进行分类预测。

试验说明:

  1. 训练数据样本数为1600个,即每个格子中随机产生100个数据点,分为2类,2种颜色分布在16个格子中,相交分布。测试数据样本点数为320个,即每个格子中的数据点为20个。
  2. 如果分类预测正确,则用绿色画出,预测错误,则用红色画出。

实验结果:

产生的训练样本分布图如下所示:

训练样本分布图如下:

预测结果如下(如果分类正确用绿色显示,否则用红色显示):

实验结果评价指标:

最后的预测准确度,MSE等指标如下所示:

实验源码即注释:


%% 随机产生16棋盘格数据点作为训练样本,每个数据格100个样本%%样本数据放在checkerboard_16数组中train_num=100;num=0;for i=1:4    for j=1:4        num=num+1;        yellowflag=mod(num+i,2);%d当redflag=1时,产生样本点用红色表示,否则用绿色表示,主要这里是num+i对2取模        x=randi([100*(i-1) 100*i],train_num,1);  %产生100个横坐标        y=randi([100*(j-1) 100*j],train_num,1);  %产生100个列坐标        if yellowflag            z=randi([1 1],train_num,1);        else            z=randi([0 0],train_num,1);        end        checkerboard_16b(:,:,num)=[x y z];        if num==1            checkerboard_16=checkerboard_16b(:,:,num);        elseif(num>1)            checkerboard_16=[checkerboard_16;checkerboard_16b(:,:,num)];%递归调用时一定要小心        end    endend%% 画出产生的16棋盘样本训练数据分布示意图for k=1:1600    if checkerboard_16(k,3)==1        plot(checkerboard_16(k,2),checkerboard_16(k,1),'yo');    else         plot(checkerboard_16(k,2),checkerboard_16(k,1),'go');    end    hold on %每次画完后要hold on,当然也可以使plot的2个参数为向量endtitle('训练数据分布');axis([-10 420 -20 420]);        %% 用svm训练分类模型checkerboard_16_label=checkerboard_16(:,end);%取出样本类标签checkerboard_16_data=checkerboard_16(:,1:end-1);%取出样本属性model=svmtrain(checkerboard_16_label,checkerboard_16_data)%% 随机产生16棋盘格数据点作为训练样本,每个数据格100个样本%%样本数据放在checkerboard_16_test数组中train_num=20;num=0;for i=1:4    for j=1:4        num=num+1;        redflag=mod(num+i,2);%d当redflag=1时,产生样本点用红色表示,否则用绿色表示        x=randi([100*(i-1) 100*i],train_num,1);  %产生100个横坐标        y=randi([100*(j-1) 100*j],train_num,1);  %产生100个列坐标        if redflag            z=randi([1 1],train_num,1);        else            z=randi([0 0],train_num,1);        end        checkerboard_16_test_b(:,:,num)=[x y z];        if num==1            checkerboard_16_test=checkerboard_16_test_b(:,:,num);        elseif(num>1)            checkerboard_16_test=[checkerboard_16_test;checkerboard_16_test_b(:,:,num)];%递归调用时一定要小心        end    endend%% 画出%% 画出产生的16棋盘样本测试数据分布示意图figure;for k=1:320    if checkerboard_16_test(k,3)==1        plot(checkerboard_16_test(k,2),checkerboard_16_test(k,1),'yo');    else         plot(checkerboard_16_test(k,2),checkerboard_16_test(k,1),'go');    end    hold on %每次画完后要hold on,当然也可以使plot的2个参数为向量endtitle('测试原数据分布');axis([-10 420 -20 420]);%% 用svm进行预测checkerboard_16_test_label=checkerboard_16_test(:,end);checkerboard_16_test_data=checkerboard_16_test(:,1:end-1);[checkerboard_16_predict_label,checkerboard_16_accuarcy]=svmpredict(checkerboard_16_test_label,checkerboard_16_test_data,model)%% 画出预测数据样本点的分布,并将预测错误的点用红色标记出来,正确预测的用绿色标记出来figure;for k=1:320    if checkerboard_16_predict_label(k)==1 && checkerboard_16_test_label(k)==1         plot(checkerboard_16_test(k,2),checkerboard_16_test(k,1),'go');    elseif checkerboard_16_predict_label(k)==0 && checkerboard_16_test_label(k)==0         plot(checkerboard_16_test(k,2),checkerboard_16_test(k,1),'go');    else         plot(checkerboard_16_test(k,2),checkerboard_16_test(k,1),'ro');    end    hold on %每次画完后要hold on,当然也可以使plot的2个参数为向量endtitle('分类预测数据分布');axis([-10 420 -20 420]);

实验总结:

    由实验结果可知,预测准确度才65.9375%,比较低。原因是svm在训练的过程中采用的是默认参数,实际上我们应该对这些参数进行寻有,或者采用暴力查找。

二、UCI中iris数据分类

实验数据:

本次试验数据来源于 http://archive.ics.uci.edu/ml/  中的risi数据,其数据类别分为3类,setosa,versicolor,virginica.每类植物有50个样本,共150个。每个样本有4个属性,分别为花萼长,花萼宽,花瓣长,花瓣宽。

数据格式如下所示:

试验目的: 用样本中的数据训练处的模型对新来的样本进行分类。

试验步骤:

  1. 把数据分为2部分,训练数据每类40个,测试数据每类10个。
  2. 用svm对训练数据进行学习。
  3. 用学习到的模型对预测数据进行分类。

实验代码:


1 load iris_new.data2 iris_train_label=iris_new([1:40 51:90 101:140],end);%每类取40个数据作为训练,共120个训练数据3 iris_train_data=iris_new([1:40 51:90 101:140],1:end-1);4 iris_test_label=iris_new([41:50 91:100 141:150],end);%每类取10个数据作为测试,共30个测试数据5 iris_test_data=iris_new([41:50 91:100 141:150],1:end-1);6 save irisdata;7 model=svmtrain(iris_train_label,iris_train_data);8 [iris_predict_label,iris_accuracy]=svmpredict(iris_test_label,iris_test_data,model)

实验结果:

可见分类准确度为100%。

本文转载自:CSDN博客
12

Matlab神经网络工具箱学习之一 – zhyhooo

1、神经网络设计的流程

2、神经网络设计四个层次

3、神经网络模型

4、神经网络结构

5、创建神经网络对象

6、配置神经网络的输入输出

7、理解神经网络工具箱的数据结构

8、神经网络训练

1、神经网络设计的流程

神经网络设计可以分为七个步骤:

a. 采集数据

b. 创建网络

c. 配置网络参数

d. 初始化权重和偏置

e. 训练神经网络

f. 验证网络

g. 使用网络

2、神经网络设计四个层次

这里的层次主要只Matlab的神经网络工具箱和相关命令

a. 第一层是“Getting Started with Neural Network Toolbox”里面提到的GUI,可以方便、快速的解决拟合、模式识 别、聚类、时序分析等问题。

b. 第二层是使用命令行输入。

c. 第三层是工具箱的个性化配置,根据需求选择参数。

d. 第四层是自己修改.M文件,以适应需要解决的问题。

3、神经网络模型

基本神经元

神经网络的最基本结构是神经元。下图是单个神经元的图示。一个神经元包含了输入p,此输入的权重w,外加偏置b,这 三项元素组成了转移方程f的输入,经过f的计算,得到输出a。网络的权重方程通常是取w和p的乘积,有时也使用|w-p|(参 见help nnweight)。网络的输入方程n一般是各项wp的累加和,有时也用乘积(参见help nnnetinput)。

神经网络的训练目的就是迭代计算,每次调整w和b的值,使得网络的输出结果和目标结果的误差最小。

转移方程

常用的转移方程有两种,线性转移方程和Log-sigmoid转移方程。前一种多用于网络最后一层(输出层),而后者多用 于网络的中间层(参见help nntransfer)。

神经元的向量输入

通常每个节点的输入 p 会是一个多维(N维)的向量,所以每个节点的权重也是N维的向量 w ,偏置b还是一个常数。转移 方程的输入就可以写为n = w * p + b,n仍旧是一个标量。

4、神经网络结构

神经网络按照结构可以分为单层网络和多层网络,每一层网络又可以包含多个节点(神经元),最后构成一个完整的模型。

一层网络

下图是一层网络的模型图,输入有R个元素,每个元素 P r是一个向量。中间层有S个节点(神经元), W sr表示第S个节 点对第r个输入的权重。 b s表示第s个节点的偏置。权重 W 就变成了一个SxR的矩阵。 P 是RxN的矩阵, b 是一个S维的向量。

多层网络

多层网络类似于是多级运算放大电路,把多个一层网络串联。每一层都有一个权重矩阵 W 和偏置向量 b 。上一层的输出作为下一层的输入。

5、创建神经网络对象

创建一个简单的网络模型可以用feedforwardnet( )函数

net = feedforwardnet

这时会显示很多的模型参数。dimensions表示整个网络的结构。connections保存网络各节点之间的连接状态,0表示没有连接,1表示有连接。layerConnect矩阵是各个网络层之间的连接,行表示目标层,列表示源层。

关键的几个参数是inputs, layers, outputs, biases, inputWeights and layerWeights.

6、配置神经网络的输入输出

网络的输入输出配置可以用configure()函数。

net1 = configure(net, input, target_output);

%% an example of BP network

load data input output

% shuffle index 
k = rand(1,2000); 
[m,n] = sort(k);

% prepare for training data 1900 out of 2000 
input_train=input(n(1:1900),:)'; 
output_train=output(n(1:1900)); 
% prepare for test data 100 out of 2000 
input_test=input(n(1901:2000),:)'; 
output_test=output(n(1901:2000));

% normalize training and test data to [-1, 1] 
[inputn,inputps]=mapminmax(input_train); 
[outputn,outputps]=mapminmax(output_train);

% initial NN model and set parameters 
net=newff(inputn,outputn,5); 
net.trainParam.epochs=100; % iteration times 
net.trainParam.lr=0.1; % learning rate 
net.trainParam.goal=0.00004; 
net=train(net,inputn,outputn);

% normalize test data 
inputn_test=mapminmax(‘apply’,input_test,inputps); 
% predict output 
an=sim(net,inputn_test); 
% de-normalize test data 
BPoutput=mapminmax(‘reverse’,an,outputps);


% plot predict output 
figure(1) 
plot(BPoutput,':og’) 
hold on 
plot(output_test,’-*’); 
legend(‘test output’,’expected output’) 
title(‘BP neural network test output’,’fontsize’,12) 
ylabel(‘output’,’fontsize’,12) 
xlabel(‘sample’,’fontsize’,12)

% plot errors 
error=BPoutput-output_test; 
figure(2) 
plot(error,’-*’) 
title(‘BP neural network error’,’fontsize’,12) 
ylabel(‘error’,’fontsize’,12) 
xlabel(‘samole’,’fontsize’,12)

本文转载自:博客园-所有随笔区

Posts navigation

  • 7
    点赞
  • 70
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值