Classification

http://homepages.inf.ed.ac.uk/rbf/HIPR2/classify.htm

Classification


Common Names: Classification

Brief Description

Classification includes a broad range of decision-theoretic approaches to the identification of images (or parts thereof). All classification algorithms are based on the assumption that the image in question depicts one or more features (e.g., geometric parts in the case of a manufacturing classification system, or spectral regions in the case of remote sensing, as shown in the examples below) and that each of these features belongs to one of several distinct and exclusive classes. The classes may be specified a priori by an analyst (as in supervised classification) or automatically clustered (i.e. as in unsupervised classification) into sets of prototype classes, where the analyst merely specifies the number of desired categories. (Classification and segmentation have closely related objectives, as the former is another form of component labeling that can result in segmentation of various features in a scene.)

How It Works

Image classification analyzes the numerical properties of various image features and organizes data into categories. Classification algorithms typically employ two phases of processing: training and testing. In the initial training phase, characteristic properties of typical image features are isolated and, based on these, a unique description of each classification category, i.e. training class, is created. In the subsequent testing phase, these feature-space partitions are used to classify image features.

The description of training classes is an extremely important component of the classification process. In supervised classification, statistical processes (i.e. based on an a priori knowledge of probability distribution functions) ordistribution-free processes can be used to extract class descriptors. Unsupervised classification relies on clustering algorithms to automatically segment the training data into prototype classes. In either case, the motivating criteria for constructing training classes is that they are:

  • independenti.e. a change in the description of one training class should not change the value of another,

  • discriminatoryi.e. different image features should have significantly different descriptions, and

  • reliable, all image features within a training group should share the common definitive descriptions of that group.

A convenient way of building a parametric description of this sort is via a feature vector Eqn:eqnv, where n is the number of attributes which describe each image feature and training class. This representation allows us to consider each image feature as occupying a point, and each training class as occupying a sub-space (i.e. a representative point surrounded by some spread, or deviation), within the n-dimensional classification space. Viewed as such, the classification problem is that of determining to which sub-space class each feature vector belongs.

For example, consider an application where we must distinguish two different types of objects (e.g. bolts and sewing needles) based upon a set of two attribute classes (e.g. length along the major axis and head diameter). If we assume that we have a vision system capable of extracting these features from a set of training images, we can plot the result in the 2-D feature space, shown in Figure 1.




Figure 1 Feature space: + sewing needles, o bolts.

At this point, we must decide how to numerically partition the feature space so that if we are given the feature vector of a test object, we can determine, quantitatively, to which of the two classes it belongs. One of the most simple (although not the most computationally efficient) techniques is to employ a supervised, distribution-free approach known as the minimum (mean) distance classifier. This technique is described below.

Minimum (Mean) Distance Classifier

Suppose that each training class is represented by a prototype (or mean) vector:

Eqn:eqncl1

where Eqn:eqnnj is the number of training pattern vectors from class Eqn:eqnomegj. In the example classification problem given above, Eqn:eqnmneed and Eqn:eqnmbolt as shown in Figure 2.




Figure 2 Feature space: + sewing needles, o bolts, * class mean

Based on this, we can assign any given pattern Eqn:eqnx to the class of its closest prototype by determining its proximity to each Eqn:eqnmj. If Euclidean distance is our measure of proximity, then the distance to the prototype is given by

Eqn:eqnclDJ

It is not difficult to show that this is equivalent to computing

Eqn:eqncl2

and assign Eqn:eqnx to class Eqn:eqnomegj if Eqn:eqndj yields the largest value.

Returning to our example, we can calculate the following decision functions:

Eqn:eqncl3
Eqn:eqncl4

Finally, the decision boundary which separates class Eqn:eqnomegi from Eqn:eqnomegj is given by values for Eqn:eqnx for which

Eqn:eqncl5

In the case of the needles and bolts problem, the decision surface is given by:

Eqn:eqncl6

As shown in Figure 3, the surface defined by this decision boundary is the perpendicular bisector of the line segment joining Eqn:eqnmi and Eqn:eqnmj.




Figure 3 Feature space: + sewing needles, o bolts, * class mean, line = decision surface

In practice, the minimum (mean) distance classifier works well when the distance between means is large compared to the spread (or randomness) of each class with respect to its mean. It is simple to implement and is guaranteed to give an error rate within a factor of two of the ideal error rate, obtainable with the statistical, supervised Bayes' classifier. The Bayes' classifier is a more informed algorithm as the frequencies of occurrence of the features of interest are used to aid the classification process. Without this information the minimum (mean) distance classifier can yield biased classifications. This can be best combatted by applying training patterns at the natural rates at which they arise in the raw training set.

Guidelines for Use

To illustrate the utility of classification (using the minimum (mean) distance classifier), we will consider a remote sensing application. Here, we have a collection of multi-spectral images (i.e. images containing several bands, where each band represents a single electro-magnetic wavelength or frequency) of the planet Earth collected from a satellite. We wish to classify each image pixel into one of several different classes (e.g. water, city, wheat field, pine forest, cloud,etc.) on the basis of the spectral measurement of that pixel.

Consider a set of images of the globe (centered on America) which describe the visible

bvs1

and infra-red

bir1

spectrums, respectively. From the histograms of the visible band image

bvs1hst1

and infra-red band image

bir1hst1

we can see that it would be very difficult to find a threshold, or decision surface, with which to segment the images into training classes (e.g. spectral classes which correspond to physical phenomena such as cloud, ground, water, etc.). It is often the case that having a higher dimensionality representation of this information (i.e. using one 2-D histogram instead of two 1-D histograms) facilitates segmentation of regions which might overlap when projected onto a single axis, as shown for some hypothetical data in Figure 4.




Figure 4 2-D feature space representation of hypothetical data. (The projection of the data onto the X-axis is equivalent to a 1-D histogram.)

Since the images over America are registered, we can combine them into a single two-band image and find the decision surface(s) which divides the data into distinct classification regions in this higher dimensional representation. To this aim, we use a k-means algorithm to find the training classes of the 2-D spectral images. (This algorithm converts an input image into vectors of equal size (where the size of each vector is determined by the number of spectral bands in the input image) and then determines the k prototype mean vectors by minimizing of the sum of the squared distances from all points in a class to the class center Eqn:eqnm.)

If we choose k=2 as a starting point, the algorithm finds two prototype mean vectors, shown with a * symbol in the 2-D histogram

bvi1tdh1

This figure also shows the linear decision surface which separates out our training classes.

Using two training classes, such as those found for the image over America, we can classify a similar multi-spectral image of Africa

avs1

(visible) and

air1

(infra-red) to yield the result:

avi2cls1

(Note that the image size has been scaled by a factor of two to speed up computation, and a border has been placed around the image to mask out any background pixels.) We can see that one of the classes created during the training process contains pixels corresponding to land masses over north and south Africa, whereas the pixels in the other class represent water or clouds.

Classification accuracy using the minimum (mean) distance classifier improves as we increase the number of training classes. The images

avi2cls4

and

avi2cls5

show the results of the classification procedure using k=4 and k=6 training classes. The equivalent with a color assigned to each class is shown in

avi2cls2

and

avi2cls3

for k=4 and k=6, respectively. Here we begin to see the classification segmenting out regions which correspond to distinct physical phenomena.

Common Variants

Classification is such a broad ranging field, that a description of all the algorithms could fill several volumes of text. We have already discussed a common supervised algorithm, therefore in this section we will briefly consider a representative unsupervised algorithm. In general, unsupervised clustering techniques are used less frequently, as the computation time required for the algorithm to learn a set of training classes is usually prohibitive. However, in applications where the features (and relationships between features) are not well understood, clustering algorithms can provide a viable means for partitioning a sample space.

A general clustering algorithm is based on a split and merge technique, as shown in Figure 5. Using a similarity measure (e.g. the dot product of two vectors, the weighted Euclidean distance, etc.), the input vectors can be partitioned into subsets, each of which should be sufficiently distinct. Subsets which do not meet this criterion are merged. This procedure is repeated on all of the subsets until no further splitting of subsets occurs or until some stopping criteria is met.




Figure 5 General clustering algorithm

Interactive Experimentation

You can interactively experiment with this operator by clicking here.

Exercises

  1. In the classification of natural scenes, there is often the problem that features we want to classify occur at different scales. For example, in constructing a system to classify trees, we have to take into account that trees close to the camera will appear large and sharp, while those at some distance away may be small and fuzzy. Describe how one might overcome this problem.

  2. The following table gives some training data to be used in the classification of flower types. Petal length and width are given for two different flowers. Plot this information on a graph (utilizing the same scale for the petal lengthand petal width axes) and then answer the questions below.

     ------------------------------------
    | Petal Length | Petal Width | Class |
    |--------------+-------------+-------|
    | 4            | 3           | 1     |
    |--------------+-------------+-------|
    | 4.5          | 4           | 1     |
    |--------------+-------------+-------|
    | 3            | 4           | 1     |
    |--------------+-------------+-------|
    | 6            | 1           | 2     |
    |--------------+-------------+-------|
    | 7            | 1.5         | 2     |
    |--------------+-------------+-------|
    | 6.5          | 2           | 2     |
     ------------------------------------

    a) Calculate the mean, or prototype, vectors Eqn:eqnmi for the two flower types described above. b) Determine the decision functions Eqn:eqndi for each class. c) Determine the equation of the boundary (i.e. Eqn:eqnd12a) and plot the decision surface on your graph. d) Notice that substitution of a pattern from class Eqn:eqnomeg1 into your answer from the previous section yields a positive valued Eqn:eqnd12b, while a pattern belonging to the class Eqn:eqnomeg2 yields a negative value. How would you use this information to determine a new pattern's class membership?

  3. Experiment with classifying some remotely sensed images: e.g.
    evs1

    and

    eir1

    are the visible and infra-red images of Europe,

    uvs1

    and

    uir1

    are those of the United Kingdom and

    svs1

    and

    sir1

    are those of Scandinavia. Begin by combining the two single-band spectral images of Europe into a single multi-band image. (You may want to scale the image so as to cut down the processing time.) Then, create a set of training classes, where k equals 6,8,10... (Remember that although the accuracy of the classification improves with greater numbers of training classes, the computational requirements increase as well.) Then try classifying all three images using these training sets.

References

T. Avery and G. Berlin Fundamentals of Remote Sensing and Airphoto Interpretation, Maxwell Macmillan International, 1985, Chap. 15.

D. Ballard and C. Brown Computer Vision, Prentice-Hall, Inc., 1982, Chap. 6.

E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, Chap. 18.

A. Jain Fundamentals of Digital Image Processing, Prentice-Hall, 1986, Chap. 9.

D. Vernon Machine Vision, Prentice-Hall, 1991, Chap. 6.

Local Info


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
1 目标检测的定义 目标检测(Object Detection)的任务是找出图像中所有感兴趣的目标(物体),确定它们的类别和位置,是计算机视觉领域的核心问题之一。由于各类物体有不同的外观、形状和姿态,加上成像时光照、遮挡等因素的干扰,目标检测一直是计算机视觉领域最具有挑战性的问题。 目标检测任务可分为两个关键的子任务,目标定位和目标分类。首先检测图像中目标的位置(目标定位),然后给出每个目标的具体类别(目标分类)。输出结果是一个边界框(称为Bounding-box,一般形式为(x1,y1,x2,y2),表示框的左上角坐标和右下角坐标),一个置信度分数(Confidence Score),表示边界框中是否包含检测对象的概率和各个类别的概率(首先得到类别概率,经过Softmax可得到类别标签)。 1.1 Two stage方法 目前主流的基于深度学习的目标检测算法主要分为两类:Two stage和One stage。Two stage方法将目标检测过程分为两个阶段。第一个阶段是 Region Proposal 生成阶段,主要用于生成潜在的目标候选框(Bounding-box proposals)。这个阶段通常使用卷积神经网络(CNN)从输入图像中提取特征,然后通过一些技巧(如选择性搜索)来生成候选框。第二个阶段是分类和位置精修阶段,将第一个阶段生成的候选框输入到另一个 CNN 中进行分类,并根据分类结果对候选框的位置进行微调。Two stage 方法的优点是准确度较高,缺点是速度相对较慢。 常见Tow stage目标检测算法有:R-CNN系列、SPPNet等。 1.2 One stage方法 One stage方法直接利用模型提取特征值,并利用这些特征值进行目标的分类和定位,不需要生成Region Proposal。这种方法的优点是速度快,因为省略了Region Proposal生成的过程。One stage方法的缺点是准确度相对较低,因为它没有对潜在的目标进行预先筛选。 常见的One stage目标检测算法有:YOLO系列、SSD系列和RetinaNet等。 2 常见名词解释 2.1 NMS(Non-Maximum Suppression) 目标检测模型一般会给出目标的多个预测边界框,对成百上千的预测边界框都进行调整肯定是不可行的,需要对这些结果先进行一个大体的挑选。NMS称为非极大值抑制,作用是从众多预测边界框中挑选出最具代表性的结果,这样可以加快算法效率,其主要流程如下: 设定一个置信度分数阈值,将置信度分数小于阈值的直接过滤掉 将剩下框的置信度分数从大到小排序,选中值最大的框 遍历其余的框,如果和当前框的重叠面积(IOU)大于设定的阈值(一般为0.7),就将框删除(超过设定阈值,认为两个框的里面的物体属于同一个类别) 从未处理的框中继续选一个置信度分数最大的,重复上述过程,直至所有框处理完毕 2.2 IoU(Intersection over Union) 定义了两个边界框的重叠度,当预测边界框和真实边界框差异很小时,或重叠度很大时,表示模型产生的预测边界框很准确。边界框A、B的IOU计算公式为: 2.3 mAP(mean Average Precision) mAP即均值平均精度,是评估目标检测模型效果的最重要指标,这个值介于0到1之间,且越大越好。mAP是AP(Average Precision)的平均值,那么首先需要了解AP的概念。想要了解AP的概念,还要首先了解目标检测中Precision和Recall的概念。 首先我们设置置信度阈值(Confidence Threshold)和IoU阈值(一般设置为0.5,也会衡量0.75以及0.9的mAP值): 当一个预测边界框被认为是True Positive(TP)时,需要同时满足下面三个条件: Confidence Score > Confidence Threshold 预测类别匹配真实值(Ground truth)的类别 预测边界框的IoU大于设定的IoU阈值 不满足条件2或条件3,则认为是False Positive(FP)。当对应同一个真值有多个预测结果时,只有最高置信度分数的预测结果被认为是True Positive,其余被认为是False Positive。 Precision和Recall的概念如下图所示: Precision表示TP与预测边界框数量的比值 Recall表示TP与真实边界框数量的比值 改变不同的置信度阈值,可以获得多组Precision和Recall,Recall放X轴,Precision放Y轴,可以画出一个Precision-Recall曲线,简称P-R
1 目标检测的定义 目标检测(Object Detection)的任务是找出图像中所有感兴趣的目标(物体),确定它们的类别和位置,是计算机视觉领域的核心问题之一。由于各类物体有不同的外观、形状和姿态,加上成像时光照、遮挡等因素的干扰,目标检测一直是计算机视觉领域最具有挑战性的问题。 目标检测任务可分为两个关键的子任务,目标定位和目标分类。首先检测图像中目标的位置(目标定位),然后给出每个目标的具体类别(目标分类)。输出结果是一个边界框(称为Bounding-box,一般形式为(x1,y1,x2,y2),表示框的左上角坐标和右下角坐标),一个置信度分数(Confidence Score),表示边界框中是否包含检测对象的概率和各个类别的概率(首先得到类别概率,经过Softmax可得到类别标签)。 1.1 Two stage方法 目前主流的基于深度学习的目标检测算法主要分为两类:Two stage和One stage。Two stage方法将目标检测过程分为两个阶段。第一个阶段是 Region Proposal 生成阶段,主要用于生成潜在的目标候选框(Bounding-box proposals)。这个阶段通常使用卷积神经网络(CNN)从输入图像中提取特征,然后通过一些技巧(如选择性搜索)来生成候选框。第二个阶段是分类和位置精修阶段,将第一个阶段生成的候选框输入到另一个 CNN 中进行分类,并根据分类结果对候选框的位置进行微调。Two stage 方法的优点是准确度较高,缺点是速度相对较慢。 常见Tow stage目标检测算法有:R-CNN系列、SPPNet等。 1.2 One stage方法 One stage方法直接利用模型提取特征值,并利用这些特征值进行目标的分类和定位,不需要生成Region Proposal。这种方法的优点是速度快,因为省略了Region Proposal生成的过程。One stage方法的缺点是准确度相对较低,因为它没有对潜在的目标进行预先筛选。 常见的One stage目标检测算法有:YOLO系列、SSD系列和RetinaNet等。 2 常见名词解释 2.1 NMS(Non-Maximum Suppression) 目标检测模型一般会给出目标的多个预测边界框,对成百上千的预测边界框都进行调整肯定是不可行的,需要对这些结果先进行一个大体的挑选。NMS称为非极大值抑制,作用是从众多预测边界框中挑选出最具代表性的结果,这样可以加快算法效率,其主要流程如下: 设定一个置信度分数阈值,将置信度分数小于阈值的直接过滤掉 将剩下框的置信度分数从大到小排序,选中值最大的框 遍历其余的框,如果和当前框的重叠面积(IOU)大于设定的阈值(一般为0.7),就将框删除(超过设定阈值,认为两个框的里面的物体属于同一个类别) 从未处理的框中继续选一个置信度分数最大的,重复上述过程,直至所有框处理完毕 2.2 IoU(Intersection over Union) 定义了两个边界框的重叠度,当预测边界框和真实边界框差异很小时,或重叠度很大时,表示模型产生的预测边界框很准确。边界框A、B的IOU计算公式为: 2.3 mAP(mean Average Precision) mAP即均值平均精度,是评估目标检测模型效果的最重要指标,这个值介于0到1之间,且越大越好。mAP是AP(Average Precision)的平均值,那么首先需要了解AP的概念。想要了解AP的概念,还要首先了解目标检测中Precision和Recall的概念。 首先我们设置置信度阈值(Confidence Threshold)和IoU阈值(一般设置为0.5,也会衡量0.75以及0.9的mAP值): 当一个预测边界框被认为是True Positive(TP)时,需要同时满足下面三个条件: Confidence Score > Confidence Threshold 预测类别匹配真实值(Ground truth)的类别 预测边界框的IoU大于设定的IoU阈值 不满足条件2或条件3,则认为是False Positive(FP)。当对应同一个真值有多个预测结果时,只有最高置信度分数的预测结果被认为是True Positive,其余被认为是False Positive。 Precision和Recall的概念如下图所示: Precision表示TP与预测边界框数量的比值 Recall表示TP与真实边界框数量的比值 改变不同的置信度阈值,可以获得多组Precision和Recall,Recall放X轴,Precision放Y轴,可以画出一个Precision-Recall曲线,简称P-R
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值