图像处理：“可口可乐”识别的算法改进

最新推荐文章于 2021-11-23 22:48:16 发布

w36680130

最新推荐文章于 2021-11-23 22:48:16 发布

阅读量1.1k

点赞数

文章标签： c++ algorithm image-processing opencv

原文链接：https://oldbug.net/q/gfL4/Image-Processing-Algorithm-Improvement-for-Coca-Cola-Can-Recognition

版权

本文介绍了作者使用C++和OpenCV进行图像处理，以识别可口可乐罐的项目。文章提到，算法需能处理背景噪音、不同比例和旋转、图像模糊、可口可乐瓶干扰等问题。作者采用了HSV颜色空间过滤、中值滤波、Canny边缘检测等预处理步骤，以及基于Generalized Hough Transform的识别方法。然而，算法存在速度慢、瓶子干扰、模糊图像处理不佳及对罐体方向敏感的问题。讨论中提出了使用距离传感器、神经网络、先检测瓶子等优化建议。

摘要由CSDN通过智能技术生成

本文翻译自：Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition

One of the most interesting projects I've worked on in the past couple of years was a project about image processing . 我过去几年中最有趣的项目之一是关于图像处理的项目。 The goal was to develop a system to be able to recognize Coca-Cola 'cans' (note that I'm stressing the word 'cans', you'll see why in a minute). 目的是开发一个能够识别可口可乐“罐头”的系统 （请注意，我强调的是“罐头”一词，稍后您会看到原因）。 You can see a sample below, with the can recognized in the green rectangle with scale and rotation. 您可以在下面看到一个示例，该示例在带有刻度和旋转的绿色矩形中可以识别。

模板匹配

Some constraints on the project: 对项目的一些限制：

The background could be very noisy. 背景可能非常嘈杂。
The can could have any scale or rotation or even orientation (within reasonable limits). 罐可以有任何比例，旋转或什至取向（在合理范围内）。
The image could have some degree of fuzziness (contours might not be entirely straight). 图像可能有一定程度的模糊性（轮廓可能不完全笔直）。
There could be Coca-Cola bottles in the image, and the algorithm should only detect the can ! 图像中可能有可口可乐瓶，该算法只能检测到罐头！
The brightness of the image could vary a lot (so you can't rely "too much" on color detection). 图像的亮度可能相差很大（因此您不能过多地依赖颜色检测）。
The can could be partly hidden on the sides or the middle and possibly partly hidden behind a bottle. 罐可以部分地隐藏在两侧或中间，可能部分地隐藏了一瓶后面。
There could be no can at all in the image, in which case you had to find nothing and write a message saying so. 有可能是没有在图像中所有，在这种情况下，你必须找到什么，写一条消息这样说。

So you could end up with tricky things like this (which in this case had my algorithm totally fail): 因此，您可能会遇到诸如此类的棘手事情（在这种情况下，我的算法完全失败了）：

总失败

I did this project a while ago, and had a lot of fun doing it, and I had a decent implementation. 我前一段时间做了这个项目，并且做得很有趣，并且实现得很好。 Here are some details about my implementation: 以下是有关我的实现的一些细节：

Language : Done in C++ using OpenCV library. 语言：使用OpenCV库在C ++中完成。

Pre-processing : For the image pre-processing, ie transforming the image into a more raw form to give to the algorithm, I used 2 methods: 预处理 ：对于图像预处理，即将图像转换为更原始的形式以提供给算法，我使用了2种方法：

Changing color domain from RGB to HSV and filtering based on "red" hue, saturation above a certain threshold to avoid orange-like colors, and filtering of low value to avoid dark tones. 将颜色域从RGB更改为HSV，并基于“红色”色调进行过滤，饱和度高于特定阈值以避免产生类似橙色的颜色，而对低值进行过滤以避免产生深色。 The end result was a binary black and white image, where all white pixels would represent the pixels that match this threshold. 最终结果是一个二进制的黑白图像，其中所有白色像素将代表与该阈值匹配的像素。 Obviously there is still a lot of crap in the image, but this reduces the number of dimensions you have to work with. 显然，图像中仍然有很多废话，但这减少了必须处理的尺寸数。
Noise filtering using median filtering (taking the median pixel value of all neighbors and replace the pixel by this value) to reduce noise. 使用中值滤波进行噪声滤波（获取所有邻居的中值像素值，然后用该值替换像素）以减少噪声。
Using Canny Edge Detection Filter to get the contours of all items after 2 precedent steps. 经过2个先验步骤后，使用Canny Edge Detection滤镜获取所有项目的轮廓。

Algorithm : The algorithm itself I chose for this task was taken from this awesome book on feature extraction and called Generalized Hough Transform (pretty different from the regular Hough Transform). 算法：我为此任务选择的算法本身取自于这本很棒的书中有关特征提取的书，并称为通用霍夫变换（与常规霍夫变换完全不同）。 It basically says a few things: 它基本上说了几件事：

You can describe an object in space without knowing its analytical equation (which is the case here). 您可以在不知道其解析方程的情况下描述空间物体（此处就是这种情况）。
It is resistant to image deformations such as scaling and rotation, as it will basically test your image for every combination of scale factor and rotation factor. 它可以抵抗诸如缩放和旋转之类的图像变形，因为它将基本上测试图像的缩放因子和旋转因子的每种组合。
It uses a base model (a template) that the algorithm will "learn". 它使用算法将“学习”的基本模型（模板）。
Each pixel remaining in the contour image will vote for another pixel which will supposedly be the center (in terms of gravity) of your object, based on what it learned from the model. 轮廓图像中剩余的每个像素将投票给另一个像素，该像素根据其从模型中学到的内容，应该是对象的中心（就重力而言）。

In the end, you end up with a heat map of the votes, for example here all the pixels of the contour of the can will vote for its gravitational center, so you'll have a lot of votes in the same pixel corresponding to the center, and will see a peak in the heat map as below: 最后，您将获得投票的热图，例如，此处罐轮廓的所有像素都将为其重力中心投票，因此在与像素相对应的同一像素中将有很多投票居中，并会在热图中看到一个峰值，如下所示：

GHT

Once you have that, a simple threshold-based heuristic can give you the location of the center pixel, from which you can derive the scale and rotation and then plot your little rectangle around it (final scale and rotation factor will obviously be relative to your original template). 有了这些功能后，您就可以使用简单的基于阈值的启发式方法来确定中心像素的位置，从中可以得出比例尺和旋转角度，然后在其周围绘制一个小矩形（最终比例尺和旋转系数显然相对于您的原始模板）。 In theory at least... 理论上至少...

Results : Now, while this approach worked in the basic cases, it was severely lacking in some areas: 结果：现在，尽管此方法在基本情况下可行，但在某些领域却严重缺乏：

It is extremely slow ! 太慢了 ！ I'm not stressing this enough. 我的压力还不够。 Almost a full day was needed to process the 30 test images, obviously because I had a very high scaling factor for rotation and translation, since some of the cans were very small. 处理这30张测试图像几乎需要整整一天的时间，这显然是因为我对旋转和平移具有非常高的缩放系数，因为某些罐非常小。
It was completely lost when bottles were in the image, and for some reason almost always found the bottle instead of the can (perhaps because bottles were bigger, thus had more pixels, thus more votes) 当瓶子出现在图像中时，它完全丢失了，并且出于某种原因几乎总是找到瓶子而不是罐子（也许是因为瓶子更大，所以像素更多，因此票数更多）
Fuzzy images were also no good, since the votes ended up in pixel at random locations around the center, thus ending with a very noisy heat map. 模糊图像也不是很好，因为投票最终以像素为中心围绕中心的随机位置，从而以非常嘈杂的热图结束。
In-variance in translation and rotation was achieved, but not in orientation, meaning that a can that was not directly facing the camera objective wasn't recognized. 实现了平移和旋转的不变性，但没有实现定向，这意味着未识别未直接面对相机物镜的罐子。

Can you help me improve my specific algorithm, using exclusively OpenCV features, to resolve the four specific issues mentioned? 您能否使用专有的OpenCV功能帮助我改善特定算法，以解决上述四个特定问题？

I hope some people will also learn something out of it as well, after all I think not only people who ask questions should learn. 我希望有些人也能从中学到一些东西，毕竟我认为不仅提出问题的人也应该学习。 :) :)

#1楼

参考：https://stackoom.com/question/gfL4/图像处理-可口可乐-识别的算法改进

#2楼

If you are not limited to just a camera which wasn't in one of your constraints perhaps you can move to using a range sensor like the Xbox Kinect . 如果您不仅仅局限于一个不受限制的相机，也许您可以转向使用Xbox Kinect之类的距离传感器。 With this you can perform depth and colour based matched segmentation of the image. 使用此功能，您可以对图像进行基于深度和颜色的匹配分割。 This allows for faster separation of objects in the image. 这样可以更快地分离图像中的对象。 You can then use ICP matching or similar techniques to even match the shape of the can rather then just its outline or colour and given that it is cylindrical this may be a valid option for any orientation if you have a previous 3D scan of the target. 然后，您可以使用ICP匹配或类似的技术来匹配罐的形状，而不是仅匹配罐的轮廓或颜色，并且鉴于罐是圆柱形的，如果您以前对目标进行了3D扫描，那么这对于任何方向都是有效的选择。 These techniques are often quite quick especially when used for such a specific purpose which should solve your speed problem. 这些技术通常很快，特别是在用于解决速度问题的特定目的时。

Also I could suggest, not necessarily for accuracy or speed but for fun you could use a trained neural network on your hue segmented image to identify the shape of the can. 我也可以建议，不一定出于准确性或速度的考虑，而是为了娱乐，您可以在色相分割的图像上使用经过训练的神经网络来识别罐子的形状。 These are very fast and can often be up to 80/90% accurate. 这些速度非常快，通常可以达到80/90％的准确度。 Training would be a little bit of a long process though as you would have to manually identify the can in each image. 培训将是一个漫长的过程，尽管您必须手动识别每个图像中的罐头。

#3楼

Fun problem: when I glanced at your bottle image I thought it was a can too. 有趣的问题：当我瞥了一眼您的酒瓶图像时，我认为它也是一个罐头。 But, as a human, what I did to tell the difference is that I then noticed it was also a bottle... 但是，作为一个人类，我所做的与众不同之处在于我后来发现它也是一个瓶子……

So, to tell cans and bottles apart, how about simply scanning for bottles first? 因此，要区分罐头和瓶子，首先简单地扫描瓶子怎么样？ If you find one, mask out the label before looking for cans. 如果找到一个，在寻找罐子之前先将标签遮盖。

Not too hard to implement if you're already doing cans. 如果您已经在做罐头，则实施起来并不难。 The real downside is it doubles your processing time. 真正的缺点是它将处理时间加倍。 (But thinking ahead to real-world applications, you're going to end up wanting to do bottles anyway ;-) （但是考虑到现实世界的应用程序，您最终还是想做瓶子；-)

#4楼

This may be a very naive idea (or may not work at all), but the dimensions of all the coke cans are fixed. 这可能是一个非常幼稚的想法（或可能根本不起作用），但是所有可乐罐的尺寸都是固定的。 So may be if the same image contains both a can and a bottle then you can tell them apart by size considerations (bottles are going to be larger). 因此，如果同一张图片中同时包含一个罐头和一个瓶子，那么您可以通过尺寸方面的区分来区分它们（瓶子会更大）。 Now because of missing depth (ie 3D mapping to 2D mapping) its possible that a bottle may appear shrunk and there isn't a size difference. 现在，由于缺少深度（即3D映射到2D映射），瓶子可能会缩水并且大小没有差异。 You may recover some depth information using stereo-imaging and then recover the original size. 您可以使用立体成像恢复一些深度信息，然后恢复原始大小。

#5楼

An alternative approach would be to extract features (keypoints) using the scale-invariant feature transform (SIFT) or Speeded Up Robust Features (SURF). 另一种方法是使用尺度不变特征变换（SIFT）或加速鲁棒特征（SURF）提取特征（关键点）。

It is implemented in OpenCV 2.3.1. 它在OpenCV 2.3.1中实现。

You can find a nice code example using features in Features2D + Homography to find a known object 您可以使用Feature2D + Homography中的功能找到一个不错的代码示例， 以查找已知对象

Both algorithms are invariant to scaling and rotation. 两种算法对于缩放和旋转都是不变的。 Since they work with features, you can also handle occlusion (as long as enough keypoints are visible). 由于它们可以使用功能，因此您也可以处理遮挡（只要可见足够的关键点）。

在此处输入图片说明

Image source: tutorial example 图片来源：教程示例

The processing takes a few hundred ms for SIFT, SURF is bit faster, but it not suitable for real-time applications. SIFT的处理过程需要几百毫秒，SURF速度更快，但是不适用于实时应用。 ORB uses FAST which is weaker regarding rotation invariance. ORB使用的FAST在旋转不变性方面较弱。

The original papers 原始论文

#6楼

I would detect red rectangles: RGB -> HSV, filter red -> binary image, close (dilate then erode, known as imclose in matlab) 我会检测到红色矩形：RGB-> HSV，过滤红色->二进制图像，关闭（先膨胀然后腐蚀，在matlab中称为imclose ）

Then look through rectangles from largest to smallest. 然后从最大到最小浏览矩形。 Rectangles that have smaller rectangles in a known position/scale can both be removed (assuming bottle proportions are constant, the smaller rectangle would be a bottle cap). 可以删除在已知位置/比例下具有较小矩形的矩形（假定瓶比例是恒定的，较小的矩形将是瓶盖）。

This would leave you with red rectangles, then you'll need to somehow detect the logos to tell if they're a red rectangle or a coke can. 这会使您留下红色矩形，然后您需要以某种方式检测徽标以判断它们是红色矩形还是可乐罐。 Like OCR, but with a known logo? 类似于OCR，但带有已知徽标？