matlab 图像接缝
介绍(Introduction)
In this article, we will be taking a deep dive into an interesting algorithm known as “Seam Carving”. It does a seemingly impossible task of resizing an image without cropping it or distorting its contents. We will build our way up, to implement the seam carving algorithm from scratch while taking a peek at some of the interesting maths behind it.
在本文中,我们将深入研究一种有趣的算法,称为“缝隙雕刻”。 调整图像的大小而不裁剪或扭曲其内容似乎是不可能完成的任务。 我们将逐步构建,从头开始实现接缝雕刻算法,同时了解其背后的一些有趣数学。
Tad knowledge about calculus will be helpful to follow along but is not required. So let’s begin.(This article is inspired by a lecture from Grant Sanderson at MIT.)
关于微积分的Tad知识将有助于后续工作,但不是必需的。 因此,让我们开始吧。 (本文的灵感来自麻省理工学院的格randint·桑德森的演讲。)
问题: (The Problem:)
Let’s take a look at this image.
让我们看一下这张图片。
The painting, done by Salvador Dali, is named “The Persistence of Memory”. Rather than the artistic value, we are more interested in the contents of the painting. We want to resize the picture by decreasing its width. Two valid processes we can think of are cropping the picture or squeezing the width.
萨尔瓦多·达利(Salvador Dali)完成的这幅画被命名为“记忆的持久性”。 我们对绘画的内容更感兴趣,而不是艺术价值。 我们要通过减小图片的宽度来调整图片的大小。 我们可以想到的两个有效过程是裁剪图片或压缩宽度。
But as we can see, cropping removes many of the objects, and squeezing distorts the pics. We want the best of both i.e. decrease the width without having to crop out any object or without distorting the objects.
但是,正如我们所看到的,裁剪会删除许多对象,并且挤压会扭曲图片。 我们都希望两者兼有,即在不裁切任何物体或不扭曲物体的情况下减小宽度。
As we can see, along with the objects, there are also a lot of empty spaces in the picture. What we want to accomplish here is to somehow remove those empty areas between the objects, so that the interesting parts of the picture remains while throwing away the unnecessary spaces.
我们可以看到,除了对象之外,图片中还有很多空白。 我们要在此处完成的任务是以某种方式删除对象之间的空白区域,以便保留图像中有趣的部分,同时丢弃不必要的空间。
This is indeed a tough problem, and it’s easy to get lost. So, it is always a good idea to split the problem into smaller more manageable parts. We can split this problem into two parts.
这确实是一个棘手的问题,很容易迷路。 因此,将问题分解为更小,更易于管理的部分始终是一个好主意。 我们可以将这个问题分为两个部分。
- Identifying interesting parts (i.e. objects) in the picture. 识别图片中有趣的部分(即对象)。
- Identifying pixel paths which can be removed without distorting the picture. 标识可以去除而不会扭曲图片的像素路径。
识别对象: (Identifying the Objects:)
Before moving forward we would want to convert our picture to greyscale. It would be helpful for the operations we would be doing later. Here is a simple formula to convert an RGB pixel to a greyscale value.
在继续之前,我们需要将图片转换为灰度。 这将对我们稍后进行的操作很有帮助。 这是一个将RGB像素转换为灰度值的简单公式。
def rgbToGrey(arr):
greyVal = np.dot(arr[...,:3], [0.2989, 0.5870, 0.1140])
return np.round(greyVal).astype(np.int32)
For identifying the objects, we can make a strategy. What if we can somehow identify all the edges in the picture? Then we can ask the seam carving algorithm to take the pixel paths which don’t travel through the edges so, by extension, any area closed by edges won’t be touched.
为了识别对象,我们可以制定策略。 如果我们能以某种方式识别图片中的所有边缘怎么办? 然后,我们可以要求接缝雕刻算法采用不通过边缘的像素路径,因此,通过扩展,不会碰触任何由边缘封闭的区域。
But then, how are we going to identify the edges? One observation we can make is, whenever there is a sharp change in color between two adjacent pixels, it’s most likely will be an edge to an object. We can rationalize this immediate change in color as the starting of a new object from that pixel.
但是,我们如何识别边缘呢? 我们可以看到的一个观察结果是,每当两个相邻像素之间的颜色发生急剧变化时,最有可能是物体的边缘。 我们可以将这种立即的颜色变化合理化,作为从该像素开始的新对象的开始。
The next problem we have to tackle is how to identify sharp changes in the pixel value. For now, let’s think of a simple case, a single line of pixels. Say we denote this array of values as x.
我们必须解决的下一个问题是如何识别像素值的急剧变化。 现在,让我们考虑一个简单的情况,即一行像素。 假设我们将此值数组表示为x 。
We may take the difference between the pixels x[i+1], x[i]. It would show how much our current pixel varies from the right-hand side. Or we can also take the difference of x[i] and x[i-1] which would give change on the left-hand side. For denoting the total change we may want to take the average of both, which yields,
我们可以取像素x [i + 1],x [i]之间的差。 它会显示我们当前的像素在右侧有多少变化。 或者我们也可以取x [i]和x [i-1]之差,这将在左侧产生变化。 为了表示总变化,我们可能要取两者的平均值,得出
Anyone familiar with calculus can quickly identify this expression as the definition of a derivative. That’s right. We need to calculate the sharp change in x value so we are calculating the derivative of it. One more eager observation we can have is, if we define a filter say [-0.5,0,0.5] and took multiply it element-wise with the array [ x[i-1],x[i],x[i+1]] and took its sum, it would give the derivative at x[i]. As our picture is 2d we would need a 2d filter. I won’t go into the details, but the 2d version of our filter looks like this,
熟悉微积分的任何人都可以快速地将此表达式识别为导数的定义。 那就对了。 我们需要计算x值的急剧变化,因此我们正在计算它的导数。 如果我们定义一个过滤器[ -0.5,0,0.5 ]并将其与数组[ x [i-1],x [i],x [i + [1] ]并取其总和,将得出x [i]的导数。 由于我们的图片是2d的,因此我们需要2d滤镜。 我不会详细介绍,但是我们的过滤器的2D版本看起来像这样,
As our filter calculates the derivative for each pixel along the x-axis it would give the vertical edges. Similarly, if we calculate the derivatives along the y-axis, we will have the horizontal edges. The filter for it would be the following. (It is the same as the filter for x-axis upon transpose.)
当我们的过滤器计算沿x轴的每个像素的导数时,它将给出垂直边缘。 同样,如果我们沿y轴计算导数,则将具有水平边缘。 过滤器如下。 (与转置时用于x轴的滤镜相同。)
These filters are also known as Sobel Filters.
这些过滤器也称为Sobel过滤器。
So, we have two filters that need to travel across the picture. For each pixel, doing an element-wise multiplication with (3X3) submatrix surrounding it then taking its sum. This operation is known as Convolution.
因此,我们有两个需要遍历图片的滤镜。 对于每个像素,请对其周围的(3X3)子矩阵进行逐元素乘法,然后求和。 此操作称为卷积。
卷积: (Convolution:)
Mathematically the convolution operation is defined as,
在数学上,卷积运算定义为
See how we do a pointwise multiplication of both functions then take its integration. Numerically it would correspond to what we did earlier i.e. element-wise multiplication of the filter and the image then taking the sum over it.
看看我们如何对两个函数进行逐点乘法,然后对其进行积分。 从数值上讲,这将与我们之前所做的相对应,即滤波器和图像的逐元素相乘,然后对其求和。
Notice, how for the k function it is written as k(t-τ). Because for the convolution operation one of the signals needs to be flipped. You can intuitively imagine it as something like this. Imagine two trains, on a straight horizontal track, are running towards each other for an inevitable collision (don’t worry nothing would happen to the trains because, Superposition). So the heads of the trains would be facing each other. Now imagine that you are scanning the track from left to right. Then for the left train, you would scan from rear to head.
注意,对于k函数,它如何写为k(t-τ) 。 因为对于卷积运算,信号之一需要翻转。 您可以直观地将其想象成这样。 想象一下,两列火车在一条直线的水平轨道上相互朝着一个不可避免的碰撞(不必担心,因为叠加,火车不会发生任何事情)。 因此,火车头将彼此面对。 现在,假设您正在从左到右扫描轨道。 然后,对于左列火车,您将从后向头扫描。
Similarly, the computer needs to read our filters from the bottom-right (2,2) corner to the top-left (0,0) instead of from the top-left to the bottom-right. So the actual Sobel filters are the following,
同样,计算机需要从右下角(2,2)角到左上角(0,0)而不是从左上角到右下角读取过滤器。 因此,实际的Sobel过滤器如下所示,
upon which we do the 180-degree rotation before the convolution operation.
在进行卷积运算之前,我们先进行180度旋转。
We can go on and write a simple naive implementation to do the convolution operation. It would be something like this,
我们可以继续编写一个简单的天真的实现来进行卷积运算。 这样的话
def naiveConvolve(img,ker):
res = np.zeros(img.shape)
r,c = img.shape
rK,cK = ker.shape
halfHeight,halfWidth = rK//2,cK//2
ker = np.rot90(ker,2)
img = np.pad(img,((1,1),(1,1)),mode='constant')
for i in range(1,r+1):
for j in range(1,c+1):
res[i-1,j-1] = np.sum(np.multiply(ker,img[i-halfHeight:i+halfHeight+1,j-halfWidth:j+halfWidth+1]))
return res
This would work just fine but will take an excruciating amount of time to be executed, as it would be doing nearly 9*r*c multiplications and additions to arrive at the result. But we can be smart and use more concepts from math to decrease the time complexity drastically.
这将很好地工作,但是将花费大量时间来执行,因为它将进行近9 * r * c的乘法和加法运算以得出结果。 但是我们可以很聪明,可以使用数学中的更多概念来大大减少时间复杂度。
快速卷积: (Fast Convolution:)
Convolutions have an interesting property. Convolutions in the time domain correspond to multiplication over the frequency domain. I.e.
卷积具有有趣的性质。 时域中的卷积对应于频域上的乘法。 即
, where F(w) denotes the function in the frequency domain.
,其中F(w)表示频域中的函数。
We know that Fourier transform converts a signal in the time domain to its frequency domain. So what we can do is, calculate the Fourier Transform of the image and the filter, multiply them, then take an Inverse Fourier Transform to get the convolution results. We can use the NumPy library for this.
我们知道傅立叶变换将时域中的信号转换成其频域。 因此,我们可以做的是计算图像和滤波器的傅立叶变换,将它们相乘,然后进行傅立叶逆变换以获得卷积结果。 我们可以为此使用NumPy库。
def fastConvolve(img,ker):
imgF = np.fft.rfft2(img)
kerF = np.fft.rfft2(ker,img.shape)
return np.fft.irfft2(imgF*kerF)
(Note: Values might be slightly different from the naive method in some cases, as the fastConvolve function calculates a circular convolution. But in practice we can comfortably use the fast convolution without worrying about these small differences in values.)
(注意:在某些情况下,值可能与朴素方法稍有不同,因为fastConvolve函数会计算圆形卷积。但是实际上,我们可以轻松地使用快速卷积,而不必担心这些较小的值差异。)
Cool! Now we have an efficient way to calculate the horizontal edges and the vertical edges, I.e the x and the y components. Thus, calculate the edges in the image using,
凉! 现在,我们有了一种有效的方法来计算水平边缘和垂直边缘,即x和y分量。 因此,使用以下方法计算图像中的边缘:
def getEdge(greyImg):
sX = np.array([[0.25,0.5,0.25], [0,0,0], [-0.25,-0.5,-0.25]])
sY = np.array([[0.25,0,-0.25], [0.5,0,-0.5], [0.25,0,-0.25]])
#edgeH = naiveConvolve(greyImg,sX)
#edgeV = naiveConvolve(greyImg,sY)
edgeH = fastConvolve(greyImg,sX)
edgeV = fastConvolve(greyImg,sY)
return np.sqrt(np.square(edgeH) + np.square(edgeV))
Awesome. We completed the first part. The edges are the interesting parts of the picture and the black parts are what we can remove without worrying.
太棒了我们完成了第一部分。 边缘是图片中有趣的部分,黑色部分是我们可以不用担心消除的部分。
识别像素路径: (Identifying Pixel Paths:)
For a continuous path, we can define a rule that each pixel only connects to the 3 nearest pixels below it. This is to have a continuous path of pixels from top to bottom. So our subproblem becomes a basic pathfinding problem where we have to minimize the cost. As edges have higher magnitude, if we continue to remove the pixel paths with the lowest cost, it would avoid the edges.
对于连续路径,我们可以定义一个规则,即每个像素仅连接到它下面的3个最近的像素。 这将使像素从上到下具有连续的路径。 因此,我们的子问题成为基本的寻路问题,我们必须将成本降到最低。 由于边缘具有更高的幅度,如果我们继续以最低的成本移除像素路径,它将避免出现边缘。
Let’s define a function “ cost” which takes a pixel and calculates the minimum cost pixel path to reach from there to the end of the pic. We have the following observations,
让我们定义一个函数“ cost” ,该函数获取一个像素并计算从那里到图片结尾的最小成本像素路径。 我们有以下观察,
- In the bottom-most row (ie. i=r-1) 在最底行(即i = r-1)
2. For any intermediate pixel,
2.对于任何中间像素,
def findCostArr(edgeImg):
r,c = edgeImg.shape
cost = np.zeros(edgeImg.shape)
cost[r-1,:] = edgeImg[r-1,:]
for i in range(r-2,-1,-1):
for j in range(c):
c1,c2 = max(j-1,0),min(c,j+2)
cost[i][j] = edgeImg[i][j] + cost[i+1,c1:c2].min()
return cost
We can see the triangular shapes in the plot. They denote the points of no return, i.e. If you reach that pixel, there is no path to the bottom that doesn’t pass through an edge. And that is what we are trying to avoid.
我们可以在图中看到三角形。 它们表示无法返回的点,即,如果到达该像素,则没有底部的路径不会通过边缘。 这就是我们要避免的事情。
From the cost matrix finding a pixel path can be easily done with a greedy algorithm. Find the min-cost pixel on the top row then move downwards selecting the pixel with the least cost among all the pixels connected to it.
从成本矩阵中找到像素路径可以很容易地用贪婪算法完成。 在第一行找到最低成本像素,然后向下移动,在与之相连的所有像素中选择成本最低的像素。
def findSeam(cost):
r,c = cost.shape
path = []
j = cost[0].argmin()
path.append(j)
for i in range(r-1):
c1,c2 = max(j-1,0),min(c,j+2)
j = max(j-1,0)+cost[i+1,c1:c2].argmin()
path.append(j)
return path
For removing the seam defined by the path we just have to go through each row and drop the column mentioned by the path array.
为了删除路径定义的接缝,我们只需要遍历每一行并删除路径数组提到的列。
def removeSeam(img,path):
r,c,_ = img.shape
newImg = np.zeros((r,c,3))
for i,j in enumerate(path):
newImg[i,0:j,:] = img[i,0:j,:]
newImg[i,j:c-1,:] = img[i,j+1:c,:]
return newImg[:,:-1,:].astype(np.int32)
And, that is it. Here, I have pre-computed 100 seam carving operations.
而且,就是这样。 在这里,我已经预先计算了100个接缝雕刻操作。
We can see how the objects in the picture have come really close to each other. We have successfully decreased the size of the image using the Seam Carving algorithm without causing any distortion to the objects. I have attached the link for the notebook with the full code. Interested readers can take a look here.
我们可以看到图片中的对象是如何彼此真正接近的。 我们已经使用接缝雕刻算法成功减小了图像的大小,而不会导致对象变形。 我已经附上了完整代码的笔记本链接。 有兴趣的读者可以在这里看看。
Overall, Seam Carving is a fun algorithm to play with. It does have it’s caveats as it will fail if the image provided has too many details or too many edges in it. It is always amusing to tinker around different pictures using the algorithm to see what the final result is. If you have any doubts or suggestions then kindly leave them in the responses. Thank you for reading until the end.
总体而言,Seam Carving是一个有趣的算法。 它确实有一些警告,因为如果提供的图像中包含过多的细节或过多的边缘,它将失败。 使用算法查看不同的图片总是很有趣,看看最终结果是什么。 如果您有任何疑问或建议,请将其保留在答复中。 感谢您的阅读,直到最后。
Related Articles:
相关文章:
Originally published at https://www.analyticsvidhya.com on September 15, 2020.
最初于2020年9月15日在https://www.analyticsvidhya.com上发布。
matlab 图像接缝