MATLAB实现Stereo Vision Project

三维视觉模拟(Stereo Vision)

工作后感觉很多以前的工作慢慢淡忘,就想抽时间在网上过一下硕士期间一篇完整的计算机视觉项目,也算是回顾和记录自己当初投入的一些心血成果。该项目主要基于MATLAB实现了人体三维视觉的模拟,其中涉及较多原创和版权,未经授权严禁转载!希望和大家共同在计算机领域学习交流,不足之处还请指正,谢谢!

项目简介

在这里插入图片描述
大多数动物有两只眼睛,这可以为他们提供一个更广阔的视野。人类的最大水平视图约为190。此外,两只眼睛都可以看到大约120度的重叠角度,因此更重要的原因是双目视觉重叠,可以感知到周围环境的单一三维图像,即所谓的立体图像。

在计算机立体视觉中,摄像机可以像人的眼睛一样工作,捕捉两个不同计算机立体视觉中,对相应图像点水平坐标的差异进行编码,可以以差分图的形式计算相对深度信息。差值图中的值与相应像素位置的场景深度成反比。但是比较对计算机是一个挑战。例如,"平滑度"是可用于比较图像的度量值,假设对象更有可能以少量颜色着色,如果我们检测到两个颜色与它们很可能属于同一对象的相同颜色的像素。当我们生活在一个有着这种特性的世界里时,人类视觉系统似乎已经进化到使用平滑来解释世界。该项目涉及在嘈杂的环境中理解和实现交叉相关、模板匹配(高斯点),对图像进行校准,并探索技术成为现实。

该项目包括理解实现互相关函数、在杂音中的模式匹配(高斯拟合)、校准图片训练建模模拟真实环境下的3D定位。MATLAB作为一个实现工具,具有良好的交叉关联内置功能和良好的绘图能力等。我们将在前两章中讨论交叉相关性,然后挖掘和扩展算法,以实现模拟人类深度理解的校准模型。它将经过测试,能够准确分析和表示来自墨尔本大学提供的资源库中的立体图像对的深度信息,然后通过人们拍摄的真实世界的立体照片进行精确分析和表示深度信息,从而对系统进行精确优化,以尽可能提高准确性。

目前还没有时间整理出中文版,只能先以原报告分享供学习参考,后续再翻译~~

Introduction

Most animals have two eyes, which could provide them with a much wider field of view. As for humans, they have a maximum horizontal view of approximately 190. Besides, about 120 degrees of 190 are seen by both eyes, thus the more important reason is overlapping of binocular vision could perceive a single three-dimensional image of the surroundings, that is referred to as stereopsis [4].

In computer stereo vision, the cameras could work as human’s eyes, capturing two different views of the same scene, the relative depth information can be calculated in the form of a disparity map by comparing two images, which encodes the difference in horizontal coordinates of corresponding image points. The values in the disparity map are inversely proportional to the scene depth at the corresponding pixel location. But comparison is a challenge to computers. There are some techniques and assumptions to figure out the computational challenges. For example, “smoothness” is a measure that can be used to compare images, the assumption is that objects are more likely to be colored with a small number of colors, if we detect two pixels with the same color they most likely belong to the same object. As we live in a world with this property, human vision system seems to have evolved to use smoothness to interpret the world as well.

This project involves understanding and implementing Cross-Correlation, template matching (Gaussian fitting) in a noisy environment, calibration of images and exploring the technology into reality. MATLAB is a good tool for implementation, which has good built-in functions for cross-correlation and good ability for plotting, etc. We will talk about the cross-correlation in the first two chapters, and then ultilise and extend the algorithms to achieve a calibration model which mimics the depth understanding of a human. It will be first tested with ability to accurately analyse and represent depth information with stereo image pairs from resource library provided by the Univeristy of Melbourne, and then with the stereo photos of real world taken by people, finally optimise the system to improve accuracy as good as possible. More details will come in the following chapters.

Cross-Correlations in one dimension

A good way to begin is getting familiar with cross-correlation. In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a sliding dot product or sliding inner-product, which is commonly used for searching a shorter template in the long signal. In our project, it could be used to find useful information in the visual images.

Cross-Correlation and Normalisation in 1D

The equation of one-dimensional spatial cross-correlation is shown below, where f and g are the compared signals here.
在这里插入图片描述

Besides, the more general way in application is to implement a normalised cross-correlation as:
在这里插入图片描述
See we have two series of vectors, the idea is like sliding one from the beginning of the other to its end. The peak is the most similar part of two vectors, which would be the template we look for. Though it would be fast and convenient to use built-in cross-correlation functions \textit{xcorr}, implementing the function manually would be very helpful for better understanding. The benefit of normalization is to reduce and eliminate data redundancy, making the result more compact, which means the shape of unnormalised vector and normalised vector should be same whereas the latter is normalised between 0-1.

It could be implemented in MATLAB by taking two vectors of the same size and passing one over the other, the steps employed to complete the implementation involved:

  1. Generate a random vector.
  2. Make a random vector of the same size.
  3. Put same size of zero vectors at the beginning and end of the second vector, means that it has no correlation with two vectors at
    the beginning and the end.
  4. Initialize the cross-correlation vector with size of length difference of two vectors plus 1.
  5. Use for-loop to pass the shorter vector over the longer one, which could get the cross-correlation vector.
  6. The normalisation of the vector is first divide two vectors by each own Euclidean norm, and then have same process as the previous
    cross-correlation.
  7. Computer code see appendix A.1.

For example, we can generate the vector of size 3, and make the plots of the cross-correlation vector (figure 2.1) and the normalised cross-correlation vector (figure 2.2).

在这里插入图片描述

The x-axis means index, while the y-axis means cross-correlation value in the figures. The curve in the first image (figure 2.1) has the peak at value about 1.4, which means the cross-correlation value of two vectors is about 1.4, when the signal is slid to index 4 of another. Similarly with the peak near 1.0 in the second image (figure 2.2), the range of cross-correlation value is normalised between zero to one.

Finding signal offset

Cross-correlation could be used to solve for more complicated questions in reality. See there are two signal files came from the same source, and are offset by some time. Knowing that the signal propagates at 333 m/s, with sample rate 44100 Hz. It could be accurate to find the offset time, and then get the distance between two sensors, x (figure 2.3).
在这里插入图片描述

After loading signals with the sample rate, it is intuitive to get the time vector and plot the signals (figure 2.4). We can observe that signals have the same shape with a time offset.
在这里插入图片描述

As shown in figure 2.5, with the x-axis (index) useless here, using the previous process in section 2.1 would be difficult to solve the problem.
在这里插入图片描述
The tricky part here is that the cross-correlation vector function becomes complicated for much data in reality, which needs a simplified structure using for loop. Besides, the correlation value returned by \textit{xcorr} function does not have the first and last zero in our implementation, as they are useless information in fact. The most important thing is it could return a lag vector for finding the offset. It is easy to get the lag vector as it should be same length as the correlation vector, with zero point at the center, which means it has same length of negative and positive values at two sides. So the improved steps involved:

\item Load two files to make them vectors.
\item Make zero vectors in the same size
\item Implement the cross-correlation vector function similar to section 2.1. Compute the cross-correlation vector.
\item Trim the first and last element (0) of the cross-correlation vector.
\item Generate the lag vector, the cross-correlation of the two measurements is maximum at a lag equal to the delay.
\item Find the maximum index of the cross-correlation vector, and find it in the lag vector, which is calculated to get offset time.
\item Computer code see appendix A.2.

The new figure of cross-correlation on lag vector becomes:
在这里插入图片描述
The x-axis becomes lag vector here, as the lag with maximum cross-correlation obtained is -50082, the results could be computed as below:
在这里插入图片描述
So the offset time becomes -1.1356s, and the distance is -378.1702m.

Correlation using FFTs

A fast Fourier transform (FFT) is an algorithm that samples a signal over a period of time (or space) and divides it into its frequency components [2]. These components are single sinusoidal oscillations at distinct frequencies each with their own amplitude and phase. Fast Fourier transforms are widely used for many applications in engineering, science, and mathematics.\

In MATLAB, FFT is a very good function for spectral analysis because of its fast and convenient features, and it could also be used to implement cross-correlation here, compared to summation of products in previous sections. Cross-correlation can be achived by completing a Fourier transform, multiplying signals, and doing an inverse Fourier transform, that could be calculated as below:
在这里插入图片描述
Where the * refers to the complex conjugate. The process involves:

\item Use *fft* getting n-point discrete Fourier transform (DFT) of two vectors, that n is sum of lengths minus 1. 
\item Use *ifft* to compute the inverse fast Fourier transform of DFT of the first vector * the complex conjugate of the second DFT.
\item Use *fftshift* to Shift zero-frequency component of the result to center of spectrum, getting the cross-correlation, r.
\item Other steps are similar to the previous process in section 2.2.
\item Computer code see appendix A.3.

The algorithm can be tested using the same problem in Section 2.2. Load and re-analyse the signals, plot the cross-correlation and get the results of offset. We can find that the new figure of cross-correlation by FFT has the same maximum value with index (positive) as figure 2.6.
在这里插入图片描述

There is a tic-toc function for measuring the performance of the algorithm. As the running time of cross-correlation in section 2.2 is about 265.79 seconds, whereas it just took 0.15 seconds in section 2.3. It is obvious that correlation using FFT is much faster than the summation of a product.

Pattern Finder

Cross-correlation is not only used find signal offset, but more commonly used to find pattern matching in many objects, like signals, pictures, etc. as we said at the beginning. It could be first applied to finding all occurrences of a particular element in a signal, and then we will turn to two-dimensional pictures in the next chapter.

Because of the fast and accurate feature of FFT implementation from section 2.3, we would like to try a pattern search finding all the drum sound in a song named ‘imperial_march.wav’ downloaded from Internet [1]. The process involves:

\item Load the song using *audioread* to get the vector with sample rate.
\item Listen and capture a drum sound as the template. Append zeros at the end of the shorter vector (drum template) to make it have the same size as the original vector (song).
\item Correlation using FFT to get the cross-correlation, r. Generate the lag vector.
\item The tricky part is first looking at value r. From figure 2.8, we could find the relative high points (split value) are beyond 15, so try to pick 16 as the threshold finding occurrences here. If the absolute value of cross-correlation is greater than 16, add it to the index vector.
\item The offset time can be found by iterating the index vector, and then divided by the sample rate. At which point, the frequency should be 1.
\item Plot the time point with frequency 1 in red asterisk in time series of the song.
\item Computer code see appendix A.4.

在这里插入图片描述
The x-axis is time while the y-axis is frequency of the drum in figure 2.9. The results fits the drum in song well, which means cross-correlation is accurate at pattern finding in one-dimensional signal.
在这里插入图片描述
Next, we will going to expand the situation to two-dimensional matrices.

Cross Correlation in two dimensions

To achieve the stereo vision system, normalised cross correlation in 2D is the most important part for finding information in pictures. There are built-in function xcorr2 and normxcorr2 in MATLAB, but we will first implement the function for better understanding as necessary.

Normalised Cross-Correlation in 2D

See there are two matrices, t(template) and S (search region), matrix S will always be larger than matrix t. The idea is using two nested for-loops to “lag” t over S, computing for each “lag” the cross-correlation, r. Assume that matrix t has dimensions (Mt, Nt) and matrix S has dimensions (MS, NS). When the block calculates the full output size, the equation for the two-dimensional discrete cross-correlation is:
在这里插入图片描述
There are some points needed attention. Making use of the images rather than creating matrices manually is more effective. Besides, there are 3 sets of values representing RGB respectively for each pixel in images, it needs to get the mean value of three colors, which called greyscale that can get 3 sets into 1. The last thing is dealing with out-of-bounds errors with images, that we need to first create a bigger search region, that appends same size 0 as template on two sides of the original search region, and then put it at the center of the new matrix. With the example of matrix 3.1.1, where \textit{m} equals to addition of the original width of the search region and 2 times of the template’s width, while \textit{n} is same in length.
在这里插入图片描述
The process employed to complete the analysis involved:

\item Images are just a matrix of pixel values, so we picked two simple images instead of generating matrices.
\item Load two images using \textit{imread}, convert RGB images to greyscale. 
\item subtract the mean value so that there are toughly equal numbers of negative and positive values.
\item Make a zero matrix and put the search region in the center, getting a new search region matrix.\label{item:newregion}
\item Initialize the normalised cross-correlation matrix with size of the sum of two matrices.
\item Use nested for-loops to "lag" the template over the search region, each time get the corresponding region from the new search region in step 4.
\item Compute for each "lag" the cross-correlation, r. The equation is using complex conjugate of the template. 
\item Compute the normalised cross-correlation by dividing each r with square root of product of two sum of dot product.
\item Trim the useless values (NaN) surrounding the matrix.
\item Computer code see appendix A.6.

Finding the Rocket Man

Now we can make use of the cross-correlation function in two-dimensional situations to find an object from a picture, which is very similar to pattern finding drum sounds in the song. See, there is a rocket man captured from an image of maze (figure 3.2) as the template here (figure 3.1).
在这里插入图片描述
So the algorithm should involve:

\item Load two images using *imread*, convert RGB images to greyscale. 
\item Use the same process as section 3.1.
\item Making the 2D cross-correlation to 1D, that could easily get the coordinate of the maximum value. Draw a red circle on the maximum value, see figure 3.3.
\item The maximum of the cross-correlation corresponds to the estimated location of the lower-right corner of the section. Use \textit{ind2sub} to convert the one-dimensional location of the maximum to two-dimensional coordinate.
\item Draw the original image and put a red star in the center of the found section on it, see figure 3.4.
\item The run time is about 191.10 seconds, compared to *xcorr2* with just 6.08 secon
  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值