We use Nikon D70 to capture the images. Nikon D70 is a high-end professional camera which all intrinsic parameters are controllable. Tripod is used in the capture. Also Nikon provides a remote control program Nikon Capture 4.0 [Nikon]. The original captured image is at least 1504x1000, which is too big for our application, and thus the images are scaled down using Batch Thumbs in our experiments.
Our implementation is based on [Lowe04]. There are some differences between the algorithm details mentioned in class and in the lecture slides. They are described in the following.
Gaussian Map and Difference of Gaussian
In the SIFT algorithm, the Gaussian images will have the factor k difference in neighbor layer. The value of k that mentioned in class is
but the value of k used in the reference code is
So the actual procedure of generation of the DoGs is following,
The k value of generation of Gaussian images has a significant impact on the value of features. Use the k-value of the reference matlab code can almost find 4 times features.
The method of local adjustment mentioned in[Lowe04] uses 3D vector (x, y , sigma) to adjust the feature position, however we found using 2D vector (x, y) has better performance.(See Figure.). On the other hand, the contrast threshold that suggested by [Lowe04] is considered too high. This issue is also mentioned in the reference code [SIFTTutorial] and [AutoPano].
We think the details of describing features are not clear in [Lowe04] . The following are the details that we think must be clarified.
After finding the main orientation of a feature, we must compute the histogram of the gradients in the window as local description. Actually, in our implementation and reference code, we rotate our window and resample the gradient, and then compute the angle and the magnitude. Note that resample pre-computed dx and dy instead of the angle and the magnitude would result better performance.
The local description is 4x4 sample array, and the each sample is composed of a 8 dimension vector which is the histogram of the gradient of the pixels in the 4x4 pixel small window. In order to have robust histograms, each pixel in the big 16x16 window is multiplied by a bilinear weight and added to the histograms of nearest 4 samples from 4x4 sample array. See Fig2.
We implement our own k-d tree according to the description in [Beis97] and [Brown03] with some modifications. In our implementation, the node-split criterion is max-spread instead of max-variance in the paper. This modification can greatly speed up the process while the performance is almost the same. The original work doesn't mention many details, and here we build one tree for each image independently.
In the current setting, each feature has at most one match in each image, and we determine its validation by examining the distance ratio between the two best matches. There are at most 4 features in 4 different images matched for each feature.
Below are some matching result where the image is rotated and scaled. We connect all matched features between the two images. We can see most connection lines pass through a single point.
We implement a reduced version of David's work in [Brown03]. We limit the variables in the image matching while the validation of the matches is preserved, and thus the input images can be out of order and images which do not belong to the panorama are rejected.
After the feature matching pairs are determined, we apply RANSAC [Fischler81] to estimate the homography. We find if we estimate the translational motion, we can enlarge the threshold and also speed up the process. The full homography is then estimated by the inliners indicated by RANSAC.
In [Brown03], a probabilistic based measure is applied to determine if the image belongs to the panorama. However, we find that SIFT is a very robust feature descriptor and fake feature matches can be removed easily, as described in [Lowe04]. Furthermore, after RANSAC, if the image matching does not exist, the number of inliners is always fewer than 10 in our experiments. Therefore, we determine the validation of the image by a fixed threshold. If the number of inliners is less than the threshold, we rule out this image.
We use the two-band blending method that is mentioned in [Lowe04] and [Burt83] to blend the image. According to [Lowe04], the image is separated into the high-band and low-band image. The high-band image is blended by choosing the information that has the maximum weight. Low-band is blended by weighted sum. Because the transform parameters are not integers, we must resample image in order to blend the single warped image into the panorama. This way increases the difficulty of coding, so we just truncate the float-point number into integer. Thanks to multi-band technique, the results are good.
We shift every column of the panorama to compensate the drift effect when blending [lec05_stitching]. But this method produces aliasing effect, so we interpolate the pixel value vertically. A result without drift compensation is shown below. Also we find that this drift error is inevitable in the panorama mosaic because the coordinate is distorted by cylinder transformation. For example, two features lie on one a horizontal line become lying on a curve after transformation.
We apply our implementation to both public test images and images we captured. Some data are snapshotted in World of Warcraft. We notice only by using our own data, there would be defects at the top or bottom of the panorama, and we think they are contributed by many sources.
First we didn't apply the radial distortion compensation, so the pictures are distorted around the corners. Second, we can not make the pitch angle zero in the capture, so the captured images do not lie on the cylinder perfectly. We believe the second reason make major part to the distortion. In fact this problem exists in most data, and many implementations avoid this problem by reducing the vertical field of view.
From our files
From test files
Click images to download the zipped package. (Including the config file for our program)
Note that we don't provide the original images of wow1 due to the possible copyright issue.
Source codes and execute files
To learn how to use, please check readme.txt
Stitching. (.NET project) [here]
- Nikon official website. http://www.nikonusa.com
- David G. Lowe, Distinctive image features from scale-invariant keypoints, in International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
- Jeffrey S. Beis and David G. Lowe, Shape indexing using approximate nearest-neighbour search in high-dimensional spaces, in Conference on Computer Vision and Pattern Recognition, Puerto Rico, June, 1997, pp. 1000-1006.
- Martin A. Fischler and Robert C. Bolles, Random
sample consensus: A paradigm for
model fitting with applications to image analysis and automated cartography, in Communications
of the ACM, 24, 6 (1981), 381-395.
- M. Brown and D. G. Lowe. Recognising panoramas, In Proceedings of the 9th International Conference on Computer Vision (ICCV2003), Nice, France, October 2003, pp. 1218-1225.
- P. J. Burt and Edward H. Adelson, A multiresolution spline with application to image mosaics, in ACM Transactions on Graphics, no. 4, vol. 2, 1983, pp. 217-236.
- Matlab SIFT tutorial ftp://ftp.cs.utoronto.ca/pub/jepson/teaching/vision/2503/SIFTtutorial.zip
- AutoPano, http://autopano.kolor.com/
- IVR file format. http://www2h.biglobe.ne.jp/~hnakamur/technolab/livepic/livepic_spec/spec.htm
© 2006 Chia-Kai Liang and Chihyuan Chung, NTUEE