Salient Object Detection and Segmentation

最新推荐文章于 2024-03-12 09:48:25 发布

ametor

最新推荐文章于 2024-03-12 09:48:25 发布

阅读量4.9k

点赞数

分类专栏：图像显著性

图像显著性专栏收录该内容

7 篇文章 2 订阅

订阅专栏

Salient Object Detection and Segmentation

Ming-Ming Cheng^1,4 Niloy J. Mitra² Xiaolei Huang³ Philip H. S. Torr⁴ Shi-Min Hu¹

¹TNList, Tsinghua University ²UCL/KAUST ³Lehigh University ⁴Oxford Brookes University

Figure. Given input images (top), a global contrast analysis is used to compute high resolution saliency maps (middle), which can be used to produce masks (bottom) around regions of interest.

Abstract

Automatic estimation of salient object regions across images, without any prior assumption or knowledge of the contents of the corresponding scenes, enhances many computer vision and computer graphics applications. We introduce a regional contrast based salient object extraction algorithm, which simultaneously evaluates global contrast differences and spatial weighted coherence scores. The proposed algorithm is simple, efficient, naturally multi-scale, and produces full-resolution, high-quality saliency maps. These saliency maps are further used to initialize a novel iterative version of GrabCut for high quality salient object segmentation. We extensively evaluated our algorithm using traditional salient object detection datasets, as well as a more challenging Internet image dataset. Our experimental results demonstrate that our algorithm consistently outperforms existing salient object detection and segmentation methods, yielding higher precision and better recall rates. We also show that our algorithm can be used to efficiently extract salient object masks from Internet images, enabling effective sketch-based image retrieval (SBIR) via simple shape comparisons. Despite such noisy internet images, where the saliency regions are ambiguous, our saliency guided image retrieval achieves a superior retrieval rate compared with state-of-the-art SBIR methods, and additionally provides important target object region information.

Papers

Global Contrast based Salient Region Detection. Ming-Ming Cheng, Guo-Xin Zhang, Niloy J. Mitra, Xiaolei Huang, Shi-Min Hu. IEEE CVPR, 2011, p. 409-416. [Project page] [C++] [Bib]
Salient Object Detection and Segmentation. Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip H. S. Torr, Shi-Min Hu. Submitted to IEEE TPAMI (TPAMI-2011-10-0753), 2011. [Paper] [Poster]

Comparisons with state of the art methods

Figure. Statistical comparison results of (a) different saliency region detection methods, (b) their variants, and (c) object of interest region segmentation methods, using largest public available dataset (i) and (ii) our THUS10000 dataset (to be made public available). We compare our HC method and RC method with 15 state of art methods, including FT [1], AIM [2], MSS [3], SEG [4], SeR [5], SUN [6], SWD [7], IM [8], IT [9], GB [10], SR [11], CA [12], LC [13], AC [14], and CB [15]. We also take simple variable-size Gaussian model 'Gau' and GrabCut method as a baseline. (Please see our paper for detailed explaintions)

Figure. Comparison of average Fβ for different saliency segmentation methods: FT [1], SEG [4], and ours, on THUR15000 dataset, which is composed by non-selected internet images.

Method	IT	AIM	IM	MSS	SEG	SeR	SUN	SWD	CB
Time (s)	0.611	4.288	0.991	0.106	4.921	1.019	1.116	0.100	5.568
Code Type	Matlab	Matlab	Matlab	Matlab	Matlab	Matlab	Matlab	Matlab	Matlab & C++
Method	GB	SR	FT	AC	CA	LC	HC	RC
Time (s)	1.614	0.064	0.102	0.109	53.1	0.018	0.019	0.254
Code Type	Matlab	Matlab	C++	Matlab	Matlab	C++	C++	C++

Table. Average time taken to compute a saliency map for images in the THUS10000 database. (Note that we use the authors original implementations for MSS and FT, which is not well optimized code.)

Method	FT	SEG	CB	Our
Time (s)	0.247	7.48	36.5	0.621
Code Type	Matlab	Matlab & C++	Matlab & C++	C++

Table. Comparison of average time for different saliency segmentation methods.

Figure. Saliency maps computed by different state-of-the-art methods~(b-p), and with our proposed HC~(q) and RC methods~(r). Most results highlight edges, or are of low resolution. See also the shared data for saliency detection results for the whole THUS10000 dataset.

Figure. Sketch based image comparison. In each group from left to right, first column shows images download from Flickr using the corresponding keyword; second column shows our retrieval results obtained by comparing user input sketch with SaliencyCut result using shape context measure [41]; third column shows corresponding sketch based retrieval results using SHoG [42].

Downloads

Some files are zip format with password. They will be available after corresponding paper be accepted. Red links are now avalible!

1. Data

The THUS10000 benchmark dataset comprises of 10, 000 images (181 MB), each of which has an unambiguous salient object and the object region is accurately annotated with pixel wise ground-truth labeling (13.1M). We provide saliency maps (5.3 GB containing 170, 000 image) for our methods as well as other 15 state of the art methods, including FT [1], AIM [2], MSS [3], SEG [4], SeR [5], SUN [6], SWD [7], IM [8], IT [9], GB [10], SR [11], CA [12], LC [13], AC [14], and CB [15]. Saliency segmentation (71.3MB) results for FT[1], SEG[4], and CB[10] are also avilable.

2. Windows executable

We supply an windows msi for install our prototype software, which includes our implementation for FT[2], SR[14], LC[28], our HC, RC and saliency cut method.

3. C++ source code

The C++ implementation of our paper as well as several other state of the art works.

4. Supplemental material

Supplemental materials (647 MB) including comparisons with other 15 state of the art algorithms are now available.

FAQs

Until now, more than 1000+ readers (according to email records) have request to get the source code for this project. Some of them have questions about using the code. Here are some frequently asked questions for new users to refer:

Q1: I’m confused with the sentence in the paper: “In our experiments, the threshold is chosen empirically to be the threshold that gives 95% recall rate in our fixed thresholding experiments”. But all most the case, people have not the ground truth, so cannot compute the call rate. When I use your Cut application, I need to guess threshold value to have good cut image.

A: The recall rate is just used to evaluate the algorithm. When you use it, you typically don't have to evaluate the algorithm itself very often. This sentence is used to explain what the fixed threshold we use typically means. Actually, when initialized using RC saliency maps, this threshold is 70 with saliency values normalized to [0,255]. It doesn’t mean that the saliency values corresponds to recall rate of 95% for every image, but empirically corresponds to recall rate of 95% for a large number of images. So, just use the suggested threshold of 70 is OK.

Q2: I use your code to get results for the same database you used. But the results seem to have some small difference from yours.

A: It seems that the cvtColor function in OpenCV 1.x is different from those in OpenCv 2.X. I suggest users to use those in recent versions. The segmentation method I used sometimes generates strange results, leading to strange results of saliency maps. This happens at low frequency. When this happens, I rerun the exe again and it becomes OK. I don't know why, but this really happens when I use the exe first time after compiling (Very strange, maybe because some default initializations). If someone find the bug, please report to me.

Q3: Does your algorithm only get good results for images with single salient object?

A: Mostly yes. As described in our paper, our method is suitable for images with an unambiguous saliency object. Since saliency detection methods typically have no prior knowledge about the target object, thus is very difficult. Much recent researches focus on images with single saliency object. Even for this simple case, state of the art algorithm may also fail. It's understandable since supervised object detection which uses a large number of training data and prior knowledge also fails in many cases.

However, the value of saliency detection methods lies on their applications in many fields. Because they don't need large human annotation for learning, and typically much faster than object detection methods, it’s possible to automatically process a large number of images with low cost. Although many of the saliency detection results may be wrong (up to 60% for noise internet image) because of the ambiguous or even missing of salient objects, we can still use efficient algorithms to select those good results and use them in many interesting applications like (Notes: all following projects use our saliency source code, with initial version of SaliencyCut used in our own Sketch2Photo project):

Image retrieval: Sketch2Photo: Internet Image Montage. Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, Shi-Min Hu. ACM SIGGRAPH Asia. 28, 5, 124:1-10, 2009.
Image editing: Semantic Colorization with Internet Images, Yong Sang Chia, Shaojie Zhuo, Raj Kumar Gupta, Yu-Wing Tai, Siu-Yeung Cho, Ping Tan, Stephen Lin, ACM SIGGRAPH Asia. 2011.
View selection: Web-Image Driven Best Views of 3D Shapes. The Visual Computer, 2011. Accepted. H Liu, L Zhang, H Huang
Image Collage: Arcimboldo-like Collage Using Internet Images.ACM SIGGRAPH Asia, 30(6), 2011. H Huang, L Zhang, HC Zhang
Image manipulation: Data-Driven Object Manipulation in Images. Chen Goldberg, Eurographics 2012, T Chen, FL Zhang, A Shamir, SM Hu.
Saliency For Image Manipulation, R. Margolin, L. Zelnik-Manor, and A. Tal, Computer Graphics International (CGI) 2012.
Mobile Product Search with Bag of Hash Bits and Boundary Reranking, Junfeng He, Xianglong Liu, Tao Cheng, Jinyuan Feng, Tai-Hsu Lin, Hyunjin Chung and Shih-Fu Chang, IEEE CVPR, 2012
SalientShape: Group Saliency in Image Collections. Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Shi-Min Hu. The Visual Computer, 2013
Much more: http://scholar.google.com/scholar?cites=9026003219213417480

Q4: I'm confused about the definition of saliency. Why the annotation format (isolated points, binary mask regions, and bounding boxes) in different benchmarks for evaluating saliency detection methods are so different?

There are 3 different saliency detection directions: i) fixation prediction, ii) salient object detection, iii) objectness estimation. They have very different research target and very different applications. Personally, I’m mainly interested in the last two problems and will discuss them in a bit more detail.

Eye fixation models aims at predicting where human looks, i.e. a small set of fixation points. The most famous method in this area is Itti’s work in PAMI 1998. The MIT benchmark is designed for evaluating such methods.

Salient object detection, as what is done in this work, aim at finding most salient object in a scene and segment the whole extent of that object. The output is typically a single saliency map (or figure-ground segmentation). The advantages and disadvantages are described in detail in Q3. High precision is a major focus of our work, as we can use shape matching based technique to effectively select good segmentations and build robust applications on top. Most widely used benchmark for evaluating this problem is MSRA1000, which precisely segment 1000 salient objects in MSRA images. Our method achieves 93% precision and 90% recall on MSRA1000 (previous best reported results: 75% precision and 83% recall). Since our results on MSRA100 are mostly comparable to ground truth annotations, we need more challenging benchmark. THUS10000 and THUR15000 are built for this purpose.

Objectness estimation is another attractive direction. These methods aim at proposing a small set (typically 1000) of bounding boxes to improve efficiency of classical sliding window pipeline. High recall at a small set of bounding box proposals is a major target. PASCAL VOC is a standard dataset for evaluating this problem. Using purely bottom up data driven methods to produce a single saliency map, as what is done in most salient object detection model, is less likely to succeed in this very challenging dataset. State of the art objectness proposal methods (PAMI12, IJCV13) achieves 90+% recall on challenging PASCAL VOC dataset given a relatively small (e.g. 1000) number of bounding boxes, while been computational efficient (4 seconds per image). This is especially useful for speed up multi-class object detection problem, as each classifier only need to examine a much smaller number of image windows (e.g. 1,000,000 -> 1,000).

Links to source code of other methods

FT	[1] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk,“Frequency-tuned salient region detection,” in IEEE CVPR, 2009, pp. 1597–1604.
AIM	[2] N. Bruce and J. Tsotsos, “Saliency, attention, and visual search: An information theoretic approach,” Journal of Vision, vol. 9, no. 3, pp. 5:1–24, 2009.
MSS	[3] R. Achanta and S. S ¨ usstrunk, “Saliency detection using maximum symmetric surround,” in IEEE ICIP, 2010, pp. 2653–2656.
SEG	[4] E. Rahtu, J. Kannala, M. Salo, and J. Heikkila, “Segmenting salient objects from images and videos,” ECCV, pp. 366–379, 2010.
SeR	[5] H. Seo and P. Milanfar, “Static and space-time visual saliency detection by self-resemblance,” Journal of vision, vol. 9, no. 12, pp. 15:1–27, 2009.
SUN	[6] L. Zhang, M. Tong, T. Marks, H. Shan, and G. Cottrell, “SUN: A bayesian framework for saliency using natural statistics,” Journal of Vision, vol. 8, no. 7, pp. 32:1–20, 2008.
SWD	[7] L. Duan, C. Wu, J. Miao, L. Qing, and Y. Fu, “Visual saliency detection by spatially weighted dissimilarity,” in IEEE CVPR, 2011, pp. 473–480.
IM	[8] N. Murray, M. Vanrell, X. Otazu, and C. A. Parraga, “Saliency estimation using a non-parametric low-level vision model,” in IEEE CVPR, 2011, pp. 433–440.
IT	[9] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE TPAMI, vol. 20, no. 11, pp. 1254–1259, 1998.
GB	[10] J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” in NIPS, 2007, pp. 545–552.
SR	[11] X. Hou and L. Zhang, “Saliency detection: A spectral residual approach,” in IEEE CVPR, 2007, pp. 1–8.
CA	[12] S. Goferman, L. Zelnik-Manor, and A. Tal, “Context-aware saliency detection,” in IEEE CVPR, 2010, pp. 2376–2383.
LC	[13] Y. Zhai and M. Shah, “Visual attention detection in video sequences using spatiotemporal cues,” in ACM Multimedia, 2006, pp. 815–824.
AC	[14] R. Achanta, F. Estrada, P. Wils, and S. S ¨ usstrunk, “Salient region detection and segmentation,” in IEEE ICVS, 2008, pp. 66–75.
CB	[15] H. Jiang, J. Wang, Z. Yuan, T. Liu, N. Zheng, and S. Li,“Automatic salient object segmentation based on context and shape prior,” in British Machine Vision Conference, 2011, pp. 1–12.
LP	[16] T. Judd, K. Ehinger, F. Durand, A Torralba, Learning to predict where humans look, ICCV 2009.

ametor

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Salient Object Detection and Segmentation

Salient Object Detection and SegmentationMing-Ming Cheng1,4 Niloy J. Mitra2 Xiaolei Huang3 Philip H. S. Torr4 Shi-Min Hu11TNList, Tsinghua University 2UCL/KAUST 3Lehigh University
复制链接

扫一扫