显著性benchmark以及相关代码

最新推荐文章于 2022-08-04 10:40:27 发布

徇齐

最新推荐文章于 2022-08-04 10:40:27 发布

阅读量1.9k

点赞数

分类专栏：图像特征

本文链接：https://blog.csdn.net/cheng1988shu/article/details/9995261

版权

图像特征专栏收录该内容

16 篇文章 0 订阅

订阅专栏

Many computational models of visual attention have been created from a wide variety of different approaches to predict where people look in images. Each model is usually introduced by demonstrating performances on new images, and it is hard to make immediate comparisons between models. To alleviate this problem, we propose abenchmark data set containing 300 natural images with eye tracking data from 39 observers to compare model performances. This is the largest data set with so many viewers per image. We calculate the performance of many models at predicting ground truth fixations using three different metrics: a receiver operating characteristic, a similarity metric, and the Earth Mover's Distance. We post the results here and provide a way for people to submit new models for evaluation.

paper

This benchmark is released in conjunction to the paper "A Benchmark of Computational Models of Saliency to Predict Human Fixations" by Tilke Judd, Fredo Durand and Antonio Torralba, available as a Jan 2012 MIT tech report.

images

300 benckmark images (The fixations from 39 viewers per image are not public such that no model can be trained using this data set.)

A comparison of images and saliency maps for several models.

model performances

Model Name	Link to code	Area under ROC* curve (higher is better)	Similarity* (higher is better)	Earth mover's distance* (lower is better)
Humans**	code	0.922	1	0
Judd et al.	code	0.811	0.506	3.13
CovSal	paper,website	0.8056	0.5018	3.1092
Tavakoli et al. 2011	paper andwebsite	0.8033	0.4952	3.3488
Region Contrast [Cheng et al. 2011]	websitewith paper and code	0.7922	0.4705	3.4180
Graph Based Visual Saliency (GBVS)	code	0.801	0.472	3.574
Multi-Resolution AIM	(coming soon)	0.7719	0.4711	3.3635
Center**	code	0.783	0.451	3.719
Saliency for Image Manipulation	website,paper,code	0.774	0.439	4.137
RARE2012	website	0.7719	0.4363	4.1019
Random Center Surround Saliency (Narayan model)	code,paper	0.753	0.42	3.465
Bruce and Tsotsos AIM	code (look for AIM.zip)	0.751	0.39	4.236
Itti&Koch2	code from the GBVS package	0.75	0.405	4.560
Context-Aware saliency	code	0.742	0.39	4.90
Torralba	code	0.684	0.343	4.715
Hou & Zhang	code	0.682	0.319	5.368
SUN saliency	code from Lingyun Zhang's site	0.672	0.34	5.088
Itti&Koch	code from the Saliency Toolbox	0.562	0.284	5.067
Achanta	code	0.523	0.297	6.854
Chance**	code	0.503	0.327	6.352

* Matlab code for the metrics we use (ROC, S, EMD)
** Baseline models that we compare against

submit a new model

Instructions:
1) Download our 300 images (IMAGES.zip)
2) Run your model to create saliency maps of each image. The saliency maps should be .jpg images of the same size and name as the original images.
3) Submit your maps to tjudd@csail.mit.edu (as a zip or tar folder).
4) We run the scoring metrics to compare how well your saliency maps predict where 39 observers looked on the images. Because we do not make the fixations public (to avoid any model from being trained on the data), it is not possible to score the model on your own. For reference, you can see the Matlab code we use to score models.
5) We post your score and model details on this page.
6) Let us know if you have a publication, website, or publicly available code for your model that we can link to your score in the chart above.

If you would also like to know your model's score with optimized blur and center weight, submit saliency maps of these 100 images from our ICCV MIT data set (listed below) which we will use to determine the optimal blur parameter and optimal center weight for your model.

other data sets

[If you have another fixation data set that you would like to list here, email tjudd@csail.mit.edu with a link and description.]

Other fixation data sets

MIT data set [Judd et al. 2009] has fixations from 15 viewers who free-viewed 1003 natural indoor and outdoor images. Created under similar conditions to the above saliency benchmark data set and can be used to train new models of saliency.

NUS data set [Subramanian et al. 2010] has fixations from ~25 observers free-viewing 758 images containing semantically affective objects/scenes such as expressive faces, nudes, unpleasant concepts, and interactive actions.

Toronto data set [Bruce and Tsotsos, 2009] contains data from 11 subjects free-viewing 120 color images of outdoor and indoor scenes. A large portion of images here do not contain particular regions of interest.

Ehinger data set [Ehinger et al. 2009] has fixations from 14 observers as they performed a search task (person detection) on 912 outdoor scenes.

FIFA [Cerf et al. 2009] has fixation data collected from 8 subjects doing a free-viewing task on 180 color outdoor and indoor images. Observers were asked to rate how interesting each image was. Images include salient objects and many different types of faces. This data set was originally used to establish that human faces are very attractive to observers and to test models of saliency that included face detectors.

DOVES(a Database Of Visual Eye movementS) [Linde et al. 2008] is a collection of eye movements from 29 human observers as they viewed 101 natural calibrated images. They are black and white and show natural outdoor scenes with no strongly salient objects.

Le Meur [Le Meur et al. 2006] has fixations of up to 40 observers who free-viewed 27 color images with strongly salient objects for 15 seconds.

IVC Data sets The Images and Video Communications team (IVC) of IRCCyN lab provides several image and video databases including eye movement recordings. Some of the databases are based on a free viewing task, other on a quality evaluation task.

Other saliency-related data sets

MIT Low-resolution data set [Judd et al. 2011] has 168 natural images and 25 pink noise images at 8 different resolutions. 64 observers were distributed across the different resolutions of the images such that there is 8 viewers per image. Useful for studying fixations on low-resolution images.

Regional Saliency Dataset (RSD) [Li, Tian, Huang, Gao 2009] (paper) A dataset for evaluating visual saliency in video.

MSRA Salient Object Database [Liu et al. 2007] database of 20,000 images with hand labeled rectangles of principle salient object by 3 users.