显著性benchmark以及相关代码

Many computational models of visual attention have been created from a wide variety of different approaches to predict where people look in images. Each model is usually introduced by demonstrating performances on new images, and it is hard to make immediate comparisons between models. To alleviate this problem, we propose abenchmark data set containing 300 natural images with eye tracking data from 39 observers to compare model performances. This is the largest data set with so many viewers per image. We calculate the performance of many models at predicting ground truth fixations using three different metrics: a receiver operating characteristic, a similarity metric, and the Earth Mover's Distance. We post the results here and provide a way for people to submit new models for evaluation.

paper

This benchmark is released in conjunction to the paper "A Benchmark of Computational Models of Saliency to Predict Human Fixations" by Tilke Judd, Fredo Durand and Antonio Torralba, available as a Jan 2012 MIT tech report.

images

300 benckmark images (The fixations from 39 viewers per image are not public such that no model can be trained using this data set.)

comparison of images and saliency maps for several models.

model performances

Model NameLink to codeArea under ROC* curve
(higher is better)
Similarity*
(higher is better)
Earth mover's distance*
(lower is better)
Humans**code0.92210
Judd et al.code0.8110.5063.13
CovSalpaper,website0.80560.50183.1092
Tavakoli et al. 2011paper andwebsite0.80330.49523.3488
Region Contrast 
[Cheng et al. 2011]
websitewith paper and code0.79220.47053.4180
Graph Based Visual Saliency (GBVS)code0.8010.4723.574
Multi-Resolution AIM(coming soon)0.77190.47113.3635
Center**code0.7830.4513.719
Saliency for Image Manipulationwebsite,paper,code0.7740.4394.137
RARE2012website0.77190.43634.1019
Random Center Surround Saliency (Narayan model)code,paper0.7530.423.465
Bruce and Tsotsos AIMcode (look for AIM.zip)0.7510.394.236
Itti&Koch2code from the GBVS package0.750.4054.560
Context-Aware saliencycode0.7420.394.90
Torralbacode0.6840.3434.715
Hou & Zhangcode0.6820.3195.368
SUN saliencycode from Lingyun Zhang's site0.6720.345.088
Itti&Kochcode from the Saliency Toolbox0.5620.2845.067
Achantacode0.5230.2976.854
Chance**code0.5030.3276.352
Matlab code for the metrics we use (ROC, S, EMD)
** Baseline models that we compare against

submit a new model

Instructions:
1) Download our 300 images  (IMAGES.zip)
2) Run your model to create saliency maps of each image. The saliency maps should be .jpg images of the same size and name as the original images. 
3) Submit your maps to  tjudd@csail.mit.edu (as a zip or tar folder).
4) We run the scoring metrics to compare how well your saliency maps predict where 39 observers looked on the images. Because we do not make the fixations public (to avoid any model from being trained on the data), it is not possible to score the model on your own. For reference, you can see the  Matlab code we use to score models.
5) We post your score and model details on this page. 
6) Let us know if you have a publication, website, or publicly available code for your model that we can link to your score in the chart above.

If you would also like to know your model's score with optimized blur and center weight, submit saliency maps of these  100 images from our ICCV MIT data set (listed below) which we will use to determine the optimal blur parameter and optimal center weight for your model.

other data sets

[If you have another fixation data set that you would like to list here, email tjudd@csail.mit.edu with a link and description.]
Other fixation data sets
MIT data set [Judd et al. 2009] has fixations from 15 viewers who free-viewed 1003 natural indoor and outdoor images. Created under similar conditions to the above saliency benchmark data set and can be used to train new models of saliency.

NUS data set [Subramanian et al. 2010] has fixations from ~25 observers free-viewing 758 images containing semantically affective objects/scenes such as expressive faces, nudes, unpleasant concepts, and interactive actions.

Toronto data set [Bruce and Tsotsos, 2009] contains data from 11 subjects free-viewing 120 color images of outdoor and indoor scenes. A large portion of images here do not contain particular regions of interest.

Ehinger data set [Ehinger et al. 2009] has fixations from 14 observers as they performed a search task (person detection) on 912 outdoor scenes.

FIFA [Cerf et al. 2009] has fixation data collected from 8 subjects doing a free-viewing task on 180 color outdoor and indoor images. Observers were asked to rate how interesting each image was. Images include salient objects and many different types of faces. This data set was originally used to establish that human faces are very attractive to observers and to test models of saliency that included face detectors.

DOVES(a Database Of Visual Eye movementS) [Linde et al. 2008] is a collection of eye movements from 29 human observers as they viewed 101 natural calibrated images. They are black and white and show natural outdoor scenes with no strongly salient objects.

Le Meur [Le Meur et al. 2006] has fixations of up to 40 observers who free-viewed 27 color images with strongly salient objects for 15 seconds.

IVC Data sets The Images and Video Communications team (IVC) of IRCCyN lab provides several image and video databases including eye movement recordings. Some of the databases are based on a free viewing task, other on a quality evaluation task. 


Other saliency-related data sets
MIT Low-resolution data set [Judd et al. 2011] has 168 natural images and 25 pink noise images at 8 different resolutions. 64 observers were distributed across the different resolutions of the images such that there is 8 viewers per image. Useful for studying fixations on low-resolution images.

Regional Saliency Dataset (RSD) [Li, Tian, Huang, Gao 2009]  (paper) A dataset for evaluating visual saliency in video.

MSRA Salient Object Database [Liu et al. 2007] database of 20,000 images with hand labeled rectangles of principle salient object by 3 users. 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值