Human-Eye-Fixation-Detection

SAM Model For Human Eye Fixation Detection
This article is mainly to reproduce the references of human eye saliency detection(maybe or fixation or attention, come on…never mind!). For more information, please refer to my github.

https://github.com/huangcaohui/Eye-Fixation-Detection

For this dataset, there are 1600 images in the training set, 400 images in the test set. 20 types of training data, 80 for each type of data; 20 types of test data, 20 for each type of data. And the original image size is 1080 × 1920 1080 \times 1920 1080×1920.

The dataset can be downloaded from the following link:

链接: https://pan.baidu.com/s/141pFLmurCyD2XWFB6ddjSg 提取码: xpb8 复制这段内容后打开百度网盘手机App,操作更方便哦

The code running environment is as follows:

Server Configuration: Intel Xeon CPU E5-2683 v3/28 kernels
RAM: 348G
Graphics Card: Nvidia Tesla P100/16G video memory
Operating System: Windows server 2012
Deep Learning Framework: Tensorflow-gpu 2.3

Considering limited computing resources, the image is scaled to 216 × 384 216\times384 216×384.

The entire model combines DCN-VGG, ConvLSTM, and Gaussian Prior networks. The output of the previous network is used as the input of the next network, and at the end, the output of the Gaussian Prior network is subjected to single-channel convolution, upsampling and normalization to obtain the final result , which constitutes the complete SAM model.

  • The DCN-VGG network is as follows:

  • The ConvLSTM network is as follows:

  • The Gaussian Prior network is as follows:

Through these three-level networks, the entire network is finally formed as follows:

When running the program on the server, the amount of parameters for the entire network is

For the SAM model, the evaluation index no longer simply selects the KL divergence or the Pearson coefficient CC, but forms the loss function of the SAM model through the linear combination of KLD and CC.

  • CC (Correlation Coefficient Loss) evaluates the correlation between the predicted saliency map and the actual saliency map through covariance and variance. The specific formula is:
    L 1 ( y ~ , y d e n ) = σ ( y ~ , y d e n ) σ ( y ~ ) ⋅ σ ( y d e n ) {L_1}\left( {\tilde y,{y^{{\mathop{\rm den}\nolimits} }}} \right) = {{\sigma \left( {\tilde y,{y^{den}}} \right)} \over {\sigma (\tilde y) \cdot \sigma \left( {{y^{den}}} \right)}} L1(y~,yden)=σ(y~)σ(yden)σ(y~,yden)

  • KLD (Kullback-Leibler divergence) is mainly used to evaluate the approximation degree of two distribution functions. Its specific formula is:
    L 2 ( y ~ , y d e n ) = ∑ i y i d e n log ⁡ ( y i d e n y ~ i + ε + ε ) {L_2}\left( {\tilde y,{y^{{\rm{den }}}}} \right) = \sum\limits_i {y_i^{{\rm{den }}}} \log \left( {{{y_i^{{\rm{den }}}} \over {{{\tilde y}_i} + \varepsilon }} + \varepsilon } \right) L2(y~,yden)=iyidenlog(y~i+εyiden+ε)

The final evaluation index is:
L ( y ~ , y d e n ) = β L 1 ( y ~ , y d e n ) + γ L 2 ( y ~ , y d e n ) L\left( {\tilde y,{y^{{\rm{den }}}}} \right){\rm{ = }}\beta {L_1}\left( {\tilde y,{y^{{\rm{den }}}}} \right) + \gamma {L_2}\left( {\tilde y,{y^{{\rm{den }}}}} \right) L(y~,yden)=βL1(y~,yden)+γL2(y~,yden)
Loss, CC, KLD curves of the final training set are as follows:

Loss, CC, KLD curves of the final validation set are as follows:

According to the training results, load the optimal model saved when the epoch is 26 for the test set, and the final test results of each category are shown in the table

Metric\CategoryActionAffectiveArtBlackWhiteCartoon
KLD0.40210.42890.39980.39360.3363
CC0.79590.80610.79770.81570.8303
Metric\CategoryFractalIndoorInvertedJumbledLineDrawing
KLD0.39750.37290.37040.35440.3168
CC0.82180.83280.83610.82070.8679
Metric\CategoryLowResolutionNoisyObjectOutdoorManMadeOutdoorNatural
KLD0.31310.38690.32310.39790.3809
CC0.88910.85090.85710.80830.8197
Metric\CategoryPatternRandomSateliteSketchSocial
KLD0.33230.36560.36540.25910.4343
CC0.86980.84140.85000.89160.7853

The output images of 26th round of training are shown in figure as follows, where each row represents the original RGB image, the real saliency map, the single-channel convolution normalized DCN-VGG output, and the single-channel convolution normalized ConvLSTM output in turn , Single-channel convolution normalizes the Gaussian Prior output and the final prediction output; each column represents a set of image data.

The above is the main content of SAM model for human eye fixation detection, mainly a reproduction of the reference, details can be found in this reference. More content can be viewed in the pdf “Test Report.pdf”.

References:

[1]. M. Cornia, L. Baraldi, G. Serra and R. Cucchiara, “Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model,” in IEEE Transactions on Image Processing, vol. 27, no. 10, pp. 5142-5154, Oct. 2018, doi: 10.1109/TIP.2018.2851672.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
1. 显著点的检测 Itti的A Model of Saliency-Based Visual Attention for Rapid Scene Analysis (TPAMI 1999)论文是显著性检测的鼻祖论文,检测出来的是用户关注的点。 2. 显著区域的检测 侯晓迪同学在2007年发表的一篇CVPR的论文,用很简单的方法检测了显著性区域,那之后显著性检测主要以区域检测为主:Saliency detection: A spectral residual approach (CVPR 2007),虽然之后有人诟病这篇论文有不足之处,但该想法简单,推动了显著性研究的普及。侯同学靠这一篇文章再加上投稿期间的趣事,就封神了。 3. 其他经典的显著性检测方法 在那之后陆续又有一些经典的显著性检测算法被提出:https://blog.csdn.net/touch_dream/article/details/78716507 可以看这个博文。 4. 基于深度学习的显著性检测 再之后,显著性检测领域就进入了Deep Learning时代, Deep Visual Attention Prediction TIP2018 (CODE)     https://github.com/wenguanwang/deepattention Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (CODE)     https://github.com/marcellacornia/sam CVPR2016 Shallow and Deep Convolutional Networks for Saliency Prediction (CODE)     https://github.com/imatge-upc/saliency-2016-cvpr Saliency Detection with GAN (2017)     https://github.com/imatge-upc/saliency-salgan-2017  (CODE)     https://github.com/batsa003/salgan/ (PyTorch的版本) 5. 非自然图象的显著性检测 例如,海报的显著性检测,图表的显著性检测,地理数据的显著性检测等等。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值