2018NIMA_ Neural Image Assessment阅读笔记

最新推荐文章于 2024-08-08 08:02:27 发布

haixwang

最新推荐文章于 2024-08-08 08:02:27 发布

阅读量1.7k

点赞数

分类专栏： Deep/Machine Learning 文章标签： NIMA 图像美学评估

本文链接：https://blog.csdn.net/HaixWang/article/details/84028134

版权

27 篇文章 2 订阅

订阅专栏

前言

笔记最初是在有道云做的，没用markdown编写，时间原因，前面先上笔记截图吧。不影响阅读。有时间补上文字版。

在这里插入图片描述

While technical quality assessment deals with measuring low-level degradations such as noise, blur, compression artifacts, etc., aesthetic assessment quantifies semantic level characteristics associated with emotions and beauty in images.
技术质量评估涉及测量噪声，模糊，压缩伪像等，美学评估量化了与图像中的情感和美感相关的语义级别特征。
weights are initialized by training on classification related datasets (e.g. ImageNet [15]), and then fine tuned on annotated data for perceptual quality assessment tasks.
由图像分类相关数据集（训练的模型）初始化权重（例如ImageNet），然后使用感知质量评估任务的标注数据进行微调。
blind quality assessment
无参考评估
semantic level qualities are directly related to image content
语义级别质量与图像内容直接相关
数据集一：AVA Dataset
I、这里的AVA Dataset 是什么？
一个美学质量评估的数据库(目前33.14Gb)，大约有255000张照片。每一张照片，都有多人参与的评分，以及语义级别的label;
60多个类别的语义标签以及与图像质量相关的摄影风格标签分类。

II、 AVA Dataset都做了什么标注？
Aesthetic annotations（人类审美）

每一张图，若干人投票，一张图平均200个人投票。投票的分数0～9，分值越高，说明图片质量越高;
并且，标注者中不止包括了专业的图像工作者，摄影师，也包括了摄影爱好者，这样显得更有普适性。
平均得分5.5
Semantic annotations（语义标注）
就是图像中到底包含了什么内容。具体来说，这个数据集包含了66个textual tags。大概有200000张图只包含一个tags，150000张图包含2个tags。
有的是描述图像的内容，比如说，水果，自然风景，人物，建筑等等;
有的是描述图像的风格，比如black and white。

摄影美学，专业角度
下面列出了所有属性，以及包含该属性的图片数量。
Complementary Colors - 互补色(949), Duotones - 双色调(1,301), High Dynamic Range - 高动态范围图像(396), Image Grain (840), Light on White (1,199), Long Exposure (845), Macro (1,698), Motion Blur (609), Negative Image (959), Rule of Thirds (1,031), Shallow DOF (710), Silhouettes (1,389), Soft Focus (1,479), Vanishing Point (674).
比如Complementary Colors：互补色，是成对的颜色，当组合时，相互抵消。这意味着当组合时，它们会产生灰度色，如白色或黑色。

AVA Dataset可参考：简书——言有三：https://www.jianshu.com/p/50da0dd4bf19
III、数据集方差
方差小，这说明大家比较能达成一致，
方差大，这说明有些人认为很好，有些人认为很差。

a novel approach to predict both technical and aesthetic qualities of images.
一种对图像进行图像技术和美学质量预测的新方法。[这两个是分开训练的]
we aim for predictions with higher correlation with human ratings, instead of classifying images to low/high score or regressing to the mean score, the distribution of ratings are predicted as a histogram
我们的目标是得到与人类评价关联度更高的预测值，而不是（简单的）将图像分类为低/高分或回归到平均得分，因此评级的分布被预测为直方图。
Given the distribution of AVA scores, typically, training a model on AVA data results in predictions with small deviations around the overall mean (5.5).
考虑到AVA数据评分的分布，通常，在AVA数据上训练模型会导致预测时整体平均值（5.5）周围的偏差很小。
数据集二：Tampere Image Database 2013

LIVE dataset contains 1162 photos captured by mobile devices. Each image is rated by an average of 175 unique subjects.
LIVE数据集包含1162张由移动设备捕获的照片。每张图片的平均从175个不同的主题来打分。

Our proposed quality and aesthetic predictor stands on image classifier architectures. More explicitly, w
我们提出的质量和美学预测模型建立在图像分类模型架构之上。
Baseline network 的权重由ImageNet数据集训练初始化，全连接层（10个神经元）随机初始化
研究团队对于VGG16、Inception V2（更佳）、MobileNet都进行了实验
在训练中，输入图像被重新缩放为256×256，然后随机提取大小为224×224的裁剪区域
算法：

i是第i类score bucket，N是score buckets数量。落入N类不同得分的概率之和为1。

it has been shown that for ordered classes, the classification frameworks can outperform regression models [21], [31]. Hou et al. [21] show that training on datasets with intrinsic ordering between classes can benefit from EMD-based losses. These loss functions penalize misclassifications according to class distances.
[21]，[31]Hou等人已经证明：对于有序类，分类框架可以胜过回归模型。 [21]表明，对于类之间具有内在排序的数据集的训练，可以从基于EMD的损失函数中受益。这些损失函数会根据类之间的距离来惩罚那些错误的分类。
损失函数的重点：