NIMA: Neural Image Assessment
H. Talebi, P. Milanfar, NIMA: Neural Image Assessment, TIP (2018)
摘要
使用卷积神经网纲预测人类评分的分布(predict the distribution of human opinion scores using a convolutional neural network)
图像评分可靠、与人类的认知高度相关,并能辅助图像质量优化、增强(not only score images reliably and with high correlation to human perception, but also to assist with adaptation and optimization of photo editing/enhancement algorithms in a photographic pipeline)
无需参考图像(without need of a “golden” reference image)
1 引言
技术质量评价(technical quality assessment):衡量图像低层次衰退(measuring low-level degradations),如噪声、模糊、压缩伪影等
审美评价(aesthetic assessment):量化(quantify)图像中与情感和审美(emotions and beauty)相关的语义层面特征(semantic level characteristics)。
图像质量评价(image quality assessment)分为全参考(full-reference)和无参考(no-reference):有参考图像时,评价指标可采用PSNR、SSIM;无参考方法利用失真统计模型预测图像质量(blind (no-reference) approaches rely on a statistical model of distortions)。
A 相关工作
B 本文贡献
相比图像高、低评分分类(classifying images to low/high score)和平均分回归(regressing to the mean score),本文预测评分的直方图分布(the distribution of ratings are predicted as a histogram),预测结果与人类评分高度相关。
C AVA数据集(A Large-Scale Database for Aesthetic Visual Analysis)
D TID2013数据集(Tampere Image Database 2013)
2 方法
本文提出的质量、美学预测器以图像分类器为基础网络(the quality and aesthetic predictor stands on image classifier architectures),如VGG16、Inception-v2、MobileNet。
将CNN基础网络(baseline)的输出层替换为10个神经元(10 neurons)的全连接层(a fully-connected layer)并采用softmax激活。
训练阶段:将输入图像缩放至 256 × 256 256 \times 256 256×256(rescaled to 256 × 256 256 \times 256 256×256)、随机裁剪 224 × 224 224 \times 224 224×224区域并随机水平翻转(horizontal flipping)。
本文目标是预测给定图像的评分分布(the distribution of ratings) p ^ \hat{\mathbf{p}} p^。人类对给定图像评分的真实分布(ground truth distribution)可表示为经验概率质量函数(an empirical probability mass function)
p = [ p s 1 , ⋯ , p s N ] , ∑ i = 1 N p s i = 1 , s 1 ≤ s i ≤ s N \mathbf{p} = \left[ p_{s_1}, \cdots, p_{s_N} \right], \sum_{i = 1}^{N} p_{s_i} = 1, \ s_1 \leq s_i \leq s_N p=[ps1,⋯,psN],i=1∑Npsi=1, s1≤si