利用计算机生成图像来理解深度特征
Understanding deep features with computer-generated imagery
Mathieu Aubry Bryan C. Russell
´
Ecole des Ponts ParisTech UC Berkeley Adobe Research
mathieu.aubry@imagine.enpc.fr brussell@
Abstract not possible, e.g. due to lack of labeled data.
Prior work has focused on a part-based analysis of the
We introduce an approach for analyzing the variation of learned convolutional filters. Examples include associat-
features generated by convolutional neural networks (CNNs) ing filters with input image patches having maximal re-
with respect to scene factors that occur in natural images. sponse [12], deconvolution starting from a given filter re-
Such factors may include object style, 3D viewpoint, color, sponse [38], or by masking the input to recover the recep-
and scene lighting configuration. Our approach analyzes tive field of a given filter [39] to generate “simplified im-
CNN feature responses corresponding to different scene fac- ages” [6, 31]. Such visualizations typically reveal the parts
tors by controlling for them via rendering using a large of an object [38] (e.g. “eye” of a cat) or scene [39] (e.g.
database of 3D CAD models. The rendered images are pre- “toilet” in bathroom). While these visualizations reveal the
sented to a trained CNN and responses for different layers nature of learned filters, they largely ignore the question of
are studied with respect to the input scene factors. We per- the dependence of the CNN representation o