作者:
Zhizhong Wang* , Lei Zhao* , Wei Xing , Dongming Lu
College of Computer Science and Technology, Zhejiang University
{endywon, cszhl, wxing, ldm}@zju.edu.cn
摘要:
Recent studies using deep neural networks have shown remarkable success in style transfer especially for artistic and photo-realistic images. However, the approaches using global feature correlations fail to capture small, intricate textures and maintain correct texture scales of the artworks, and the approaches based on local patches are defective on global effect. In this paper, we present a novel feature pyramid fusion neural network, dubbed GLStyleNet, which sufficiently takes into consideration multi-scale and multilevel pyramid features by best aggregating layers across a VGG network, and performs style transfer hierarchically with multiple losses of different scales. Our proposed method retains high-frequency pixel information and low frequency construct information of images from two aspects: loss function constraint and feature fusion. Our approach is not only flexible to adjust the trade-off between content and style, but also controllable between global and local. Compared to state-of-the-art methods, our method can transfer not just large-scale, obvious style cues but also subtle, exquisite ones, and dramatically improves the quality of style transfer. We demonstrate the effectiveness of our approach on portrait style transfer, artistic style transfer, photo-realistic style transfer and Chinese ancient painting style transfer tasks. Experimental results indicate that our unified approach improves image style transfer quality over previous state-of-the-art methods, while also accelerating the whole process in a certain extent. Our code is available at https://github.com/EndyWon/GLStyleNet.
结论:
In this work, the two means are utilized to integrate the global and local statistical properties of images. Firstly, the integrated neural network, dubbed GLStyleNet, is designed to fuse pyramid features of content image and style image. As we know, the lower output features of VGG network retain much more high frequency details of image, however, the higher output features retain much more low frequency details of image. That is to say, the lower output features of VGG network retain local details (local information) of image (corresponding to larger feature scale), and the higher output features retain structure information (global information) of image (corresponding to smaller feature scale). Our GLStyleNet fuses multi-scale and multi-level pyramid features from VGG network. Secondly, our loss function constraint is based on global and local statistical attributes of images. The global statistics are based on gram matrices and the local statistics are based on feature patches. It is worth mentioning that the parameters could be learned automatically by extending and training GLStyleNet like [16], to replace complex manual parameter adjustment process. Our method dramatically improves the quality of style transfer, and the novel idea of layer aggregation helps us to accelerate the whole process with the least texture losing. Our approach is not only flexible to adjust the tradeoff between content and style, but also controllable between global and local, so it can be applied in numerous style transfer tasks especially for Chinese ancient painting style transfer which is never demonstrated before. Experimental results demonstrate that our proposed approach performs better than many state-of-the-art methods, and this could assist us to create more elaborate works. In future, we will explore our method in high fidelity image style transfer and generate exquisite details of transferred image while global structure accord with constraint of content image.