节点文献
基于深度学习的图像风格迁移算法研究
Research on Deep Learning-Based Image Style Transfer Algorithms
【作者】 王志鹏;
【导师】 全红艳;
【作者基本信息】 华东师范大学 , 软件工程(专业学位), 2023, 硕士
【摘要】 图像风格迁移技术是指从输入的风格图像中提取风格特征,融入到输入的内容图像提取的内容特征中,生成一张新的具有二者特征图像的技术。近些年图像风格迁移算法多采用卷积神经网络和生成对抗网络进行图像风格迁移,但已有的算法训练出的模型执行风格迁移时,特征融合过程中存在收敛慢、网络参数过少、特征提取不充分、执行耗时长等问题。迁移后的图像中部分区域也会存在明显的局部颜色不均匀的问题。本文通过使用深度学习方法实现图像风格迁移任务,同时提出能够对任意指定内容和风格图像组进行实时风格迁移和具有良好视觉效果的风格迁移解决方案。首先提出一种基于卷积神经网络的多级特征融合颜色变换图像风格迁移方法。它包括以下2个关键点:使用卷积神经网络提取输入图像的多层特征表示,使用颜色变换生成图像。多级特征融合将来自不同层次的特征进行融合,有助于在不同尺度上捕捉更多的图像特征,颜色变换有效保留输入图像内容信息。实验表明,算法可以对任意一组内容和风格图像进行风格迁移,1080p的图像迁移速度达到40FPS,实现实时图像风格迁移。其次提出一种基于多头注意力机制的图像风格迁移方法,强调了多头注意力机制的应用。将输入图像分解成小的图像块,并将这些块转换为可用于模型的向量表示。通过将嵌入的块输入到多头注意力网络中,模型能够学习全局和局部的上下文信息,这对于理解图像的结构和语义非常有用。采用颜色变换来确保生成图像的内容与原始图像保持一致。在训练过程中,使用适当的损失函数来指导生成的图像与目标风格图像之间的一致性。实验表明算法可以对任意一组内容和风格图像进行风格迁移,实验结果证明算法能够联系远距离特征,从图像语义角度将风格特征迁移到内容图像中与风格图像特征更相似的那部分图像中去。最后提出无引导词的控制扩散模型,在扩散模型基础上摆脱对引导词的依赖,使网络支持输入风格特征图像特征。在控制网络的作用下,通过内容图像线稿图约束,使得最终输出图像兼具内容与风格的特征。通过对比实验,本文的控制网络模型网络参数少,对数据集数据量要求低,有效降低训练时长,实验结果表明本文无引导词的控制扩散模型对图像特征细节保留更全面。通过对风格迁移的关键步骤进行研究,针对如何有效学习图像的风格信息和如何将风格信息融入到图像语义信息中,本文的网络模型提供一个新的思路和发展方向。
【Abstract】 Image style transfer technology refers to the technology of extracting style features from the input style image,integrating them into the content features extracted from the input content image,and generating a new image with both features.In recent years,image style transfer algorithms have mostly used convolutional neural networks and generative adversarial networks for image style transfer.However,when models trained by existing algorithms perform style transfer,the feature fusion process suffers from slow convergence and too few network parameters.Insufficient feature extraction,long execution time and other problems.There will also be obvious local color unevenness in some areas of the migrated image.This paper implements the image style transfer task by using deep learning methods,and proposes a style transfer solution that can perform real-time style transfer and have good visual effects for any specified content and style image group.First,this article experimentally proposes an image style transfer method based on convolutional neural networks.Its key features are multi-level feature fusion and bilateral affine transformation.Specifically,it includes the following key points: First,use a CNN model to extract multi-layer feature representations of the input image,and fuse features from different levels to retain the content information and style information of the image.Multi-level feature fusion helps capture more image features at different scales.When generating images,a bilateral affine transformation is used to preserve the structure and shape of the input image.Helps produce more natural images while retaining style information.The experiment demonstrates that algorithms can perform style transfer on any given pair of content and style images.The image transfer speed for 1080 p resolution reaches 40 frames per second(FPS),enabling real-time image style transfer.Secondly,this article proposes an image style transfer method based on patchembed transformer,emphasizing the application of the Transformer model.Break the input image into small image patches and use the Transformer’s embedding layer to convert these patches into vector representations that can be used in the model.By feeding embedded patches into the Transformer encoder,the model is able to learn global and local context information,which is very useful for understanding the structure and semantics of images.Similar to the previous chapter,a bilateral affine transformation is used to ensure that the structure of the generated image remains consistent with the original image.During the training process,appropriate loss functions,such as content loss and style loss,are used to guide the consistency between the generated images and the target style images.The experiments demonstrate that the algorithm is capable of performing style transfer on arbitrary pairs of content and style images.The results confirm that the algorithm can effectively capture long-range features and migrate style features from the style image to the corresponding part of the content image by considering image semantics,which shares more similarity in terms of image semantics.At last,this article proposes an image style transfer method based on diffusion model.The no-prompt control net regards style image features as input instead of prompt words.Through comparative experiments,the control network model in this article has fewer network parameters,lower requirements on the amount of data set,and effectively reduces the training time.The experimental results show that the noprompt control net diffusion model in this article retains image feature details more comprehensively.By studying the key steps of style transfer,the network model in this article provides a new idea and development direction for how to effectively learn the style information of images and how to integrate style information into image semantic information.
【Key words】 Image-to-image Translation; Style Transfer; Real Time Transfer; Photographic style transfer;
- 【网络出版投稿人】 华东师范大学 【网络出版年期】2025年 01期
- 【分类号】TP391.41;TP18