节点文献

基于弱监督与域适应的图像语义分割方法研究

Research on Image Semantic Segmentation based on Weak Supervision and Domain Adaptation

【作者】 陈涛

【导师】 唐振民;

【作者基本信息】 南京理工大学 , 控制科学与工程, 2022, 博士

【摘要】 作为场景理解的关键性技术,图像语义分割一直是计算机视觉领域一个非常活跃的研究方向,它在自动驾驶、目标识别、人机交互、步态识别以及视频监控等方面有着广泛的应用。随着深度学习的发展,基于全卷积网络的语义分割技术取得了巨大的成功。然而,训练深层神经网络往往需要大规模带标注的训练数据,为语义分割任务获取像素级精确标注更是格外费时费力。因此,近年来,越来越多的研究者将注意力转向研究如何摆脱大规模像素级标注,从而减轻语义分割任务的标注负担。基于弱监督与域适应的图像语义分割方法研究成为了新的热点方向。论文围绕图像语义分割这一视觉课题,对基于弱监督与域适应的方法进行了探索研究。论文从弱监督、域适应和单样本学习三个角度探索图像语义分割算法设计:针对弱监督任务中目标对象挖掘不足和类别激活图定位不完整问题,分别提出基于非显著区域对象挖掘算法和基于显著性引导的类间和类内关系约束算法;针对域适应任务中对抗学习方法存在特征失真和过拟合源域特征的问题,提出了增强特征空间对抗学习的算法;针对单样本学习任务中编码器提取的特征缺乏语义信息而容易导致网络过拟合的问题,提出了基于语义丰富的类别原型学习的算法。论文具体的研究内容以及创新之处包括以下几个方面:(1)针对现有弱监督语义分割方法对显著区域外的目标对象挖掘不足问题,提出了基于非显著区域对象挖掘的算法。该算法首先引入了一个基于图的全局推理单元来帮助分类网络捕获不连续和远距离区域之间的全局关系,从而增强网络激活散布在角落或图像边缘附近的目标对象的能力;随后,为了进一步挖掘非显著区域内的目标,该算法借助原始类别激活图来定位显著区域外的潜在对象,通过降低伪标签的假负率进一步发挥分割网络的自我校正能力;最后,该算法通过非显著区域掩码操作为复杂图像生成掩码伪标签,从而帮助网络进一步发现非显著区域中的对象。在PASCAL VOC 2012和MS COCO数据集上的实验表明了该算法的优越性。(2)针对现有弱监督语义分割方法中类别激活图只能定位目标对象部分最具鉴别力区域的不足,提出了一个基于显著性引导的类间和类内关系约束算法。该算法首先利用了一个显著性引导的类别无关距离损失,通过将特征与其类原型对齐来最小化类内特征方差;随后,利用一个类别特定距离损失来分离类间特征,并促使对象区域比背景具有更高的激活度;除了增强分类网络在类别激活图中激活更多完整对象区域的能力外,该算法还通过一个对象引导的标签精炼子方法,进一步利用分割预测和初始标签生成高质量的伪标签。在PASCAL VOC 2012和MS COCO数据集上的实验表明了该算法的有效性。(3)针对当前基于对抗学习的域适应方法存在特征失真和过拟合源域特征的问题,提出了增强特征空间对抗学习的算法。该算法首先通过消除鉴别器中的任何池化层或跨步卷积来平衡特征空间对抗训练,随后通过赋予鉴别器对特征结构的洞察能力,设计了分类约束鉴别器来缓解对抗性学习中的特征失真,该分类约束鉴别器促使特征生成器提取领域不变特征的同时保留对语义分割任务有用的结构信息;考虑到特征空间对抗学习的训练过程中,目标域特征对分类器的不可见性容易导致分类器过度拟合源域特征,该算法进一步提出了一种特征空间对抗学习与伪标签自训练相结合的混合协作框架。为了充分利用伪标签,该算法还通过对齐来自不同域的同类特征的中心来对齐域间特征分布。在GTA5→Cityscapes和SYNTHIA→Cityscapes任务上的实验表明了该算法的优越性。(4)针对现有单样本图像分割算法中编码器提取的特征缺乏语义信息而容易导致网络过拟合的不足,提出了基于语义丰富的类别原型学习的算法。该算法在回合训练中利用多类标签信息对网络进行约束,帮助网络提取类别感知的语义特征表示,从而为目标类别生成更具语义意义的类别原型;该算法进一步利用金字塔特征融合子方法对目标线索进行多尺度的融合特征挖掘;为了更好地利用支持图像及其掩码,该算法为支持图像的分割提出了一个自原型引导分支来获得更加稳健的目标类别原型。在测试过程中,该算法将查询图像的伪原型与支持图像的原型相结合,从而更好地指导查询图像的最终分割。在PASCAL-5~i和MS-COCO-20~i数据集上的实验表明了该算法的有效性。综上所述,论文从弱监督、域适应和单样本学习三个角度对图像语义分割方法进行了研究并提出了相应算法。论文通过充分的对比实验验证了所提出的各算法的有效性。

【Abstract】 As the key technology of scene understanding,image semantic segmentation has always been a very active research direction in the field of computer vision.It is widely used in automatic driving,object detection,human-computer interaction,gait recognition and video surveillance.With the development of deep learning,semantic segmentation has achieved great success with the backbone of fully convolutional networks.However,training deep neural networks usually requires a large amount of annotated training data.Obtaining pixel-level annotations for semantic segmentation are prohibitively expensive and time-consuming.Therefore,in recent years,an increasing number of researchers have turned their attention to get rid of large-scale pixel-level annotations,so as to reduce the annotation burden of semantic segmentation task.The research on image semantic segmentation based on weak supervision and domain adaptation has become a new hot direction.Image semantic segmentation has been explored in this dissertation from the per-spectives of weak supervision learning,domain adaptation and one-shot learning.Aiming at the problem of insufficient target object mining and incomplete localization of class ac-tivation maps in weakly supervised tasks,algorithms of non-salient region object mining and saliency guided inter-and intra-class relation constraints are proposed,respectively;aiming at the problems of feature distortion and classifier overfitting in domain adapta-tion tasks,an enhancing feature space adversarial learning method is proposed;aiming at the problem of network overfitting resulting from features lacking semantic information in one-shot semantic segmentation task,a semantically meaningful class prototype learning method is proposed.The main research content of this dissertation includes:(1)A non-salient region object mining approach is proposed for weakly supervised semantic segmentation to discover more objects outside the salient areas.A graph-based global reasoning unit is introduced to strengthen the classification network’s ability to capture global relations among disjoint and distant regions.This helps the network activate the object features outside the salient area.Then the self-correction ability of the segmentation network is further exerted to mine non-salient region objects.Specifically,a potential object mining module is proposed to reduce the false-negative rate in pseudo labels.Moreover,a non-salient region masking module is proposed for complex images to generate masked pseudo labels.The non-salient region masking module helps further discover objects in the non-salient region.Experiments on PASCAL VOC 2012 and MS COCO datasets reveal the superiority of the proposed approach.(2)A saliency guided inter-and intra-class relation constrained method is proposed for weakly supervised semantic segmentation to assist the expansion of the activated object regions in class activation maps.Specifically,a saliency guided class-agnostic distance module is proposed to pull the intra-category features closer by aligning features to their class prototypes.Further,a class-specific distance module is proposed to push the inter-class features apart and encourage the object region to have higher activation than the background.After strengthening the capability of the classification network to activate more integral object regions in class activation maps,an object guided label refinement module is introduced to take a full use of both the segmentation prediction and the initial labels for obtaining superior pseudo-labels.Experiments on PASCAL VOC2012 and MS COCO datasets verify the effectiveness of the proposed approach.(3)An enhancing feature space adversarial learning method is proposed for domain adaptation of semantic segmentation to alleviate the feature distortion and classifier over-fitting problem.First,a classification constrained discriminator is proposed to balance the adversarial training and alleviate the feature distortion problem.The classification constrained discriminator can help the feature generator extract domain-invariant fea-tures that are useful for segmentation rather than just ambiguous features to fool the domain discriminator.Next,to alleviate the classifier overfitting problem,self-training is collaboratively used to learn a domain robust classifier with pseudo labels selected from target domain noisy predictions.Moreover,an efficient class centroid calculation module is proposed,and the domain discrepancy is further reduced by aligning the feature cen-troids of the same class from different domains.Experiments on GTA5→Cityscapes and SYNTHIA→Cityscapes tasks demonstrate the superiority of the proposed approach.(4)A semantically meaningful class prototype learning method is proposed for one-shot image segmentation to alleviate the problem of network overfitting resulting from features lacking semantic information.The multi-class label information is leveraged during the episodic training for encouraging the network to generate more semantically meaningful features for each category.After integrating the target class cues into the query features,a pyramid feature fusion module is proposed to mine the fused features for the final classifier.Furthermore,to take more advantage of the support image-mask pair,a self-prototype guidance branch is proposed for the segmentation of support image.It can constrain the network for generating more compact features and a robust prototype for each semantic class.For inference,a fused prototype guidance branch is proposed for the segmentation of the query image.Specifically,the prediction of the query image is leveraged to extract the pseudo-prototype which will be combined with the initial proto-type.Then the fused prototype is utilized to guide the final segmentation of the query image.Experiments on PASCAL-5~iand MS-COCO-20~idatasets reveal the effectiveness of this approach.Research on image semantic segmentation based on weak supervision and domain adaptation is conducted in this dissertation.Three different tasks(i.e.,weak supervision learning,domain adaptation and one-shot learning)are explored,and methods are pro-posed accordingly.Comprehensive experiments verify the effectiveness of each algorithm.

  • 【分类号】TP391.41
节点文献中: 

本文链接的文献网络图示:

本文的引文网络