节点文献

基于多特征融合的人脸表情识别研究

Facial Expression Recognition Methods Based on Multiple Feature Fusion

【作者】 张海峰

【导师】 汪增福;

【作者基本信息】 中国科学技术大学 , 控制科学与工程, 2020, 博士

【摘要】 面部表情是人类在交流过程中传达其情感状态和意图最有力、最自然、最普遍的一种非语言手段。人脸表情识别在涉及人机交互的诸多场合(如游戏娱乐、医疗康复、交通安全、教学评估和产品营销等)均有广泛的应用前景。由于人脸表情的复杂性和可变性,使得人脸表情识别成为一个极富挑战性的研究课题。人脸表情识别亟待解决的问题主要包括以下几个方面:(1)现有的大多数人脸表情识别算法容易受个体身份因素的影响,缺乏应对身份变化的能力。(2)大多数基于全局特征的人脸表情识别算法,容易忽略对人脸表情识别起至关重要作用的局部细节特征。(3)基于局部特征的人脸表情识别算法对数据标签有高度要求,算法的计算复杂度高,建模难度大。(4)单一的方法,不论是基于局部特征的方法还是基于全局特征的方法,在面对复杂场景时,算法会出现性能波动大、鲁棒性差等问题。针对上述问题,本文以提升人脸表情识别算法的准确性和魯棒性为目标,对基于静态人脸表情图片的多特征融合学习问题进行了深入研究,提出了一系列有针对性的人脸表情识别算法。论文的主要贡献和创新点如下:(1)针对表情特征类内差异大、类间差异小,而普通卷积神经网络不能很好地表达局部特征等问题,提出了一种基于人脸表情双线性编码模型的人脸表情识别算法。所述算法利用两个基于卷积神经网络的特征提取器来提取特征,并利用双线性编码模型对其输出进行组合以生成突出局部细节特征的特征表达。实验结果表明,所述算法可在无任何局部标签的情况下以端到端的方式实现对局部特征的有效提取,显著提升人脸表情的识别率。(2)针对现有算法难以应对身份变化,不能充分利用身份特征的问题,提出了一种基于身份-表情对偶分支网络的人脸表情识别算法。不同于先前抑制身份特征的算法,该算法强化身份特征的影响,并利用双线性模型融合表情和身份特征,自适应地学习身份与表情之间的关系。实验结果表明,融入身份特征后形成的新特征具有更强的判别性,增强了表情识别算法对身份变化的鲁棒性,提高了人脸表情识别的识别率。(3)针对现有方法中存在的局部特征提取方式过于复杂、局部-全局互补效应没有得到有效利用等问题,提出了一种融合全局特征和感兴趣区域局部特征的人脸表情识别算法。首先,设计了注意力图生成器,在弱监督条件下得到一组指示感兴趣区域的注意力图。其次,采用双线性注意力池化的方法来生成和细化局部特征,设计了选择性特征单元,它允许在分类之前对全局和局部特征进行自适应加权融合。此外,定义了局部和全局特征的对比损失,并将其用于提高不同粒度下的类间离散性和类内紧凑性。实验结果表明,该方法可以准确定位感兴趣的人脸局部区域,相比于使用单一的全局或局部特征,可显著提升人脸表情的识别率。(4)针对大多数已有算法提取局部特征时需要额外信息辅助,且提取的局部特征粒度相对单一等问题,提出了一种基于细化的水平金字塔网络的多粒度特征人脸表情识别算法。首先,设计了水平金字塔网络,在不同的水平金字塔尺度下使用在特征图上均匀划分得到的局部特征进行分类,有效地利用了人脸各部分的区分能力。其次,在水平金字塔网络中加入细化机制用于改善因均匀划分而在局部区域产生的异常值,使细化后的局部区域具有更强的特征一致性。在多个数据集上的实验结果表明,细化的水平金字塔网络在不使用任何额外的局部监督信息的情况下可大幅提升人脸表情识别率。

【Abstract】 Facial expressions are the most powerful,natural,and most common nonverbal means for humans to convey their emotional states and intentions during communica-tion.Facial expression recognition has wide application prospects in many occasions involving human-computer interaction,such as game entertainment,medical rehabilitation,traffic safety,teaching evaluation,product marketing and so on.Due to the complexity and variability of facial expressions,facial expression recognition has become a very challenging research topic.The problems to be solved in facial expression recognition mainly include the following aspects:(1)Most existing facial expression recognition algorithms are easily affected by individual identity factors and lack the ability to cope with identity changes.(2)Most facial expression recognition algorithms based on global features are easy to ignore the local detail features that are crucial to facial expression recognition.(3)The facial expression recognition algorithm based on local features has high requirements on data labels,and the algorithm has high calculation complexity and difficulty in modeling.(4)A single feature based method,whether it is a method based on local features or a method based on global features,when facing complex scenes,the algorithm will have problems such as large performance fluctuations and poor robustness.In view of the above problems,this paper aims to improve the accuracy and ro-bustness of facial expression recognition algorithms,and conducts an in-depth study on the multi-feature fusion learning problem based on static facial expression pictures,and proposes a series of targeted faces Expression recognition algorithm.The main contributions can be summarized into the following four components:(1)Aiming at the problems of large intra-class differences and small inter-class differences of expression features,and ordinary convolutional neural networks are back-ward in extracting local features,a facial expression recognition method based on facial expression bilinear encoding model is proposed.The model uses two feature extractors based on convolutional neural networks to extract features,and uses a bilinear encoding model to combine its outputs to generate feature expressions that highlight local detail features.Experimental results show that the method can effectively extract local features in an end-to-end manner without any local labels,and significantly improve the recognition accuracy.(2)Aiming at the problem that existing methods are difficult to cope with iden-tity changes and cannot fully utilize identity features,a facial expression recognition method based on identity-expression dual branch network is proposed.Unlike previous methods that suppress identity features,this method enhances the influence of identity features and uses a bilinear model to fuse expression and identity features to adaptively learn the relationship between identity and expression.The experimental results show that the new features formed after the integration of identity features are more discriminative,enhance the robustness of the facial expression recognition model to the identity change,and improve the facial expression recognition accuracy.(3)Aiming at the problems that the way of extracting local features in the existing methods are too complicated and the local-global complementary effect is not effectively used,a facial expression recognition method combining global features and local features of the region-of-interest is proposed.Firstly,an attention map generator is designed to obtain a set of attention maps indicating regions-of-interest under weak supervision.Secondly,a bilinear attention pooling is used to generate and refine local features,and a selective feature unit is designed,which allows adaptive weighted fu-sion of global and local features before classification.In addition,the local contrastive loss and global contrastive loss are defined and used to improve inter-class dispersion and intra-class compactness at different granularities.Experimental results show that this method can accurately locate the region-of-interest.Compared with using a single global or local feature,it can significantly improve the recognition accuracy.(4)Aiming at the problems that most existing methods need the assist of additional information to extract local features,and the granularity of the extracted local features is relatively single,a facial expression recognition method using multi-granular fea-tures based on a refined horizontal pyramid network is proposed.Firstly,a horizontal pyramid network is designed,the local features uniformly divided on the feature map are used for classification under different horizontal pyramid scales.It effectively uses the distinguishing ability of each part of the face.Secondly,a refinement mechanism is added to the horizontal pyramid network to improve the outliers generated in the local region due to the uniform division,so that the refined local region has stronger feature consistency.Experimental results on multiple datasets show that the refined horizon-tal pyramid network can greatly improve the recognition accuracy without using any additional local supervision information.

  • 【分类号】TP391.41
  • 【被引频次】13
  • 【下载频次】1362
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络