节点文献
基于人体运动时序分析的拉班舞谱自动生成算法研究
Research on Automatic Labanotation Generation Based on Time Series Analysis of Human Motion
【作者】 李敏;
【导师】 苗振江;
【作者基本信息】 北京交通大学 , 信号与信息处理, 2022, 博士
【摘要】 传统民间舞蹈是人类非物质文化遗产的重要组成部分。利用拉班舞谱记录和保存传统民间舞蹈,是保护珍贵非物质文化遗产的有效手段。拉班舞谱是目前最为通行的记录人类舞蹈动作的符号系统之一,在舞蹈的记录、再现、教学和交流等方面具有重要应用。然而,现在大多数拉班舞谱都是通过手工绘制得到的,记谱过程需要专业记谱人员花费大量时间和精力进行观察、绘制和校对。因此,利用计算机技术自动生成拉班舞谱具有重要的应用意义和研究价值。目前主流的拉班舞谱自动生成方法首先利用运动捕捉技术获取人体运动数据,而后通过分析和识别人体运动捕捉数据来生成舞谱。该任务输入的是人体运动捕捉数据随时间变化的序列,最终按时间顺序输出舞谱符号序列。因此,本文结合时序分析方法,对利用人体运动数据自动生成拉班舞谱的若干算法开展研究,提出了一系列针对性的运动特征提取方法和时序分析模型,有效提升了拉班舞谱自动生成的准确度。主要研究工作可概括为以下四个方面:(1)提出基于数据分割的拉班舞谱自动生成算法。对于人工分割数据,为克服模板匹配不够灵活的缺陷,我们提出了一种基于隐马尔可夫模型时序分析的方法来实现从运动数据片段自动生成拉班舞谱。首先,为了应对舞蹈的多样动作形式、不同舞者体型和运动捕捉数据中的噪声等挑战,我们提出了一种新的动作特征,该特征对人体测度和身体朝向具有不变性。然后,我们利用隐马尔可夫模型分析肢体运动的时间动态特性,并将每个下肢动作映射到相应的拉班符号。更进一步地,为了节省人力,我们提出了一个全自动分割数据的拉班舞谱自动生成框架。首先,根据拉班记谱法的重心转移理论,将连续的运动捕捉数据分割为只含有一个动作的数据片段。然后,我们使用具有一维卷积层和门控循环单元层的神经网络来识别分割的数据片段,并得出对应的拉班符号。(2)提出基于卷积循环注意力序列模型的拉班舞谱自动生成算法。为了避免数据分割方法对拉班舞谱自动生成准确率的影响,我们提出了一种使用融合特征的卷积循环注意力序列模型,用于直接从连续运动时间序列数据生成高质量的拉班舞谱。首先,我们将骨骼特征和李群特征进行融合,以使特征不仅能够提取相邻关节之间的骨骼信息,而且能够学习连接骨骼之间的相对几何关系。然后,在序列学习模型中,我们利用卷积循环神经网络学习运动捕捉数据的时空特征表示,并利用注意力机制学习输入的运动特征序列和输出的符号序列之间的对应关系,从而准确地生成拉班符号序列。(3)提出基于图卷积注意力序列模型的拉班舞谱自动生成算法。为了充分学习骨骼数据的运动特征信息,我们提出了一个新的基于图卷积的注意力序列模型,用于分析人体运动时序数据,并实现可靠而高效的拉班舞谱自动生成。在编码器中,我们提出了一种新的姿势敏感的图卷积网络,该网络通过学习关节自适应权重和有意义的非物理性连接,来学习运动捕捉数据中的时间和空间模式。在解码器中,我们利用运动的节奏信息,提出了一种新的基于节奏感知的注意力机制来学习运动数据序列和拉班符号序列之间的对应关系,从而在预测目标拉班符号时更有效地引导网络的注意力权重集中于输入序列的相关部分,而无需搜索整个输入数据序列,以实现高效准确的拉班舞谱生成。(4)提出基于图注意力Transformer模型的拉班舞谱自动生成算法。为了有效捕捉灵活的肢体动作并处理复杂的舞步时间过程,我们提出了一个基于图注意力Transformer的模型(LabanFormer)。首先,我们提出一个多尺度图注意力网络,学习每两个关节之间的特征相关性,并在多个尺度上聚合相邻关节的特征,以捕捉灵活的肢体动作。其次,我们提出一种新的基于门控循环网络位置编码的Transformer模型,该模型可以学习多尺度图注意力网络输出的特征序列中的全局依赖关系。门控循环位置编码模块能够处理不同长度的时间序列,并利用可学习的参数对序列位置信息进行编码。这样,本文提出的LabanFormer模型能够捕捉舞蹈中具有周期性、对称性或重复性的舞步。经过训练,所提出的模型对输入的运动捕捉数据序列能够准确解码生成相应的拉班符号。我们对所提出的拉班舞谱自动生成算法在真实数据集上进行了充分的实验。大量实验结果表明,本文提出的多个基于人体运动时序分析的拉班舞谱自动生成算法均能取得良好的生成性能,且准确率逐个提升。与当前优秀的算法相比,本文提出的算法可以取得相似或者更好的结果,从而为推动民族民间舞蹈的保护进程做出贡献。
【Abstract】 Traditional folk dances are an important part of the local intangible cultural heritages.Recording traditional folk dances with Labanotation scores is a good means of protecting the intangible cultural heritages.Labanotation is one of the most widely used notation systems to record human dance movements in the preservation,reproduction,education and communication of dances.However,most of Labanotation scores are written by hand,which requires a huge amount of time and effort to observing,drawing and proofreading even for professionals.Therefore,using computer technology to automatically generate Labanotation scores is of great value in application and research.At present,the mainstream methods of automatic Labanotation generation first obtain human motion data via motion capture technologies,and then generate Labanotation scores based on analyzing and recognizing the captured motion data.This task takes the human motion data sequence that varies along time as inputs,and outputs Laban symbol sequences along the time order.Therefore,this dissertation investigates temporal sequential analysis methods,focuses on the research of antomatically generating Labanotation scores based on human motion data,proposes a number of specialized motion feature extraction algorithms and temporal analyzing models,and significantly improves the accuracy of Labanotation generation.The main research work can be summarized as follows:(1)We propose a method that automatically generates Labanotation scores based on data segmentation.For manually segmented data,in order to overcome the drawback of template matching methods that is not flexible enough,we propose a method based on time series analysis with hidden Markov model for the automatic Labanotation generation.First,in order to deal with the challenges including various dance forms,different dances’ body shapes and noises in motion capture data,we propose a new feature,which is invariant to human body measurement and body orientation.Then,we apply hidden Markov model to analyze the temporal dynamic characteristics of limb movements and map each limb movement to the corresponding Laban symbol.Furthermore,in order to save manpower,we propose an automatic generation framework of Labanotation scores based on fully automatic data segmentation.First,according to the center of body gravity transferring theory of Labanotation,the continuous motion capture data are divided into data segments each containing only one movement.Then,we use a neural network with one-dimensional convolution layer and recurrent layer to recognize the data segments and obtain the corresponding Laban symbols.(2)We propose a method that generates Labanotation scores automatically based on convolutional recurrent attention sequence model.In order to eliminate the influence of erroneous data segmentation on the automatic Labanotation generation,we propose an attention sequence learning model based on convolutional recurrent networks with fusion features to generate reliable Labanotation scores directly from continuous temporal sequences of motion data.First,we fuse the bone feature and Lie group feature,so that the fusion features can not only extract the bone information between adjacent joints,but also learn the relative geometric relationships between connected bones.Then,in the sequence learning model,we use convolutional recurrent networks to learn the spatio-temporal representation from motion capture data and employ an attention mechanism to learn a good alignment between the input motion feature sequence and the output symbol sequence.Finally,the correct Laban symbol sequences and Labanotation scores are generated.(3)We propose a method that generates Labanotation scores automatically based on graph convolutional networks and attention sequence learning model.In order to fully exploit the motion feature information from skeleton data,we propose a new attention sequence learning model based on graph convolution to analyze time series of human motion for the reliable and efficient automatic Labanotation generation.In the encoder,we propose a new gesture-sensitive graph convolutional network with learnable adaptive joint weights and non-physical connections to learn both spatial and temporal patterns from motion data sequences.In the decoder,we exploit motion rhythm information and propose a novel rhythm-aware attention mechanism to learn a good alignment between motion sequences and Laban symbol sequences,so that we can focus on relevant parts of the input motion sequence without searching in the whole input sequence when predicting a target Laban symbol.Therefore,we can generate Labanotation scores efficiently and accurately.(4)We propose a LabanFormer model that generates Labanotation scores automatically based on a graph attention network and the Transformer model.In order to effectively capture flexible limb movements and deal with the temporal sequences of complex dance steps,we propose a LabanFormer model based on graph attention network.First,we propose a multi-scale graph attention network(MS-GAT)that can capture flexible limb movements by learning feature correlations between every two joints and aggregating features of neighboring joints over multiple scales.Second,we propose a new Transformer model with a gated recurrent positional encoding(GRPE)module to learn the global temporal dependencies in the output feature sequences of MS-GAT.The novel GRPE module can encode position information with learnable parameters while handling time series of various lengths.As such,the periodic,symmetric,or repeated steps in dances can be effectively captured.After training,the proposed model can accurately decode motion capture data sequences and generate corresponding Laban symbols.We carried out sufficient experiments on two real-world datasets.A large number of experimental results show that the automatic Labanotation generation algorithms based on human motion time series analysis proposed in this dissertation can obtain favorable generation performance.The accuracy is progressively improved with the proposal of each method and the proposed algorithms perform favorably comparing with the stateof-the-arts.Therefore,we can contribute to the process of folk-dance protection.
【Key words】 Automatic Labanotation generation; Motion capturing; Time series analysis; Graph convolutional network; Graph attention network;