节点文献

基于深度学习的工业机械手语音控制方法研究

Research on Deep Learning-based Voice Control Method for an Industrial Manipulator

【作者】 李莹莹

【导师】 肖南峰; 林巧彬;

【作者基本信息】 华南理工大学 , 工程硕士(专业学位), 2018, 硕士

【摘要】 随着我国智能制造业的迅速发展,各类造业企业传统的手工操作已经不能满足目前的生产需求,工业机器人正在加速地替代生产工人及其手工操作。为了实现工业机器人的智能化,首先要求工业机器人能够摆脱对设备的复杂操作并与工人进行自然交流,而自然交流就要求工业机器人能听懂和理解工人的语言。因此,通过语音控制工业机器人极其重要。本论文针对现有语音识别系统鲁棒性不强等缺点,提出了使用深度学习目标检测算法来识别语谱图的语音识别方法,并将识别结果应用于工业机器人语音控制中,具体研究工作包括以下方面:1.根据研究目的与内容,本论文确定孤立词和多字词语音指令集内容,并录制音频样本,录制环境为静音环境,测试环境分为静音环境和噪音环境。由于生产机器运行声音频率远远高于工人正常发声频率,而常用语音识别方法只考虑时域信息而忽略频域信息,因此本论文使用短时傅里叶算法对音频样本进行变换,以便得到兼顾时域与频域的干净语谱图,并使用归一化处理增强语谱图的显示效果。2.针对现有语谱图识别使用深度卷积神经网络和全卷积神经网络方法,本论文提出使用深度学习目标检测方法识别语谱图,使得计算过程只关注有效的时频交叉区域,而忽略时域上的发音空白和频域上的高频噪音,针对噪音频率高于工人发声频率的环境识别效果更好。本论文在实验中使用多种检测模型对语谱图数据集进行训练实验对比,得到安静环境下的识别错词率基本在10%以下,噪音环境下的识别错词率基本在15%以下,说明本论文的方法对于语谱图识别具有一定的作用,且鲁棒性较强。此外,根据目标检测算法Faster RCNN计算量较大等缺点,本论文改进了网络结构,降低了计算难度。3.本论文使用目标检测算法识别语谱图的语音识别系统,在训练开始之前需要对检测对象的尺度进行先验框选择,这有助于提高预测准确率。由于语谱图有效区域的尺度比例与日常所见物体的尺度比例相差较大,故本论文使用机器学习聚类算法k-means对预选框进行聚类,以便得到各个检测模型效果最好的先验框。4.本论文设计了工业机器人语音控制系统,使用ROS仿真软件进行模拟实验,将目标检测语音识别方法得到的文字传送到ROS系统,控制工业机器人做出相应的生产动作,实验结果表明本论文提出的方法具有良好的实用性。

【Abstract】 Along with the high speed development of intelligent making techniques,the traditional artificial operation ways can no longer meet the current production demand,therefore the industrial robots are gradually replacing workers.To realize the real intelligence of the industrial robots,the premise of the communication between the industrial robots and the workers is that industrial robots can understand human’s language.Therefore,through voice to control the industrial robot’s behavior is first step.Since the shortcomings of the existing speech recognition systems,this thesis proposes a target detection method to recognize the spectrogram for the speech recognition,and applies the recognition results to the intelligent control of the industrial robots.The details of the work are as follows:1.According to the purpose and content of this thesis,the content of the isolated word instruction set is determined and the audio samples are recorded.And the recording environment is quiet environment.Since there are great differences between the human’s voices and the noises,speech recognition can be achieved by recognizing the spectrogram.2.The traditional speech recognition systems have poor robustness,and they mainly focus on the analysis of the time dimension.Therefore,this thesis proposes a target detection method to recognize the spectrogram.The proposed method only focuses on the local interest region,which filters the noises that has a great impact on the recognition performance.The recognition accuracy in the quiet environment is basically more than 90%,and the recognition accuracy in the noisy environment is basically more than 85%.3.The speech recognition system using the target detection algorithm to recognize the spectrogram needs to select the pre-selected box of the test object before training,which helps to improve the forecast accuracy.Since the effective area scale of the spectrogram differs greatly from the scale of the objects seen daily,therefore,the machine learning clustering algorithm k-means is used to cluster the pre-selected boxes.4.The simulations of the industrial robot voice control system is designed on the ROS simulation software.The words obtained by the speech recognition based on the target detection method are transmitted to the ROS system and the industrial robot is controlled to make the corresponding actions,which proves the practicality of the methods in this thesis.

  • 【分类号】TN912.3;TP241
  • 【被引频次】1
  • 【下载频次】270
  • 攻读期成果
节点文献中: