节点文献

说话人确认中的录音回放攻击检测研究

Research on Replay Spoofing Attack Detection in Automatic Speaker Verification

【作者】 陈敏

【导师】 俞一彪;

【作者基本信息】 苏州大学 , 信息与通信工程, 2021, 硕士

【摘要】 作为一种生物识别技术,说话人确认(Automatic Speaker Verification,ASV)系统因具有便捷性和非接触性等特点而被广泛应用于金融系统、智能门禁系统、安全系统等各类需要身份信息验证的领域。但ASV系统极易受到仿冒欺骗攻击,其中,录音回放攻击是ASV系统在实际应用时面临的最大挑战,其特点是仿冒逼真、简单易实现、成本较低。因此,为了保证ASV系统的安全性,研究录音回放攻击检测极其重要。本文分析了真实语音和录音回放语音输入信道的构成及其频率响应特性,探索了三种特征提取方法应用于录音回放攻击检测。第一种方法基于线性预测残差,包括残差逆梅尔倒谱系数(Residual IMel Frequency Cepstral Coefficient,RIMFC)特征和残差相位系数(Residual Phase Coefficient,RPC)特征,能够从线性预测残差角度放大真实语音和录音回放语音的差异。第二种方法是对角双谱系数(Diagonal BiSpectrum Coefficient,DBSC)特征提取,该特征参数基于双谱分析的高维度频域信息表达能力,结合对角切片双谱计算和IMel滤波,能够强调真实语音和录音回放语音的高频差异。第三种方法是斜率系数(Slope Coefficient,SC)特征提取,该特征参数能反映滤波器组能量的动态变化,通过同等强调所有共振峰表征真实语音和录音回放语音在高频段的差异。录音回放攻击检测实验不针对特定说话人且与文本无关,采用ASVspoof2017 2.0语料库。实验表明,RPC特征在基于线性预测残差的两种特征中具有较好的检测性能,融合RIMFC特征与RPC特征得到的新特征、DBSC特征和SC特征对应的录音回放攻击检测等错误率(Equal Error Rate,EER)分别为23.90%、24.45%和22.29%,相比基线系统使用的常数Q倒谱系数(Constant Q Cepstral Coefficient,CQCC)特征的EER分别相对下降了 16.14%、14.21%和21.79%。结果说明,本文提出的新特征参数能够有效表达真实语音和录音回放语音的差异,大幅提升了录音回放攻击检测性能。

【Abstract】 As a biometric technology,Automatic Speaker Verification(ASV)system can be widely used in financial systems,intelligent access control systems,security systems and other fields requiring identity verification due to its convenience and non-contact characteristics.However,the ASV system is limited to attack,among which the replay spoofing attack is the biggest challenge in the practical application of ASV system,which is characterized by realistic,simple and easy to implement,and low cost.Therefore,in order to ensure the security of ASV system,it is very important to study the replay spoofing attack detection.Based on the analysis of the composition and frequency response characteristics of genuine speech and spoofing speech input channels,three kinds of feature extraction methods are proposed for replay spoofing attack detection.The first method is based on linear prediction residuals,including Residual IMel Frequency Cepstral Coefficient(RIMFC)and Residual Phase Coefficient(RPC),which amplifies the differences between genuine speech and spoofing speech from the perspective of linear prediction residuals.The second method is the feature extraction of Diagonal BiSpectrum Coefficient(DBSC),which is based on the high-dimensional frequency domain information expression ability of bispectrum analysis,combined with diagonal bispectrum calculation and IMel filtering to emphasize the highfrequency differences between genuine speech and spoofing speech.The third method is the feature extraction of Slope Coefficient(SC).This feature can reflect the dynamic change of filter bank energy.By emphasizing all formants equally,it reflects the difference between genuine speech and spoofing speech in high frequency band.The replay spoofing attack detection experiment is based on ASVspoof2017 2.0 corpus,and the experiment in this thesisis not specific to the speaker and has nothing to do with the text.The results show that RPC feature of the two features based on linear prediction residuals has the best detection performance,and the Equal Error Rate(EER)of the new feature obtained by combining RIMFC feature with RPC feature,DBSC feature and SC feature is 23.90%,24.45%and 22.29%,respectively.Compared to the Constant Q Cepstral Coefficient(CQCC)used in the baseline system,the EER is relatively reduced by 16.14%,14.21%and 21.79%.The results show that the new features can express the differences between genuine speech and spoofing speech effectively,and greatly improve the replay spoofing attack detection performance.

  • 【网络出版投稿人】 苏州大学
  • 【网络出版年期】2023年 04期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络