节点文献

THUYG-20:一个免费的维吾尔语语音数据库

THUYG-20:A Free Uyghur Speech Database

推荐 CAJ下载
PDF下载
不支持迅雷等下载工具，请取消加速工具后下载。

【摘要】语音数据资源是语音识别研究的基础。当前国内几乎没有开放的语音数据库供研究者免费使用,特别是在少数民族语音识别方面,数据资源更为贫乏。本文公开一个免费的维吾尔语连续语音数据库,该数据库包括约20小时的训练数据和1小时的测试数据。我们同时公开了构建维吾尔语语音识别系统所需要的音素集、词表、文本数据等相关资源,公开了用于构建基线系统的脚本,给出该基线系统在纯净测试数据和噪音测试数据上的识别性能。更多还原

【Abstract】 Speech data plays a fundamental role in the research of speech recognition. At present, there is not yet an open speech database available for researchers in China, especially for minor languages such as Uyghur. This paper publishes a Uyghur speech database which is totally open and free. The database consists of 20 hours of training speech and 1 hour of test speech, as well as all the resources that are requested to construct a full-fledged Uyghur speech recognition system, including the phone set, lexicon, and text data. A recipe that is used to construct the baseline system is also published, and the results of the baseline system are reported on two test sets, one involves clean speech and the other involves noisy speech.更多还原

【关键词】维吾尔语；语料库；语音识别； DNN；
【Key words】 Uyghur language； corpus； speech recognition； DNN；

【会议录名称】第十三届全国人机语音通讯学术会议(NCMMSC2015)论文集

【会议名称】第十三届全国人机语音通讯学术会议(NCMMSC2015)

【会议时间】2015-10-25
【会议地点】中国天津
【分类号】TN912.34;TP311.13

【主办单位】中国中文信息学会语音信息专业委员会

知网节下载

节点文献中：

本文链接的文献网络图示:

本文的引文网络

节点文献