èŠ‚ç‚¹æ–‡çŒ®

TTSè¯éŸ³å•å…ƒè¾¹ç•Œçš„è‡ªåŠ¨åˆ‡åˆ†

Automatic Segmentation for TTS Units

æŽ¨è CAJä¸‹è½½
PDFä¸‹è½½
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€Authorã€‘ WANG Li-juan, CAO Zhi-gang (1 State Key Laboratory of Microwave and Digital Communications, Department of Electronic Engineering, Tsinghua University, Beijing 100084)

ã€æœºæž„ã€‘ æ¸…åŽå¤§å¦ç”µåå·¥ç¨‹ç³»å¾®æ³¢ä¸Žæ•°å—é€šä¿¡æŠ€æœ¯å›½å®¶é‡ç‚¹å®žéªŒå®¤ï¼› æ¸…åŽå¤§å¦ç”µåå·¥ç¨‹ç³»å¾®æ³¢ä¸Žæ•°å—é€šä¿¡æŠ€æœ¯å›½å®¶é‡ç‚¹å®žéªŒå®¤ åŒ—äº¬100084ï¼› åŒ—äº¬100084ï¼›

ã€æ‘˜è¦ã€‘ è¯éŸ³å•å…ƒè¾¹ç•Œçš„å‡†ç¡®åˆ‡åˆ†å¯¹åŸºäºŽæ³¢å½¢æ‹¼æŽ¥çš„è¯éŸ³åˆæˆç³»ç»Ÿè‡³å…³é‡è¦ã€‚æ–‡ç« é‡‡ç”¨äº†ä¸¤æ¥åˆ‡åˆ†æ–¹æ³•,ç¬¬ä¸€æ¥ä¸å…ˆç”±åŸºäºŽHMMæ¨¡åž‹çš„å¼ºåˆ¶å¯¹é½æ–¹æ³•å¾—åˆ°åˆå§‹çš„è¾¹ç•Œ,åœ¨ç¬¬äºŒæ¥ä¸æå‡ºç”¨åŸºäºŽå‰åŽéŸ³ç´ çš„è¾¹ç•Œæ¨¡åž‹æ¥ä¿®æ£åˆå§‹è¾¹ç•Œã€‚ä¸ºè§£å†³è®ç»ƒæ•°æ®ä¸è¶³çš„é—®é¢˜,æå‡ºç”¨åˆ†ç±»ä¸Žè¡°é€€æ ‘å°†å‰åŽå› ç´ å‘éŸ³ç›¸è¿‘çš„è¾¹ç•Œæ¨¡åž‹è¿›è¡Œèšç±»ã€‚è¿™æ ·å¯ä»¥æ ¹æ®è®ç»ƒæ•°æ®çš„å¤šå°‘,åŠ¨æ€è°ƒèŠ‚è¾¹ç•Œæ¨¡åž‹çš„æ•°ç›®,ä»¥ä¿è¯æ¨¡åž‹è®ç»ƒçš„å¯é æ€§ã€‚åœ¨å¯¹ä¸æ–‡è¯éŸ³åº“çš„å®žéªŒä¸,è‡ªåŠ¨åˆ‡åˆ†çš„å‡†ç¡®åº¦ç”±78.7%æé«˜åˆ°91.5%ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ Correct unit segmentation are, though laborsome, very crucial to the performance of a concatenation based TTS system. This paper suggests a two-step procedure for automatic unit segmentation, which coarsely segments speech data in the first step and refines segment boundaries in the secord step. A new Context-Dependent Boundary Model (CDBM) to describe the evolution across the segment boundary is proposed. To reduce manual segmentation, Classification and Regression Tree(CART) is used to structure the available data into a more efficient usage. Acoustically similar boundaries are clustered together and corresponding tied CDBM models are thus trained and used for boundary refinement during the secord step. After a series of experiments, the optimal CDBM parameters and the training conditions are found. The segmentation accuracy is raised from 78.7% to 91.5% in Mandarin syllable segmentation with about 1,000 manually segmented sentences as CDBM training data.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ å‰åŽéŸ³ç´ ç›¸å…³ï¼› è¾¹ç•Œæ¨¡åž‹ï¼› åˆ†ç±»ä¸Žè¡°é€€æ ‘ï¼› è‡ªåŠ¨åˆ‡åˆ†ï¼› TTSï¼›
ã€Key wordsã€‘ Context-dependent boundary modelï¼› CARTï¼› Automatic segmentationï¼› TTSï¼›

ã€æ–‡çŒ®å‡ºå¤„ã€‘ å¾®ç”µåå¦ä¸Žè®¡ç®—æœº ,Microelectronics & Computer , ç¼–è¾‘éƒ¨é‚®ç®± ,2005å¹´12æœŸ

ã€åˆ†ç±»å·ã€‘TN912.3
ã€è¢«å¼•é¢‘æ¬¡ã€‘6
ã€ä¸‹è½½é¢‘æ¬¡ã€‘184

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

TTSè¯­éŸ³å•å…ƒè¾¹ç•Œçš„è‡ªåŠ¨åˆ‡åˆ†

Automatic Segmentation for TTS Units

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

TTSè¯éŸ³å•å…ƒè¾¹ç•Œçš„è‡ªåŠ¨åˆ‡åˆ†