èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽæ¡ä»¶éšæœºåœºçš„è‡ªåŠ¨åˆ†è¯æŠ€æœ¯çš„ç ”ç©¶

Study of Automatic Segmentation Technique Based on Conditional Random Fields

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ é™ˆæ™´ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ ä¸œåŒ—å¤§å¦ ï¼Œ è®¡ç®—æœºç³»ç»Ÿç»“æž„ï¼Œ 2005ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ éšç€ç§‘æŠ€çš„å‘å±•å’Œæµ·é‡ä¿¡æ¯çš„æ¶ŒçŽ°,ä¿¡æ¯å¤„ç†æŠ€æœ¯å·²ç»æˆä¸ºå½“ä»Šä¸–ç•Œå‘å±•ä¸å¯æˆ–ç¼ºçš„ä¸€éƒ¨åˆ†,è¦åœ¨æµ·é‡çš„ä¿¡æ¯ä¸æå–æœ‰ç”¨çš„çŸ¥è¯†,å°±å¿…é¡»è¦è®©æœºå™¨â€œè¯»æ‡‚â€è¿™äº›ç”±äººç±»è¯è¨€æ‰€æè¿°çš„ä¿¡æ¯,è€Œè¯æ˜¯æœ€å°çš„èƒ½å¤Ÿç‹¬ç«‹æ´»åŠ¨çš„æœ‰æ„ä¹‰çš„è¯è¨€æˆåˆ†ã€‚å°†è¯ç¡®å®šä¸‹æ¥æ˜¯ç†è§£è‡ªç„¶è¯è¨€çš„ç¬¬ä¸€æ¥,åªæœ‰è·¨è¶Šäº†è¿™ä¸€æ¥,æˆ‘ä»¬æ‰æœ‰å¯èƒ½å¯¹ä¿¡æ¯è¿›è¡Œæ›´æ·±å…¥çš„å¤„ç†,ä»¥è‡³äºŽè®©æœºå™¨ç†è§£äººç±»è¯è¨€ã€‚æœ¬å®žéªŒå®¤å¯¹æœºå™¨ç¿»è¯‘å’Œè‡ªç„¶è¯è¨€å¤„ç†çš„ç ”ç©¶,åœ¨å¾ˆå¤§çš„ç¨‹åº¦ä¸Šéƒ½è¦ä¾èµ–äºŽå¦‚åˆ†è¯ç‰åºåˆ—æ ‡è®°å’Œåˆ†å‰²çš„æŠ€æœ¯,ä»¥ä¾¿å‡å°‘é”™è¯¯çš„è”“å»¶,å¹¶è¿›è¡Œæ·±å…¥çš„ç ”ç©¶ã€‚ æ¡ä»¶éšæœºåœºæ˜¯è¿‘å¹´æ¥æå‡ºçš„,ç”¨äºŽæ ‡è®°å’Œåˆ†å‰²åºåˆ—æ•°æ®çš„æ¡ä»¶æ¦‚çŽ‡æ¨¡åž‹,ä¹Ÿæ˜¯åœ¨ç»™å®šè¾“å…¥èŠ‚ç‚¹æ¡ä»¶ä¸‹è®¡ç®—è¾“å‡ºèŠ‚ç‚¹çš„æ¡ä»¶æ¦‚çŽ‡çš„æ— å‘å›¾æ¨¡åž‹ã€‚å®ƒä¸éœ€è¦ä»¥éšé©¬å°”å¯å¤«æ¨¡åž‹ä¸ºä»£è¡¨çš„â€œç”Ÿæˆâ€æ¨¡åž‹é‚£æ ·çš„ä¸¥æ ¼ç‹¬ç«‹å‡è®¾,å¹¶å…‹æœäº†æœ€å¤§ç†µé©¬å°”å¯å¤«æ¨¡åž‹å’Œå…¶ä»–â€œéžç”Ÿæˆâ€æ¨¡åž‹æ‰€å˜åœ¨çš„æ ‡è®°åç½®çš„é—®é¢˜ã€‚è¯¥æ¨¡åž‹å¯ä»¥éžå¸¸å®¹æ˜“çš„å°†è¾“å…¥åºåˆ—ä¸çš„ä»»æ„ç‰¹å¾æˆ–æ˜¯è¯è¨€æœ¬èº«æ‰€å›ºæœ‰çš„ç‰¹å¾åŠ å…¥åˆ°æ¨¡åž‹ä¸,æˆ‘ä»¬ä¸ä»…å¯ä»¥å°†ä¼ ç»Ÿçš„HMMåºåˆ—æ¨¡åž‹çš„è½¬ç§»ç‰¹å¾å’Œå‘å°„ç‰¹å¾åŠ å…¥è¿›æ¥,è€Œä¸”ä¹Ÿå¯ä»¥å°†ä¸€äº›å…¶ä»–çš„ä¿¡æ¯åŠ å…¥è¿›æ¥,æ¯”å¦‚æž„è¯è§„åˆ™,é¢†åŸŸç‰¹å¾,è¯å…¸ä¿¡æ¯ç‰ç‰ã€‚ æœ¬æ–‡ç³»ç»Ÿçš„ä»‹ç»äº†æ¡ä»¶éšæœºåœºçš„å®šä¹‰ã€æ¨¡åž‹ç»“æž„ã€ç‰¹å¾å‡½æ•°ã€å‚æ•°ä¼°è®¡åŠå…¶è®ç»ƒæ–¹æ³•ç‰ã€‚å¹¶å°†æ¡ä»¶éšæœºåœºåº”ç”¨äºŽæ±‰è¯è‡ªåŠ¨åˆ†è¯,å¾—åˆ°äº†æ¯”ä»¥å¾€ç”¨äºŽåºåˆ—æ ‡è®°å’Œåˆ†å‰²çš„æ¨¡åž‹æ›´å¥½çš„æ•ˆæžœ,ä»Žå®žéªŒä¸ŠéªŒè¯äº†æ¡ä»¶éšæœºåœºåœ¨åºåˆ—æ ‡è®°å’Œåˆ†å‰²æ–¹é¢çš„ä¼˜åŠ¿;å¹¶åœ¨ä¸æ–æ·»åŠ ç‰¹å¾çš„æ¡ä»¶ä¸‹åº”ç”¨æ¡ä»¶éšæœºåœºè¿›è¡Œäº†å¤§é‡çš„å®žéªŒ,åœ¨å®žéªŒä¸,æ¡ä»¶éšæœºåœºè¡¨çŽ°å‡ºäº†éžå¸¸ä¼˜å¼‚çš„æ€§èƒ½ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ In company with the development of technology and the expansion of mass information, Information Processing Techniques have been one of the most important parts in technology developing in todayâ€™s world. To extract useful knowledge from the mass information, it must be possible to make machines "understand" the information formed by human languages. However, words are the least language elements which can be independently used and have real meaning. It is the first step to understand the natural language that to identify the words, only by achieved the first step, could it be possible to deal with the information in depth, even make the machines understand human languages. The researches of machine translation and natural language processing in our lab mostly depend on the technique of sequence labeling and segmenting, such as segmentation, so as to reduce the extension caused by errors , and to do more deep research.Conditional Random Fields (CRFs), a recently introduced conditioned probabilistic model for labeling and segmenting sequential data, is a undirected graph model that calculate the conditional probability over output nodes given the input nodes. It relaxes the strong independence assumptions which generative model must have, such as Hidden Markov Model, and overcomes the label-bias problem exhibited by Maximum Entropy Markov Model and other non-generative models. This model can easily incorporate arbitrary features of the input sequence and the implicit ones of the language in itself, and so we can not only introduce the transition and emission features in traditional HMM modeling, also introduce some other information, such as the rules of wordsâ€™ formation, domain features, lexicon etc.This text systematically introduces the definition of CRFs, structure of the CRFs model, feature functions, parameter estimate and training methods. Applying CRFs to Chinese automatic segmentation, we obtained a better performance in comparison with the model already used in sequence labeling and segmenting, and verified the advantages of the CRFs model in sequence labeling and segmenting by experiments;æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ æ¡ä»¶éšæœºåœºï¼› è‡ªåŠ¨åˆ†è¯ï¼› è‡ªç„¶è¯è¨€ç†è§£ï¼› æœ‰å‘å›¾ï¼› æ— å‘å›¾ï¼› éšé©¬å°”å¯å¤«æ¨¡åž‹ï¼› æœ€å¤§ç†µé©¬å°”å¯å¤«æ¨¡åž‹ï¼› å‚æ•°ä¼°è®¡ï¼›
ã€Key wordsã€‘ Conditional Random Fieldsï¼› Automatic Segmentationï¼› Natural Language Understandï¼› Directed graphï¼› Undirected graphï¼› Hidden Markov Modelï¼› Maximum Entropy Markov Modelï¼› Parameter estimateï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ ä¸œåŒ—å¤§å¦

ã€åˆ†ç±»å·ã€‘TP391.1
ã€è¢«å¼•é¢‘æ¬¡ã€‘55
ã€ä¸‹è½½é¢‘æ¬¡ã€‘1319

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽæ¡ä»¶éšæœºåœºçš„è‡ªåŠ¨åˆ†è¯æŠ€æœ¯çš„ç ”ç©¶

Study of Automatic Segmentation Technique Based on Conditional Random Fields

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

åŸºäºŽæ¡ä»¶éšæœºåœºçš„è‡ªåŠ¨åˆ†è¯æŠ€æœ¯çš„ç ”ç©¶