èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽé—´éš”ç†è®ºçš„åºåˆ—æ•°æ®æŒ–æŽ˜ç ”ç©¶

Time Series Data Mining Based on Large Margin Theory

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ äºŽéœ„ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ å“ˆå°”æ»¨å·¥ä¸šå¤§å¦ ï¼Œ æŽ§åˆ¶ç§‘å¦ä¸Žå·¥ç¨‹ï¼Œ 2012ï¼Œ åšå£«

ã€æ‘˜è¦ã€‘ æ—¶é—´åºåˆ—æ•°æ®åœ¨å„ä¸ªé¢†åŸŸå¹¿æ³›å˜åœ¨ï¼Œåºåˆ—æ•°æ®çš„åˆ†æžå’Œæ•°æ®æŒ–æŽ˜ç ”ç©¶æˆä¸ºç§‘å¦ç ”ç©¶é¢†åŸŸæŒç»å…³æ³¨çš„çƒç‚¹ã€‚åºåˆ—æ•°æ®çš„çŸ¥è¯†å‘çŽ°å› æ•°æ®çš„é«˜ç»´åº¦ã€æ—¶é—´ç»´åº¦ä¿¡æ¯çš„éžç‹¬ç«‹ç‰ç‰¹æ€§ï¼Œå¯¼è‡´ä¿¡æ¯å¾ˆéš¾å¾—åˆ°æœ‰æ•ˆåˆ©ç”¨ï¼Œè®¸å¤šä¼ ç»Ÿæœºå™¨å¦ä¹ ç®—æ³•éš¾ä»¥å–å¾—ç†æƒ³æ•ˆæžœã€‚é’ˆå¯¹æ—¶åºæ•°æ®ç‰¹æ®Šæ€§ï¼Œè¿ç”¨æœºå™¨å¦ä¹ ä¸çš„å¤§é—´éš”ç†è®ºï¼Œå¯¹æ—¶åºæ•°æ®æŒ–æŽ˜çš„å‡ ä¸ªé—®é¢˜è¿›è¡Œäº†ç ”ç©¶ï¼Œå…·ä½“ä»Žä»¥ä¸‹å‡ ä¸ªæ–¹é¢è¿›è¡Œäº†ç ”ç©¶ï¼šè®¾è®¡äº†åŸºäºŽé—´éš”çš„æ—¶é—´åºåˆ—ç›¸ä¼¼åº¦é‡æ–¹æ³•ã€‚ç›¸ä¼¼åº¦é‡ä½œä¸ºæœºå™¨å¦ä¹ çš„æ ¸å¿ƒé—®é¢˜ï¼Œç›´æŽ¥å…³ç³»åˆ°ç®—æ³•åœ¨æ—¶åºæ•°æ®æŒ–æŽ˜ä¸çš„æ•ˆæžœã€‚ä¸åŒçš„æ—¶åºé—®é¢˜æ™®éå˜åœ¨å½¢å¼å¤šæ ·çš„ç›¸ä½åç§»çŽ°è±¡ï¼Œæœ¬æ–‡è®¾è®¡äº†åŸºäºŽé—´éš”ç†è®ºçš„åŠ¨æ€æ—¶é—´å¼¯æ›²ç›¸ä¼¼åº¦é‡çº¦æŸå¦ä¹ æ–¹æ³•ã€‚ç›¸æ¯”çŽ°æœ‰æ¬§æ°æˆ–è€…åŠ¨æ€æ—¶é—´å¼¯æ›²è·ç¦»ç‰ç›¸ä¼¼åº¦é‡ä½“ç³»ï¼Œæ”¹è¿›äº†åºåˆ—æ‰æ›²çš„åŒ¹é…ç–ç•¥ã€‚é’ˆå¯¹è·ç¦»é›†ä¸é—®é¢˜ï¼Œé€šè¿‡åŸºäºŽé—´éš”çš„èŒƒæ•°å¦ä¹ çš„æ–¹æ³•æ¥åŠ å¼ºåº¦é‡å‡½æ•°åœ¨é«˜ç»´ç©ºé—´ä¸‹çš„æœ‰æ•ˆæ€§ã€‚è®¾è®¡äº†æ—¶åºç‰¹å¾ç‰‡æ®µæå–åŠåŸºäºŽç‰‡æ®µçš„åˆ†ç±»ç®—æ³•ã€‚æ—¶åºæ•°æ®æŒ–æŽ˜çš„éš¾ç‚¹ä¹‹ä¸€ï¼Œå°±æ˜¯æœ‰æ•ˆåˆ¤åˆ«ä¿¡æ¯å¸¸å¸¸éšè—äºŽå±€éƒ¨çš„ç‰‡æ®µè€Œä¸æ˜¯æ•´ä¸ªåºåˆ—åŒºåŸŸï¼Œè¿™ä¸€çŽ°è±¡å¸¸å¸¸å˜åœ¨äºŽå›¾åƒè¾¹ç¼˜è¿åŠ¨è½¨è¿¹ç‰åºåˆ—é—®é¢˜ã€‚æœ¬æ–‡è®¾è®¡äº†é’ˆå¯¹åºåˆ—çš„ç‰¹å¾ç‰‡æ®µæŠ½å–æ–¹æ³•ï¼Œé€šè¿‡å„ä¸ªç‰‡æ®µæœ‰æ•ˆä¿¡æ¯çš„å¯¹æ¯”ï¼Œé€‰æ‹©åˆ¤åˆ«èƒ½åŠ›æœ€å¤§çš„è‹¥å¹²ç‰‡æ®µæ¥è¡¨å¾æ•´ä¸ªåºåˆ—ã€‚è¿™ç§åŸºäºŽç‰‡æ®µçš„ç‰¹å¾æå–/æ•°æ®é‡æ–°è¡¨è¾¾æ–¹æ³•ä¸Žä¼ ç»Ÿæ–¹æ³•ç›¸æ¯”ï¼Œç‰¹åˆ«é€‚ç”¨å›¾åƒè¾¹ç¼˜æˆ–è¿åŠ¨è½¨è¿¹æ›²çº¿å¾—åˆ°çš„åºåˆ—æ•°æ®ï¼Œæé«˜äº†åˆ†ç±»ç²¾åº¦ã€æ•ˆçŽ‡å’Œå¯è§£é‡Šæ€§ã€‚åŒæ—¶ä¸ŽåŒç±»çŸ¥åç®—æ³•shapeletè¿›è¡Œäº†å¯¹æ¯”ï¼Œå®žéªŒéªŒè¯äº†è¯¥ç®—æ³•çš„åˆ†ç±»æ€§èƒ½ã€‚æå‡ºåŸºäºŽé—´éš”çš„åºåˆ—ç²—ç²’åŒ–è¡¨è¾¾ç®—æ³•ã€‚ç ”ç©¶äº†åºåˆ—æ•°æ®ä»Žæ•°å€¼åˆ°ç¬¦å·çš„è½¬åŒ–ä¸ï¼Œæœ‰æ•ˆä¿¡æ¯åŠæ— æ•ˆä¿¡æ¯çš„å˜åŒ–å…³ç³»ã€‚å‘çŽ°æ•°æ®å½¢å¼çš„å˜æ¢è¿‡ç¨‹è™½ç„¶ä¼šé€ æˆéƒ¨åˆ†æœ‰æ•ˆä¿¡æ¯çš„æŸå¤±ï¼Œä½†ä¹Ÿä¼šå¸¦æ¥äº†æ— æ•ˆæ•°æ®çš„çº¦ç®€ã€‚æå‡ºåŸºäºŽé—´éš”çš„æœ‰ç›‘ç£çš„åºåˆ—æ•°æ®ç²—ç²’åŒ–æ–¹æ³•ï¼Œæé«˜äº†åˆ†ç±»ç²¾åº¦å’Œæ•ˆçŽ‡ï¼Œå¹¶é€šè¿‡å®žéªŒéªŒè¯ã€‚è®¾è®¡äº†åŸºäºŽå¤§é—´éš”å…³é”®æ¡ˆä¾‹åŠ æƒçš„æ—¶åºåˆ†ç±»æ¨¡åž‹ã€‚é€šè¿‡ç»™ç¦»ç¾¤ç‚¹å’Œå†—ä½™æ ·æœ¬ä»¥è¾ƒä½Žçš„æƒå€¼ï¼Œæé«˜åˆ†ç±»æ¨¡åž‹çš„æ³›åŒ–èƒ½åŠ›ï¼Œé€šè¿‡å‡å°‘å†—ä½™è®ç»ƒæ ·æœ¬è¿˜èƒ½æé«˜åˆ†ç±»æ¨¡åž‹çš„è®¡ç®—æ•ˆçŽ‡ã€‚è®¾è®¡å…³é”®æ ·æœ¬é›†æ—¶ï¼Œåˆ©ç”¨å¤§é—´éš”ç†è®ºè¯„ä»·æ¯ä¸ªæ ·æœ¬çš„æ•ˆèƒ½ï¼Œå¢žåŠ èƒ½äº§ç”Ÿæœ€å¤§å‡è®¾é—´éš”çš„æ ·æœ¬çš„æƒå€¼ï¼Œå‡å°ç¦»ç¾¤ç‚¹å’Œå†—ä½™æ ·æœ¬çš„æƒå€¼ï¼Œæé«˜äº†åˆ†ç±»æ¨¡åž‹çš„æ³›åŒ–èƒ½åŠ›ã€‚æœ€åŽé€šè¿‡å®žéªŒéªŒè¯äº†è¿™ä¸€æ€æƒ³æ–¹æ³•çš„æœ‰æ•ˆæ€§ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ The time series have been widely used in each field. Sequence data analysis anddata mining become hot spots and continuous attention has been paid in scientific area.Data of high dimension and features such as non-independent of time dimension informa-tion lead to the difficulty of using information effectively in knowledge discovery fromsequence data. Therefore, many traditional machine learning algorithms can not readilyobtain satisfactory results. Aiming at the particularity of the time series data, the largemargin theory in machine learning is adopted to study the time series data mining in thisdissertation. Some of the important problems are as follows:A sequential similarity measure method is designed based on the large margin the-ory. As a core problem in machine learning, similarity measure directly relates to theeffect of algorithm in the time series data mining. According to various phase shift phe-nomena commonly existed in sequential sample, a dynamic time warping similarity mea-sure method is designed based on the large margin theory. Compared with the Euclideanor dynamic time warping distance, the matching strategy for sequence distortion is im-proved. As for the distance instability phenomenon of high-dimensional data measure,the effectiveness of distance measurement is optimized through the norm learning.The feature extraction of supervised learning/data re-expression algorithm is de-signed based on the sequential characteristics of fragments. One of difficulties in thetime series data mining is that the effective identification information is often hidden inthe local sequence fragments rather than the entire area. This phenomenon often existsin sequence problems such as the trajectory from image edge. By contrasting variousfragments of useful information, several fragments with the largest discriminant capacityare selected to represent the entire sequence. Compared with traditional methods, thisfragment-based feature extraction/data re-expression method is especially suited for thetrajectory from the edge or sequence data obtained by curve. It also can improve theclassification accuracy, efficiency, and interpretability. Besides that, this method is com-pared with the well-known similar algorithm shapelet.The classification performance ofthis model is verified by the experiment.The sequence coarse graining algorithm is proposed based on the large margin the- ory. The changing relationship between useful and useless information is studied duringthe transformation of sequence data from values to symbols. Although some useful infor-mation is lost during the transformation, useless information is also reduced significantly.A supervised discretization method of sequence data is proposed to improve the classifi-cation accuracy and efficiency, which is also verified by the experiment.The sequential classification model is designed based on critical cases. During thedesign of critical sample set, the efficiency of each sample is evaluated by using the largemargin theory. The weights of samples which can produce the largest assumptions mar-gin are increased while the weights of outliers and the redundant samples are decreased.Those above can improve the generalization ability of the classification model. In addi-tion, the computational efficiency of classification model can be improved by reducingredundant training samples. The validity of this method is confirmed by the experiment.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ æ—¶åºæŒ–æŽ˜ï¼› å¤§é—´éš”ç†è®ºï¼› åŠ¨æ€æ—¶é—´å¼¯æ›²ï¼› ç‰¹å¾ç‰‡æ®µï¼› æ¡ˆä¾‹é€‰æ‹©ï¼›
ã€Key wordsã€‘ Time series data miningï¼› large marginï¼› dynamic time warpingï¼› feature seg-mentï¼› prototype selectionï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ å“ˆå°”æ»¨å·¥ä¸šå¤§å¦

ã€åˆ†ç±»å·ã€‘TP311.13
ã€è¢«å¼•é¢‘æ¬¡ã€‘5
ã€ä¸‹è½½é¢‘æ¬¡ã€‘806
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽé—´éš”ç†è®ºçš„åºåˆ—æ•°æ®æŒ–æŽ˜ç ”ç©¶

Time Series Data Mining Based on Large Margin Theory

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

åŸºäºŽé—´éš”ç†è®ºçš„åºåˆ—æ•°æ®æŒ–æŽ˜ç ”ç©¶