èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽæœ‰æ•ˆè·ç¦»çš„ç‰¹å¾æå–å’Œç‰¹å¾é€‰æ‹©ç®—æ³•ç ”ç©¶

Research on Feature Extraction and Feature Selection Algorithms Based on Effective Distance

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ å¼ ä¸¹ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ å—äº¬èˆªç©ºèˆªå¤©å¤§å¦ ï¼Œ è®¡ç®—æœºç§‘å¦ä¸ŽæŠ€æœ¯ï¼Œ 2017ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ åœ¨æœºå™¨å¦ä¹ å’Œæ¨¡å¼è¯†åˆ«é¢†åŸŸ,ç‰¹å¾æå–å’Œç‰¹å¾é€‰æ‹©æŠ€æœ¯å·²ç»æˆä¸ºäº†è§£å†³é«˜ç»´æ•°æ®çš„é‡è¦é€”å¾„,å¹¶ä¸”åœ¨ä¿¡æ¯æ£€ç´¢ã€æ–‡æœ¬åˆ†ç±»å’Œç–¾ç—…è¯Šæ–ç‰é¢†åŸŸéƒ½å¾—åˆ°äº†å¹¿æ³›çš„åº”ç”¨ã€‚ç ”ç©¶è¡¨æ˜Žå¤šæ•°çš„ç‰¹å¾æå–å’Œç‰¹å¾é€‰æ‹©ç®—æ³•éƒ½åˆ©ç”¨ç›¸ä¼¼æ€§æ¥è¡¡é‡æ ·æœ¬ä¹‹é—´çš„å…³ç³»,è€Œæ ·æœ¬ä¹‹é—´çš„ç›¸ä¼¼æ€§å¾€å¾€éƒ½æ˜¯ä½¿ç”¨ä¼ ç»Ÿçš„æ¬§æ°è·ç¦»è®¡ç®—ã€‚ç”±äºŽæ¬§æ°è·ç¦»çš„é™æ€æœ¬è´¨,å› æ¤å®ƒå¾€å¾€å¿½ç•¥äº†å‘¨å›´å…¶ä»–æ ·æœ¬å¯¹ç›®æ ‡æ ·æœ¬çš„å½±å“ä»¥åŠæ ·æœ¬ä¸Žæ ·æœ¬ä¹‹é—´æ½œåœ¨çš„åŠ¨æ€ç»“æž„ã€‚ä¸ºäº†å¯ä»¥å……åˆ†åæ˜ å‡ºæ ·æœ¬ä¹‹é—´è¿™ç§æ½œåœ¨çš„åŠ¨æ€ç»“æž„,æœ¬æ–‡æå‡ºåœ¨å…¨å±€æ‹“æ‰‘ç»“æž„å…³ç³»çš„åŸºç¡€ä¸Š,è€ƒè™‘åˆ°å…¶ä»–æ ·æœ¬ä¸Žç›®æ ‡æ ·æœ¬ä¹‹é—´çš„å…³ç³»,ç„¶åŽè®¡ç®—æ ·æœ¬ä¹‹é—´çš„è·ç¦»,å³æœ‰æ•ˆè·ç¦»ã€‚æŽ¥ç€æˆ‘ä»¬åˆ©ç”¨äº†æœ‰æ•ˆè·ç¦»è®¡ç®—æ ·æœ¬ä¹‹é—´çš„ç›¸ä¼¼æ€§,æå‡ºäº†åŸºäºŽæœ‰æ•ˆè·ç¦»æ”¹è¿›çš„ç‰¹å¾æå–å’Œç‰¹å¾é€‰æ‹©ç®—æ³•ã€‚æœ¬æ–‡çš„ä¸»è¦åˆ›æ–°ç‚¹å’Œç ”ç©¶å·¥ä½œæ€»ç»“ä¸»è¦å¦‚ä¸‹:ä¸€æ–¹é¢,æˆ‘ä»¬æå‡ºäº†ä¸¤ç§æ–¹å¼è®¡ç®—æ ·æœ¬ä¹‹é—´çš„æœ‰æ•ˆè·ç¦»,åˆ†åˆ«ä¸ºåŸºäºŽKNN (k Nearest Neighborhood)çš„æœ‰æ•ˆè·ç¦»å’ŒåŸºäºŽç¨€ç–è¡¨ç¤ºçš„æœ‰æ•ˆè·ç¦»ã€‚è¿™ä¸¤ç§æœ‰æ•ˆè·ç¦»çš„è®¡ç®—éƒ½è¦ä¾èµ–äºŽæ ·æœ¬ä¹‹é—´çš„æ‹“æ‰‘ç»“æž„å…³ç³»,å› æ¤æˆ‘ä»¬é¦–å…ˆåˆ©ç”¨æ ·æœ¬ä¹‹é—´çš„ç¨€ç–é‡æž„å…³ç³»æˆ–æ ·æœ¬ä¹‹é—´çš„è¿‘é‚»å…³ç³»æž„é€ å‡ºä¸€ä¸ªåŒå‘çš„æ‹“æ‰‘ç½‘ç»œ,ç„¶åŽä¾èµ–äºŽè¿™ä¸ªåŒå‘ç½‘ç»œè®¡ç®—äº†ä¸¤ä¸ªæ ·æœ¬ä¹‹é—´çš„æœ‰æ•ˆè·ç¦»ã€‚æŽ¥ç€,æˆ‘ä»¬æŠŠåŸºäºŽæœ‰æ•ˆè·ç¦»å¾—åˆ°çš„ç›¸ä¼¼æ€§çŸ©é˜µå¼•å…¥åˆ°ç‰¹å¾æå–ç®—æ³•ä¸,å¾—åˆ°äº†åŸºäºŽæœ‰æ•ˆè·ç¦»çš„ç‰¹å¾æå–ç®—æ³•ã€‚å®žéªŒç»“æžœè¡¨æ˜Ž,åŸºäºŽæœ‰æ•ˆè·ç¦»æ”¹è¿›çš„ç‰¹å¾æå–ç®—æ³•,èƒ½å¤Ÿæœ‰æ•ˆåœ°èŽ·å–æ ·æœ¬çš„å…¨å±€å’Œå±€éƒ¨ç»“æž„ä¿¡æ¯,ä»Žè€Œå¾—åˆ°æ›´åŠ ä¼˜è¶Šçš„åˆ†ç±»æ€§èƒ½ã€‚å¦ä¸€æ–¹é¢,æˆ‘ä»¬é¦–å…ˆé€šè¿‡ç¨€ç–è¡¨ç¤ºå¾—åˆ°æ ·æœ¬ä¹‹é—´çš„ç¨€ç–é‡æž„å…³ç³»,ç„¶åŽåŸºäºŽè¿™ç§ç¨€ç–é‡æž„å…³ç³»æž„å»ºäº†å…¨å±€çš„æ‹“æ‰‘ç»“æž„,ä»Žè€Œå¯ä»¥è®¡ç®—æ ·æœ¬ä¹‹é—´çš„æœ‰æ•ˆè·ç¦»ã€‚é€šè¿‡æœ‰æ•ˆè·ç¦»,æˆ‘ä»¬å¯ä»¥è®¡ç®—ä¸åŒæ ·æœ¬ä¹‹é—´åŸºäºŽæœ‰æ•ˆè·ç¦»çš„ç›¸ä¼¼æ€§,åœ¨ç‰¹å¾é€‰æ‹©è¿‡ç¨‹ä¸ç”¨äºŽè¡¡é‡ç‰¹å¾çš„é‡è¦æ€§ã€‚æ¤å¤–,æˆ‘ä»¬åœ¨ç‰¹å¾é€‰æ‹©è¿‡ç¨‹ä¸åŠ å…¥äº†è¿ä»£çš„æ€æƒ³,é€æ¸åœ°åŽ»é€‰æ‹©æœ€ä¼˜çš„ç‰¹å¾åé›†ã€‚å› æ¤,æˆ‘ä»¬æå‡ºäº†åŸºäºŽæœ‰æ•ˆè·ç¦»çš„è¿ä»£ç‰¹å¾é€‰æ‹©ç®—æ³•ã€‚æˆ‘ä»¬åœ¨ä¸€ç³»åˆ—çš„UCIæ•°æ®é›†ä¸Šè¿›è¡Œäº†éªŒè¯,å®žéªŒç»“æžœè¡¨æ˜Ž,ç›¸æ¯”äºŽä½¿ç”¨æ¬§æ°è·ç¦»çš„ç‰¹å¾é€‰æ‹©ç®—æ³•,æœ¬æ–‡æå‡ºçš„åŸºäºŽæœ‰æ•ˆè·ç¦»çš„ç‰¹å¾é€‰æ‹©ç®—æ³•å¯ä»¥é€‰æ‹©å‡ºæ›´ä¼˜çš„ç‰¹å¾,è¿›è€Œå¯ä»¥æå‡åˆ†ç±»æ€§èƒ½ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ In machine learning and pattern recognition domain, feature extraction and feature selection are important approaches to deal with high-dimensional data, which have been widely used in information retrieval, text classification and disease diagnosis. Researches showed that many feature extraction and feature selection algorithms focus on using Euclidean distance to measure the similarity of samples However, Euclidean distance usually ignores the influence of other samples and fails to capture the dynamic structure due to its static characteristics. To reflect the underlying dynamic structure of data,in this thesis, we measure the effective distance of samples by considering the relationship between the target sample and other samples, which based on the global topological structure. Then we propose a set of effective distance-based feature extraction and feature selection algorithms by using the effective distance-based similarity. The main innovation and research of this thesis are as followsOn the one hand, we develop two ways to compute the effective distance of samples, including k Nearest Neighborhood-based effective distance and sparse representation-based effective distance The computation of effective distance is depended on the topological structure of samples. First, we construct one bilateral network using sparse reconstruction relationship of samples or neighborhood relationship of samples. Based on this bilateral network, we can compute the effective distance of two samples. Then we propose effective distance-based feature extraction methods by using the effective distance-based similarity matrix. Experimental results show that effective distance-based feature ex traction algorithms can effectively preserve the structure of samples and achieve better classification performance than conventional methods using Euclidean distance.On the other hand, we firstly obtain the sparse reconstruction relationship of samples through sparse representation, which is used to construct the global topological structure. Then the effective distance of different samples could be measured using the topological structure. In the process of feature selection,the similarity based on effective distance is used to evaluate the importance of features. Besides, we take advantage of the idea of iteration to achieve the optimal feature subset gradually. As a result, we develop the modified iterative feature selection algorithms based on effective distance. Experiments are conducted on a series of UCI data sets and the results indicate that our effective distance-based feature selection methods can select much better features and boost the classification performance.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ ç‰¹å¾æå–ï¼› ç‰¹å¾é€‰æ‹©ï¼› åŠ¨æ€ç»“æž„ï¼› æ‹“æ‰‘ç»“æž„ï¼› æœ‰æ•ˆè·ç¦»ï¼› ç›¸ä¼¼æ€§ï¼› åˆ†ç±»ï¼›
ã€Key wordsã€‘ Feature extractionï¼› feature selectionï¼› dynamic structureï¼› topological structureï¼› effective distanceï¼› similarityï¼› classificationï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ å—äº¬èˆªç©ºèˆªå¤©å¤§å¦

ã€åˆ†ç±»å·ã€‘TP391.4;TP181
ã€è¢«å¼•é¢‘æ¬¡ã€‘4
ã€ä¸‹è½½é¢‘æ¬¡ã€‘178
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽæœ‰æ•ˆè·ç¦»çš„ç‰¹å¾æå–å’Œç‰¹å¾é€‰æ‹©ç®—æ³•ç ”ç©¶

Research on Feature Extraction and Feature Selection Algorithms Based on Effective Distance

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

åŸºäºŽæœ‰æ•ˆè·ç¦»çš„ç‰¹å¾æå–å’Œç‰¹å¾é€‰æ‹©ç®—æ³•ç ”ç©¶