èŠ‚ç‚¹æ–‡çŒ®

ç”µåæ–‡æ¡£ä¿¡æ¯æŒ–æŽ˜ç³»ç»Ÿçš„ç ”ç©¶

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ è”¡ç«‹å†›ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ æ¹–å—å¤§å¦ ï¼Œ æŽ§åˆ¶å·¥ç¨‹ï¼Œ 2003ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ éšç€InternetåŠå…¶ä¿¡æ¯æœåŠ¡çš„çˆ†ç‚¸æ€§å¢žé•¿ï¼Œç»§æ•°æ®æŒ–æŽ˜æŠ€æœ¯æˆåŠŸåœ°ç”¨äºŽä¼ ç»Ÿçš„æ•°æ®åº“é¢†åŸŸä¹‹åŽï¼Œäººä»¬å¯¹ç½‘ç»œä¿¡æ¯æŒ–æŽ˜ç‰¹åˆ«æ˜¯Webæ•°æ®æŒ–æŽ˜æŠ€æœ¯ä¹Ÿå¼€å§‹ç ”ç©¶ã€‚ æœ¬æ–‡é¦–å…ˆä»‹ç»äº†æ•°æ®æŒ–æŽ˜çš„å®šä¹‰ã€åŠŸèƒ½ã€æ¨¡åž‹å’Œç®—æ³•ï¼›ç ”ç©¶äº†æ•°æ®æŒ–æŽ˜çš„èƒŒæ™¯ã€æŠ€æœ¯æ¼”å˜è¿‡ç¨‹å’ŒçŽ°çŠ¶ã€‚ æŽ¥ç€æè¿°äº†æ•°æ®æŒ–æŽ˜ç³»ç»Ÿçš„åŽŸåž‹æ¡†æž¶ï¼Œå¹¶ç€é‡å¯¹æœ€å¸¸ç”¨çš„ä¸‰ç§Webæ•°æ®æŒ–æŽ˜æŠ€æœ¯è¿›è¡Œäº†åˆ†æžï¼šWebæ—¥å¿—æŒ–æŽ˜é‡‡ç”¨çš„æ¨¡åž‹æœ‰è¾ƒå¤§çš„ç¼ºé™·ï¼šç²¾åº¦è¾ƒä½Žã€æ¨¡åž‹ä»£ä»·å¤ªå¤§ã€æ•ˆçŽ‡ä¸é«˜ï¼Œä¸é€‚åˆç”µåæ–‡æ¡£çš„æ•°æ®æŒ–æŽ˜ï¼›å‘é‡ç©ºé—´æ¨¡åž‹VSMæ³•å’ŒåŸºäºŽç¤ºä¾‹å¦ä¹ çš„æ–‡æ¡£è¿‡æ»¤æ³•å…¶å®žéƒ½æ˜¯ä¸€ç§æ–‡æ¡£æ¯”è¾ƒã€è¿‡æ»¤æ¨¡åž‹çš„æ–¹æ³•ï¼Œè¿™ç§æ–¹æ³•çš„ä¸»è¦ç¼ºé™·æ˜¯å‘é‡çš„ç»´æ•°å’Œè®¡ç®—å¼€é”€éžå¸¸å·¨å¤§ï¼ŒæŒ–æŽ˜æ•ˆçŽ‡ä½Žã€‚å¤„ç†åŒ…å«æ¨¡ç³Šç‰¹æ€§çš„äº‹ç‰©ï¼Œæ•ˆæžœä¸æ˜¯å¾ˆå¥½ã€‚å¯¹ä¸å¿ƒè¯è¿›è¡Œæ¨¡ç³Šæµ‹åº¦å¤„ç†æ—¶ï¼Œä¼šäº§ç”Ÿè¾ƒå¤§çš„åå·®ã€‚ æœ€åŽï¼Œè®ºæ–‡ç»™å‡ºäº†ä¸€ä¸ªå®žç”¨çš„ç”µåæ–‡æ¡£ä¿¡æ¯æŒ–æŽ˜ç³»ç»Ÿçš„è§£å†³æ–¹æ¡ˆã€‚Internetä¸Šæ–‡æ¡£ç±»åž‹ç¹å¤šï¼Œè¯ç§å¤æ‚ï¼Œé’ˆå¯¹è¿™äº›æ–‡æ¡£å»ºç«‹ä¸€ä¸ªæ ¼å¼ä¸€è‡´çš„æ•°æ®åº“å°†æ˜¯ä¸€é¡¹å¾ˆå¤æ‚çš„äº‹æƒ…ã€‚å› æ¤ï¼Œæœ¬æ–‡é‡‡å–äº†å»ºç«‹InternetæœåŠ¡å™¨çš„æ–‡ä»¶èµ„æ–™é•œåƒç«™ç‚¹çš„æ–¹æ³•ï¼Œé‡‡ç”¨åŸºäºŽä¼ ç»Ÿæ•°æ®æŒ–æŽ˜çš„é€†è¿‡ç¨‹ï¼Œå³å…ˆå¯¹ç”µåæ–‡æ¡£è¿›è¡ŒæŒ–æŽ˜åŽï¼ŒæŠŠå¯¹ç”¨æˆ·æœ‰ç”¨çš„ç”µåæ–‡æ¡£èµ„æ–™å†è¿›è¡Œå»ºåº“ï¼Œä»Žè€Œæé«˜ç”¨æˆ·å¯¹ä¿¡æ¯å¤„ç†çš„èƒ½åŠ›å’Œå¤„ç†é€Ÿåº¦ã€‚ç³»ç»Ÿé‡‡ç”¨I₂DEFæ–¹æ³•å»ºç«‹äº†ç»“æž„æ¨¡åž‹ã€åŠ¨æ€æ¨¡åž‹å’ŒåŠŸèƒ½æ¨¡åž‹ï¼›è®¾è®¡äº†åŒæ‰«æç¼“å†²åŒºçš„æ— å›žæº¯æœç´¢ç®—æ³•åŠæœç´¢è¿‡ç¨‹çš„åŒæ ˆç»“æž„ï¼›æ ¹æ®ç”µåé‚®ä»¶ç›‘æŽ§ç³»ç»Ÿå’Œç”µåæ–‡æ¡£æŒ–æŽ˜æŠ€æœ¯çš„ç‰¹å¾ï¼Œè®¾è®¡äº†Bayesåˆ†ç±»å™¨å¹¶ä½¿ç”¨äº†å¢žå¼ºåž‹æ–¹æ³•ï¼Œæå‡ºäº†ä¸€ç§è¿ç”¨ç”µåæ–‡æ¡£æŒ–æŽ˜æŠ€æœ¯çš„ç”µåé‚®ä»¶ç›‘æŽ§ç³»ç»Ÿï¼›æž„å»ºäº†C/Så’ŒB/SåŒé‡ä½“ç³»ç»“æž„ï¼›å¹¶ç»™å‡ºäº†æŒ–æŽ˜è¿‡ç¨‹çš„éƒ¨åˆ†å‡½æ•°è°ƒç”¨å…³ç³»åŠç³»ç»ŸæŒ–æŽ˜çš„å¤„ç†è¿‡ç¨‹ã€éƒ¨åˆ†å¤„ç†ç¨‹åºã€‚ç³»ç»Ÿèƒ½å¤Ÿå®žçŽ°ç”µåæ–‡æ¡£çš„æŒ–æŽ˜ã€å‘å¸ƒã€ç®¡ç†ã€ç”µåé‚®ä»¶ç›‘æŽ§ã€ç³»ç»Ÿç»´æŠ¤ç‰åŠŸèƒ½ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ With the surprising growth of Internet and its information service, data mining (DM) technology has been successfully used in data base, Which makes it possible for people to make a study of Web information mining, especially Web data mining.Beginning with the introduction of the definition of DM, its function, model and arithmetic, the paper also makes a study of its background, technology evaluation and its present situation. Then it describes the framework of DM system, focusing on the analysis of three most common Web DM technologies. Because Web daily record mining model is of great deficiency: such as low accuracy, high cost and inefficiency, it is unfit for electronic documents. Vector space model (VSM) as well as document filtration based on sample leaning is actually a way of documentary comparison and model filtration, in this way vector dimensions as well as their arithmetic cost are very huge but ineffiently. It is ineffective while handling indefinite things, for deviation may appear while estimating key words. Finally the paper proposes a practical electronic documentary information mining system as a solution, it is very complicated to set up a data base of the same pattern on Internet because of various types of documents and languages. Inverse to traditional data mining process, this paper uses a method of establishing mirror image sites of Internet service. That is , once electronic documents are mined up, a base is set up again for the documents useful to users in order to increase their ability and speed of handling information. Employing IDEF to establish framework, dynamitic and functional models, the system also designs a non-back shifting search arithmetic for double-scanning buffer zone and a double-track structure for searching process. According to the characteristics of E-mail control and electronic documentary mining technology, Bayes classifiers are made to strengthen the electronic control system in which electronic documentary mining technology is used; and moreover the double systematic structure of C/S & B/S is constructor with the presence of some function relationships in mining process as well as systematic mining and program handling. The system has the function of mining, issuing, managing electronic files, E-mail control and systematic safeguard.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ æ•°æ®æŒ–æŽ˜ï¼› ç”µåæ–‡æ¡£ï¼› webæ—¥å¿—æŒ–æŽ˜ï¼› VSMï¼› 1₂DEFæ–¹æ³•ï¼› æ— å›žæº¯æœç´¢ç®—æ³•ï¼› åŒæ ˆç»“æž„ï¼› ç”µåé‚®ä»¶ç›‘æŽ§ï¼›
ã€Key wordsã€‘ DMï¼› Electronic documentï¼› Web daily-record miningï¼› VSMï¼› I₂DEF Methodï¼› Non-back shifting search Arithmeticï¼› Double-track Structureï¼› e-mail monitoringï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ æ¹–å—å¤§å¦

ã€åˆ†ç±»å·ã€‘TP311.13
ã€ä¸‹è½½é¢‘æ¬¡ã€‘208

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

ç”µå­æ–‡æ¡£ä¿¡æ¯æŒ–æŽ˜ç³»ç»Ÿçš„ç ”ç©¶

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

ç”µåæ–‡æ¡£ä¿¡æ¯æŒ–æŽ˜ç³»ç»Ÿçš„ç ”ç©¶