èŠ‚ç‚¹æ–‡çŒ®

å†¬å¥¥ä¼šæ–°é—»æ–‡æœ¬é‡‡é›†åŠåˆ†ç±»åˆ†æžç³»ç»Ÿçš„è®¾è®¡ä¸Žå®žçŽ°

Design and Implementation of the Winter Olympics News Text Collection and Classification Analysis System

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ åˆ˜å¨œï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ æ²³åŒ—å·¥ç¨‹å¤§å¦ ï¼Œ è®¡ç®—æœºæŠ€æœ¯ï¼ˆä¸“ä¸šå¦ä½ï¼‰ï¼Œ 2020ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ éšç€äº’è”ç½‘æŠ€æœ¯çš„å‘å±•,ç½‘ç»œä¿¡æ¯æ•°é‡ä¸æ–å¢žåŠ ã€‚ç½‘ç»œæ•°æ®å¤šä»¥æ–‡æœ¬ç±»åž‹å±•çŽ°,ä½†æ–‡æœ¬ä¿¡æ¯åˆ†å¸ƒå‘æ•£,å†…å®¹å¤æ‚,åˆ†ç±»å•ä¸€,å¯¼è‡´ç½‘ç»œä¿¡æ¯çš„é‡‡é›†å’Œåˆ†æžéš¾åº¦è¾ƒå¤§ã€‚ä¸ºè§£å†³æ•°æ®é‡‡é›†å›°éš¾å’Œæ–‡æœ¬åˆ†ç±»ç²—ç³™çš„é—®é¢˜,æœ¬è®ºæ–‡ä»¥ä¸»é¢˜çˆ¬è™«å’Œæ–‡æœ¬åˆ†ç±»æŠ€æœ¯ä¸ºåŸºç¡€,åˆ©ç”¨Pythonè¯è¨€è®¾è®¡å¹¶å®žçŽ°äº†å†¬å¥¥ä¼šæ–°é—»æ–‡æœ¬é‡‡é›†åŠåˆ†ç±»åˆ†æžç³»ç»Ÿã€‚è¯¥ç³»ç»Ÿä¸»è¦åŒ…æ‹¬æ•°æ®é‡‡é›†ã€æ•°æ®åˆ†ç±»ã€æ•°æ®å¯è§†åŒ–ä¸‰ä¸ªåŠŸèƒ½æ¨¡å—ã€‚åœ¨æ•°æ®é‡‡é›†æ¨¡å—ä¸,ä¸ºäº†é‡‡é›†ä¸Žå†¬å¥¥ä¼šä¸»é¢˜ç›¸å…³çš„æ–°é—»æ–‡æœ¬æ•°æ®,å®šåˆ¶äº†ä¸»é¢˜çˆ¬è™«ã€‚æ‰€èŽ·å¾—çš„æ•°æ®ä¸ºå†¬å¥¥ä¼šä¿¡æ¯çš„åˆ†ç±»ä¸Žåˆ†æžæä¾›äº†æ•°æ®æ”¯æ’‘,å¹¶å®žçŽ°äº†å¯¹å†¬å¥¥ä¼šç½‘ç»œä¿¡æ¯çš„åˆæ¥æ•°æ®æ•´åˆã€‚æ•°æ®åˆ†ç±»æ¨¡å—ä¸»è¦åˆ†ä¸ºä¸¤ä¸ªéƒ¨åˆ†:æ•°æ®ç›é€‰å’Œæ–‡æœ¬åˆ†ç±»ã€‚ä¸ºå®žçŽ°å¯¹æ— å…³ä¿¡æ¯çš„ç›é€‰,æœ¬è®ºæ–‡åŸºäºŽè¿‘é‚»ç®—æ³•SNNå¼•å…¥å±€éƒ¨å¯†åº¦å’Œç›¸ä¼¼åº¦,æå‡ºäº†åŸºäºŽå±€éƒ¨å¯†åº¦å’Œç›¸ä¼¼åº¦çš„è‡ªé€‚åº”SNNç®—æ³•(AK-SNN)ã€‚ä¸ºéªŒè¯AK-SNNç®—æ³•çš„æ€§èƒ½,åˆ†åˆ«åœ¨UCIæ•°æ®é›†å’Œå†¬å¥¥ä¼šæ–°é—»æ–‡æœ¬æ•°æ®é›†ä¸Šè¿›è¡Œäº†å¯¹æ¯”å®žéªŒã€‚å®žéªŒç»“æžœè¡¨æ˜Ž,AK-SNNå…·æœ‰æ›´å¥½çš„é²æ£’æ€§å’Œé¢„æµ‹ç²¾åº¦ã€‚ä¸ºè¿›ä¸€æ¥å¯¹ç½‘ç»œæ–‡æœ¬æ•°æ®è¿›è¡Œç±»åˆ«ç»†åˆ†,é‡‡ç”¨æžé™å¦ä¹ æœº(ELM)ä½œä¸ºæ–‡æœ¬åˆ†ç±»å™¨å®žçŽ°æ–‡æœ¬ä¿¡æ¯çš„å¤šåˆ†ç±»ã€‚ç»“æžœè¡¨æ˜Ž,ELMåœ¨å¤šç±»åˆ«çš„æ–‡æœ¬åˆ†ç±»ä¸èŽ·å¾—äº†è‰¯å¥½çš„åˆ†ç±»ç²¾åº¦ã€‚åœ¨æ•°æ®å¯è§†åŒ–æ¨¡å—ä¸,ä¸ºäº†ç›´è§‚å±•ç¤ºé‡‡é›†å’Œåˆ†ç±»ç»“æžœ,åˆ©ç”¨Djangoæ¡†æž¶è®¾è®¡äº†Webå±•ç¤ºç•Œé¢ã€‚ä¸ºæŒ–æŽ˜ä¿¡æ¯ä¸çš„æ½œåœ¨ä»·å€¼,å¯¹åˆ†ç±»ç»“æžœã€æ–°é—»æ¥æºã€æ–°é—»å‘å¸ƒæ—¥æœŸç‰å¤šæ–¹é¢è¿›è¡Œæ•°æ®åˆ†æž,å¹¶å¯¹åˆ†æžç»“æžœè¿›è¡Œäº†å¯è§†åŒ–ã€‚æœ¬è®ºæ–‡çš„è®¾è®¡ä¸Žå®žçŽ°ä¸º2022å¹´å†¬å¥¥ä¼šç½‘ç»œä¿¡æ¯çš„é‡‡é›†å’Œåˆ†æžæä¾›äº†ä¸€å®šçš„æ•°æ®æ”¯æŒå’ŒæŠ€æœ¯æ”¯æ’‘,åŒæ—¶ä¸ºæŒ–æŽ˜å¤§åž‹ä½“è‚²èµ›äº‹ç›¸å…³ç½‘ç»œæ–°é—»æ–‡æœ¬ä¸çš„æ½œåœ¨ä»·å€¼ä¿¡æ¯æä¾›äº†ä¸€ç§å¯ä¾›å€Ÿé‰´çš„æ€è·¯ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ With the development of internet technology,the amount of network information continues to increase.Network data is mostly displayed in text types,but the distribution of text information is divergent,the content is complex,and the classification is single,so it is difficult to collect and analyze network information.In order to solve the problems of difficult data collection and rough text classification,in this paper,the Winter Olympics news text collection and classification analysis system is designed and implemented by Python language based on the focused crawler and text classification technology.The system mainly includes three functional modules which are data collection module,data classification module,and data visualization module.In the data collection module,in order to collect news text data related to the theme of the Winter Olympics,a focused crawler is customized.The obtained data provided support for the classification and analysis of the Winter Olympics information,and realized the preliminary data integration of the Winter Olympics network information.The data classification module is mainly divided into two parts which are data filtering part and text classification part.In order to achieve the screening of irrelevant information,in this paper,by introducing the local density and similarity to SNN,an adaptive SNN algorithm based on local density and similarity(AK-SNN)is proposed.To verify the performance of the AK-SNN algorithm,the comparative experiments were carried out on the UCI dataset and the Winter Olympics news text dataset.Experimental results show that AK-SNN has better robustness and prediction accuracy.In order to further classify the network text data,the extreme learning machine(ELM)is used as a classifier to achieve multi-classification of text information.The results show that ELM has achieved good classification accuracy in multi-category text classification.In the data visualization module,to visually display the collection and classification results,the web display interface is designed using the Django framework.In order to explore the potential value of the information,the data analysis was carried out on classification results,news sources,news release dates,etc.The analysis results were displayed.The design and implementation of this paper provide certain data support and technical support for the collection and analysis of network information for the 2022 Winter Olympics.At the same time,it provides a way of thinking for mining the potential information in the relevant online news texts of large-scale sports events.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ 2022å†¬å¥¥ä¼šï¼› æ–‡æœ¬æ•°æ®ï¼› ä¸»é¢˜çˆ¬è™«ï¼› æ–‡æœ¬åˆ†ç±»ï¼› æ•°æ®åˆ†æžï¼› æ•°æ®å¯è§†åŒ–ï¼›
ã€Key wordsã€‘ 2022 Winter Olympicsï¼› text dataï¼› focused crawlersï¼› text classificationï¼› data analysisï¼› data visualizationï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ æ²³åŒ—å·¥ç¨‹å¤§å¦

ã€åˆ†ç±»å·ã€‘TP391.1;G811.212
ã€è¢«å¼•é¢‘æ¬¡ã€‘2
ã€ä¸‹è½½é¢‘æ¬¡ã€‘471
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

å†¬å¥¥ä¼šæ–°é—»æ–‡æœ¬é‡‡é›†åŠåˆ†ç±»åˆ†æžç³»ç»Ÿçš„è®¾è®¡ä¸Žå®žçŽ°

Design and Implementation of the Winter Olympics News Text Collection and Classification Analysis System

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

å†¬å¥¥ä¼šæ–°é—»æ–‡æœ¬é‡‡é›†åŠåˆ†ç±»åˆ†æžç³»ç»Ÿçš„è®¾è®¡ä¸Žå®žçŽ°