èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽé¢†åŸŸçŸ¥è¯†åº“çš„ç®€åŽ†ä¿¡æ¯æŠ½å–ç³»ç»Ÿçš„è®¾è®¡ä¸Žå®žçŽ°

Design and Implementation of Resume Information Extraction Ystem Based on Domain Knowledge Base

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ å¼ åšï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ åŒ—äº¬é‚®ç”µå¤§å¦ ï¼Œ è®¡ç®—æœºæŠ€æœ¯ï¼Œ 2018ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ ç®€åŽ†æ˜¯æ±‚èŒè€…å¯¹è‡ªèº«æƒ…å†µæ‰€åšçš„ä¹¦é¢ä»‹ç»,å°½ç®¡åœ¨ç»“æž„ä¸Šå…·æœ‰ä¸€å®šçš„ç‰¹ç‚¹,åœ¨å†…å®¹ä¸Šå˜åœ¨ä¸€å®šçš„è§„èŒƒ,ä½†æ˜¯å½¢å¼å¤šæ ·ã€‚å¯¹äºŽæ‹›è˜è€…æ¥è¯´,é€šè¿‡äººå·¥çš„æ–¹å¼é˜…è¯»ã€è®°å½•å’Œç›é€‰ç®€åŽ†,å¾€å¾€è€—è´¹å·¨å¤§çš„å·¥ä½œé‡ã€‚å› æ¤éœ€è¦åˆ©ç”¨ä¿¡æ¯æŠ½å–æŠ€æœ¯ä»Žè‡ªç”±æ ¼å¼çš„ç®€åŽ†æ–‡æœ¬ä¸æŠ½å–å‡ºç»“æž„åŒ–çš„æœ‰ä»·å€¼ä¿¡æ¯,èƒ½å¤Ÿæžå¤§åœ°ç®€åŒ–ç®€åŽ†åˆ†æžå·¥ä½œ,ä»Žè€Œå›´ç»•ç®€åŽ†ä¸çš„å®žä½“å’Œäº‹ä»¶ä¿¡æ¯æž„é€ æœ‰æ•ˆçš„äººæ‰åº“,æ–¹ä¾¿è¿›è¡Œç®€åŽ†çš„ç›é€‰ã€æ£€ç´¢ä»¥åŠäººæ‰åŒ¹é…ã€‚æœ¬è®ºæ–‡æ ¹æ®å®žé™…éœ€è¦æ˜Žç¡®ç®€åŽ†æŠ½å–çš„åŠŸèƒ½å’ŒéžåŠŸèƒ½æ€§éœ€æ±‚,å¯¹ç³»ç»Ÿæž¶æž„å’ŒåŠŸèƒ½æ¨¡å—è¿›è¡Œè®¾è®¡,æ·±å…¥ç ”ç©¶ç®€åŽ†ä¿¡æ¯æŠ½å–çš„æŠ€æœ¯è§£å†³æ–¹æ¡ˆ,å®žçŽ°äº†ä¸€ä¸ªåŸºäºŽé¢†åŸŸçŸ¥è¯†åº“çš„ç®€åŽ†ä¿¡æ¯æŠ½å–ç³»ç»Ÿ,ä¸»è¦å®Œæˆäº†ä»¥ä¸‹å‡ æ–¹é¢å·¥ä½œ:(1)ä»Žç»´åŸºç™¾ç§‘ã€æ‹›è˜ç½‘ç«™ç‰äº’è”ç½‘èµ„æºä¸é‡‡é›†ä¿¡æ¯è¿›è¡Œæ•´ç†,æž„å»ºç®€åŽ†ä¿¡æ¯æŠ½å–ç›¸å…³çš„ä¼ä¸šåç‰¹å¾åº“ã€ç‰ä»·åç§°åº“ç‰é¢†åŸŸçŸ¥è¯†åº“ã€‚(2)é‡‡ç”¨è§¦å‘è¯åŒ¹é…ç®—æ³•å¹¶ç»“åˆWord2vecè¯å‘é‡æ‰©å±•è§¦å‘è¯åº“,å®žçŽ°äº†æŒ‰ç…§ç»“æž„ç‰¹å¾çš„ç®€åŽ†ä¿¡æ¯åˆ†å—ã€‚å¯¹äºŽä¸å«æœ‰è§¦å‘è¯ç‰¹å¾çš„ç®€åŽ†,é€šè¿‡å°†ç®€åŽ†å¥åè¡¨ç¤ºä¸ºç‰¹å¾å‘é‡,åˆ©ç”¨SVMåˆ†ç±»ç®—æ³•å®žçŽ°æŒ‰ç…§å†…å®¹ç‰¹å¾çš„ç®€åŽ†åˆ†å—ã€‚(3)å¯¹æ¯”åˆ†æžäº†åŸºäºŽé¢†åŸŸçŸ¥è¯†çš„æ¡ä»¶éšæœºåœºæ¨¡åž‹(CRF),éšé©¬å°”å¯å¤«æ¨¡åž‹(HMM)å’Œæœ€å¤§ç†µæ¨¡åž‹(ME)åœ¨ç®€åŽ†å‘½åå®žä½“è¯†åˆ«ä¸çš„åŽŸç†å’Œåº”ç”¨æ•ˆæžœ,ä½¿ç”¨æœ€ä¼˜çš„ç»Ÿè®¡æ¨¡åž‹å®žçŽ°å„ç±»ç®€åŽ†å—ä¸çš„å®žä½“ä¿¡æ¯æŠ½å–ã€‚(4)æå‡ºäº†ç®€åŽ†ä¿¡æ¯æŠ½å–å›žæº¯ç–ç•¥,é‡‡ç”¨åŸºäºŽé¢†åŸŸçŸ¥è¯†åº“çš„è§„åˆ™åŒ¹é…æ–¹æ³•å¯¹ç»Ÿè®¡æ¨¡åž‹å®žä½“è¯†åˆ«çš„ç»“æžœè¿›è¡ŒäºŒæ¬¡æŠ½å–,åŒæ—¶åœ¨è¯†åˆ«å‡ºçš„éƒ¨åˆ†å®žä½“åºåˆ—ä¸é‰´åˆ«å‡ºäº‹ä»¶ä¿¡æ¯ã€‚(5)åˆ©ç”¨E1 asti c searchåˆ†å¸ƒå¼æ£€ç´¢å¼•æ“Žå®žçŽ°äº†å¯¹ç®€åŽ†æŠ½å–ç»“æžœçš„å¿«é€Ÿç›é€‰å’ŒæŸ¥è¯¢ã€‚é™¤æ¤ä¹‹å¤–,ä½¿ç”¨Zendæ¡†æž¶,Echartsç‰WEBç›¸å…³æŠ€æœ¯å°†å„ä¸ªåŠŸèƒ½æ•´åˆåˆ°ç³»ç»Ÿä¸,å®žçŽ°äº†ç®€åŽ†ä¿¡æ¯æŠ½å–çš„å¯è§†åŒ–æ“ä½œã€‚æœ¬æ–‡åœ¨ä¸Šè¿°å·¥ä½œçš„åŸºç¡€ä¸Š,å¯¹ç®€åŽ†ä¿¡æ¯æŠ½å–ç³»ç»Ÿè¿›è¡Œäº†ä¸€ç³»åˆ—åŠŸèƒ½å’Œæ€§èƒ½æµ‹è¯•,ç»“æžœæ˜¾ç¤ºç³»ç»Ÿèƒ½å¤Ÿå®žçŽ°è‡ªåŠ¨ä»Žç®€åŽ†æ–‡æœ¬ä¸æŠ½å–ç”Ÿæˆç»“æž„åŒ–ä¿¡æ¯å¹¶å»ºç«‹æ±‚èŒè€…æ•°æ®åº“,å¹¶ä¸”å¯¹äºŽå¤§å¤šæ•°å®žä½“å‡èƒ½è¾¾åˆ°é¢„æœŸçš„æŠ½å–æ•ˆæžœ,è¯´æ˜Žäº†æœ¬æ–‡ä¸æå‡ºçš„ç®€åŽ†åˆ†å—æ–¹æ¡ˆå’Œå®žä½“æŠ½å–æ–¹æ¡ˆçš„æœ‰æ•ˆæ€§ã€‚åŒæ—¶ç³»ç»Ÿä¸ºç”¨æˆ·æä¾›çš„ç®€åŽ†ç®¡ç†ã€ç›é€‰å’Œæ£€ç´¢ç‰åŠŸèƒ½,ä¹Ÿæ˜¾è‘—æé«˜äº†ç®€åŽ†å¤„ç†çš„æ•ˆçŽ‡,ä½¿å…¶å…·æœ‰äº†æ›´å¥½çš„å®žç”¨ä»·å€¼ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ Resume is a job seeker written description of their own situation,although there are certain characteristics in the structure,there are some content in the specification,but a variety of forms.So for recruiters,manual reading,recording and filtering resumes often cost a tremendous amount of work.Therefore,it is necessary to use information extraction technology to extract structured and valuable information from the free-form resume text,which can greatly simplify the resume analysis and construct an effective talent pool around the entity and event information in the resume so as to facilitate the talent matching,searching and filtering of resumes.Based on the brief introduction of the related technology of information extraction,this paper clarifies the demand and function design of resume extraction according to the actual needs,deeply studies the core technology solutions of resume information extraction,and implements a complete resume information extraction system and the following aspects of work:(1)Collect information from Internet resources such as Wikipedia and recruitment websites for collation,and build an enterprise name knowledge base,equivalent name knowledge base etc.(2)Trigger word matching algorithm is used in conjunction with Word2vec word vector to expand thesaurus to implement the segmentation of the resume information according to the structure characteristics.Trigger word matching algorithm is used in conjunction with Word2vec word vector to expand thesaurus to achieve the structure of the resume information block.For resumes that do not contain triggers,the resumes are expressed as eigenvectors,and the SVM classification algorithm is used to implement resume segmentation based on content features.(3)Comparative analysis the principle and application effect of Hidden Markov Model(HMM),Maximum Entropy Model(ME)and Conditional Random Field Model(CRF)which introduce domain knowledge in the named entities recognition of resume,select the optimal statistical model to achieve entity information extraction in various categories of resume block.(4)Proposing a backtracking strategy of resume information extraction.The rules matching method based on knowledge base was used to complete the results of entity recognition based on statistical methods.At the same time,identify some event information in sequence of entities.(5)The Elasticsearch distributed search engine is used to filter and search resume extraction results.In addition,using Zend framework,Echarts and other WEB related technology to achieve the resume information extraction data visualization and other business layer functions,so that it has a more practical value,enabling business recruiters to efficiently handle resumes.Based on the above work,this paper carried out a series of functions and performance tests on the resume information extraction system.The results show that system can automatically extract structured information from the resume texts and establish a job seeker database,and for most entities can achieve the expected results,illustrate the effectiveness of the proposed block citation scheme and entity extraction scheme in this paper.At the same time the system also provides users with resume management,filtering and retrieval capabilities to improve the efficiency of resume processing.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ ç®€åŽ†ä¿¡æ¯æŠ½å–ï¼› é¢†åŸŸçŸ¥è¯†åº“ï¼› æ–‡æœ¬åˆ†ç±»ï¼› å‘½åå®žä½“è¯†åˆ«ï¼› å›žæº¯ç–ç•¥ï¼›
ã€Key wordsã€‘ resume information extractionï¼› domain knowledge baseï¼› text categorizationï¼› named entity recognitionï¼› backtracking strategiesï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ åŒ—äº¬é‚®ç”µå¤§å¦

ã€åˆ†ç±»å·ã€‘TP391.1
ã€è¢«å¼•é¢‘æ¬¡ã€‘7
ã€ä¸‹è½½é¢‘æ¬¡ã€‘469

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽé¢†åŸŸçŸ¥è¯†åº“çš„ç®€åŽ†ä¿¡æ¯æŠ½å–ç³»ç»Ÿçš„è®¾è®¡ä¸Žå®žçŽ°

Design and Implementation of Resume Information Extraction Ystem Based on Domain Knowledge Base

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

åŸºäºŽé¢†åŸŸçŸ¥è¯†åº“çš„ç®€åŽ†ä¿¡æ¯æŠ½å–ç³»ç»Ÿçš„è®¾è®¡ä¸Žå®žçŽ°