èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽå¤šæ¨¡å¼åˆ†æžè‡ªåŠ¨è§£æžæ–°é—»è§†é¢‘(è‹±æ–‡)

Automatic Parsing of News Video Using Multimodal Analysis

æŽ¨è CAJä¸‹è½½
PDFä¸‹è½½
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€Authorã€‘ WANG Wei qiang 1,\ GAO Wen 1,2 1(Institute of Computing Technology, The Chinese Academy of Sciences, Beijing 100080, China); 2(Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin 150001, China)

ã€æœºæž„ã€‘ ä¸å›½ç§‘å¦é™¢è®¡ç®—æŠ€æœ¯ç ”ç©¶æ‰€ï¼› å“ˆå°”æ»¨å·¥ä¸šå¤§å¦è®¡ç®—æœºç§‘å¦ä¸Žå·¥ç¨‹ç³» åŒ—äº¬100080ï¼› é»‘é¾™æ±Ÿå“ˆå°”æ»¨150001ï¼›

ã€æ‘˜è¦ã€‘ æå‡ºä¸€ç§ç»“åˆè§†è§‰ã€å£°éŸ³ã€æ–‡å—ç‰å¤šç§æ¨¡å¼ä¿¡æ¯è‡ªåŠ¨è§£æžæ–°é—»è§†é¢‘çš„æ–¹æ³• ,å¹¶å¯¹éŸ³é¢‘ç‰¹å¾çš„æå–ä»¥åŠç»¼åˆå¤šç§æ¨¡å¼ä¿¡æ¯è§£æžæ–°é—»è§†é¢‘çš„ç®—æ³•è¿›è¡Œäº†è¯¦ç»†çš„æŽ¢è®¨ .å¤šç§æ¨¡å¼ä¿¡æ¯çš„ä½¿ç”¨æœ‰æ•ˆåœ°å¼¥è¡¥äº†ä»…åŸºäºŽå›¾åƒåˆ†æžæŠ€æœ¯åˆ†å‰²æ–°é—»æ¡ç›®çš„ä¸è¶³ ,ä»Žè€Œä½¿è¯¥æ–¹æ³•å¯¹ä¸åŒæ–¹å¼å˜åœ¨çš„æ–°é—»æ¡ç›®åœ¨åˆ†å‰²æ—¶å…·æœ‰æ›´å¹¿æ³›çš„é€‚åº”æ€§ .åœ¨åŒ…å« 184 10 0å¸§çš„æµ‹è¯•æ•°æ®é›†ä¸Š ,å¯¹äºŽæ–°é—»æ¡ç›®è¾¹ç•Œç‚¹çš„æ£€æµ‹ ,ç³»ç»ŸèŽ·å¾—äº† 95 .1%æŸ¥å…¨çŽ‡ ,93.3%çš„æ£ç¡®çŽ‡ .å®žéªŒç»“æžœè¯æ˜Žäº†è¯¥æ–¹æ³•çš„æœ‰æ•ˆæ€§ã€å¼ºå£®æ€§ .æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ The paper presents an approach, which exploits multimodal information (video, audio and text) to automatically parse news video. In the paper, audio features extraction, as well as multimodal information integration scheme, are addressed in detail. Integration of multiple information sources can overcome the weakness of the approach only exploiting the image analysis techniques. That makes our approach have wider adaptation to variable existence situations of news items. On test data with 184 100 frames, when the system detects boundaries between news items, the recall 95.1% and the accuracy 93.3% are obtained. The experiment results show the approach is valid and robust.æ›´å¤š è¿˜åŽŸ