Caption text routinely provides rich semantic information. Compared with other video features, information in caption text is highly compact and structured, thus is more suitable for efficient video indexing, therefore video caption based methods have attracted particular attention.
This paper deals with how to make full use of spatial-temporal information to fulfill caption extraction with great efficiency and speed. In order to further video analysis, an algorithm of abrupt shot bou...