Caption extraction and recognition are important in the video analysis and retrieval.The article analyzes the algorithm of caption extraction based on the spatio-temporal and puts forward the algorithm of caption segmentation.At last,the caption is recognized by the algorithm of SVM.