In order to extract caption in digital video,this paper presents a method which utilizes many features of caption such as temporal and spatial,edges,color.The caption's regions are located by detecting edges so text's colors are known.Universal Gaussian Mixture Mode(lGMM)is trained for text's color.The color layer of texts is extracted based on the trained GMM.The method to judge whether content of caption changes is to add mask bitmap to frame.Experiments show that this method performs well even if the bac...