To address the problem of full body human pose estimation in video,a coarse-to-fine cascade of spatio-temporal models was developed in which the tracklet of body part was considered as basic unit. The notion of trackletranges from trajectory covering the whole video to body part in one frame. In this cascade,coarse models filtered the state space for the next level via their maxmarginals. Loops in the graphical models made the inference intractable,the models were decomposed into Markov random fields and hi...