Advancements in generative models for text-to-image (T2I) have been dramatic. Recently, text-to-video (T2V) systems have made significant strides, enabling the automatic generation of videos based on textual prompt descriptions. One primary challenge in video synthesis is the extensive memory and training data required. Methods based on the pre-trained Stable Diffusion (SD) model have been proposed…
