{"ID":2824506,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.23851","arxiv_id":"2512.23851","title":"Pretraining Frame Preservation for Lightweight Autoregressive Video History Embedding","abstract":"Autoregressive video generation relies on history context for content consistency and storytelling. As video histories grow longer, efficiently encoding them remains an open problem - particularly for personal users and local workflows where compute and memory budgets are limited. We present a lightweight history encoder that maps long video histories into short-length embeddings, pretrained with a frame query objective that learns to attend to content features at arbitrary temporal positions. The pretraining stage provides the encoder with dense history coverage on large-scale video data; the subsequent finetuning stage adapts the pretrained encoder under an autoregressive video generation objective to establish content-level consistency. In this way, the lightweight embeddings achieve comparable performance to heavier alternatives. We evaluate the framework with ablative settings and discuss the architecture designs.","short_abstract":"Autoregressive video generation relies on history context for content consistency and storytelling. As video histories grow longer, efficiently encoding them remains an open problem - particularly for personal users and local workflows where compute and memory budgets are limited. We present a lightweight history encod...","url_abs":"https://arxiv.org/abs/2512.23851","url_pdf":"https://arxiv.org/pdf/2512.23851v5","authors":"[\"Lvmin Zhang\",\"Shengqu Cai\",\"Muyang Li\",\"Chong Zeng\",\"Beijia Lu\",\"Anyi Rao\",\"Song Han\",\"Gordon Wetzstein\",\"Maneesh Agrawala\"]","published":"2025-12-29T20:29:21Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
