{"ID":2838838,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.17844","arxiv_id":"2511.17844","title":"Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation","abstract":"Fine-tuning large-scale text-to-video diffusion models to add new generative controls, such as those over physical camera parameters (e.g., shutter speed or aperture), typically requires vast, high-fidelity datasets that are difficult to acquire. In this work, we propose a data-efficient fine-tuning strategy that learns these controls from sparse, low-quality synthetic data. We show that not only does fine-tuning on such simple data enable the desired controls, it actually yields superior results to models fine-tuned on photorealistic \"real\" data. Beyond demonstrating these results, we provide a framework that justifies this phenomenon both intuitively and quantitatively.","short_abstract":"Fine-tuning large-scale text-to-video diffusion models to add new generative controls, such as those over physical camera parameters (e.g., shutter speed or aperture), typically requires vast, high-fidelity datasets that are difficult to acquire. In this work, we propose a data-efficient fine-tuning strategy that learn...","url_abs":"https://arxiv.org/abs/2511.17844","url_pdf":"https://arxiv.org/pdf/2511.17844v4","authors":"[\"Shihan Cheng\",\"Nilesh Kulkarni\",\"David Hyde\",\"Dmitriy Smirnov\"]","published":"2025-11-21T23:41:19Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Diffusion Model\"]","has_code":false}
