{"ID":2923539,"CreatedAt":"2026-06-02T04:05:25.881865328Z","UpdatedAt":"2026-06-04T13:12:39.622923895Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.02479","arxiv_id":"2606.02479","title":"Retrieve What's Missing: Coverage-Maximizing Retrieval for Consistent Long Video Generation","abstract":"Maintaining long-term geometric consistency remains challenging for long-horizon autoregressive video generation. Memory-augmented generative models address this by retrieving historical frames, but their effectiveness depends on two key design choices: what 3D-geometric evidence should represent past observations, and how memory frames should be selected from this evidence. Existing methods often rely on camera poses or field-of-view overlap, which are lightweight but too coarse to reason about pixel-wise visibility, or use explicit 3D reconstruction, which provides fine-grained evidence but is costly to maintain over long rollouts. We propose Coverage-Maximizing Retrieval-Augmented Generation (COVRAG), a depth-based memory retrieval framework that uses pretrained 3D priors to construct a target-view coverage map as lightweight 3D memory evidence. For frame selection, COVRAG maximizes residual coverage gain, iteratively retrieving frames that explain target-view regions not covered by the current context or previously selected memories. To improve scalability in long-video generation, we introduce sliding-window depth caching for efficient geometry estimation. Experiments on RealEstate10K and DL3DV10K show that COVRAG improves long-horizon geometric consistency while maintaining low latency compared to baselines.","short_abstract":"Maintaining long-term geometric consistency remains challenging for long-horizon autoregressive video generation. Memory-augmented generative models address this by retrieving historical frames, but their effectiveness depends on two key design choices: what 3D-geometric evidence should represent past observations, and...","url_abs":"https://arxiv.org/abs/2606.02479","url_pdf":"https://arxiv.org/pdf/2606.02479v1","authors":"[\"Minseok Joo\",\"Dogyun Park\",\"Taehoon Lee\",\"Kyujin Lee\",\"Hyunwoo J. Kim\"]","published":"2026-06-01T16:49:58Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"RAG\"]","has_code":false}
