{"ID":2850958,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.20178","arxiv_id":"2510.20178","title":"PPMStereo: Pick-and-Play Memory Construction for Consistent Dynamic Stereo Matching","abstract":"Temporally consistent depth estimation from stereo video is critical for real-world applications such as augmented reality, where inconsistent depth estimation disrupts the immersion of users. Despite its importance, this task remains challenging due to the difficulty in modeling long-term temporal consistency in a computationally efficient manner. Previous methods attempt to address this by aggregating spatio-temporal information but face a fundamental trade-off: limited temporal modeling provides only modest gains, whereas capturing long-range dependencies significantly increases computational cost. To address this limitation, we introduce a memory buffer for modeling long-range spatio-temporal consistency while achieving efficient dynamic stereo matching. Inspired by the two-stage decision-making process in humans, we propose a \\textbf{P}ick-and-\\textbf{P}lay \\textbf{M}emory (PPM) construction module for dynamic \\textbf{Stereo} matching, dubbed as \\textbf{PPMStereo}. PPM consists of a `pick' process that identifies the most relevant frames and a `play' process that weights the selected frames adaptively for spatio-temporal aggregation. This two-stage collaborative process maintains a compact yet highly informative memory buffer while achieving temporally consistent information aggregation. Extensive experiments validate the effectiveness of PPMStereo, demonstrating state-of-the-art performance in both accuracy and temporal consistency. % Notably, PPMStereo achieves 0.62/1.11 TEPE on the Sintel clean/final (17.3\\% \\\u0026 9.02\\% improvements over BiDAStereo) with fewer computational costs. Codes are available at \\textcolor{blue}{https://github.com/cocowy1/PPMStereo}.","short_abstract":"Temporally consistent depth estimation from stereo video is critical for real-world applications such as augmented reality, where inconsistent depth estimation disrupts the immersion of users. Despite its importance, this task remains challenging due to the difficulty in modeling long-term temporal consistency in a com...","url_abs":"https://arxiv.org/abs/2510.20178","url_pdf":"https://arxiv.org/pdf/2510.20178v1","authors":"[\"Yun Wang\",\"Junjie Hu\",\"Qiaole Dong\",\"Yongjian Zhang\",\"Yanwei Fu\",\"Tin Lun Lam\",\"Dapeng Wu\"]","published":"2025-10-23T03:52:39Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[]","has_code":false,"code_links":[{"ID":607851,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2850958,"paper_url":"https://arxiv.org/abs/2510.20178","paper_title":"PPMStereo: Pick-and-Play Memory Construction for Consistent Dynamic Stereo Matching","repo_url":"https://github.com/cocowy1/PPMStereo","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
