{"ID":2829184,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.13904","arxiv_id":"2512.13904","title":"Generative AI for Video Translation: A Scalable Architecture for Multilingual Video Conferencing","abstract":"The real-time deployment of cascaded generative AI pipelines for applications like video translation is constrained by significant system-level challenges. These include the cumulative latency of sequential model inference and the quadratic ($\\mathcal{O}(N^2)$) computational complexity that renders multi-user video conferencing applications unscalable. This paper proposes and evaluates a practical system-level framework designed to mitigate these critical bottlenecks. The proposed architecture incorporates a turn-taking mechanism to reduce computational complexity from quadratic to linear in multi-user scenarios, and a segmented processing protocol to manage inference latency for a perceptually real-time experience. We implement a proof-of-concept pipeline and conduct a rigorous performance analysis across a multi-tiered hardware setup, including commodity (NVIDIA RTX 4060), cloud (NVIDIA T4), and enterprise (NVIDIA A100) GPUs. Our objective evaluation demonstrates that the system achieves real-time throughput ($τ\u003c 1.0$) on modern hardware. A subjective user study further validates the approach, showing that a predictable, initial processing delay is highly acceptable to users in exchange for a smooth, uninterrupted playback experience. The work presents a validated, end-to-end system design that offers a practical roadmap for deploying scalable, real-time generative AI applications in multilingual communication platforms.","short_abstract":"The real-time deployment of cascaded generative AI pipelines for applications like video translation is constrained by significant system-level challenges. These include the cumulative latency of sequential model inference and the quadratic ($\\mathcal{O}(N^2)$) computational complexity that renders multi-user video con...","url_abs":"https://arxiv.org/abs/2512.13904","url_pdf":"https://arxiv.org/pdf/2512.13904v1","authors":"[\"Amirkia Rafiei Oskooei\",\"Eren Caglar\",\"Ibrahim Sahin\",\"Ayse Kayabay\",\"Mehmet S. Aktas\"]","published":"2025-12-15T21:21:09Z","proceeding":"cs.MM","tasks":"[\"cs.MM\",\"cs.AI\",\"cs.CL\",\"cs.CV\"]","methods":"[]","has_code":false}