{"ID":2836732,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.19919","arxiv_id":"2511.19919","title":"HybriDLA: Hybrid Generation for Document Layout Analysis","abstract":"Conventional document layout analysis (DLA) traditionally depends on empirical priors or a fixed set of learnable queries executed in a single forward pass. While sufficient for early-generation documents with a small, predetermined number of regions, this paradigm struggles with contemporary documents, which exhibit diverse element counts and increasingly complex layouts. To address challenges posed by modern documents, we present HybriDLA, a novel generative framework that unifies diffusion and autoregressive decoding within a single layer. The diffusion component iteratively refines bounding-box hypotheses, whereas the autoregressive component injects semantic and contextual awareness, enabling precise region prediction even in highly varied layouts. To further enhance detection quality, we design a multi-scale feature-fusion encoder that captures both fine-grained and high-level visual cues. This architecture elevates performance to 83.5% mean Average Precision (mAP). Extensive experiments on the DocLayNet and M$^6$Doc benchmarks demonstrate that HybriDLA sets a state-of-the-art performance, outperforming previous approaches. All data and models will be made publicly available at https://yufanchen96.github.io/projects/HybriDLA.","short_abstract":"Conventional document layout analysis (DLA) traditionally depends on empirical priors or a fixed set of learnable queries executed in a single forward pass. While sufficient for early-generation documents with a small, predetermined number of regions, this paradigm struggles with contemporary documents, which exhibit d...","url_abs":"https://arxiv.org/abs/2511.19919","url_pdf":"https://arxiv.org/pdf/2511.19919v1","authors":"[\"Yufan Chen\",\"Omar Moured\",\"Ruiping Liu\",\"Junwei Zheng\",\"Kunyu Peng\",\"Jiaming Zhang\",\"Rainer Stiefelhagen\"]","published":"2025-11-25T04:53:47Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false}
