{"ID":2825447,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.21058","arxiv_id":"2512.21058","title":"Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control","abstract":"In computational pathology, understanding and generation have evolved along disparate paths: advanced understanding models already exhibit diagnostic-level competence, whereas generative models largely simulate pixels. Progress remains hindered by three coupled factors: the scarcity of large, high-quality image-text corpora; the lack of precise, fine-grained semantic control, which forces reliance on non-semantic cues; and terminological heterogeneity, where diverse phrasings for the same diagnostic concept impede reliable text conditioning. We introduce UniPath, a semantics-driven pathology image generation framework that leverages mature diagnostic understanding to enable controllable generation. UniPath implements Multi-Stream Control: a Raw-Text stream; a High-Level Semantics stream that uses learnable queries to a frozen pathology MLLM to distill paraphrase-robust Diagnostic Semantic Tokens and to expand prompts into diagnosis-aware attribute bundles; and a Prototype stream that affords component-level morphological control via a prototype bank. On the data front, we curate a 2.65M image-text corpus and a finely annotated, high-quality 68K subset to alleviate data scarcity. For a comprehensive assessment, we establish a four-tier evaluation hierarchy tailored to pathology. Extensive experiments demonstrate UniPath's SOTA performance, including a Patho-FID of 80.9 (51% better than the second-best) and fine-grained semantic control achieving 98.7% of the real-image. The dataset and code can be obtained from https://github.com/Hanminghao/UniPath.","short_abstract":"In computational pathology, understanding and generation have evolved along disparate paths: advanced understanding models already exhibit diagnostic-level competence, whereas generative models largely simulate pixels. Progress remains hindered by three coupled factors: the scarcity of large, high-quality image-text co...","url_abs":"https://arxiv.org/abs/2512.21058","url_pdf":"https://arxiv.org/pdf/2512.21058v2","authors":"[\"Minghao Han\",\"Yichen Liu\",\"Yizhou Liu\",\"Zizhi Chen\",\"Jingqun Tang\",\"Xuecheng Wu\",\"Dingkang Yang\",\"Lihua Zhang\"]","published":"2025-12-24T08:52:08Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Large Language Model\"]","has_code":false,"code_links":[{"ID":605665,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2825447,"paper_url":"https://arxiv.org/abs/2512.21058","paper_title":"Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control","repo_url":"https://github.com/Hanminghao/UniPath","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
