{"ID":2834982,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.01039","arxiv_id":"2512.01039","title":"Joint Partitioning and Placement of Foundation Models for Real-Time Edge AI","abstract":"Inference over large-scale foundation models within heterogeneous edge environments necessitates a fundamentally reconfigurable orchestration substrate. Static partitioning of model layers presumes temporal stability across compute and network resources, which is misaligned with the volatility of real-world deployments. We introduce a framework in which both the spatial placement and internal segmentation of foundation models are elevated to runtime-resolved constructs. The orchestration problem is formalized as a constrained optimization over layer-wise assignments, subject to evolving latency, utilization, and privacy gradients. The framework implements reactive inference composition responsive to infrastructural fluctuations by integrating model-aware capacity profiling with dynamic graph re-partitioning and reallocation. We introduce architectural and algorithmic components, along with a representative use case in 6G multi-access edge computing.","short_abstract":"Inference over large-scale foundation models within heterogeneous edge environments necessitates a fundamentally reconfigurable orchestration substrate. Static partitioning of model layers presumes temporal stability across compute and network resources, which is misaligned with the volatility of real-world deployments...","url_abs":"https://arxiv.org/abs/2512.01039","url_pdf":"https://arxiv.org/pdf/2512.01039v1","authors":"[\"Aladin Djuhera\",\"Fernando Koch\",\"Alecio Binotto\"]","published":"2025-11-30T19:16:30Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.LG\",\"cs.NI\"]","methods":"[]","has_code":false}
