{"ID":2921903,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-04T00:38:48.101967128Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.01520","arxiv_id":"2606.01520","title":"TERRA: Task-Embedded Reasoning and Representation Architecture for Cross-Domain Applications","abstract":"A single action-conditioned latent predictive architecture can in principle be trained on the structured state of a driving scene, a robot workspace, or a financial order book. The ingredients for doing so within any one domain already exist and are individually validated: masked-latent prediction, action-conditioned latent world models, discrete action tokenization, and joint-embedding prediction on voxelized state. What is not established, and what TERRA addresses, is the transfer question: when does a representation or predictor learned in one structured-state domain carry over to a structurally analogous but otherwise unrelated domain, and by how much. We give this question a formal treatment. We model each domain as a controlled Markov process on a graded latent grid, factor any instantiation into thin domain adapters and a shared domain-invariant core, and identify a cross-domain correspondence with an approximate Markov decision process homomorphism whose quality is measured by a lax bisimulation discrepancy and, for domains lacking a shared coordinate system, by a Gromov-Wasserstein distance between their action-conditioned transition operators. Under a Lipschitz predictor we derive a transfer bound that separates source-model error from structural mismatch, grows geometrically in the prediction horizon, and is certified from below by the Gromov-Wasserstein distance; we then connect latent error to decision regret through the Lipschitz value property of bisimulation metrics. The resulting Structured-State Transfer Hypothesis is stated as a falsifiable claim with a preregistered experimental program, centered on a transfer test from driving scenes to order books, including conditions under which it is refuted. We present no empirical results: this is a research proposal that converts a widely repeated intuition into testable theory.","short_abstract":"A single action-conditioned latent predictive architecture can in principle be trained on the structured state of a driving scene, a robot workspace, or a financial order book. The ingredients for doing so within any one domain already exist and are individually validated: masked-latent prediction, action-conditioned l...","url_abs":"https://arxiv.org/abs/2606.01520","url_pdf":"https://arxiv.org/pdf/2606.01520v1","authors":"[\"Shayan Shokri\"]","published":"2026-06-01T00:50:59Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[]","has_code":false}
