{"ID":2840744,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.13655","arxiv_id":"2511.13655","title":"OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation","abstract":"Earth observation data presents a unique challenge: it is spatial like images, sequential like video or text, and highly multimodal. We present OlmoEarth: a multimodal, spatio-temporal foundation model that employs a novel self-supervised learning formulation, masking strategy, and loss all designed for the Earth observation domain. OlmoEarth achieves state-of-the-art performance compared to 12 other foundation models across a variety of research benchmarks and real-world tasks from external partners. When evaluating embeddings OlmoEarth achieves the best performance on 15 out of 24 tasks, and with full fine-tuning it is the best on 19 of 29 tasks. We deploy OlmoEarth as the backbone of an end-to-end platform for data collection, labeling, training, and inference of Earth observation models. The OlmoEarth Platform puts frontier foundation models and powerful data management tools into the hands of non-profits and NGOs working to solve the world's biggest problems. OlmoEarth source code, training data, and pre-trained weights are available at $\\href{https://github.com/allenai/olmoearth_pretrain}{\\text{https://github.com/allenai/olmoearth_pretrain}}$.","short_abstract":"Earth observation data presents a unique challenge: it is spatial like images, sequential like video or text, and highly multimodal. We present OlmoEarth: a multimodal, spatio-temporal foundation model that employs a novel self-supervised learning formulation, masking strategy, and loss all designed for the Earth obser...","url_abs":"https://arxiv.org/abs/2511.13655","url_pdf":"https://arxiv.org/pdf/2511.13655v1","authors":"[\"Henry Herzog\",\"Favyen Bastani\",\"Yawen Zhang\",\"Gabriel Tseng\",\"Joseph Redmon\",\"Hadrien Sablon\",\"Ryan Park\",\"Jacob Morrison\",\"Alexandra Buraczynski\",\"Karen Farley\",\"Joshua Hansen\",\"Andrew Howe\",\"Patrick Alan Johnson\",\"Mark Otterlee\",\"Ted Schmitt\",\"Hunter Pitelka\",\"Stephen Daspit\",\"Rachel Ratner\",\"Christopher Wilhelm\",\"Sebastian Wood\",\"Mike Jacobi\",\"Hannah Kerner\",\"Evan Shelhamer\",\"Ali Farhadi\",\"Ranjay Krishna\",\"Patrick Beukema\"]","published":"2025-11-17T18:06:26Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.LG\"]","methods":"[]","has_code":false,"code_links":[{"ID":606995,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2840744,"paper_url":"https://arxiv.org/abs/2511.13655","paper_title":"OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation","repo_url":"https://github.com/allenai/olmoearth_pretrain","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}