{"ID":2872029,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.14252","arxiv_id":"2509.14252","title":"LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures","abstract":"Large Language Model (LLM) pretraining, finetuning, and evaluation rely on input-space reconstruction and generative capabilities. Yet, it has been observed in vision that embedding-space training objectives, e.g., with Joint Embedding Predictive Architectures (JEPAs), are far superior to their input-space counterpart. That mismatch in how training is achieved between language and vision opens up a natural question: {\\em can language training methods learn a few tricks from the vision ones?} The lack of JEPA-style LLM is a testimony of the challenge in designing such objectives for language. In this work, we propose a first step in that direction where we develop LLM-JEPA, a JEPA based solution for LLMs applicable both to finetuning and pretraining. Thus far, LLM-JEPA is able to outperform the standard LLM training objectives by a significant margin across models, all while being robust to overfiting. Those findings are observed across numerous datasets (NL-RX, GSM8K, Spider, RottenTomatoes) and various models from the Llama3, OpenELM, Gemma2 and Olmo families. Code: https://github.com/rbalestr-lab/llm-jepa.","short_abstract":"Large Language Model (LLM) pretraining, finetuning, and evaluation rely on input-space reconstruction and generative capabilities. Yet, it has been observed in vision that embedding-space training objectives, e.g., with Joint Embedding Predictive Architectures (JEPAs), are far superior to their input-space counterpart....","url_abs":"https://arxiv.org/abs/2509.14252","url_pdf":"https://arxiv.org/pdf/2509.14252v2","authors":"[\"Hai Huang\",\"Yann LeCun\",\"Randall Balestriero\"]","published":"2025-09-11T03:03:57Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":609919,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2872029,"paper_url":"https://arxiv.org/abs/2509.14252","paper_title":"LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures","repo_url":"https://github.com/rbalestr-lab/llm-jepa","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
