{"ID":2871062,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.12179","arxiv_id":"2509.12179","title":"Co-Alignment: Rethinking Alignment as Bidirectional Human-AI Cognitive Adaptation","abstract":"Current AI alignment through RLHF follows a single directional paradigm that AI conforms to human preferences while treating human cognition as fixed. We propose a shift to co-alignment through Bidirectional Cognitive Alignment (BiCA), where humans and AI mutually adapt. BiCA uses learnable protocols, representation mapping, and KL-budget constraints for controlled co-evolution. In collaborative navigation, BiCA achieved 85.5% success versus 70.3% baseline, with 230% better mutual adaptation and 332% better protocol convergence. Emergent protocols outperformed handcrafted ones by 84%, while bidirectional adaptation unexpectedly improved safety (+23% out-of-distribution robustness). The 46% synergy improvement demonstrates optimal collaboration exists at the intersection, not union, of human and AI capabilities, validating the shift from single-directional to co-alignment paradigms.","short_abstract":"Current AI alignment through RLHF follows a single directional paradigm that AI conforms to human preferences while treating human cognition as fixed. We propose a shift to co-alignment through Bidirectional Cognitive Alignment (BiCA), where humans and AI mutually adapt. BiCA uses learnable protocols, representation ma...","url_abs":"https://arxiv.org/abs/2509.12179","url_pdf":"https://arxiv.org/pdf/2509.12179v5","authors":"[\"Yubo Li\",\"Weiyi Song\"]","published":"2025-09-15T17:41:16Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.MA\"]","methods":"[\"RLHF\"]","has_code":false}