{"ID":2833266,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.03381","arxiv_id":"2512.03381","title":"Characterizing Language Use in a Collaborative Situated Game","abstract":"Cooperative video games, where multiple participants must coordinate by communicating and reasoning under uncertainty in complex environments, yield a rich source of language data. We collect the Portal Dialogue Corpus: a corpus of 11.5 hours of spoken human dialogue in the co-op mode of the popular Portal 2 virtual puzzle game, comprising 24.5K total utterances. We analyze player language and behavior, identifying a number of linguistic phenomena that rarely appear in most existing chitchat or task-oriented dialogue corpora, including complex spatial reference, clarification and repair, and ad-hoc convention formation. To support future analyses of language use in complex, situated, collaborative problem-solving scenarios, we publicly release the corpus, which comprises player videos, audio, transcripts, game state data, and both manual and automatic annotations of language data.","short_abstract":"Cooperative video games, where multiple participants must coordinate by communicating and reasoning under uncertainty in complex environments, yield a rich source of language data. We collect the Portal Dialogue Corpus: a corpus of 11.5 hours of spoken human dialogue in the co-op mode of the popular Portal 2 virtual pu...","url_abs":"https://arxiv.org/abs/2512.03381","url_pdf":"https://arxiv.org/pdf/2512.03381v2","authors":"[\"Nicholas Tomlin\",\"Naitian Zhou\",\"Eve Fleisig\",\"Liangyuan Chen\",\"Téa Wright\",\"Lauren Vinh\",\"Laura X. Ma\",\"Seun Eisape\",\"Ellie French\",\"Tingting Du\",\"Tianjiao Zhang\",\"Alexander Koller\",\"Alane Suhr\"]","published":"2025-12-03T02:29:53Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[]","has_code":false}
