{"ID":2899683,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.00828","arxiv_id":"2507.00828","title":"ProxAnn: Use-Oriented Evaluations of Topic Models and Document Clustering","abstract":"Topic model and document-clustering evaluations either use automated metrics that align poorly with human preferences or require expert labels that are intractable to scale. We design a scalable human evaluation protocol and a corresponding automated approximation that reflect practitioners' real-world usage of models. Annotators -- or an LLM-based proxy -- review text items assigned to a topic or cluster, infer a category for the group, then apply that category to other documents. Using this protocol, we collect extensive crowdworker annotations of outputs from a diverse set of topic models on two datasets. We then use these annotations to validate automated proxies, finding that the best LLM proxies are statistically indistinguishable from a human annotator and can therefore serve as a reasonable substitute in automated evaluations. Package, web interface, and data are at https://github.com/ahoho/proxann","short_abstract":"Topic model and document-clustering evaluations either use automated metrics that align poorly with human preferences or require expert labels that are intractable to scale. We design a scalable human evaluation protocol and a corresponding automated approximation that reflect practitioners' real-world usage of models....","url_abs":"https://arxiv.org/abs/2507.00828","url_pdf":"https://arxiv.org/pdf/2507.00828v1","authors":"[\"Alexander Hoyle\",\"Lorena Calvo-Bartolomé\",\"Jordan Boyd-Graber\",\"Philip Resnik\"]","published":"2025-07-01T15:00:55Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\"]","has_code":false,"code_links":[{"ID":612508,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2899683,"paper_url":"https://arxiv.org/abs/2507.00828","paper_title":"ProxAnn: Use-Oriented Evaluations of Topic Models and Document Clustering","repo_url":"https://github.com/ahoho/proxann","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
