{"ID":2923685,"CreatedAt":"2026-06-02T04:05:25.881865328Z","UpdatedAt":"2026-06-04T13:12:39.622923895Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.02214","arxiv_id":"2606.02214","title":"Do Gender Cues Affect LLM Value Trade-offs? Evidence from a Controlled Decision Benchmark","abstract":"Large language models are increasingly used in value-sensitive decision settings, where irrelevant demographic cues should not alter judgments. We construct the Realistic Value Decision Benchmark (RVDB), a controlled benchmark that varies only the role-gender configuration while holding the scenario, ordered value pair, roles, candidate decisions, Value Distance, and Decision Severity fixed. Using a position-balanced evaluation across seven models, we test whether models preserve decision invariance under gender perturbations and whether their self-attributions reflect observed behavioral changes. We find that explicit gender cues induce bounded but systematic decision flips, including under an explicit gender-attribution prompt that asks models to report whether gender influenced their choice. Cross-gender role swaps reveal a consistent female-proposed-decision asymmetry, while models often attribute flipped decisions to No Influence or other non-gender factors. Further analysis shows that gender effects concentrate near less determinate value boundaries and under more severe decision contexts, suggesting that gender cues act as local boundary-shifting factors rather than global overrides of value reasoning. Value rankings remain largely stable, but ordered value-pair trade-offs shift unevenly across role-gender configurations. These results show that gender can enter LLM value trade-offs behaviorally while remaining obscured in self-attribution, motivating controlled behavioral audits beyond explanation-based evaluation.","short_abstract":"Large language models are increasingly used in value-sensitive decision settings, where irrelevant demographic cues should not alter judgments. We construct the Realistic Value Decision Benchmark (RVDB), a controlled benchmark that varies only the role-gender configuration while holding the scenario, ordered value pair...","url_abs":"https://arxiv.org/abs/2606.02214","url_pdf":"https://arxiv.org/pdf/2606.02214v1","authors":"[\"Yangyang Liu\",\"Dong Yu\",\"Pengyuan Liu\"]","published":"2026-06-01T13:14:10Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
