{"ID":2824021,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.00869","arxiv_id":"2601.00869","title":"Cultural Encoding in Large Language Models: The Existence Gap in AI-Mediated Brand Discovery","abstract":"As artificial intelligence systems increasingly mediate consumer information discovery, brands face algorithmic invisibility. This study investigates Cultural Encoding in Large Language Models (LLMs) -- systematic differences in brand recommendations arising from training data composition. Analyzing 1,909 pure-English queries across 6 LLMs (GPT-4o, Claude, Gemini, Qwen3, DeepSeek, Doubao) and 30 brands, we find Chinese LLMs exhibit 30.6 percentage points higher brand mention rates than International LLMs (88.9% vs. 58.3%, p\u003c.001). This disparity persists in identical English queries, indicating training data geography -- not language -- drives the effect. We introduce the Existence Gap: brands absent from LLM training corpora lack \"existence\" in AI responses regardless of quality. Through a case study of Zhizibianjie (OmniEdge), a collaboration platform with 65.6% mention rate in Chinese LLMs but 0% in International models (p\u003c.001), we demonstrate how Linguistic Boundary Barriers create invisible market entry obstacles. Theoretically, we contribute the Data Moat Framework, conceptualizing AI-visible content as a VRIN strategic resource. We operationalize Algorithmic Omnipresence -- comprehensive brand visibility across LLM knowledge bases -- as the strategic objective for Generative Engine Optimization (GEO). Managerially, we provide an 18-month roadmap for brands to build Data Moats through semantic coverage, technical depth, and cultural localization. Our findings reveal that in AI-mediated markets, the limits of a brand's \"Data Boundaries\" define the limits of its \"Market Frontiers.\"","short_abstract":"As artificial intelligence systems increasingly mediate consumer information discovery, brands face algorithmic invisibility. This study investigates Cultural Encoding in Large Language Models (LLMs) -- systematic differences in brand recommendations arising from training data composition. Analyzing 1,909 pure-English...","url_abs":"https://arxiv.org/abs/2601.00869","url_pdf":"https://arxiv.org/pdf/2601.00869v1","authors":"[\"Huang Junyao\",\"Situ Ruimin\",\"Ye Renqin\"]","published":"2025-12-30T13:50:14Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
