{"ID":2836236,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.21060","arxiv_id":"2511.21060","title":"Zipf Distributions from Two-Stage Symbolic Processes: Stability Under Stochastic Lexical Filtering","abstract":"Zipf's law in language lacks a definitive origin, debated across fields. This study explains Zipf-like behavior using geometric mechanisms without linguistic elements. The Full Combinatorial Word Model (FCWM) forms words from a finite alphabet, generating a geometric distribution of word lengths. Interacting exponential forces yield a power-law rank-frequency curve, determined by alphabet size and blank symbol probability. Simulations support predictions, matching English, Russian, and mixed-genre data. The symbolic model suggests Zipf-type laws arise from geometric constraints, not communicative efficiency.","short_abstract":"Zipf's law in language lacks a definitive origin, debated across fields. This study explains Zipf-like behavior using geometric mechanisms without linguistic elements. The Full Combinatorial Word Model (FCWM) forms words from a finite alphabet, generating a geometric distribution of word lengths. Interacting exponentia...","url_abs":"https://arxiv.org/abs/2511.21060","url_pdf":"https://arxiv.org/pdf/2511.21060v1","authors":"[\"Vladimir Berman\"]","published":"2025-11-26T04:59:40Z","proceeding":"stat.ME","tasks":"[\"stat.ME\",\"cs.CL\",\"stat.ML\"]","methods":"[]","has_code":false}
