{"ID":2836789,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.19998","arxiv_id":"2511.19998","title":"REWA: A General Theory of Witness-Based Similarity","abstract":"We present a universal framework for similarity-preserving encodings that subsumes all discrete, continuous, algebraic, and learned similarity methods under a single theoretical umbrella. By formulating similarity as functional witness projection over monoids, we prove that \\[ O\\!\\left(\\frac{1}{Δ^{2}}\\log N\\right) \\] encoding complexity with ranking preservation holds for arbitrary algebraic structures. This unification reveals that Bloom filters, Locality Sensitive Hashing (LSH), Count-Min sketches, Random Fourier Features, and Transformer attention kernels are instances of the same underlying mechanism. We provide complete proofs with explicit constants under 4-wise independent hashing, handle heavy-tailed witnesses via normalization and clipping, and prove \\[ O(\\log N) \\] complexity for all major similarity methods from 1970-2024. We give explicit constructions for Boolean, Natural, Real, Tropical, and Product monoids, prove tight concentration bounds, and demonstrate compositional properties enabling multi-primitive similarity systems.","short_abstract":"We present a universal framework for similarity-preserving encodings that subsumes all discrete, continuous, algebraic, and learned similarity methods under a single theoretical umbrella. By formulating similarity as functional witness projection over monoids, we prove that \\[ O\\!\\left(\\frac{1}{Δ^{2}}\\log N\\right) \\] e...","url_abs":"https://arxiv.org/abs/2511.19998","url_pdf":"https://arxiv.org/pdf/2511.19998v2","authors":"[\"Nikit Phadke\"]","published":"2025-11-25T07:04:44Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.DS\",\"cs.IR\"]","methods":"[\"Transformer\"]","has_code":false}