{"ID":2880435,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.15109","arxiv_id":"2508.15109","title":"Homomorphism Calculus for User-Defined Aggregations","abstract":"Data processing frameworks like Apache Spark and Flink provide built-in support for user-defined aggregation functions (UDAFs), enabling the integration of domain-specific logic. However, for these frameworks to support \\emph{efficient} UDAF execution, the function needs to satisfy a \\emph{homomorphism property}, which ensures that partial results from independent computations can be merged correctly. Motivated by this problem, this paper introduces a novel \\emph{homomorphism calculus} that can both verify and refute whether a UDAF is a dataframe homomorphism. If so, our calculus also enables the construction of a corresponding merge operator which can be used for incremental computation and parallel execution. We have implemented an algorithm based on our proposed calculus and evaluate it on real-world UDAFs, demonstrating that our approach significantly outperforms two leading synthesizers.","short_abstract":"Data processing frameworks like Apache Spark and Flink provide built-in support for user-defined aggregation functions (UDAFs), enabling the integration of domain-specific logic. However, for these frameworks to support \\emph{efficient} UDAF execution, the function needs to satisfy a \\emph{homomorphism property}, which...","url_abs":"https://arxiv.org/abs/2508.15109","url_pdf":"https://arxiv.org/pdf/2508.15109v1","authors":"[\"Ziteng Wang\",\"Ruijie Fang\",\"Linus Zheng\",\"Dixin Tang\",\"Isil Dillig\"]","published":"2025-08-20T22:56:38Z","proceeding":"cs.PL","tasks":"[\"cs.PL\"]","methods":"[]","has_code":false}
