A near-linear time approximation scheme for $(k,\ell)$-median clustering under discrete Fréchet distance

cs.DS arXiv:2508.07008
View PDF arXiv JSON

Abstract

A time series of complexity $m$ is a sequence of $m$ real valued measurements. The discrete Fréchet distance $d_{dF}(x,y)$ is a distance measure between two time series $x$ and $y$ of possibly different complexity. Given a set of $n$ time series represented as $m$-dimensional vectors over the reals, the $(k,\ell)$-median problem under discrete Fréchet distance aims to find a set $C$ of $k$ time series of complexity $\ell$ such that $$\sum_{x\in P} \min_{c\in C} d_{dF}(x,c)$$ is minimized. In this paper, we give the first near-linear time $(1+\varepsilon)$-approximation algorithm for this problem when $\ell$ and $\varepsilon$ are constants but $k$ can be as large as $Ω(n)$. We obtain our result by introducing a new dimension reduction technique for discrete Fréchet distance and then adapt an algorithm of Cohen-Addad et al. (J. ACM 2021) to work on the dimension-reduced input. As a byproduct we also improve the best coreset construction for $(k,\ell)$-median under discrete Fréchet distance (Cohen-Addad et al., SODA 2025) and show that its size can be independent of the number of input time series \emph{ and } their complexity.

PDF Viewer