{"ID":2872292,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.09630","arxiv_id":"2509.09630","title":"I Know Who Clones Your Code: Interpretable Smart Contract Similarity Detection","abstract":"Widespread reuse of open-source code in smart contract development boosts programming efficiency but significantly amplifies bug propagation across contracts, while dedicated methods for detecting similar smart contract functions remain very limited. Conventional abstract-syntax-tree (AST) based methods for smart contract similarity detection face challenges in handling intricate tree structures, which impedes detailed semantic comparison of code. Recent deep-learning based approaches tend to overlook code syntax and detection interpretability, resulting in suboptimal performance. To fill this research gap, we introduce SmartDetector, a novel approach for computing similarity between smart contract functions, explainable at the fine-grained statement level. Technically, SmartDetector decomposes the AST of a smart contract function into a series of smaller statement trees, each reflecting a structural element of the source code. Then, SmartDetector uses a classifier to compute the similarity score of two functions by comparing each pair of their statement trees. To address the infinite hyperparameter space of the classifier, we mathematically derive a cosine-wise diffusion process to efficiently search optimal hyperparameters. Extensive experiments conducted on three large real-world datasets demonstrate that SmartDetector outperforms current state-of-the-art methods by an average improvement of 14.01% in F1-score, achieving an overall average F1-score of 95.88%.","short_abstract":"Widespread reuse of open-source code in smart contract development boosts programming efficiency but significantly amplifies bug propagation across contracts, while dedicated methods for detecting similar smart contract functions remain very limited. Conventional abstract-syntax-tree (AST) based methods for smart contr...","url_abs":"https://arxiv.org/abs/2509.09630","url_pdf":"https://arxiv.org/pdf/2509.09630v1","authors":"[\"Zhenguang Liu\",\"Lixun Ma\",\"Zhongzheng Mu\",\"Chengkun Wei\",\"Xiaojun Xu\",\"Yingying Jiao\",\"Kui Ren\"]","published":"2025-09-11T17:15:51Z","proceeding":"cs.SE","tasks":"[\"cs.SE\",\"cs.CR\"]","methods":"[\"Diffusion Model\"]","has_code":false}
