{"ID":3004963,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-04T19:14:31.964469513Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03138","arxiv_id":"2606.03138","title":"Section-Weighted Hybrid Approach for Legal Case Retrieval","abstract":"Finding truly analogous precedents requires capturing legal reasoning beyond surface word overlap. We present a two-stage, section-aware framework for legal case retrieval that first segments raw judgments into facts, issues, decision, and reasoning using a deterministic large language model (LLM) offline. In Stage 1, we combine parallel lexical (BM25) and semantic (dense ANN) whole-document searches via Reciprocal Rank Fusion (RRF) to form a high-recall candidate pool. In Stage 2, we perform fine-grained, like-for-like comparisons (e.g., query reasoning vs. candidate reasoning). To address the scale mismatch between unbounded lexical scores and cosine similarities, we apply query-wise Z-score normalization before aggregating signals with learned section weights. For the top results, the system returns the relevant section text with a concise, grounded rationale and party-stance labels. We evaluate on a jurisdiction-scale benchmark, demonstrating consistent gains over strong lexical and neural baselines while maintaining high candidate coverage","short_abstract":"Finding truly analogous precedents requires capturing legal reasoning beyond surface word overlap. We present a two-stage, section-aware framework for legal case retrieval that first segments raw judgments into facts, issues, decision, and reasoning using a deterministic large language model (LLM) offline. In Stage 1,...","url_abs":"https://arxiv.org/abs/2606.03138","url_pdf":"https://arxiv.org/pdf/2606.03138v1","authors":"[\"Rajith Arulanandam\",\"Nisansa de Silva\"]","published":"2026-06-02T04:27:53Z","proceeding":"cs.IR","tasks":"[\"cs.IR\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}