{"ID":2846723,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.01953","arxiv_id":"2511.01953","title":"Reliability Assessment Framework Based on Feature Separability for Pathological Cell Image Classification under Prior Bias","abstract":"Background and objective: Prior probability shift between training and deployment datasets challenges deep learning-based medical image classification. Standard correction methods reweight posterior probabilities to adjust prior bias, yet their benefit is inconsistent. We developed a reliability framework identifying when prior correction helps or harms performance in pathological cell image analysis. Methods: We analyzed 303 colorectal cancer specimens with CD103/CD8 immunostaining, yielding 185,432 annotated cell images across 16 cell types. ResNet models were trained under varying bias ratios (1.1-20$\\times$). Feature separability was quantified using cosine similarity-based likelihood quality scores, reflecting intra- versus inter-class distinctions in learned feature spaces. Multiple linear regression, ANOVA, and generalized additive models (GAMs) evaluated associations among feature separability, prior bias, sample adequacy, and F1 performance. Results: Feature separability dominated performance ($β= 1.650$, $p \u003c 0.001$), showing 412-fold stronger impact than prior bias ($β= 0.004$, $p = 0.018$). GAM analysis showed strong predictive power ($R^2 = 0.876$) with mostly linear trends. A quality threshold of 0.294 effectively identified cases requiring correction (AUC = 0.610). Cell types scoring $\u003e0.5$ were robust without correction, whereas those $\u003c0.3$ consistently required adjustment. Conclusion: Feature extraction quality, not bias magnitude, governs correction benefit. The proposed framework provides quantitative guidance for selective correction, enabling efficient deployment and reliable diagnostic AI.","short_abstract":"Background and objective: Prior probability shift between training and deployment datasets challenges deep learning-based medical image classification. Standard correction methods reweight posterior probabilities to adjust prior bias, yet their benefit is inconsistent. We developed a reliability framework identifying w...","url_abs":"https://arxiv.org/abs/2511.01953","url_pdf":"https://arxiv.org/pdf/2511.01953v2","authors":"[\"Takaaki Tachibana\",\"Toru Nagasaka\",\"Yukari Adachi\",\"Hiroki Kagiyama\",\"Ryota Ito\",\"Mitsugu Fujita\",\"Kimihiro Yamashita\",\"Yoshihiro Kakeji\"]","published":"2025-11-03T14:33:11Z","proceeding":"q-bio.QM","tasks":"[\"q-bio.QM\",\"eess.IV\"]","methods":"[]","has_code":false}