{"ID":2921177,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-04T04:58:08.453578371Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.01702","arxiv_id":"2606.01702","title":"KDH-CAD: Knowledge-data hybrid CAD learning under data scarcity","abstract":"Deep learning in computer-aided design (CAD) remains fundamentally constrained by the data scarcity challenge: authentic CAD data is difficult to collect at scale, while synthetic data may not faithfully reflect real design practice. Rather than pursuing ever-larger CAD datasets, this paper alternatively treats CAD learning as a knowledge completion and calibration problem. It introduces KDH-CAD, a knowledge-data hybrid framework that integrates pretrained knowledge in foundation models, structured domain knowledge from textbooks/tutorials, and a very small amount of labeled CAD data. Domain knowledge is used to elicit and complete CAD-relevant concepts that are weakly expressed or under-represented in pretrained foundation models, while labeled CAD data calibrates these concepts in the latent space to account for task-specific geometric variability, without fine-tuning the foundation model. Experiments on real-world mechanical part classification show that KDH-CAD achieves strong performance in low-data regimes, reaching 92.6\\% accuracy with only 250 training samples, 95.8\\% with 1,000 samples, and continuing to improve with additional data. This matches or exceeds state-of-the-art performance that typically requires an order of magnitude more data. These results suggest that combining pretrained foundation models with structured domain knowledge can substantially reduce reliance on large-scale CAD datasets, providing a principled and practical direction for data-efficient CAD learning.","short_abstract":"Deep learning in computer-aided design (CAD) remains fundamentally constrained by the data scarcity challenge: authentic CAD data is difficult to collect at scale, while synthetic data may not faithfully reflect real design practice. Rather than pursuing ever-larger CAD datasets, this paper alternatively treats CAD lea...","url_abs":"https://arxiv.org/abs/2606.01702","url_pdf":"https://arxiv.org/pdf/2606.01702v1","authors":"[\"Ziqin Gao\",\"Zhijie Yang\",\"Qiang Zou\"]","published":"2026-06-01T05:11:54Z","proceeding":"cs.GR","tasks":"[\"cs.GR\",\"cs.LG\"]","methods":"[]","has_code":false}
