{"ID":2853364,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.16394","arxiv_id":"2510.16394","title":"FSAR-Cap: A Fine-Grained Two-Stage Annotated Dataset for SAR Image Captioning","abstract":"Synthetic Aperture Radar (SAR) image captioning enables scene-level semantic understanding and plays a crucial role in applications such as military intelligence and urban planning, but its development is limited by the scarcity of high-quality datasets. To address this, we present FSAR-Cap, a large-scale SAR captioning dataset with 14,480 images and 72,400 image-text pairs. FSAR-Cap is built on the FAIR-CSAR detection dataset and constructed through a two-stage annotation strategy that combines hierarchical template-based representation, manual verification and supplementation, prompt standardization. Compared with existing resources, FSAR-Cap provides richer fine-grained annotations, broader category coverage, and higher annotation quality. Benchmarking with multiple encoder-decoder architectures verifies its effectiveness, establishing a foundation for future research in SAR captioning and intelligent image interpretation.","short_abstract":"Synthetic Aperture Radar (SAR) image captioning enables scene-level semantic understanding and plays a crucial role in applications such as military intelligence and urban planning, but its development is limited by the scarcity of high-quality datasets. To address this, we present FSAR-Cap, a large-scale SAR captionin...","url_abs":"https://arxiv.org/abs/2510.16394","url_pdf":"https://arxiv.org/pdf/2510.16394v1","authors":"[\"Jinqi Zhang\",\"Lamei Zhang\",\"Bin Zou\"]","published":"2025-10-18T08:17:21Z","proceeding":"eess.IV","tasks":"[\"eess.IV\"]","methods":"[]","has_code":false}