FSAR-Cap: A Fine-Grained Two-Stage Annotated Dataset for SAR Image Captioning

eess.IV arXiv:2510.16394
View PDF arXiv JSON

Abstract

Synthetic Aperture Radar (SAR) image captioning enables scene-level semantic understanding and plays a crucial role in applications such as military intelligence and urban planning, but its development is limited by the scarcity of high-quality datasets. To address this, we present FSAR-Cap, a large-scale SAR captioning dataset with 14,480 images and 72,400 image-text pairs. FSAR-Cap is built on the FAIR-CSAR detection dataset and constructed through a two-stage annotation strategy that combines hierarchical template-based representation, manual verification and supplementation, prompt standardization. Compared with existing resources, FSAR-Cap provides richer fine-grained annotations, broader category coverage, and higher annotation quality. Benchmarking with multiple encoder-decoder architectures verifies its effectiveness, establishing a foundation for future research in SAR captioning and intelligent image interpretation.

PDF Viewer