{"ID":2864029,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.25564","arxiv_id":"2509.25564","title":"FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology","abstract":"Multimodal large language models (MLLMs) have demonstrated impressive cross-domain capabilities, yet their proficiency in specialized scientific fields like marine biology remains underexplored. In this work, we systematically evaluate state-of-the-art MLLMs and reveal significant limitations in their ability to perform fine-grained recognition of fish species, with the best open-source models achieving less than 10\\% accuracy. This task is critical for monitoring marine ecosystems under anthropogenic pressure. To address this gap and investigate whether these failures stem from a lack of domain knowledge, we introduce FishNet++, a large-scale, multimodal benchmark. FishNet++ significantly extends existing resources with 35,133 textual descriptions for multimodal learning, 706,426 key-point annotations for morphological studies, and 119,399 bounding boxes for detection. By providing this comprehensive suite of annotations, our work facilitates the development and evaluation of specialized vision-language models capable of advancing aquatic science.","short_abstract":"Multimodal large language models (MLLMs) have demonstrated impressive cross-domain capabilities, yet their proficiency in specialized scientific fields like marine biology remains underexplored. In this work, we systematically evaluate state-of-the-art MLLMs and reveal significant limitations in their ability to perfor...","url_abs":"https://arxiv.org/abs/2509.25564","url_pdf":"https://arxiv.org/pdf/2509.25564v1","authors":"[\"Faizan Farooq Khan\",\"Yousef Radwan\",\"Eslam Abdelrahman\",\"Abdulwahab Felemban\",\"Aymen Mir\",\"Nico K. Michiels\",\"Andrew J. Temple\",\"Michael L. Berumen\",\"Mohamed Elhoseiny\"]","published":"2025-09-29T22:39:58Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
