{"ID":2873444,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.06660","arxiv_id":"2509.06660","title":"Investigating Location-Regularised Self-Supervised Feature Learning for Seafloor Visual Imagery","abstract":"High-throughput interpretation of robotically gathered seafloor visual imagery can increase the efficiency of marine monitoring and exploration. Although recent research has suggested that location metadata can enhance self-supervised feature learning (SSL), its benefits across different SSL strategies, models and seafloor image datasets are underexplored. This study evaluates the impact of location-based regularisation on six state-of-the-art SSL frameworks, which include Convolutional Neural Network (CNN) and Vision Transformer (ViT) models with varying latent-space dimensionality. Evaluation across three diverse seafloor image datasets finds that location-regularisation consistently improves downstream classification performance over standard SSL, with average F1-score gains of $4.9 \\pm 4.0%$ for CNNs and $6.3 \\pm 8.9%$ for ViTs, respectively. While CNNs pretrained on generic datasets benefit from high-dimensional latent representations, dataset-optimised SSL achieves similar performance across the high (512) and low (128) dimensional latent representations. Location-regularised SSL improves CNN performance over pre-trained models by $2.7 \\pm 2.7%$ and $10.1 \\pm 9.4%$ for high and low-dimensional latent representations, respectively. For ViTs, high-dimensionality benefits both pre-trained and dataset-optimised SSL. Although location-regularisation improves SSL performance compared to standard SSL methods, pre-trained ViTs show strong generalisation, matching the best-performing location-regularised SSL with F1-scores of $0.795 \\pm 0.075$ and $0.795 \\pm 0.077$, respectively. The findings highlight the value of location metadata for SSL regularisation, particularly when using low-dimensional latent representations, and demonstrate strong generalisation of high-dimensional ViTs for seafloor image analysis.","short_abstract":"High-throughput interpretation of robotically gathered seafloor visual imagery can increase the efficiency of marine monitoring and exploration. Although recent research has suggested that location metadata can enhance self-supervised feature learning (SSL), its benefits across different SSL strategies, models and seaf...","url_abs":"https://arxiv.org/abs/2509.06660","url_pdf":"https://arxiv.org/pdf/2509.06660v1","authors":"[\"Cailei Liang\",\"Adrian Bodenmann\",\"Emma J Curtis\",\"Samuel Simmons\",\"Kazunori Nagano\",\"Stan Brown\",\"Adam Riese\",\"Blair Thornton\"]","published":"2025-09-08T13:19:04Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.RO\"]","methods":"[\"Vision Transformer\",\"Transformer\",\"LoRA\",\"Convolutional Neural Network\"]","has_code":false}