{"ID":2888303,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.23398","arxiv_id":"2507.23398","title":"Smart Video Capsule Endoscopy: Raw Image-Based Localization for Enhanced GI Tract Investigation","abstract":"For many real-world applications involving low-power sensor edge devices deep neural networks used for image classification might not be suitable. This is due to their typically large model size and require- ment of operations often exceeding the capabilities of such resource lim- ited devices. Furthermore, camera sensors usually capture images with a Bayer color filter applied, which are subsequently converted to RGB images that are commonly used for neural network training. However, on resource-constrained devices, such conversions demands their share of energy and optimally should be skipped if possible. This work ad- dresses the need for hardware-suitable AI targeting sensor edge devices by means of the Video Capsule Endoscopy, an important medical proce- dure for the investigation of the small intestine, which is strongly limited by its battery lifetime. Accurate organ classification is performed with a final accuracy of 93.06% evaluated directly on Bayer images involv- ing a CNN with only 63,000 parameters and time-series analysis in the form of Viterbi decoding. Finally, the process of capturing images with a camera and raw image processing is demonstrated with a customized PULPissimo System-on-Chip with a RISC-V core and an ultra-low power hardware accelerator providing an energy-efficient AI-based image clas- sification approach requiring just 5.31 μJ per image. As a result, it is possible to save an average of 89.9% of energy before entering the small intestine compared to classic video capsules.","short_abstract":"For many real-world applications involving low-power sensor edge devices deep neural networks used for image classification might not be suitable. This is due to their typically large model size and require- ment of operations often exceeding the capabilities of such resource lim- ited devices. Furthermore, camera sens...","url_abs":"https://arxiv.org/abs/2507.23398","url_pdf":"https://arxiv.org/pdf/2507.23398v1","authors":"[\"Oliver Bause\",\"Julia Werner\",\"Paul Palomero Bernardo\",\"Oliver Bringmann\"]","published":"2025-07-31T10:13:39Z","proceeding":"eess.IV","tasks":"[\"eess.IV\",\"cs.AR\",\"cs.CV\"]","methods":"[\"Generative Adversarial Network\",\"Convolutional Neural Network\"]","has_code":false}