{"ID":2871996,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.09067","arxiv_id":"2509.09067","title":"Improvement of Human-Object Interaction Action Recognition Using Scene Information and Multi-Task Learning Approach","abstract":"Recent graph convolutional neural networks (GCNs) have shown high performance in the field of human action recognition by using human skeleton poses. However, it fails to detect human-object interaction cases successfully due to the lack of effective representation of the scene information and appropriate learning architectures. In this context, we propose a methodology to utilize human action recognition performance by considering fixed object information in the environment and following a multi-task learning approach. In order to evaluate the proposed method, we collected real data from public environments and prepared our data set, which includes interaction classes of hands-on fixed objects (e.g., ATM ticketing machines, check-in/out machines, etc.) and non-interaction classes of walking and standing. The multi-task learning approach, along with interaction area information, succeeds in recognizing the studied interaction and non-interaction actions with an accuracy of 99.25%, outperforming the accuracy of the base model using only human skeleton poses by 2.75%.","short_abstract":"Recent graph convolutional neural networks (GCNs) have shown high performance in the field of human action recognition by using human skeleton poses. However, it fails to detect human-object interaction cases successfully due to the lack of effective representation of the scene information and appropriate learning arch...","url_abs":"https://arxiv.org/abs/2509.09067","url_pdf":"https://arxiv.org/pdf/2509.09067v3","authors":"[\"Hesham M. Shehata\",\"Mohammad Abdolrahmani\"]","published":"2025-09-11T00:14:56Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
