Источник
IEEE Access
Дата публикации
27.02.2025
Авторы
Максим Алёшин Светлана Илларионова Илья Новиков Юлия Выборнова Дмитрий Шадрин Артем Никоноров Евгений Бурнаев
Поделиться

Self-supervised learning for temporal action segmentation in industrial and manufacturing videos

Аннотация

Reliable methods for process control are required in maintenance to prevent emergencies, maintain high quality of work, and minimize potential risks to workers. Strict adherence to technological processes is also an essential requirement for compliance with production standards. The development of computer vision technologies opens new prospects in the field of video analytics for manufacturing and technological processes. However, effective implementation of AI-based solutions in monitoring systems requires overcoming a number of existing challenges. Among these limitations are difficulties in collecting annotated datasets necessary for training neural network models to solve specific tasks. In this work, we propose a methodology for creating a foundation industrial model (FIM) based on deep learning technology, that allows easy adaptation of algorithms for specific tasks with minimal requirements for annotated datasets. The solution for temporal action segmentation involves the use of self-supervised learning concepts and visual transformers. In order to validate the proposed approach, a dataset was prepared from several technological processes and a part of the data was annotated to enable model training for up to 14 event categories. The tested approaches include the X3D-M architecture and ViT-L. The pre-training is performed using the V-JEPA approach. On the test subset, the proposed approach leads to the Mean over Frame (MoF) of 0.817 and 0.653 for oil change and tire replacement process recognition, respectively. The experiments carried out confirm the feasibility of the proposed approach.

Присоединяйтесь к AIRI в соцсетях