Efficient Transformer for Video Summarization

Video Summarization, Deep Learning, Transformers

Abstract

The amount of user-generated content is increasing daily. That is especially true for video content that became popular with social media like TikTok. Other internet sources keep up and easier the way for video sharing. That is why automatic tools for finding core information of content but decreasing its volume are essential. Video summarization is aimed to help with it. In this work, we propose a transformer-based approach to supervised video summarization. Previous applications of attention architectures either used lighter versions or loaded models with RNN modules, that slower computations. Our proposed framework uses all advantages of transformers. Extensive evaluation on two benchmark datasets showed that the introduced model outperform existed approaches on the SumMe dataset by 3% and shows comparable results on the TVSum dataset.

Full text