Abstract:
A method for keyframe extraction based on action semantics is proposed to address the issues of redundancy, action semantic loss, uncontrolled frame numbers, and poor real-time performance in existing keyframe extraction methods for production line monitoring videos. Firstly, the ORB local features of the video are extracted to train an action semantic dictionary, and the local features are mapped into the action semantic space to enhance the semantic information in feature representation. Secondly, VLAD encoding is constructed to generate global video features. Finally, K-means clustering is applied to ensure precise and controllable extraction results. Experimental results demonstrate that the proposed method retains complete action semantics with a compression ratio as low as 3.33%, achieving a 52.16% improvement in
F1 score compared to the baseline method. Compared with several other methods, the proposed approach exhibits superior overall performance. It can quickly, uniformly, and stably extract keyframes, with strong adaptability and robustness to varying compression ratios. The method offers advantages such as low redundancy and high restoration accuracy.