Journal of Software Engineering and Applications

Volume 14, Issue 5 (May 2021)

ISSN Print: 1945-3116   ISSN Online: 1945-3124

Google-based Impact Factor: 1.22  Citations  h5-index & Ranking

Action Recognition Using Multi-Scale Temporal Shift Module and Temporal Feature Difference Extraction Based on 2D CNN

HTML  XML Download Download as PDF (Size: 2212KB)  PP. 172-188  
DOI: 10.4236/jsea.2021.145011    5 Downloads   17 Views  Citations

ABSTRACT

Convolutional neural networks, which have achieved outstanding performance in image recognition, have been extensively applied to action recognition. The mainstream approaches to video understanding can be categorized into two-dimensional and three-dimensional convolutional neural networks. Although three-dimensional convolutional filters can learn the temporal correlation between different frames by extracting the features of multiple frames simultaneously, it results in an explosive number of parameters and calculation cost. Methods based on two-dimensional convolutional neural networks use fewer parameters; they often incorporate optical flow to compensate for their inability to learn temporal relationships. However, calculating the corresponding optical flow results in additional calculation cost; further, it necessitates the use of another model to learn the features of optical flow. We proposed an action recognition framework based on the two-dimensional convolutional neural network; therefore, it was necessary to resolve the lack of temporal relationships. To expand the temporal receptive field, we proposed a multi-scale temporal shift module, which was then combined with a temporal feature difference extraction module to extract the difference between the features of different frames. Finally, the model was compressed to make it more compact. We evaluated our method on two major action recognition benchmarks: the HMDB51 and UCF-101 datasets. Before compression, the proposed method achieved an accuracy of 72.83% on the HMDB51 dataset and 96.25% on the UCF-101 dataset. Following compression, the accuracy was still impressive, at 95.57% and 72.19% on each dataset. The final model was more compact than most related works.

Share and Cite:

Wu, K. and Chiu, C. (2021) Action Recognition Using Multi-Scale Temporal Shift Module and Temporal Feature Difference Extraction Based on 2D CNN. Journal of Software Engineering and Applications, 14, 172-188. doi: 10.4236/jsea.2021.145011.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.