Optimized Inflated 3D Convolutional Neural Networks for Robust Human Action Recognition in Surveillance Videos
Author Details
Journal Details
Published
Published: 13 August 2019 | Article Type : Research ArticleAbstract
Human action recognition in video surveillance remains a challenging task in computer vision, particularly when dealing with long-duration activities, viewpoint variations, and crowded scenes. This paper presents an enhanced Optimized Inflated 3D Convolutional Neural Network (Opt-3D-Inflated-CNN) architecture designed specifically for accurate and efficient temporal-spatial feature extraction from surveillance video sequences. The proposed approach leverages 2D-to-3D filter inflation techniques combined with parallel branch architecture and temporal fusion mechanisms to capture both local motion patterns and global spatio-temporal dynamics. Comprehensive evaluation on two benchmark datasets—UCF101 (101 action categories) and HAR (6 action classes)—demonstrates state-of-the-art performance with 97.8% accuracy on UCF101 and 94.75% accuracy on HAR dataset, representing improvements of 8.2% and 10.89% over baseline 3D-CNN models respectively. The system achieves real-time processing capability with optimized
computational efficiency suitable for edge deployment in surveillance systems.
Keywords: 3D Convolutional Neural Networks, Action Recognition, Temporal-Spatial Feature Learning, Video Surveillance, Deep Learning, Inflated Convolutions, Motion Feature Extraction, Multi-branch Architecture.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright © Author(s) retain the copyright of this article.
Statistics
28 Views
41 Downloads
Volume & Issue
Article Type
Research Article
How to Cite
Citation:
Naga Charan Nandigama. (2019-08-13). "Optimized Inflated 3D Convolutional Neural Networks for Robust Human Action Recognition in Surveillance Videos." *Volume 3*, 2, 48-57