Moviesmobilenet Patched Page

The proliferation of streaming services necessitates robust automatic movie genre classification. While 3D Convolutional Neural Networks (3D CNNs) and Video Transformers achieve high accuracy, they are computationally prohibitive for real-time or edge applications. This paper introduces , a novel architecture that marries a patched frame sampling strategy with a modified MobileNetV3 backbone. By dividing each frame into spatial patches and applying a temporal attention mechanism across patch sequences, MovieSMobileNet captures both local textures and short-term motion cues without 3D convolutions. Experimental results on the MMAct and a subset of MovieNet show that our patched approach improves F1-score by 4.2% over standard frame aggregation, achieving 89.1% accuracy with only 5.2M parameters and 1.8 GFLOPs—suitable for mobile deployment.

Tags: model, patch, moviesmobilenet, update moviesmobilenet patched