Deep Learning Based Dynamic Patterns Classification of Videos Using Spatio-Temporal Virtual Communication
S. P. Kasthuri Arachchi, Yi-Cheng Chen, Timothy K. Shih
National Central University, Taiwan (R. O. C.)
Comparison of Control Experiments
While doing the control experiments, to find out a best single-stream baseline model in order to implement our propose two-stream architecture, we considered the pre-trained models of several state-of-the-art CNN architectures such as 3DCNN, AlexNet, and ResNet. However, since we are working on dynamic patterns classification, not with the static image patterns, AlexNet and ResNet architectures are built on top of the LSTM networks as Single-AlexNet+LSTM, Single-ResNet34+LSTM, and Single-ResNet50+LSTM in addition to the single-3DCNN+LSTM model. As propose baseline models, we implemented a Single-LSTM architecture, which consists of a sequence of LSTMs and, inspired by the layers in AlexNet, the proposed Single-CNNLSTM.
Performance(%) comparison of single-stream network architectures and two-stream network architectures
Performance Evaluation of Single and Two Stream Networks
In addition to the single-stream baseline models that we discussed, we have implemented three two-stream models to select a proper two-stream architecture to our proposed VC-LSTM models. The models Dual-LSTM, Dual-CNNLSTM and Dual-3DCNN+LSTM are designed parallelly pipelining the Single-LSTM, Single-CNNLSTM and single-3DCNN+LSTM models consecutively. Because of the significant performance of the single-3DCNN+LSTM model, among the other single-stream models based on state-of-the-art architectures, we only implemented its two-stream model.
Performance(%) comparison of single-stream network architectures and two-stream network architectures
Model Skills with Multiple Fusion Benchmarks
We evaluate and compare the concatenation, mean and multiplication fusion methods. The tested results of both models, Dual-VCLSTM and Dual-CNNVCLSTM with 1000 videos dataset. Both late fusion methods, which consider the class probabilities for final class prediction have achieved better results with tested models than concatenation fusion.
Effect of learning Rate Schedules
Most popular learning rate schedules include step decay, time-based decay, and exponential decay. For clarifying purpose, we apply the SGD optimization algorithm with different learning rate schedules and constant learning rate. As an alternative to general SGD, we trained our proposed model using three adaptive gradient descent algorithms: Adagrad, RMSprop, and Adam. Finally, we compare the model accuracies of all the learning rate schedules and adaptive learning rate methods we have tested during this study.