Deep Learning Based Dynamic Patterns Classification of Videos Using Spatio-Temporal Virtual Communication
S. P. Kasthuri Arachchi, Yi-Cheng Chen, Timothy K. Shih
National Central University, Taiwan (R. O. C.)
Introduction
In this study, we propose the two-stream pipeline model which focuses on classifying the dynamic patterns of videos using both static frames and motion optical flows. Traditional hand-crafted features are known to be insufficient in classifying complex video information. Inspired by the remarkable success of deep learning methods, we propose a two-stream neural net architecture, which classifies dynamic video patterns, using appearance and motion features. Our goal of this study is to investigate the performance of single-stream and two-stream networks with the proposed virtual communication architecture. During this study, we use our firework dataset to validate the proposed models. Prior successful studies of video classifications only focused on the relationship between the standalone streams themselves. However, the proposed virtual communication long short-term memory (VC-LSTM) architecture interconnects between both streams’ next time steps to generate and feed a newly learned information to the following time steps. The VC-LSTM extends the general purpose of LSTM by virtually communicating with previous cells’ states of both appearance and optical flow stream. The experimental results demonstrated that the proposed two-stream Dual-CNNVCLSTM architecture significantly outperforms with training accuracy of 81.76% over single and two-stream baseline architectures.
Models
Virtual Communication Long Short-Term Memory (VC-LSTM)
we focused on updating and resetting cells state information rather than the standard cell state update process.
The first step of this process extracts the previous time step’s cells state from both streams LSTM units. And then average the extracted cell state information and update the cells sates of next time step additionally giving this newly learned information.
The way of the defined virtual package, dealing with LSTM sequence
Proposed two-stream, RGB and optical flow based Virtual Communication Long Short-Term Memory (VC-LSTM)
The way of the defined virtual package, dealing with LSTM sequence
Baseline Models
In this study, we examine the performance of single-stream and two-stream networks with proposed virtual communication LSTM architecture. As single-stream baseline architectures, we designed three models, using a sequence of LSTMs (Single-LSTM), a combination of CNN and LSTM in which the LSTM is placed after the fully connected layer of CNN (Single-CNNLSTM) and using the 3DCNN architecture (Single-3DCNNLSTM). Two-Stream networks Dual-LSTM, Dual-CNNLSTM and Dual-3DCNNLSTM are designed by parallelly pipelining these single-stream networks for classifying firework patterns.
Dataset
Manually categorized dataset into eight classes: Chrysanhemum, Crosette, Desi, Dot, Drop, Fish, Palm and WaterFlower by looking at each video clip.
Chrysanthemum
Dot
Crosette
Drop
Desi
Fish
Palm
WaterFlower
Experiments
In this section, we discuss the results of proposed virtual communication models over the without communication models. Further experiments are designed to study the effectiveness of both single-stream and two-stream networks and, to evaluate the model skill with different size of datasets and fusion methods as well.
VC-LSTM Performance Over Two-stream Baseline Models
We report the performance of two-stream networks tested under two dataset sizes such as 500 and 1000 video clips, for a better comparison.