Video recordings provide rich information on dynamic events occurred over a period of time such as human actions, crowd behaviours, and other subject pattern changes in comparison to still image-based processes. However, substantial progresses have been made in the last decade on 2D image processing and its applications such as face matching and objects recognition; video event detection still remains one of the most difficult challenges in computer vision research due to the wide range of continuous or discrete input signals and the potential analytical features it engaged. In this project, the spatio-temporal volume (STV) and the 3D region intersection (RI) method-based 3D shape-matching approaches have been employed to facilitate the definition and detection of video events. To maintain run-time performance of this innovative technique for real world applications, this research has also developed an efficient pre-filtering mechanism to reduce the amount of voxels (volumetric-pixels) that need to be processed in each operational cycle. Substantial improvements on both the event “Recall” rate and the processing efficiency have been observed in the experiments designed in the project.
|STV-based video feature processing for action recognition|