Human motion detection in movies may be utilized in such areas as video surveillance, human-computer interplay, and gadget management. The duty requires a picture sequence with a three-dimensional form as an enter to detect such actions as operating or catching a ball.
Normally, convolutional neural networks (CNN) are used for this activity. Nevertheless, they solely take into account the spatiotemporal options, whereas using frequency options would facilitate the educational. A current paper on arXiv.org proposes an end-to-end single-stage community within the time-frequency area.
3D-CNN and 2D-CNN have been used to extract time and frequency options accordingly. Then, they have been fused with an consideration mechanism to acquire detecting patterns. The experiments reveal the prevalence of the steered method in opposition to different state-of-the-art fashions. The feasibility of motion detection utilizing frequency options was proved.
At present, spatiotemporal options are embraced by most deep studying approaches for human motion detection in movies, nevertheless, they neglect the necessary options in frequency area. On this work, we suggest an end-to-end community that considers the time and frequency options concurrently, named TFNet. TFNet holds two branches, one is time department shaped of three-dimensional convolutional neural community(3D-CNN), which takes the picture sequence as enter to extract time options; and the opposite is frequency department, extracting frequency options by means of two-dimensional convolutional neural community(2D-CNN) from DCT coefficients. Lastly, to acquire the motion patterns, these two options are deeply fused underneath the eye mechanism. Experimental outcomes on the JHMDB51-21 and UCF101-24 datasets reveal that our method achieves exceptional efficiency for frame-mAP.
Analysis paper: Li, C., Chen, H., Lu, J., Huang, Y., and Liu, Y., “Time and Frequency Community for Human Motion Detection in Movies”, 2021. Hyperlink: https://arxiv.org/abs/2103.04680