Enhancing ball detection in football videos using attention mechanisms in FPN-based CNNS
Abstract
While deep learning models have significantly advanced player detection in sports analytics, accurately identi fying the football remains a persistent challenge due to its small size, rapid movement, frequent occlusions, and visual similarity to other elements such as player socks, logos, and field markings. This limitation significantly reduces the effectiveness of automated systems in comprehensively analyzing football matches, particularly in applications such as tactical event recognition, shot classification, and game state prediction. In this paper, we propose a method to improve ball detection accuracy in football videos by enhancing an existing architecture based on Feature Pyramid Networks (FPN). The original FPN-based model, although efficient for detecting large-scale players, shows limited performance in detecting small objects such as the ball. To address this, we integrate lightweight attention mechanisms to help the model focus on more relevant spatial and semantic fea tures. Specifically, we introduce Squeeze-and-Excitation (SE) layers into the backbone of the network to perform channel-wise feature recalibration and embed a Convolutional Block Attention Module (CBAM) into the ball detection head to refine both spatial and channel-level attention. These modifications are designed to enhance the network’s ability to distinguish the ball from cluttered backgrounds and visually similar objects. Our exper iments, conducted on the ISSIA-CNR and Soccer Player Detection datasets, demonstrate that the proposed at tention-augmented model achieves improved ball classification accuracy compared to the baseline, with no deg radation in player detection performance. These results validate the utility of lightweight attention mechanisms in the context of small object detection and provide a promising direction for more robust and real-time football video analysis systems.
Prombles in programming 2025; 2: 54-62
Keywords
Full Text:
PDFReferences
Bialkowski, P. Lucey, P. Carr, Y. Yue, S. Sridharan and I. Matthews, "Large-Scale Analysis of Soccer Matches Using Spatiotemporal Tracking Data," in 2014 IEEE International Conference on Data Mining, December 2014.
M. Manafifard, H. Ebadi and H. Moghaddam, "A Survey on Player Tracking in Soccer Videos. Computer Vision and Image Understanding," Computer Vision and Image Understanding, vol. 159, pp. 19-46, June, 2017.
.
J. Komorowski, G. Kurzejamski and G. Sarwas, "FootAndBall: Integrated Player and Ball Detector," in 15th International Conference on Computer Vision Theory and Applications, pp. 47-56, Valletta, Malta, January, 2020.
P. Kamble, A. Keskar and K. Bhurchandi, "A deep learning ball tracking system in soccer videos," Opto-Electronics Review, vol. 27, no. 1, pp. 58-69, March, 2019.
T. D'Orazio, M. Leo, N. Mosca, P. Spagnolo and P. L. Mazzeo, "A Semi-automatic System for Ground Truth Generation of Soccer Video Sequences," in Sixth IEEE International Con ference on Advanced Video and Signal Based Surveillance, Genova, Italy, September, 2009.
T. Wang and T. Li, "Deep Learning-Based Football Player Detection in Videos," Compu tational Intelligence and Neuroscience, pp. 1-8, 2022.
T. -Y. Lin, P. Dollár, R. Girshick, K. He, H. B. and S. Belongie, "Feature Pyramid Networks for Object Detection," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936-944, Honolulu, HI, USA, 2017.
J. Redmon and A. Farhadi, "YOLOv3: An In cremental Improvement," arXiv:1804.02767, 2018.
W. Liu, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu and A. Berg, "SSD: Single Shot MultiBox Detector," in European Conference on Com puter Vision, pp 21–37, 2016.
Z. Zhu, D. Liang, S. Zhang, X. Huang, B. Li and S. Hu, "Traffic-Sign Detection and Classi fication in the Wild," in Conference on Com puter Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June, 2016.
Y. Chen, J. Wang, Z. Dong, Y. Yang, Q. Luo and M. Gao, "An Attention Based YOLOv5 Network for Small Traffic Sign Recognition," in IEEE 31st International Symposium on In dustrial Electronics (ISIE), Anchorage, AK, USA, June, 2022.
S. Du, W. Pan, N. Li, S. Dai, B. Xu, H. Liu, C. Xu and X. Li, "TSD‐YOLO: Small traffic sign detection based on improved YOLO v8," ET Image Processing, vol. 18, June, 2024.
J. Qu, Z. Tang, L. Zhang, Y. Zhang and Z. Zhang, "Remote Sensing Small Object Detec tion Network Based on Attention Mechanism and Multi-Scale Feature Fusion," Remote Sensing, vol. 15, p. 2728, May, 2023.
J. Rabbi, N. Ray, M. Schubert, S. Chowdhury and D. Chao, "Small-Object Detection in Re mote Sensing Images with End-to-End Edge Enhanced GAN and Object Detector Net work," Remote Sensing, vol. 12, p. 1432, April, 2020.
O. Oktay, J. Schlemper, L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Hammerla, B. Kainz, B. Glocker and D. Rueckert, "Attention U-Net: Learning Where to Look for the Pancreas," arXiv:1804.03999, April, 2018.
K. Min, G.-H. Lee and S.-W. Lee, "Attentional feature pyramid network for small object de tection," Neural Networks, vol. 155, p. 439 450, November, 2022.
V. Renò, N. Mosca, R. Marani, M. Nitti and E. Stella, "Convolutional Neural Networks Based Ball Detection in Tennis Games," in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, June, 2018.
J. Hu, L. Shen and G. Sun, "Squeeze-and-Ex citation Networks," in 2018 IEEE/CVF Con ference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, June, 2018.
S. Woo, J. Park, J.-Y. Lee and I. Kweon, "CBAM: Convolutional Block Attention Mod ule," in European Conference on Computer Vi sion (ECCV), Munich, Germany, 2018.
H. Li, P. Xiong, J. An and L. Wang, "Pyramid Attention Network for Semantic Segmenta tion," 10.48550/arXiv.1805.10180, pp. 3-19, September, 2018.
R. Girshick, "Fast R-CNN," in 2015 IEEE In ternational Conference on Computer Vision (ICCV), Santiago, Chile, December, 2015.
K. Lu, J. Chen, J. Little and H. He, "Light Cas caded Convolutional Neural Networks for Ac curate Player Detection," 10.48550/arXiv.1709.10230, September, 2017.
D. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," in International Conference on Learning Representations, De cember, 2014.
C. Shorten and T. Khoshgoftaar, "A survey on Image Data Augmentation for Deep Learning," Journal of Big Data, vol. 6, no. 60, July, 2019.
M. Everingham, L. Van Gool, C. Williams, J. Winn and A. Zisserman, "The Pascal Visual Object Classes (VOC) challenge," Interna tional Journal of Computer Vision, vol. 88, pp. 303-338, 2010.
A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. Le and H. Adam, "Searching for MobileNetV3," 10.48550/arXiv.1905.02244, pp. 1314-1324, Seoul, Korea (South), 2019.
L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang and L. Van Gool, "Temporal Segment Networks for Action Recognition in Videos," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 11, pp. 2740 2019. doi: 10.1109/TPAMI.2018.2868668.
A. Kompella and R. Kulkarni, "A semi-super vised recurrent neural network for video salient object detection," Neural Computing and Ap plications, pp. 2065–2083, vol. 33, no. 6, March 2021.
Refbacks
- There are currently no refbacks.