Method of managing the execution of tasks of a multithreaded program according to a given dependency graph

R.V. Terentiev, P.А. Ivanenko

Abstract


This article examines the effectiveness of pre-training generative model based on a visual transformer and subsequent fine tuning for image classification tasks. The main problem of the study is the poor training efficiency of the visual transformer on a limited amount of data. It is possible to improve the accuracy of the image classification model by using transfer learning of the knowledge obtained during the previous training of the generative model on the same data. A subset of the standard Imagenet dataset - Tiny Imagenet was used to test the hypothesis. It contains 200 categories of around 500 images each. The size of each image is 64x64 pixels. For pre-training the generative model, patches are used to mask image segments. The training of restoring masked image pixels forces the model to pay attention to the context around the removed part, as well as to general visual patterns. This leads to a better understanding of visual information by the model as a whole and helps with further fine tuning of the model for the classification task. As a result of a series of experiments, it was possible to achieve an improvement in the accuracy of image classification from 40% to 44.7%, and an analysis of the effect of the overall degree of masking and patch size on it is given. Additionally, impact of different sizes of patches (2x2, 4x4, 8x8 pixels) and different percentages of masking (20/40/60 percent) of the input image were investigated in the paper.

Prombles in programming 2024; 2-3: 247-252

 


Keywords


vision transformers; generative models; image classification; pre-training; transfer learning

References


Bao H., Dong L., Piao S. and Wei F. (2021) BEiT: BERT Pre-Training of Image Transformers, arXiv preprint arXiv:2106.08254.

Bachlechner T., Majumder B.P., Mao H.H., Cottrell G.W. and McAuley J. (2021) ReZero is All You Need: Fast Convergence at Large Depth, Uncertainty in Artificial Intelligence. PMLR.

Cubuk E.D., Zoph B., Mane D., Vasudevan V. and Le Q.V. (2019) AutoAugment: Learning Augmentation Strategies from Data Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.

Dagli R., (2023) Astroformer: More Data Might Not be All You Need for Classification, arXiv preprint arXiv:2304.05350

Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., Uszkoreit J. and Houlsby N. (2020) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, arXiv preprint arXiv:2010.11929.

He K., Zhang X., Ren S. and Sun J. (2016) Deep Residual Learning for Image Recognition, Proceedings of the IEEE conference on computer vision and pattern recognition.

Kingma D.P., Ba J. (2014) Adam: A Method for Stochastic Optimization, arXiv preprint arXiv:1412.6980.

Ronneberger O., Fischer P. and Brox T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation, Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer International Publishing.

Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., Erhan D., Vanhoucke V. and Rabinovich A. (2015) Going Deeper with Convolutions, Proceedings of the IEEE conference on computer vision and pattern recognition.

Tan M., Le Q.V. (2019) EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, International conference on machine learning. PMLR.

Torbunov D., Huang Y., Tseng H., Yu H., Huang J., Yoo S., Lin M., Viren B. and Ren Y. (2023) UVCGAN v2: An Improved Cycle Consistent GAN for Unpaired Image-to-Image Translation, arXiv preprint arXiv:2303.16280.

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L. and Polosukhin I. (2017) Attention Is All You Need, Advances in neural information processing systems 30.

Xiao T., Singh M., Mintun E., Darrell T., Dollár P. and Girshick R. (2021) Early Convolutions Help Transformers See Better, Advances in neural information processing systems 34.

Xie Z., Zhang Z., Cao Y., Lin Y., Bao J., Yao Z., Dai Q., Hu H. (2022) SimMIM: A Simple Framework for Masked Image Modeling, International Conference on Computer Vision and Pattern Recognition (CVPR).


Refbacks

  • There are currently no refbacks.