Developing a semantic image model using machine learning based on convolutional neural networks
Abstract
This paper describes the main areas of research in the field of developing computer models for the automatization of digital image recognition. The concept of the semantic image model is introduced and the implementation of the machine learning model for solving the problem of automatic construction of such a model is described. The semantic model consists of a list of objects represented in the image and their relationships. The developed model was compared to other solutions and showed better results in all but one case. The performance of the model is justified by the use of the latest achievements of machine learning, including ZNM, TL, Faster R-CNN, and VGG16. Much of the links represented in the image are spatial links, so for the model to work better, you need to use that fact in designing it, which was done.
Problems in programming 2020; 2-3: 352-361
Keywords
Full Text:
PDF (Українська)References
Karpathy A., Li Fei-Fei. Deep Visual-Semantic Alignments for Generating Image Descriptions [Electronic resourse]. Mode of access: https://cs.stanford.edu/people/karpathy/deepimagesent/
A visual proof that neural nets can compute any function [Electronic resourse]. Mode of access: http://neuralnetworksanddeeplearning.com/chap4.html
Simonyan K., Zisserman A.Very Deep Convolutional Networks for Large-Scale Image Recognition [Electronic resourse]. Mode of access: https://arxiv.org/pdf/1409.1556.pdf
Image Captioning [Electronic resourse]. Mode of access: http://shikib.com/captioning.html
Dai J. R-FCN: Object Detection via Region-based Fully Convolutional Networks [Electronic resourse]. Mode of access: https://arxiv.org/pdf/1605.06409.pdf
VGG16 – Convolutional Network for Classification and Detection [Electronic resourse]. Mode of access: https://neurohive.io/en/popular-networks/vgg16/
Vinyals O. Show and Tell: A Neural Image Caption Generator [Electronic resourse]. Mode of access: https://arxiv.org/pdf/1411.4555.pdf
Dai B. Detecting Visual Relationships with Deep Relational Networks [Electronic resourse]. Mode of access: https://arxiv.org/pdf/1704.03114.pdf
Sadeghi M. Recognition Using Visual Phrases [Electronic resourse]. Mode of access: http://vision.cs.uiuc.edu/phrasal/ recognition_using_visual_phrases.pdf
Lu C. Visual Relationship Detection with Language Priors [Electronic resourse]. Mode of access: https://arxiv.org/pdf/1608.00187.pdf
Krishna R. Visual Genome Connecting Language and Vision Using Crowdsourced Dense Image Annotations [Electronic resourse]. Mode of access: https://arxiv.org/pdf/1602.07332.pdf
Visual Genome [Electronic resourse]. Mode of access: https://visualgenome.org
Data Loading and Processing Tutorial [Electronic resourse]. Mode of access: https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
TorchVision Models [Electronic resourse]. Mode of access: https://pytorch.org/docs/stable/torchvision/models.html
Ren S. Faster R-CNN: Towards Real-Time Object Detectionwith Region Proposal Networks [Electronic resourse]. Mode of access: https://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf
Chilamkurthy S. Transfer Learning Tutorial [Electronic resourse]. Mode of access: https://pytorch.org/tutorials/beginner/ transfer_learning_tutorial.html
DOI: https://doi.org/10.15407/pp2020.02-03.352
Refbacks
- There are currently no refbacks.