Developing a semantic image model using machine learning based on convolutional neural networks

P.I. Andon, A.M. Glybovets, V.V. Kuryliak

Abstract


This paper describes the main areas of research in the field of developing computer models for the automatization of digital image recognition. The concept of the semantic image model is introduced and the implementation of the machine learning model for solving the problem of automatic construction of such a model is described. The semantic model consists of a list of objects represented in the image and their relationships. The developed model was compared to other solutions and showed better results in all but one case. The performance of the model is justified by the use of the latest achievements of machine learning, including ZNM, TL, Faster R-CNN, and VGG16. Much of the links represented in the image are spatial links, so for the model to work better, you need to use that fact in designing it, which was done.

Problems in programming 2020; 2-3: 352-361


Keywords


semantic image model; machine learning; computer vision; convolutional neural networks; image links

References


Karpathy A., Li Fei-Fei. Deep Visual-Semantic Alignments for Generating Image Descriptions [Electronic resourse]. Mode of access: https://cs.stanford.edu/people/karpathy/deepimagesent/

A visual proof that neural nets can compute any function [Electronic resourse]. Mode of access: http://neuralnetworksanddeeplearning.com/chap4.html

Simonyan K., Zisserman A.Very Deep Convolutional Networks for Large-Scale Image Recognition [Electronic resourse]. Mode of access: https://arxiv.org/pdf/1409.1556.pdf

Image Captioning [Electronic resourse]. Mode of access: http://shikib.com/captioning.html

Dai J. R-FCN: Object Detection via Region-based Fully Convolutional Networks [Electronic resourse]. Mode of access: https://arxiv.org/pdf/1605.06409.pdf

VGG16 – Convolutional Network for Classification and Detection [Electronic resourse]. Mode of access: https://neurohive.io/en/popular-networks/vgg16/

Vinyals O. Show and Tell: A Neural Image Caption Generator [Electronic resourse]. Mode of access: https://arxiv.org/pdf/1411.4555.pdf

Dai B. Detecting Visual Relationships with Deep Relational Networks [Electronic resourse]. Mode of access: https://arxiv.org/pdf/1704.03114.pdf

Sadeghi M. Recognition Using Visual Phrases [Electronic resourse]. Mode of access: http://vision.cs.uiuc.edu/phrasal/ recognition_using_visual_phrases.pdf

Lu C. Visual Relationship Detection with Language Priors [Electronic resourse]. Mode of access: https://arxiv.org/pdf/1608.00187.pdf

Krishna R. Visual Genome Connecting Language and Vision Using Crowdsourced Dense Image Annotations [Electronic resourse]. Mode of access: https://arxiv.org/pdf/1602.07332.pdf

Visual Genome [Electronic resourse]. Mode of access: https://visualgenome.org

Data Loading and Processing Tutorial [Electronic resourse]. Mode of access: https://pytorch.org/tutorials/beginner/data_loading_tutorial.html

TorchVision Models [Electronic resourse]. Mode of access: https://pytorch.org/docs/stable/torchvision/models.html

Ren S. Faster R-CNN: Towards Real-Time Object Detectionwith Region Proposal Networks [Electronic resourse]. Mode of access: https://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf

Chilamkurthy S. Transfer Learning Tutorial [Electronic resourse]. Mode of access: https://pytorch.org/tutorials/beginner/ transfer_learning_tutorial.html




DOI: https://doi.org/10.15407/pp2020.02-03.352

Refbacks

  • There are currently no refbacks.