Machine learning methods analysis in the document classification problem

A.P. Zhyrkova, O.P. Ignatenko


Current situation with official documentary in the world, and especially in Ukraine, requires tools for electronical processing. One of the main tasks at this field is seal (or stamp) detection, which leads to documents classification based on mentioned criterion. Current article analyzes some of existed methods to resolve the problem, describes a new approach to classify documentary and reflects dependence of model accuracy to input data amount. As a result of this work is a convolutional neural network that classify 708 out of 804 images of official documents correctly. A corresponded percentage of model accuracy is 88.03, despite the fact of bias presence in input data.

Problems in programming 2020; 4: 81-87


machine learning; classification; convolutional neural networks; stamp; seal

