Machine learning methods analysis in the document classification problem

A.P. Zhyrkova, O.P. Ignatenko

Abstract


Current situation with official documentary in the world, and especially in Ukraine, requires tools for electronical processing. One of the main tasks at this field is seal (or stamp) detection, which leads to documents classification based on mentioned criterion. Current article analyzes some of existed methods to resolve the problem, describes a new approach to classify documentary and reflects dependence of model accuracy to input data amount. As a result of this work is a convolutional neural network that classify 708 out of 804 images of official documents correctly. A corresponded percentage of model accuracy is 88.03, despite the fact of bias presence in input data.

Problems in programming 2020; 4: 81-87


Keywords


machine learning; classification; convolutional neural networks; stamp; seal

References


Forczmanski P., Markiewicz A. Two-stage approach to extracting visual objects from paperdocuments. Machine Vision and Applications. 2016. N 27. P. 1243-1257. CrossRef

Forczmanski P., Markiewicz A. Stamps Detection and Classification Using Simple Features Ensemble. Mathematical Problems in Engineering. 2015. CrossRef

Roy P., Pal U., Lladós J. Seal Detection and Recognition: An Approach for Document Indexing [Електронний ресурс]. 10th International Conference on Document Analysis and Recognition. 2009. Режим доступу до ресурсу: https://www.researchgate.-net/publication/220861099_Seal_Detection_and_Recognition_An_Approach_for_Document_Indexing. CrossRef

Micenkova B., van Beusekom J., Shafait F. Stamp Verification for Automated Document Authentication [Електронний ресурс]. Режим доступу до ресурсу: http://pure.au.dk/-portal/files/51730044/Barbora_Stamp_Verification_IWCF12.pdf.

D-StaR: A, Younas M., Afzal M., Malik та ін. Generic Method for Stamp Segmentation from Document Images [Електронний ресурс]. 2017. Режим дос-тупу до ресурсу: https://tukl.seecs.nust.edu.pk/members/projects/conference/D-StaR-A-Generic-Method-for-Stamp-Segmentation-from-Document-Images.pdf. CrossRef

Gantuya P., Mungunshagai B., Suvdaa B. "Mongolian Traditional Stamp Recognition using Scalable kNN." International journal of advanced smart convergence 4.2 (2015): 170-176. CrossRef

Engin Deniz, et al. "Offline Signature Verification on Real-World Documents." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2020. CrossRef

Official portal for publishing information on public procurement in Ukraine [Electronic resource]. Access to the resource: https://prozorro.gov.ua.




DOI: https://doi.org/10.15407/pp2020.04.081

Refbacks

  • There are currently no refbacks.