Automated methods of coherence evaluation of Ukrainian texts using machine learning techniques

A.A. Kramov, S.D. Pogorilyy

Abstract


The main methods of coherence evaluation of texts with the usage of different machine learning techniques have been analyzed. The principles of methods with the usage of recurrent and convolutional neural networks have been described in details. The advantages of a semantic similarity graph method have been considered. Other approaches to perform the vector representation of sentences for the estimation of semantic similarity between the elements of a text have been suggested to use.  The experimental examination of methods has been performed on the set of Ukrainian scientific articles. The training of recurrent and convolutional networks with the usage of early stopping has been performed. The accuracy of the solving of document discrimination and insertion tasks has been calculated. The comparative analysis of the results obtained has been performed.

Problems in programming 2020; 2-3: 295-303


Keywords


coherence of a text; recurrent neural network; convolutional neural network; semantic similarity graph; semantic representation of sentences; document discrimination task; insertion task

References


Lednik O. Cohesion and coherence as a category of cohesive text. Scientific journal of M.P. Dragomanov National Pedagogical University. Series 10: Problems of grammar and lexicology of the Ukrainian language. [Online]. 2010. (6). P. 119-123. Available from: http://enpuir.npu.edu.ua/handle/123456789/15909. [Accessed: 23 January 2020].

Pogorilyy S. & Kramov A. Coreference Resolution Method Using a Convolutional Neural Network. In: Proceeding of the 2019 IEEE International Conference on Advanced Trends in Information Theory. 2019. P. 397-401. Available from: [Accessed: 20 February 2020]. CrossRef

Bedi G., Carrillo F., Cecchi G., Slezak D., Sigman M., Mota N., Ribeiro S., Javitt D., Copelli M. & Corcoran C. (2015). Automated analysis of free speech predicts psychosis onset in high-risk youths. npj Schizophrenia. 1 (1). Available from: [Accessed: 23 January 2020]. CrossRef

Cui B., Zhang Y. & Zhang Z. Text Coherence Analysis Based on Deep Neural Network. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 2017, Singapore. P. 2027-2030. Available from: [Accessed: 23 January 2020]. CrossRef

Li J. & Hovy E. A model of coherence based on distributed sentence representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. P. 2039-2048. Available from: [Accessed: 23 January 2020]. CrossRef

Giray G. & Ünalır M. (2019). Assessment of text coherence using an ontology-based relatedness measurement method. Expert Systems. Available from: [Accessed: 23 January 2020]. CrossRef

Haykin S. (2016). Neural Networks: A Comprehensive Foundation Second Edition. 2nd Ed. Kyiv.

Pogorilyy S., Kramov A. & Yatsenko F. . A method for analyzing the coherence of Ukrainian-language texts using a recurrent neural network. Mathematical machines and systems. 2019. 4. P. 9-16. Available from: [Accessed: 23 January 2020]. CrossRef

Mikolov T., Sutskever I., Chen K., Corrado G. & Dean J. Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013. P. 3111-3119. Available from: [Accessed: 23 January 2020].

Pennington J., Socher R. & Manning D.. GloVe: Global Vectors for Word Representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. P. 1532-1543. Available from: [Accessed: 23 January 2020]. CrossRef

Cui Z., Henrickson K., Ke R., Pu Z. & Wang Y. Traffic Graph Convolutional Recurrent Neural Network: A Deep Learning Framework for Network-Scale Traffic Learning and Forecasting. In: IEEE Transactions on Intelligent Transportation Systems. 2019. P. 1-12. Available from: [Accessed: 23 January 2020].

Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014, pp. 1746-1751. Available from: [Accessed: 23 January 2020]. CrossRef

Pogorilyy S. & Kramov A. Automated extraction of structured information from a variety of web pages. In: Proceedings of the 11th International Conference of Programming UkrPROG 2018. 2018. Р. 149-158. Available from: [Accessed: 23 January 2020]. CrossRef

Nakatani S. [Online]. 2010. Language Detection Library for Java. Available from: https://code.google.com/archive/p/language-detection. [Accessed: 23 January 2020].

AI2 (2020). allenai/science-parse. [Online]. 2020. GitHub. Available from: https://github.com/allenai/science-parse. [Accessed: 23 January 2020].

Le Q. & Mikolov T. Distributed representations of sentences and documents. In: International Conference on Machine Learning. 2014, pp. 1188-1196. Available from: [Accessed: 23 January 2020].

Anon. Homepage: lang-uk. [Online]. 2020. Lang.org.ua. Available from: http://lang.org.ua. [Accessed: 23 January 2020].

Řehůřek R. & Sojka P. Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. 2010. Р. 45-50. Available from: [Accessed: 23 January 2020].

Anon (2020). Home - Keras Documentation. [Online]. 2020. Keras.io. Available from: https://keras.io. [Accessed: 23 January 2020].




DOI: https://doi.org/10.15407/pp2020.02-03.295

Refbacks

  • There are currently no refbacks.