Coreference resolution algorithm for Ukrainian-language texts using decision trees

S. D. Pogorilyy, P. V. Biletskyi

Abstract


The paper examines the problem of coreference resolution in Ukrainian-language texts using decision trees. An application that uses vector representations of Elmo words and other characteristics for the automated formation of a decision tree has been developed. A set of prepared texts containing more than 360,000 words was used to form the decision tree and evaluate the accuracy of the algorithm. The decision tree created to determine whether a pair of objects is coreference was used to form clusters of coreference objects. Special metrics were used for comparison with the results obtained by other algorithms in the Ukrainian language.

Prombles in programming 2022; 3-4: 85-91


Keywords


coreference resolution; natural language processing (NLP); decision trees; artificial intelligence (AI); vector words representation, neural networks

Full Text:

PDF (Ukrainian)

References


POGORILYY S. & BILETSKYI P. (2022) Usage of a graphics processor to accelerate coreference resolution while using the RoBERTa model. Scientific works of DonNTU, Series: “Informatics, cybernetics and computer technology”. (2) P. 4-9.

PETERS M., NEUMANN M., IYYER M., GARDNER M., CLARK C., LEE K., ZETTLEMOYER L.. (2018) Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (1). P. 2227–2237.

MIKOLOV T., SUTSKEVER I., CHEN K., CORRADO G., DEAN J.. (2013) Distributed Representations of Words and Phrases and their Compositionality. Proceedings of the 26th International Conference on Neural Information Processing Systems, (2). P. 3111–3119.

Scikit-learn library. https://scikit-learn.org/

Graphviz library. https://graphviz.org/

UDpipe library. https://lindat.mff.cuni.cz/services/udpipe/

TELENYK S., POGORILYY S., KRAMOV A. (2021) The complex method of coreferent clusters detection based on a BiLSTM neural network, Knowledge Based Systems. P. 205-210.

BAGGA A., BALDWIN B., (1998) Algorithms for Scoring Coreference Chains, The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, P. 563-566.

VILAIN M., BURGER J., ABERDEEN J., CONNOLLY D., HIRSCHMAN L. (1995) A Model-Theoretic Coreference Scoring Scheme, Proceedings of the 6th Conference on Message Understanding (MUC).


Refbacks

  • There are currently no refbacks.