Formal verification of the properties of coreferent resolution model based on decision trees
Abstract
The paper examines the problem of coreference resolution, which involves identifying objects - words or phrases in a text, that refer to the same real or imaginary entity. The solution of this task is explored for Ukrainianlanguage texts using decision trees, which autonomously structure themselves based on training data. Unlike other machine learning algorithms such as neural networks, decision trees allow for analysis of their internal structure through graphical representation. This feature facilitates explaining individual results produced by the tree, significantly easing formal verification of their properties. To create decision trees, vector representations of words (such as Elmo) and other linguistic features are used. After formation, decision trees are employed for binary classification of input pairs potentially referring to the same coreferent objects. Based on the obtained binary classifier, coreferent objects are grouped into clusters, followed by an evaluation of the clustering accuracy using specialized metrics. The paper provides a detailed description of the implemented application and the structure of the formed decision tree, which serves as the basis for further analysis. Additionally, the use of transition systems is proposed to construct a high-level specification model for coreference resolution. The transition system-based model enables analysis of application behavior on infinite state sequences, ensuring errorfree execution. Formalization is carried out, and automata models along with linear-temporal logic are used to verify a set of properties of the obtained specification. Büchi automata are created to accept words confirming the properties, and examples as well as counterexamples of the analyzed properties are found. The method defined in the paper serves as the foundation for creating automated analyzers for coreference resolution applications based on decision trees.
Prombles in programming 2024; 2-3: 319-325
Keywords
Full Text:
PDF (Українська)References
S. Pogorilyy, P. Biletskyi, Coreference resolution algorithm for Ukrainian language texts using decision trees, Problems in programming, 2022, №3-4, pp. 85-91.
Boyko Yu.V., Kryvyi S.L., Pogorilyy S.D. et al (2016) Methods and innovative approaches to designing, managing, and deploying high performant IT infrastructures. PPC "Kyiv University", 447p.
Kryvyi S.L. , Pogorilyy S.D., Slynko M.S., Kramov A.A. (2020) Method of semantic application verification in GPGPU technology. System Research & Information Technologies № 3, pp. 7-22.
UDpipe library.
M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L. Zettlemoyer.. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018, № 1, pp. 2227–2237.
Scikit learn for decision trees.
S. Tangirala. Evaluating the Impact of GINI Index and Information Gain on Classification using Decision Tree Classifier Algorithm, International Journal of Advanced Computer Science and Applications, 2020, pp. 612-619.
S. Telenyk, S. Pogorilyy, A. Kramov. The complex method of coreferent clusters detection based on a BiLSTM neural network, Knowledge Based Systems, 2021, pp. 205-210.
Krichen M. et al (2022) Are Formal Methods Applicable To Machine Learning And Artificial Intelligence? In Proceedings of 2nd International Conference of Smart Systems and Emerging Technologies (SMARTTECH), pp. 48-53.
Refbacks
- There are currently no refbacks.