Integration of large language models with semantic processing tools as an instrument for knowledge digitization
Abstract
The paper addresses the task of automating the analysis, generation, and management of complex natural language documents based on the integration of generative artificial intelligence with semantic technologies, in particular Semantic MediaWiki. It analyzes how the use of ontological models of subject domains and semantic markup makes it possible to prevent such critical shortcomings of large language models as the tendency to “hallucinations” (generation of false statements) and the lack of transparency in decision explanations. This integration is explored using the example of the instrumental system “LINZA,” which is being developed for automated intelligent processing of content from heterogeneous documents with complex and weakly formalized structure, with the aim of generating natural language reports according to specified requirements in various domains, such as public administration, jurisprudence, certification, and standardization. The system is based on the combination of the flexibility and adaptability of large language models with formalized ontological knowledge and support for semantic queries about pertinent facts in the Semantic MediaWiki environment or external sources (Retrieval-Augmented Generation). The proposed approach will significantly reduce the risks of typical errors in generative models and ensure factual accuracy and transparency in the decision-making process. Special attention is paid to mechanisms of transparency, reliability, and the possibility of human control to increase trust in the generated data, which is especially important in areas with high information security requirements, and ensures greater confidence in automatically created documents. The multi-level architecture of the system defines the tasks of agents and services that perform specialized functions of data collection, analysis, transformation, and verification, and ensures flexibility, scalability, and adaptability of the system to changes in input data and requirements.
Problems in programming 2025; 2: 63-76
Keywords
Full Text:
PDF (Українська)References
Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Mian, A. A comprehensive overview of large language models. ACM Transactions on Intelligent Systems and Technology. 2023. URL: https://dl.acm.org/doi/pdf/10.1145/3744746.
Liang, X., Zhou, B., Jiang, L., Meng, G., Xiu, Y. Collaborative pursuit-evasion game of multi-UAVs based on Apollonius circle in the environment with obstacle. Connection Science. 2023. Vol. 35, Iss. 1. P. 1–24.
Musumeci, E., Brienza, M., Suriani, V., Nardi, D., Bloisi, D. D. LLM-based multi agent generation of semi-structured documents from semantic templates in the public administration domain. In: Proceedings of the International Conference on Human Computer Interaction (HCII 2024). Cham: Springer, 2024. P. 98–117.
Eichhorn, T. CLAIR: Generating on-demand low-code application documentation through knowledge graph and LLM-based multi-agent system integration. Master's thesis. University of Twente, 2025.
Плескач В. Л., Рогушина Ю. В. Агентні технології: Монографія. Київ : Київ. нац. торг.–екон. ун–т, 2005.
Рогушина Ю. В., Гладун А. Я., Осадчий В. В., Прийма С. М. Онтологічний аналіз у Web: Монографія. Мелітополь : МДПУ ім. Богдана Хмельницького, 2015. 407 с. URL: http://www.dut.edu.ua/uploads/l_2148_67675 988.pdf. ISBN 978-617-7346-27-1.
Vrandečić, D., Krötzsch, M. Semantic MediaWiki. In: Management: Semantic Knowledge Integrating Ontology Management, Knowledge Discovery, and Human Language Technologies. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. P. 171–179.
Chen, J., Lu, X., Du, Y., Rejtig, M., Bagley, R., Horn, M., Wilensky, U. Learning agent based modeling with LLM companions: Experiences of novices and experts using ChatGPT & NetLogo chat. In: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 2024. P. 1–18.
Yang, D., Simoulin, A., Qian, X., Liu, X., Cao, Y., Teng, Z., Yang, G. DocAgent: A multi-agent system for automated code documentation generation. arXiv preprint arXiv:2504.08725. 2025.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kiela, D. Retrieval augmented generation for knowledge intensive nlp tasks. Advances in Neural Information Processing Systems. 2020. Vol. 33. P. 9459–9474.
Machado, M., Rodrigues, J. M., Lima, G., Fiorini, S. R., da Silva, V. T. LLM Store: Leveraging large language models as sources of Wikidata-structured knowledge. In: Proceedings of the International Semantic Web Conference (ISWC 2024). Cham: Springer, 2024 (to appear).
Mihindukulasooriya, N., Tiwari, S., Dobriy, D., Nielsen, F. Å., Chhetri, T. R., Polleres, A. Scholarly Wikidata: Population and exploration of conference data in Wikidata using LLMs. In: Proceedings of the International Conference on Knowledge Engineering and Knowledge Management (EKAW 2024). Cham: Springer, 2024. P. 243-259.
Каверинський В. В., Літвін А. А., Палагін О. В. Зворотний синтез природномовних висловлювань на основі їх онтологічного представлення з використанням великої мовної моделі. Проблеми програмування. 2024. № 2-3. С. 359-366.
Рогушина Ю. В., Гладун А. Я., Аніщенко О. В., Прийма С. М. Семантичні технології як інструмент інформаційного забезпечен ня професіоналізації андрагогів. Проблеми програмування. 2024. № 2-3. С. 441-448.
Rotsos, C., King, D., Farshad, A., Bird, J., Fawcett, L., Georgalas, N., Hutchison, D. Network service standardization: orchestration A technology survey. Computer Standards & Interfaces. 2017. Vol. 54. P. 203–215.
Patel, A., Jain, S. Present and future of semantic web technologies: a research statement. International Journal of Computers and Applications. 2021. Vol. 43, Iss. 5. P. 413–422.
Refbacks
- There are currently no refbacks.