Ontology-based semantic similarity to metadata analysis in the information security domain

A.Y. Gladun, K.A. Khala


It is becoming clear with growing complication of cybersecurity threats, that one of the most important resources to combat cyberattacks is the processing of large amounts of data in the cyber environment. In order to process a huge amount of data and to make decisions, there is a need to automate the tasks of searching, selecting and interpreting Big Data to solve operational information security problems. Big data analytics is complemented by semantic technology, can improve cybersecurity, and allows you to process and interpret large amounts of information in the cyber environment. Using of semantic modeling methods in Big Data analytics is necessary for the selection and combination of heterogeneous Big Data sources, recognition of the patterns of network attacks and other cyber threats, which must occur quickly to implement countermeasures. Therefore to analyze Big Data metadata, the authors propose pre-processing of metadata at the semantic level. As analysis tools, it is proposed to create a thesaurus of the problem based on the domain ontology, which should provide a terminological basis for the integration of ontologies of different levels. To build a thesaurus of the problem, it is proposed to use the standards of open information resources, dictionaries, encyclopedias. The development of an ontology hierarchy formalizes the relationships between data elements that will be used in future for machine learning and artificial intelligence algorithms to adapt to changes in the environment, which in turn will increase the efficiency of big data analytics for the cybersecurity domain.

Prombles in programming 2021; 2: 34-41


big data analytics; information security; cyber security; ontology; thesaurus; unstructured data; metadata; semantic similarity

Full Text:



Erl T., Khattak W., and Buhler P.: Big Data Fundamentals: Concepts, Drivers & Techniques. Prentice Hall, ServiceTech press, 2016.

P. Buneman, S. Davidson, M. Fernandez, D. Suciu:Adding structure to unstructured data, In 6th International Conference on Database Theory, pp. 336-350. Delphi, Greece, 1997.

Smith K., Seligman L., Rosenthal A.: Big Metadata: The Need for Principled Metadata Management in Big Data Ecosystems. In Proceedings of the Company DanaC@SIGMOD, р. 46-55. Snowbird, UT, USA 2014.

Dey A., Chinchwadkar G., Fekete A., Ramachandran K.: Metadata-as-a-Service. In Proceedings of the 31st IEEE International Conference on Data Engineering Workshops, р.6-9. IEEE, Seoul, South Korea, 2015.

Salahi A., Ansarinia M.: Predicting Network Attacks Using Ontology-Driven Inference.In IJICTR, IGI Global, vol. 4, no. 2; pp. 27-35, 2012.

Bhandari P., Guiral M.S.: Ontology Based Approach for Perception of Network Security State. In Proc.of Recent Advances in Engineering and Computational Sciences, Chandigarh, pp.1-6, 2014.

Oltramari A., Cranor L.F., Walls R.J.: Building an Ontology of Cyber Security. In Proc. 9th Inter. Conf. on Semantic Technologies for Intelligence, Defense, and Security, Fairfax, pp. 54-61, 2014.

Wang J.A. and Guo M.,: OVM. An Ontology for Vulnerability Management. In Proc. 5th Annu. Conf on Cyber Security and Information Intelligence Research, Knoxville, pp. 1-4, 2009.

Gladun A.Y., Puchkov O.O, Subach I.Yu., and Khala K.O.: English-Ukrainian dictionary of terms on information technology and cybersecurity. Kiev, Ukraine: NTUU KPInamed by Igor Sikorsky, 2018.

Protégé 5.0. [Online]. Available: Accessed on: Nov 24, 2020.

Gladun A., Rogushina J.:Use of Semantic Web Technologies and Multilinguistic Thesauri for Knowledge-Based Access to Biomedical Resources. International Journal of Intelligent Systems and Applications, №1,pp.11-20, 2012.

Rada R., Mili H., Bicknell E.: Development and application of a metric on semantic nets. In Proceedings of the IEEE transactions on systems, man, and cybernetics, p. 17–30, 1989.

Richardson R., Smeaton A., Murphy J.: Using WordNet as a knowledge base for measuring semantic similarity between words. Technical Report Working Paper CA-1294, School of Computer Applications, Dublin City University, 1994.

Hirst G., St-Onge D.: Lexical chains as representations of context for the detection and correction of malapropisms. In Proceedings of the WordNet: An electronic lexical database, vol. 305, p. 305–332, 1998.

Wu Z., Palmer M.: Verbs semantics and lexical selection.Іn Proceedings of the 32nd annual meeting on Association for Computational Linguistics, p. 133–138, 1994.

Lin D.: An information-theoretic definition of similarity. Іn ICML, vol. 98, p. 296–304, 1998.

Lin D.: Principle-based parsing without overgeneration. Іn Proceedings of the 31st annual meeting on Association for Computational Linguistics, p. 112–120,1993.

Resnik P.: Semantic similarity in a taxonomy. An information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res.(JAIR), vol. 11, p. 95–130, 1999.



  • There are currently no refbacks.