Metadata as a tool of the semantic analysis of the complex contents of the big data. The images

O.V. Zakharova


The purpose of the research is to specify effective approaches for improving the semantic analysis of graphic contents of big data. This article considers images or video scenes as examples of such complex contents. Proposed approach takes into account the special features of these contents and create a hybrid annotation model that extends the text annotation model with more specific elements. For the visual data, these are characteristics of visualization. Determining the similarity of information contents is a critical problem for solving big data tasks. It is the basis for the big data categorization and enables the composition of the documents, conversion of an unstructured contents to relevant knowledge structures and the visualization of the information. Semantic analysis of information contents is usually based on their metadata, which form the basis of semantic annotations. Also, they are elements of a structured semantic description of the content and the basis for its automated processing. The approach is based on using ontologies to define semantic annotations. Ontologies provide various sources of knowledge to measure semantic similarity, contain a lot of information about the interpretation of concepts and other semantic relationships with a hierarchical structure based on hyponymy relations. But, in recent years, there is the rapid growth of the number of images and video resources. And, at this time, we can note a significant enrichment of available visual information. From a visual point of view, it is easier to understand whether two concepts are similar. Therefore, the integration of semantic and visual information of the image ensures the optimization of the ontological methods for similarity estimation and allows to obtain similarity metrics that are more consistent with human perception. De facto, such assessments of the complex semantic similarity of concepts are defined by the composition of two functions, the first of which, in fact, is an ontological measure of similarity, and the second is built on the basis of a complex facilities vector. It is a concatenation of semantic and visual characteristics with an established weight balance between these two types of features. The combination of visualization features with semantic and ontological characteristics of the contents in the similarity metrics is the central idea of this study.

Prombles in programming 2023; 1: 58-65


the big data; the complex content; the semantic similarity of the information; visual features; a descriptor; a descriptor;space; key points for an image; a visual vector; ontological similarity metrics; textual models


J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In Proc. of 9th IEEE Int’l Conf. on Computer Vision, Vol. 2, 2003.

S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. of 2006 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, volume 2, pages 2169{2178, 2006.

F.­F. Li and P. Perona. A bayesian hierarchical model for learning natural scene categories. In Proc. of the 2005 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pages 524{531, 2005.

J. Zhang, M. Marszalek, S. Lazebnik, and

C. Schmid. Local features and kernels for classi¯cation of texture and object categories: An in­depth study. In Technical report, INRIA, 2005.

K. Mikolajczyk and C. Schmid. Scale and affine invariant interest point detectors. Int. J. Comput. Vision, 60(1):63­86, 2004.

K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10):1615­1630, 2005.

Mark D. Fairchild. 2005. Status of cie color appearance models.

John Canny. 1986. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell, 36(4):679–698.

Song Chun Zhu, Cheng en Guo, Ying Nian Wu, and Yizhou Wang. 2002. What are textons? In Computer Vision ­ ECCV 2002, 7th European Conference on Computer Vision, Copenhagen, Denmark, May 28­31, 2002, Proceedings, Part IV, pages 793–807. Springer.

Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vision, 42:145–175.

Elia Bruni, Giang Binh Tran, and Marco Baroni. 2011. Distributional semantics from text and images. In Proceedings of the EMNLP GEMS Workshop, pages 22–32, Edinburgh.

Marco Baroni and Alessandro Lenci. 2010. Distributional Memory: A general framework for corpus­based semantics. Computational Linguistics, 36(4):673–721.

Collet, C., Huhns, M.N., Shen, W.M.: Resource integration using a large knowledge base in carnot. IEEE Computer 24 (1991) 55–62

Tversky, A.: Features of similarity. Psycological Review 84 (1997) 327–352

Jang, J., Conrath, D.: Semantic symilarity based on corpus statistic and lexical taxonomy. In: Proceedings of the International Conference on Computational Linguistics. (1997)

Resnik, P.: Semantic similarity in a taxonomy: An information­based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11 (1999) 95–130.

O. Zakharova. Defining degree of semantic similarity using description logic tools. Про- блеми програмування. — 2021. — № 2. — С. 24­33.

Banerjee S, Pedersen T. An adapted Lesk algorithm for word sense disambiguation using WordNet[M]. Computational linguistics and intelligent text processing. Springer Berlin Heidelberg, 2002: 136­145. DOI: http://dx.doi. org/10.1007/3­540­45715­1_11

Mengyun Wang, Xianglong Liu, Lei Huang, Bo Lang, Hailiang Yu. 2014. Ontology­based Concept Similarity Integrating Image Semantic and Visual Information. Proceedings of the 2014 Federated Conference on Computer Science and Information Systems pp. 289–296.

Rodríguez M A, Egenhofer M J. Determining semantic similarity among entity classes from different ontologies. Knowledge and Data Engineering, IEEE Transactions on, 2003, 15(2): 442­456.

Zhou Z, Wang Y, Gu J. A new model of information content for semantic similarity in WordNet. Future Generation Communication and Networking Symposia, 2008. FGCNS’08. Second International Conference on. IEEE, 2008.

Patwardhan S, Pedersen T. Using WordNet­ based context vectors to estimate the semantic relatedness of concepts. Proceedings of the EACL 2006 Workshop Making Sense of Sense­Bringing Computational Linguistics and Psycholinguistics Together. 2006, 1501:1­8.

Lesk M. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. Proceedings of the 5th annual international conference on Systems documentation. ACM, 1986: 24­26. DOI:



  • There are currently no refbacks.