The concept and evaluating of big data quality in the semantic environment

A.V. Novitsky


Big data refers to large volumes, complex data sets with various autonomous sources, characterized by continuous growth. Data storage and data collection capabilities are now rapidly expanding in all fields of science and technology due to the rapid development of networks. Evaluating the quality of data is a difficult task in the context of big data, because the speed of semantic data reasoning directly depends on its quality. The appropriate strategies are necessary to evaluate and assess data quality according to the huge amount of data and its rapid generation. Managing a large volume of heterogeneous and distributed data requires defining and continuously updating metadata describing various aspects of data semantics and its quality, such as conformance to metadata schema, provenance, reliability, accuracy and other properties. The article examines the problem of evaluating the quality of big data in the semantic environment. The definition of big data and its semantics is given below and there is a short excursion on a theory of quality assessment. The model and its components which allow to form and specify metrics for quality have already been developed. This model includes such components as: quality characteristics; quality metric; quality system; quality policy. A quality model for big data that defines the main components and requirements for data evaluation has already been proposed. In particular, such evaluation components as: accessibility, relevance, popularity, compliance with the standard, consistency, etc. are highlighted. The problem of inference complexity is demonstrated in the article. Approaches to improving fast semantic inference through materialization and division of the knowledge base into two components, which are expressed by different dialects of descriptive logic, are also considered below. The materialization of big data makes it possible to significantly speed up the processing of requests for information extraction. It is demonstrated how the quality of metadata affects materialization. The proposed model of the knowledge base allows increasing the qualitative indicators of the reasoning speed.

Prombles in programming 2022; 3-4: 260-270


big data; complex data sets

Full Text:



Amsler, R., 1972. Application of Citation-based Automatic Classification, Austin: s.n.

Ceravolo, P. et al., 2018. Big data semantics. Journal on Data Semantics, 7(2), pp. 65-85. CrossRef

Harford, T., 2014. Big data: A big mistake?. Significance , 11(5), pp. 14-19. CrossRef

Intel IT Center, I. C., 2012. Centre. Big Data Analytics: Intel's IT Manager Survey on How Organizations Are Using Big Data, Santa Clara: s.n.

Laney, D., 2001. 3D data management: Controlling data volume, velocity and variety, META group.

Lutz, C., 2002. The Complexity of Description Logics with Concrete Domains, Hamburg">CrossRef

Miller, J. J., 2013. Graph database applications and concepts with Neo4j. In Proceedings of the southern association for information systems con- ference, 2324(36).

Novitsky, A., Reznychenko, V. & Romanov, E., 2016. Characteristics and quality metrics of electronic libraries in the semantic web. Software engineering, 1(25), pp. 17-36.

Novytskyi, O., Proskudina, G. & Ovdiy, O., 2014. Development of an digital library quality model. місце видання невідоме, Lviv Polytechnic Publishing House, p. 284-285.

Novytskyi, O., Proskudina, G. Y., Reznichenko, V. & Ovdiy, O., 2014. Evaluation of the quality of electronic libraries in the web environment. Software engineering, 20(4). CrossRef

Novytskyi, O. V., 2010. Data integration in the Internet: linked data. Kyiv, Institute of Software Systems of the National Academy of Sciences of Ukraine, pp. 487-493.

Raphael, V., Staab, S. & Motik, B., 2005. Incrementally maintaining materializations of ontologies stored in logic databases. Journal on Data Semantics, pp. 1-34. CrossRef

Schmidt-Schaubß, M. & Smolka, G., 1991. Attributive concept descriptions with complements. Artif. Intell, 48(1), pp. 1-26. CrossRef

Schroeck, M. et al., 2012. Analytics: The Real-World Use of Big Data, s.l.: IBM.

Shi, P., Fan, G., Li, S. & Kou, D., 2021. Big Data Storage Technology for Smart Distribution Grid Based on Neo4j Graph Database. IEEE 4th International Conference on Electronics Technology (ICET), pp. 441-445. CrossRef

Spirin, O. M. et al., 2012. Collective monograph. Electronic library information systems of scientific and educational institutions. Kyiv: Pedagogical press.

Stuart Ward, J. & Barker, A., 2013. Undefined By Data: A Survey of Big Data Definitions.

Suthaharan, S., 2014. Big data classification: Problems and challenges in network intrusion prediction with machine learning.. ACM SIGMETRICS Performance Evaluation Review, 41(4), pp. 70-73. CrossRef

Thorsten, B., Nizar, A., Kreutler, G. & Gerhard, F., 2004. Product Configuration Systems: State of the Art, Conceptualization and Extensions. Munich, University Library of Munich, pp. 25-36.

Trentin, A., Perin, E. & Forza, C., 2012. Product configurator impact on product quality. International Journal of Production Economics, 135(2), pp. 850-859. CrossRef

Wang, Y., Wenlong, Z. & Wayne, X. W., 2020. Needs-based product configurator design for mass customization using hierarchical attention net- work. IEEE Transactions on Automation Science and Engineering, 18(1), pp. 195-204. CrossRef

Wilkinson, M. et al., 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, pp. 1-9.

Woods, W. A., 1975. What's in a link: Foundations for semantic networks.. Representation and understanding, pp. 35-82. CrossRef



  • There are currently no refbacks.