Using metadata to resolve big data problems

O.V. Zakharova


Today, the volumes of data used by application sys­tems are growing exponentially and have reached such si­zes that they cannot be processed by traditional sys­tems. So the term "Big data" appeared. The main prob­lems of such data sets are associated, first of all, not on­ly with their volumes, but also with the variety and com­plexity of the information they contain. Thus, along with the growth of data volumes and the number of big data initiatives, the metadata become the most im­portant priority for the success of large data projects. En­terprises understand that the full use of the ope­ra­ti­o­nal potential of machine learning, in-depth learning and ar­tificial intellect requires the unprocessed data was sup­plemented with metadata. Therefore, the purpose of this work is to analyze the effect of metadata to solving the big data problems, determine the main categories of da­ta to be annotated by metadata, and the main types of metadata used for this. Today, metadata is a means of classifying, organizing, and characterizing data or its contents. De­­pending on the role they play in solving big data problems, NISO identifies four main types of metadata: administrative, descriptive, structural, and markup languages. Dif­fe­rent types of metadata can be used in a certain way to ef­fectively solve problems of management, search, data in­tegration, etc. A separate issue is the way of their creation/automatic generation, since the manual cre­ation of metadata is a laborious process, and their vo­lume is often several times larger than the volume of the data itself.




big data analytic; big data management; metadata; annotation; machine learning; Hadoop; meta­data classification; structural metadata; descriptive metadata; administrative metadata; data integration; ontology; linked data; data semantics

