Using metadata to resolve big data problems

O.V. Zakharova

Abstract


Today, the volumes of data used by application sys­tems are growing exponentially and have reached such si­zes that they cannot be processed by traditional sys­tems. So the term "Big data" appeared. The main prob­lems of such data sets are associated, first of all, not on­ly with their volumes, but also with the variety and com­plexity of the information they contain. Thus, along with the growth of data volumes and the number of big data initiatives, the metadata become the most im­portant priority for the success of large data projects. En­terprises understand that the full use of the ope­ra­ti­o­nal potential of machine learning, in-depth learning and ar­tificial intellect requires the unprocessed data was sup­plemented with metadata. Therefore, the purpose of this work is to analyze the effect of metadata to solving the big data problems, determine the main categories of da­ta to be annotated by metadata, and the main types of metadata used for this. Today, metadata is a means of classifying, organizing, and characterizing data or its contents. De­­pending on the role they play in solving big data problems, NISO identifies four main types of metadata: administrative, descriptive, structural, and markup languages. Dif­fe­rent types of metadata can be used in a certain way to ef­fectively solve problems of management, search, data in­tegration, etc. A separate issue is the way of their creation/automatic generation, since the manual cre­ation of metadata is a laborious process, and their vo­lume is often several times larger than the volume of the data itself.

 

 


Keywords


big data analytic; big data management; metadata; annotation; machine learning; Hadoop; meta­data classification; structural metadata; descriptive metadata; administrative metadata; data integration; ontology; linked data; data semantics

References


https://whatis.techtarget.com/definition/ metadata

https://www.gartner.com/doc/3075917/reasons-big-data-needs-metadata

https://www.datasciencecentral.com/profiles/blogs/why-you-need-metadata-for-big-data-success

https://hbr.org/2013/05/little-data-makes-big-data-mor

hts://blogs.loc.gov/loc/2010/04/how-tweet-it-is-library-acquires-entire-twitter-archive/

https://www.datasciencecentral.com/profiles/blogs/importance-of-metadata-in-a-big-data-world

http://framework.niso.org/24.html

https://groups.niso.org/apps/group_public/download.php/17443/understanding-metadata

https://www.i-scoop.eu/big-data-action-value-context/data-lakes/

https://groups.niso.org/apps/group_public/download.php/17443/understanding-metadata

"OWL Web Ontology Language Overview," W3C Recommendation, 2004, http:// www.w3.org/TR/owl-features/.

http://www.w3.org/TR/rdf-schema/

Blake J. A. and Bult C. J. "Beyond the data deluge: data integration and bio-ontologies". Journal of Biomedical Informatics. 2006. Vol. 39, N 3. P. 314-320, View at Publisher · View at Google Scholar ·View at Scopus. CrossRef

Viti F., Merelli I., Calabria A. et al., "Ontology-based resources for bioinformatics analysis," International Journal of Metadata, Semantics and Ontologies. 2011. Vol. 6, N 1. P. 35-45. View at Publisher · View at Google Scholar · View at Scopus. CrossRef

Osborne J. D., Flatow J., Holko M. et al. "Annotating the human genome with disease ontology," BMC Genomics. 2009. Vol. 10, supplement 1, article S6. View at Publisher·View at Google Scholar·View at Scopus. CrossRef

https://www.w3.org/DesignIssues/LinkedData.html




DOI: https://doi.org/10.15407/pp2019.02.081

Refbacks

  • There are currently no refbacks.