Means and methods of the unstructured data analysis

J.V. Rogushina

Abstract


Analysis of the current trends in the unstructured text data  wide usage  and the development of software tools for their processing causes the high urgency of this research direction and the necessity of intelligent information systems in such processing. A signigicant part of Big Data consists of unstructured texts that require the further development of specific Text Mining and algorythms of machine learning. Unstructured  data consisting of natural language text in the general case, do not have a predetermined data model. Their ambiguity, heterogeneity and context dependence considerably complicate the classification of documents, the identification of their components and the automated obtaining of user-oriented knowledge from their content, while the large volumes and dynamism of such data do not involve efficient manual processing. The means and methods of data structuring, their various software implementations are considered. The prospects of using background knowledge for such structuring are analyzed. The feasibility of application such W3C standards as RDF and OWL is substantiated. The use of semantic Wiki-technologies for development of distributed information resources simplifies the process of natural text structuring by users and also generates the source of background knowledge for the analysis of arbitrary texts of the corresponding domains. The models and methods proposed in the work allow to improve this process.

Problems in programming 2019; 1: 57-77


Keywords


unstructured data; ontology; Text Mining; Semantic Web; Wiki

References


Grimes S. Unstructured Data and the 80 Percent Rule, 2008, Clarabridge, Bridgepoints. http://breakthroughanalysis. com/2008/08/01 /unstructured-data-and-the-80-percent-rule/.

Unstructured data in big data environment. https://ru.howtodou.com/unstructured-data-in-big-data-environment.

Unstructured_data. https://en.wikipedia.org/ wiki/Unstructured_data.

Grimes S. A Brief History of Text Analytics. B Eye Network, 2016. http://www.b-eye-network.com/view/6311.

Buneman P., Davidson S., Fernandez M., Suciu D. Adding structure to unstructured data. // International Conference on Database Theory, 1997. P. 336–350. CrossRef

Gladun A.Ya., Rogushina Y.V. Data Mining: Finding Knowledge in Data. K .: ADEF-Ukraine Ltd., 2016. 452 p. [in Ukrainian]

Feldman R., Sanger, J. The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge university press, 2007. https://wtlab.um.ac.ir/images/e-library/text_mining/The%20Text%20Mining%20HandBook.pdf.

Shakhovska N. Features of modeling of data spaces, 2007. ena.lp.edu.ua/bitstream/ntb/ 35116/1/24_139-148.pdf. [in Ukrainian]

Sadalage P., Fowler M. NoSQL Distilled. Pearson Education, 2012. 192 p.

Golovkov V., Portnov A., Chernov V. RDF as a tool for unstructured data // Open Systems. https://www.osp.ru/os/2012/09/13032513/. [in Russian]

Autonomy IDOL. http://www.autonomy. com/content/Products/products-idol-server/index.en.html.

Lyte V., Jones S., Ananiadou S., Kerr L. UK institutional repository search: innovation and discovery, 2009. http://www.ariadne.ac.uk/ issue/61/ lyte-et-al/.

Lamantia J. 10 Information Retrieval Patterns, 2006. http://www.joelamantia.com/ information-architecture/10-information-retrieval-patterns.

Faceted classification. http://uk.wikipedia. org/wiki/ Фасетна_класифікація. [in Russian]

Noruzi A. Application of Ranganathan's Laws to the Web. http://www.webology.org/ 2004/v1n2/a8.html.

Serbin O.O. Features of the faceted classification of documents under the modern transformation of the book science content // Book culture in the context of international contacts: Proc| of the III International Scientific Conference, Minsk: Central Scientific Library of the National Academy of Sciences of Belarus, 2015. P. 457–462. http://eprints.rclis.org/25289/1/serbin.pdf. faceted classification. [in Russian]

Oganesyan A. Unstructured Data 2.0 // Open Systems. N 04, 2012. https://www.osp.ru/ os/2012/04/13015772/. [in Russian]

Wagner C. Wiki: A technology for conversational knowledge management and group collaboration // The Communications of the Association for Information Systems. 2004. Vol. 13(1). P. 264–289. http://aisel. aisnet.org/cgi/viewcontent.cgi? article= 3238& context= cais. CrossRef

MediaWiki. https://www.mediawiki.org/ wiki/MediaWiki.

Rogushina Y.V., Priyma S.M, Strokan O.V. Creating and use of the Semantic Wiki resources: tutorial. Melitopol, FOP Odnorog T.V., 2017. 169 p. [in Ukrainian]

Rogushina J. Analysis of Automated Matching of the Semantic Wiki Resources with Elements of Domain Ontologies. International Journal of Mathematical Sciences and Computing (IJMSC). 2017.




DOI: https://doi.org/10.15407/pp2019.01.057

Refbacks

  • There are currently no refbacks.