Technological solutions for intelligent analysis of Big Data. Programming languages

I.Y. Grishanova, J.V. Rogushina

Abstract


We consider the problems arising during in the process of application of  data analysis methods to Big Data. Modern programming languages are analyzed from the point of view of efficiency of their application for development of machine learning (ML) tools focused on Big Data.

We analyzed the main types of machine learning tasks associated with information acqusition from Big Data that can be useful for practical use. This analysis shows that these tasks are solved by methods of statistical processing and training of neural networks. Therefore, it is advisable to have appropriate libraries in software tools aimed at solving these problems.

Availability of the large number of ML algorithms that are focused on the different types of input information and different representations of result knowledge indicates the need for specialized libraries of machine learning implemented these algorithms. Another important factor in choosing a tool environment where  ML tasks are solved for Big Data is processing speed: this requirement is caused by the large volumes of data to be compiled.

External services for ML and Big Data processing , proposed by Google, Amazon, etc., greatly simplify the process of developing of intelligent data analysis tools for those programming languages that support the use of such services.

Thus, for creation of experimental prototypes that combine modern approaches to machine learning with elements of artificial intelligence (AI) the most suitable programming language is Python. This conclusion is also confirmed by the world's results of surveys of developers in the field of Data Sciences. But other programming languages analyzed in this paper can become more useful under certain additional conditions: for example, C++ for projects oriented on specific software and hardware or Java and Scala for corporate applications.

Problems in programming 2018; 4: 45-58

 


 


Keywords


Big Data; intelligent data analysis; machine learning

References


Lynch C. Bigdata: How do your data grow? Nature. 2008. Vol. 455, N 7209. P. 28–29. CrossRef

Gandomi A., Haider M. Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management. 35 (2). 2015. P. 137–144. CrossRef

The Fourth Paradigm: Data-Intensive Scientific Discovery. 2009. http://research.microsoft.com/enus/collaboration/fourthparadigm

Cheharin E.E. Big data: big problems. Perspectives of sciences and education, 2016. N 3 (21). [in Russian].

Gladun A.Y., Rogushina J.V. Data Mining: retrieval of knowlegde into data. K.: ADEF-Ukraine, 2016. 452 p. [in Ukrainian].

TensorFlow. https://www.tensorflow.org/get_ started/get_started

The R Project for Statistical Computing. https://www.r-project.org

Python. – https://www.python.org

Gensim. – https://radimrehurek.com/gensim/

Java. https://www.oracle.com/technetwork/ java/index.html

Top 5 libraries of machine learning for Java. – https://javarush.ru/groups/posts/254-top-5-bibliotek-mashinnogo-obuchenija-dlja-java. [in Russian].

Weka. https://www.cs.waikato.ac.nz/ml/weka/ index.html

MOA – Massive On-Line Analysis. https://moa.cms.waikato.ac.nz/

The Scala Programming Language. https://www.scala-lang.org

The Features of C++ as a Language. http://www.cplusplus.com/ info/description/

PHP-ML. https://php-ml.readthedocs.io/en/ latest/

Fast Artificial Neural Network или FANN. – http://php.net/manual/ru/book.fann.php

Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018: Trends and Analysis. – https://www.kdnuggets.com/2018/05/poll-tools-analytics-data-science-machine-learning-results.html

Voskoglou C. What is the best programming language for Machine Learning? – https://towardsdatascience.com/what-is-the-best-programming-language-for-machine-learning-a745c156d6b7




DOI: https://doi.org/10.15407/pp2018.04.045

Refbacks

  • There are currently no refbacks.