Extension of the program synthesis system to analyze large data sets

O.M. Ovdii

Abstract


Analysis of large data sets is a challenging problem in the modern world. This paper presents an extension of the online dialog designer of syntactically correct programs ODSP for programs designing and synthesis for analyzing large data sets based on Apache Hadoop software. The advantage of the proposed approach is the use of a method that ensures the syntactic correctness of algorithms and programs that were designed. Experiments have been carried out and the operation of the system is illustrated by the example of program designing for analyzing a large meteorological data set. This approach is promising for the scientific research conducting, in particular in the field of meteorology.

Problems in programming 2018; 2-3: 068-074


Keywords


design and synthesis of programs; distributed computing; Big Data; Map Reduce; Hadoop

References


Lynch C. (2008) Big data: How do your data grow? Nature, 455(7209), P. 28-29.

https://doi.org/10.1038/455028a

NIST Big Data Interoperability Framework: Volume 1, Definitions. [online] Available at: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1500-1.pdf [Accessed 26 Feb. 2018].

Doroshenko, A.Yu., Beketov, O.G., Ivaniv R.B., Iovchev, V.O., Myronenko, I.O. & Yatsenko, O.A. (2015) Automated generation of parallel programs for graphics processing units based on algorithm schemes. Problems in programming. (1). P. 19-28. (in Ukrainian).

Andon, P.I., Doroshenko, A.Yu., Beketov, O.G., Iovchev, V.O. & Yatsenko O.A. (2015) Software tools for automation of parallel programming on the basis of algebra of algorithms. Cybernetics and systems analysis. (1). P. 162-170. (in Russian).

https://doi.org/10.1007/s10559-015-9706-0

Doroshenko, A.Yu., Ivanenko, P.A., Ovdii, O.M., & Yatsenko, O.A. (2016) Automated design of programs for solving the task of meteorological forecasting. Problems in programming. (1). P. 102-115. (in Ukrainian).

Andon, P.I. et al. (2007) Algebra-algorithmic models and methods of parallel programming. Kiev: Academperiodika. (in Russian).

Doroshenko, A.Yu. & Yatsenko O.A. (2006) About the synthesis of Java programs by algebra-algorithmic specifications. Problems in programming. (4). P. 58-70. (in Russian).

Yatsenko O.A. (2013) Integration of algebra-algorithmic tools and term rewriting for efficient parallel programs development. Problems in programming. (2). P. 62-70. (in Russian).

Doroshenko, A.Yu., Beketov, O.G. Yatsenko, O.A., Pavliuchyn, T.O. & Vitriak, I.A. (2014) Development of the service-oriented soft-ware for launching parallel programs on a multiprocessor cluster. Problems in programming. (4). P. 3-14. (in Ukrainian).

Doroshenko A.Yu., Ovdii O.M., Yatsenko O.A. (2017) Ontological and algebra-algorithmic tools for automated design of parallel programs for cloud platforms. Cybernetics and Systems Analysis. 53(2). P. 181-192. (in Russian). 11. Hadoop.apache.org. Apache Hadoop Official Website. [online] Available at: http://hadoop.apache.org/ [Accessed 26 Feb. 2018].

https://doi.org/10.1007/s10559-017-9932-8

White T. (2015) Hadoop: The Definitive Guide, 4th Edition. O'Reilly Media, Inc.

Pig.apache.org. Apache Pig Official Website. [online] Available at: http://pig.apache.org/ [Accessed 26 Feb. 2018].

Gates A. (2011) Programming Pig Dataflow Scripting with Hadoop. O'Reilly Media.

Olston Ch., Reed B., Srivastava U., Kumar R. & Tomkins A. (2008) Pig latin: A not-so-foreign language for data processing. In: ACM SIGMOD Int'l Conference on Management of Data, P. 1099-1110.

https://doi.org/10.1145/1376616.1376726

Dean J. & Ghemawat S. (2004) Mapreduce: Simplified data processing on large clusters. In: 6th USENIX OSDI, P. 137-150.

Atkinson M., Gesing S., Montagnat J. & Taylor I. (2017) Scientific workflows: Past, present and future. Future Generation Computer Systems, Elsevier. 75. P. 216-227.

https://doi.org/10.1016/j.future.2017.05.041

Singh M.P. & Vouk M.A. (1996) Scientific workflows: scientific computing meets transactional workflows. In: NSF Workshop on Workflow and Process Automation in Information Systems: State-of-the-Art and Future Directions Univ. Georgia, Athens, GA, USA. P. 28-34.

Doroshenko, A.Yu., Ivanenko, P.A., Ovdii, O.M., Pavliuchyn, T.O. & Vitriak, I.A. (2015) Creation of an Internet portal providing meteorological forecasting services on multiprocessor platform. Problems in programming. (3). P. 24-32. (in Ukrainian).

Noaa.gov. National Oceanic and Atmospheric Administration (NOAA). [online] Available at: http://www.noaa.gov/ [Accessed 26 Feb. 2018].

Ncdc.gov. National Climatic Data Center (NCDC). [online] Available at: https://www.ncdc.noaa.gov/ [Accessed 26 Feb. 2018].




DOI: https://doi.org/10.15407/pp2018.02.068

Refbacks

  • There are currently no refbacks.