Optimization of large datasets processing in cluster systems

E.V. Nazarenko; V.G. Tulchinsky; P.G. Tulchinsky

Optimization of large datasets processing in cluster systems

E.V. Nazarenko, V.G. Tulchinsky, P.G. Tulchinsky

Abstract

For tasks of massive dataset processing the file storage usually provided to be a bottleneck. The data compression affect on computation speed is examined. On the base of the earlier built model the minimal execution time estimates with accounting the pack/unpack time and the data compression coefficient are obtained. The obtained estimates have been verified on the sample of time cubes compression for acceleration of seismic migration procedure.

Problems in programming 2010; 2-3: 149-154

Full Text:

PDF (Русский)

References

Oldfield R., Kotz D. Scientific applications using parallel I/O / High Performance Mass Storage and Parallel I/O: Technologies and Applications. – New York: IEEE Computer Society Press and John Wiley & Sons. – 2001. – P. 655–666.

Hennessy J.L., Patterson D.A. Computer architecture: A quantitative approach. – Boston: Morgan Kaufmann Publishers. – 2006. – 696 p.

Amdahl G.M. Validity of the single-processor approach to achieving large scale computing capabilities // Proc. AFIPS Spring Joint Computer Conf. – Atlantic City: AFIPS Press. – 1967. – P. 483–485.

Kumar V., Gupta A. Analyzing Scalability of Parallel Algorithms and Architectures // J. of Parallel and Distributed Computing. – 1994. – Vol. 22. – P. 379–391.

Глушков В.М., Капитонова Ю.В., Летичевский А.А. К теории проектирования схеммного и программного оборудования

микропроцессорных ЭВМ // Кибернетика. – 1978. – № 6. – С. 1–15.

Bell J., Casasent D., Bell C.G. An Investigation of Alternative Cache Organizations // IEEE Transactions on Computers. – 1974. – Vol. C-23

(4). – P. 346–351.

Kung H. T. Memory requirements for balanced computer architectures // Proceedings of the IEEE Symp. on Computer Architecture. – IEEE

Press. – 1986. – P. 49–54.

Nitzberg B. Performance of the iPSC/860 Concurrent File System / Technical Report RND-92-020 of NAS Systems Division. – NASA Ames. – 1992.

Crandall P. E., Aydt R. A., Chien A. A., Reed. D. A. Input/output characteristics of scalable parallel applications // Proc. of Supercomputing ’95 – San Diego: IEEE Computer Society Press. – 1995.

Nieuwejaar N., Kotz D., Purakayastha A., Ellis C. S., Best M. File-access characteristics of parallel scientific workloads // IEEE Transactions on Parallel and Distributed Systems. – 1996. – No 7(10). – P. 1075–1089.

Фальфушинский В.В. Кэширование в кластерных системах // Компьютерная математика. – 2008. – №2. – C. 64–73.

Nitzberg W. J. Collective Parallel I/O. (PhD thesis). – Oregon: University of Oregon. – 1995.

Corbett P. F., D. G. Feitelson. The Vesta parallel file system / High Performance Mass Storage and Parallel I/O: Technologies and Applications. - New York: IEEE Computer Society Press and Wiley. – 2001. – P. 285–308.

Moyer S. A., Sunderam V. S. PIOUS: a scalable parallel I/O system for distributed computing environments // Proc. of Scalable High

Performance Computing Conf. – New York: IEEE Computer Society Press. – 1994. – P. 71–78.

Nieuwejaar N., Kotz D. The Galley parallel file system // Parallel Computing. – No 23(4). – 1997. – P. 447–476.

Schmuck R. L. F. B. GPFS: A shared-disk file system for large computing clusters // Proc. of 5th Conf. on File and Storage Technologies. – San Jose: IBM Almaden Research Center. – 2002. (www.usenix.org/events/fast02/full_papers/schmuck/schmuck.pdf).

Schwan P. Lustre : Building a file system for 1,000-node clusters. // ACM Transactions on Computer Systems (TOCS). – 2003. – Vol. 14(2). – P.200-222. (http://www.kernel.org/doc/ols/2003/ols2003-pages-380-386.pdf)

Lebre A., Huard G., Denneulin Y. I/O Scheduling Service for Multi-Application Clusters // IEEE Int. Conf. on Volume. – Barcelona: IEEE

Computer Society Press. – 2006. – Р. 21–23.

Kotz D. Disk-directed I/O for MIMD multiprocessors. // Proc. of the 1994 Symp. on Operating Systems Design and Implementation. –

Dartmouth: IEEE Computer Society Press. – 1994. – P. 61–74.

Seamons K. E., Chen Y., Jones P., Jozwiak J., Winslett M. Server-directed collective I/O in Panda. // Proc. of Supercomputing '95. – San Diego: IEEE Computer Society Press. – 1995.

Bennett R., Bryant K., Sussman A., Das R., Saltz J. Jovian: A framework for optimizing parallel I/O. // Proc. of Scalable Parallel Libraries Conf. - Mississippi State: IEEE Computer Society Press. – 1994. – P.10–20.

http://www.mcs.anl.gov/research/projects/romio/

http://www.nersc.gov/nusers/resources/software/libs/io/mpiio.php

http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf

http://aioli.imag.fr/

Oldfield R., Kotz D. Armada: A parallel I/O framework for computational grids // Future Generation Computing Systems (FGCS). – 2002. – Vol. 18(4). – P. 501–523.

Гречко В.О., Гудзенко В.В. Продуктивность параллельной СУБД в кластерных системах // Компьютерная математика. – 2005. – №3. – C. 71-79.

Тульчинский В.Г., Чарута А.К. Оценка времени обработки данных в кластерных системах // Проблеми програмування. – 2006. – №2-3. – C. 118-123.

Перевозчикова О.Л., Тульчинский В.Г., Ющенко Р.А. Построение и оптимизация параллельных компьютеров для обработки больших

объемов данных // Кибернетика и системный анализ. – 2006. – №4. – С. 117–129.

Donoho P.L. Seismic data compression: Improved data management for acquisition, transmission, storage, and processing. – Proc. Seismic. – 1998.

Wang W., Mishra P. A Partitioned Bitmask-based Technique for Lossless Seismic Data Compression. CISE Technical Report # 08-452. – University of Florida. – 2008. – 10 p. (http://www.cise.ufl.edu/~prabhat/Publications/seismicTR.pdf)

Исаев В.К., Плотников CA. Обратная задача Чебышева и сплайны Чебышева // Труды Математического института РАН. – 1995. – Т.221. – C. 164-185.

Каленчук-Порханова А.А., Вакал Л.П. Наилучшая чебышевская аппроксимация для сжатия численной информации // Компьютерная математика. – 2009. – № 1. – С. 99–107.

http://www.inparcom.com/

Refbacks

There are currently no refbacks.

Username
Password
Remember me