I/O benchmarking of data intensive applications

Olga Mordvinova, Thomas Ludwig, Christian Bartholomä

Abstract


The increasing computerization of the society over the last decade led to the increased data volumes stored over the world. The need to handle and store these massive amounts of data, arising from diverse sources as scientific records, web pages, or social networks has created a new class of application – data intensive applications. Usually designed up to the specific application requirements, one of these most challenging questions is choice of the appropriate back-end. The I/O benchmarking tools can easy this decision process. However, despite of its high variety, there is a lack of portable and easily adaptable benchmarks that can correspond to the real application behavior. The programmable I/O benchmark Parabench tries to close this gap. Its input is based on access patterns, which can be adjusted to the application, for which the system is to be used. Our work concentrates on ability of Parabench in mimicking real applications. We describe here its capabilities to handle MPI-I/O and POSIX and present a modeling example of a data intensive application from the field of business intelligence.

Problems in programming 2010; 2-3: 107-114


Keywords


I/O Benchmarking, Data Intensive Applications; Access Patterns; POSIX I/O; MPI-I/O

Full Text:

PDF

References


ASCI I/O Stress Benchmark. http://www.llnl.gov/asci/purple/benchmarks/limited/ior/

b_eff_io Benchmark. https://fs.hlrs.de/projects/par/mpi//b_eff_io/

Biardzki, Ch., Ludwig. Th. Analyzing Metadata Performance in Distributed File Systems. In: Proc. of PaCT’09. Springer (2009) 8–18

Borrill, J., Oliker, L., Shalf, J., Shan, H. Investigation of leading HPC I/O performance using a scientific-application derived benchmark. In: Proc. of SC’07, ACM (2007) 1–12

Bryant, R.A. Data-Intensive Supercomputing: The Case for DISC. Technical Report, CMU (2007)

Burns, R., Dorin, R. The SAP NetWeaver BI Accelerator.Transforming Business Intelligence. Technical report, Winter Corporation (2006)

Burns, R. Large-scale testing of the SAP NetWeaver BI Accelerator on an IBM Platform. Technical report. Winter Corporation (2008)

Bison Manual. http://www.gnu.org/software/bison/manual/

Dean, J., Ghemawat, S. MapReduce: Simplified data processing on large clusters. In Operating Systems Design and Implementation, 2004.

DeWitt, D.J., Madden, S., Stonebraker, M. How to Build a High-Performance Data Warehouse. Technical article. MIT (2009)

http://db.lcs.mit.edu/madden/high_perf.pdf

Flex Manual. http://flex.sourceforge.net/manual/

FileBench Benchmark. http://hub.opensolaris.org/bin/view/Community+Group+performance/filebench.

Filesystem IO Test Program BWT. http://people.web.psi.ch/stadler h/

Ghemawat, S., Gobioff H., Leung, S.-H. The Google File System. In: Proc. of SOSP’03. ACM (2003) 29–43 http://www.cs.rochester.edu/sosp2003/papers/p125-ghemawat.pdf

Gnome Library. http://library.gnome.org.

Gropp, W.D., Lusk, E.L., Ross, R.B., Thakur, R. Using MPI-2: Advanced features of the message passing interface. In: CLUSTER, IEEE Computer Society (2003)

IEEE POSIX Certification Authority. http://standards.ieee.org/regauth/posix/

IOzone Filesystem Benchmark. http://www.iozone.org/

Isard, M., et al. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys 2007, March 2007.

Krietemeyer, M., Versick. D. Tavangarian, D. The PRIOmark Parallel I/O-Benchmark. In: Proc. of IASTED’05. IASTED Press (2005)

Ludwig, Th. Research Trends in High Performance Parallel Input/Output for Cluster Environments.In: Procs. of UkrPROG’04. NASU (2004) 274–281

May, J. Pianola: A Script-based I/O Benchmark. In: Proc. of SC’08. IEEE (2008)

Message Passing Interface Forum, MPI. A message-passing interface standard. Version 2.1 (June 2008).

Mordvinova, O., Runz, D., Kunkel, J.M., Ludwig, T. I/O performance evaluation with Parabench – Programmable I/O Benchmark. In: Proc.of. ICCS’10 [To Appear], ICCS (2010)

Mordvinova, O., Shepil, O., Ludwig, T., Ross, A. A strategy for cost efficient distributed data storage for in-memory OLAP. In: Proc. IADIS Int. Conf. Applied Computing 2009. Volume I., IADIS Press (2009) 109–117.

Mushran, Sunil. OCFS2 – A cluster file system for Linux. User’s guide for release 1.4. http://oss.oracle.com/projects/ocfs2/dist/

documentation/v1.4/ocfs2-1_4-usersguide.pdf

Network Block Device. http://nbd.sourceforge.net/

Network File System. http://sourceforge.net/projects/nfs/

Rabenseifner R., Koniges, A. E. Effective communication and file-I/O bandwidth benchmarks. In: Proc. of PVM/MPI’01, Springer (2001) 24–35

Ross, A. SAP NetWeaver BI Accelerator. Galileo Press (2009)

Schmuck F., Haskin R. GPFS: A Shared-Disk File System for Large Computing Clusters. In: Proc. of FAST’02, USENIX Association (2002) 231–244

Weil, S. A. et al. Ceph: A scalable, high-performance distributed file system. In: Proc. 7th Symposium on Operating Systems Design and Implementation, OCDI (2006) 307–320

Wong, P., Van der Wijngaart, R. F. NAS Parallel Benchmarks I/O Version 2.4, Technical Report NAS-03-002 (2003).


Refbacks

  • There are currently no refbacks.