Performance analysis of massively parallel programs for graphics processing units

D.V. Rahozin


Any modern Graphics Processing Unit (graphics card) is a good platform to run massively parallel programs. Still, we lack tools to observe and measure performance characteristics of GPU-based software. We state that due to complex memory hierarchy and thou- sands of execution threads the all performance issues are about efficient use of graphics card memory hierarchy. We propose to use GPGPUSim simulator, previously used mostly for graphics card architecture validation, for performance validation for CUDA-based program. We provide examples which show how to use the simulation for performance analysis of massively parallel programs.

Prombles in programming 2022; 3-4: 51-58


graphics processing unit; software performance; massive parallelism; simulation; software performance model

Full Text:



Khairy, M., Jain, A., Aamodt, T.M., & Rogers, T.G. (2019). A Detailed Model for Contemporary GPU Memory Systems. 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), p. 141-142. CrossRef

A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong and T. M. Aamodt, "Analyzing CUDA workloads using a detailed GPU simula- tor," 2009 IEEE International Symposium on Performance Analysis of Systems and Software, 2009, pp. 163-174, CrossRef

A. Jog, O. Kayiran, T. Kesten, A. Pattnaik, E. Bolotin, N. Chatterjee, S. W. Keckler, M. T. Kandemir, and C. R. Das. 2015. Anatomy of GPU Memory System for Multi-Application Execution. In Proc. of the 2015 International Symposium on Memory Systems (MEM- SYS '15). ACM, NY, USA, Pp. 223-234. CrossRef

M. A. Raihan, N. Goli and T. M. Aamodt, «Modeling Deep Learning Accelerator Enabled GPUs,» 2019 IEEE International Sympo- sium on Performance Analysis of Systems and Software (ISPASS), 2019, pp. 79-92, CrossRef

S. Barrachina, M. Castillo, F. D. Igual, R. Mayo and E. S. Quintana-Orti, «Evaluation and tuning of the Level 3 CUBLAS for graphics processors,» 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008, pp. 1-8, CrossRef

J. Kurzak, S. Tomov and J. Dongarra, «Autotuning GEMM Kernels for the Fermi GPU,» in IEEE Transactions on Parallel and Dis- tributed Systems, vol. 23, no. 11, pp. 2045-2057, Nov. 2012, CrossRef

Pavlo A. Ivanenko, Anatoliy Y. Doroshenko, and Kostiantyn A. Zhereb, TuningGenie: Auto-Tuning Framework Based on Rewriting Rules // in: 10th International Conference, ICTERI 2014, Kherson, Ukraine, June 9-12, 2014, Revised Selected Papers, Series: Com- munications in Computer and Information Science, (Ermolayev, V., Mayr, H.C., Nikitchenko, M., Spivakovsky, A., Zholtkevych, G. (Eds.)), Springer, CCIS Vol. 469, 2014. - PP. 139-160. CrossRef

Wu, Kui & Truong, Nghia & Yuksel, Cem & Hoetzlein, Rama. Fast Fluid Simulations with Sparse Volumes on the GPU. Eurographics/ Computer Graphics Forum. Vol 37. May 2018. pp. 157-167. CrossRef

Jain, Akshay & Rogers, Timothy. A Quantitative Evaluation of Contemporary GPU Simulation Methodology. ACM SIGMETRICS Performance Evaluation Review. Vol 46, June 2018. Pp. 103-105. CrossRef

M. Khairy, Z. Shen, T. M. Aamodt, T. G. Rogers. Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling, in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) CrossRef



  • There are currently no refbacks.