Loop Block Profiling with Performance Prediction
Mohsin Khan, Maaz Ahmed, Waseem Ahmed, Rashid Mehmood, Abdullah Algarni, Aiiad Albeshri, Iyad Katib "Loop Block Profiling with Performance Prediction". International Journal of Computer Trends and Technology (IJCTT) V47(4):199-204, May 2017. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
Abstract -
With increase in the complexity of High
Performance Computing systems, the complexity of
applications has increased as well. To achieve better
performance by effectively exploiting parallelism
from High Performance Computing architectures, we
need to analyze/identify various parameters such as,
the code hotspot (kernel), execution time, etc of the
program. Statistics say that a program usually spends
90% of the time in executing less than 10% of the
code. If we could optimize even some small portion of
the 10% of the code that takes 90% of the execution
time we have a high probability of getting better
performance. So we must find the bottleneck, that is
the part of the code which takes a long time to run
which is usually called the hotspot. Profiling provides
a solution to the question: which portions of the
code should be optimized/parallelized, for achieving
better performance. In this research work we develop
a light-weight profiler that gives information about
which portions of the code is the hotspot and estimates
the maximum speedup that could be achieved,
if the hotspot is parallelized.
References
[1] D. C. Suresh, W. A. Najjar, F. Vahid, J. R. Villarreal, and
G. Stitt, “Profiling tools for hardware/software partitioning
of embedded applications,” in ACM SIGPLAN Notices,
vol. 38, pp. 189–198, ACM, 2003.
[2] D. A. Patterson, Computer architecture: a quantitative approach.
Elsevier, 2011.
[3] D. Binkley, “Source code analysis: A road map,” in Future of
Software Engineering, 2007. FOSE’07, pp. 104–119, IEEE,
2007.
[4] C. Dubach, J. Cavazos, B. Franke, G. Fursin, M. F. O’Boyle,
and O. Temam, “Fast compiler optimisation evaluation using
code-feature based performance prediction,” in Proceedings
of the 4th international conference on Computing frontiers,
pp. 131–142, ACM, 2007.
[5] H. Nilsson and P. Fritzson, “Lazy algorithmic debugging:
Ideas for practical implementation,” Automated and Algorithmic
Debugging, pp. 117–134, 1993.
[6] M. Harman, “The current state and future of search based
software engineering,” in 2007 Future of Software Engineering,
pp. 342–357, IEEE Computer Society, 2007.
[7] M. Woodside, G. Franks, and D. C. Petriu, “The future of
software performance engineering,” in Future of Software
Engineering, 2007. FOSE’07, pp. 171–187, IEEE, 2007.
[8] G. CanforaHarman and M. Di Penta, “New frontiers of reverse
engineering,” in 2007 Future of Software Engineering,
pp. 326–341, IEEE Computer Society, 2007.
[9] K. H. Bennett and V. T. Rajlich, “Software maintenance and
evolution: a roadmap,” in Proceedings of the Conference on
the Future of Software Engineering, pp. 73–87, ACM, 2000.
[10] A. Bertolino, “Software testing research: Achievements,
challenges, dreams,” in 2007 Future of Software Engineering,
pp. 85–103, IEEE Computer Society, 2007.
[11] D. Binkley and M. Harman, “Analysis and visualization
of predicate dependence on formal parameters and global
variables,” IEEE Transactions on Software Engineering,
vol. 30, no. 11, pp. 715–735, 2004.
[12] D. Crockford, “Json: Javascript object notation,” URL
http://www. json. org [Accessed: May 2017], 2006.
[13] L.-N. Pouchet, “Polybench: The polyhedral benchmark
suite (2011),” URL http://www-roc. inria. fr/˜
pouchet/software/polybench, 2015.
[14] S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer,
S.-H. Lee, and K. Skadron, “Rodinia: A benchmark suite
for heterogeneous computing,” in Workload Characterization,
2009. IISWC 2009. IEEE International Symposium on,
pp. 44–54, Ieee, 2009.
[15] E. Bendersky, “Pycparse (2010),” URl: https://github.
com/eliben/pycparser [Accessed: May 2017].
[16] S. Garcia, D. Jeon, C. Louie, and M. B. Taylor, “The
kremlin oracle for sequential code parallelization,” IEEE
Micro, vol. 32, no. 4, pp. 42–53, 2012.
[17] S. L. Graham, P. B. Kessler, and M. K. Mckusick, “Gprof:
A call graph execution profiler,” in ACM Sigplan Notices,
vol. 17, pp. 120–126, ACM, 1982.
[18] M. Kim, P. Kumar, H. Kim, and B. Brett, “Predicting
potential speedup of serial code via lightweight profiling and
emulations with memory performance model,” in Parallel
& Distributed Processing Symposium (IPDPS), 2012 IEEE
26th International, pp. 1318–1329, IEEE, 2012.
[19] C. von Praun, R. Bordawekar, and C. Cascaval, “Modeling
optimistic concurrency using quantitative dependence analysis,”
in Proceedings of the 13th ACM SIGPLAN Symposium
on Principles and practice of parallel programming,
pp. 185–196, ACM, 2008.
[20] P. Wu, A. Kejariwal, and C. Ca¸scaval, “Compiler-driven
dependence profiling to guide program parallelization,” in
International Workshop on Languages and Compilers for
Parallel Computing, pp. 232–248, Springer, 2008.
[21] A. Ketterlin and P. Clauss, “Profiling data-dependence to assist
parallelization: Framework, scope, and optimization,” in
Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM
International Symposium on, pp. 437–448, IEEE, 2012.
[22] L. Gao, J. Huang, J. Ceng, R. Leupers, G. Ascheid, and
H. Meyr, “Totalprof: a fast and accurate retargetable source
code profiler,” in Proceedings of the 7th IEEE/ACM international
conference on Hardware/software codesign and
system synthesis, pp. 305–314, ACM, 2009.
Keywords
Profiling, Loop Block Profile, Code
Analysis, Performance Prediction, Speedup Estimation.