DRCCTPROF: A Fine-Grained Call Path Profiler for ARM-Based Clusters
QD Zhao and X Liu and M Chabbi, PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20) (2020).
DOI: 10.1109/SC41405.2020.00034
ARM is an attractive CPU architecture for exascale systems because of its energy efficiency. As a recent entry into the HPC paradigm, ARM lags in its software stack, especially in the performance tooling aspect. Notably, there is a lack of fine-grained measurement tools to analyze fully optimized HPC binary executables on ARM processors. In this paper, we introduce DRCCTPROF - a fine-grained call path profiling framework for binaries running on ARM architectures. The unique ability of DRCCTPROF is to obtain full calling context at any and every machine instruction that executes, which provides more detailed diagnostic feedback for performance optimization and correctness tools. Furthermore, DRCCTPROF not only associates any instruction with source code along the call path, but also associates memory access instructions back to the constituent data object. Finally, DRCCTPROF incurs moderate overhead and provides a compact view to visualize the profiles collected from parallel executions.
Return to Publications page