An automated and portable method for selecting an optimal GPU frequency

G Ali and M Side and S Bhalachandra and NJ Wright and Y Chen, FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 149, 71-88 (2023).

DOI: 10.1016/j.future.2023.07.011

Power consumption poses a significant challenge in current and emerging graphics processing unit (GPU) enabled high-performance computing systems. In modern GPUs, dynamic voltage frequency scaling (DVFS) appears to be a reliable control to regulate power consumption and performance. However, the DVFS design space is large -hence, brute-force approaches are infeasible to select the optimal frequency. Furthermore, no single frequency can be universally optimal for applications with varying computational intensities. Thus, the application's complexity and the availability of a wide range of frequency settings are a challenge in selecting the optimal frequency configuration for a given GPU workload. To that end, this paper proposes a systematic approach that consists of three steps. The feature characterization study identifies the fine-grain GPU utilization metrics that influence the power consumption and execution time of a given workload. To understand the performance, power, and energy consumption behaviors of a workload across GPU's DVFS design space, we derived analytical power and performance models using the identified fine-grain features. It is shown that the same set of GPU utilization metrics can estimate both the power consumption and execution time while being agnostic of changes to frequency and input sizes. Applying a power control with the single objective of reducing power may cause performance degradation, leading to more energy consumption. A multi-objective approach is proposed to select the optimal GPU DVFS configuration for a workload that reduces power consumption with negligible degradation in performance. The evaluation was conducted using SPEC ACCEL benchmarks and three real applications -NAMD LAMMPS, and LSTM on NVIDIA GV100, GA100, and AMD MI210 GPUs. On average, real applications showed 29.6% energy savings with a performance loss of 5.2% on GA100 and 22.6% energy savings with a performance loss of 4.7% on GV100. Moreover, the proposed models are portable to real applications, GPU architectures, and vendors, and require metric collection at only the default frequency rather than all supported DVFS configurations. Additionally, we conducted a comparison between our models and the GPU assembly instructions (PTX)-based static models. The results revealed a significant reduction in the average error rates, with a decrease from 19.7% to 3.1% for power models and from 29.4% to 5.2% for performance models.& COPY; 2023 Elsevier B.V. All rights reserved.

Return to Publications page