A generalized statistics-based model for predicting network-induced variability

S Chunduri and E Jennings and K Harms and C Knight and S Parker, PROCEEDINGS OF 2019 IEEE/ACM PERFORMANCE MODELING, BENCHMARKING AND SIMULATION OF HIGH PERFORMANCE COMPUTER SYSTEMS (PMBS 2019), 59-72 (2019).

DOI: 10.1109/PMBS49563.2019.00013

Shared network topologies, such as dragonfly, subject applications to unavoidable inter-job interference arising from congestion on shared network links. Quantifying the impact of congestion is essential for effectively assessing and comparing the application runtimes. We use network performance counter-based metrics for this quantification. We claim and demonstrate that by using a local view of congestion captured through the counters monitored during a given application run, we can accurately determine the run conditions and thereby estimate the impact on the application's performance. We construct a predictive model that is trained using several applications with distinctive communication characteristics run under production system conditions with a 91% accuracy for predicting congestion effects.

Return to Publications page