Resource allocation for task-level speculative scientific applications: A proof of concept using Parallel Trajectory Splicing

A Garmon and V Ramakrishnaiah and D Perez, PARALLEL COMPUTING, 112, 102936 (2022).

DOI: 10.1016/j.parco.2022.102936

The constant increase in parallelism available on large-scale distributed computers poses major scalability challenges to many scientific applications. A common strategy to improve scalability is to express algorithms in terms of independent tasks that can be executed concurrently on a runtime system. In this manuscript, we consider a generalization of this approach where task-level speculation is allowed. In this context, a probability is attached to each task which corresponds to the likelihood that the output of the speculative task will be consumed as part of the larger calculation. We consider the problem of optimal resource allocation to each of the possible tasks so as to maximize the total expected computational throughput. The power of this approach is demonstrated by analyzing its application to Parallel Trajectory Splicing, a massively-parallel long-time-dynamics method for atomistic simulations.

Return to Publications page