The Viability of Using Compression to Decrease Message Log Sizes

KB Ferreira and R Riesen and D Arnold and D Ibtesham and R Brightwell, EURO-PAR 2012: PARALLEL PROCESSING WORKSHOPS, 7640, 484-493 (2013).

Fault-tolerance and its associated overheads are of great concern for current and future extreme-scale systems. The dominant mechanism used today, coordinated checkpoint/restart, places great demands on the I/O system and the method requires frequent synchronization. Uncoordinated checkpointing with message logging addresses many of these limitations at the cost of increasing the storage needed to hold message logs. These storage requirements are critical to the scalability of extreme-scale systems. In this paper, we investigate the viability of using standard compression algorithms to reduce message log sizes for a number of key high-performance computing workloads. Using these workloads we show that, while not be a universal solution for all applications, compression has the potential to significantly reduce message log sizes for a great number of important workloads.

Return to Publications page