An Efficient Lossless Compression Algorithm for Trajectories of Atom Positions and Volumetric Data

M Brehm and M Thomas, JOURNAL OF CHEMICAL INFORMATION AND MODELING, 58, 2092-2107 (2018).

DOI: 10.1021/acs.jcim.8b00501

We present our newly developed and highly efficient lossless compression algorithm for trajectories of atom positions and volumetric data. The algorithm is designed as a two-step approach. In the first step, efficient polynomial extrapolation schemes reduce the information entropy of the data by exploiting both spatial and temporal continuity. The second step processes the data by a series of transformations (Burrows-Wheeler, move-to-front, run length encoding) and finally compresses the stream with multitable canonical Huffman coding. Our approach reaches a compression ratio of around 15:1 for typical position trajectories in the XYZ format. For volumetric data trajectories in Gaussian Cube format (such as electron density), even a compression ratio of around 35:1 is yielded, which is by far the smallest size of all formats compared here. At the same time, compression and decompression are still reasonably fast for everyday use. The precision of the data can be selected by the user. For storage of the compressed data, we introduce the BQB file format, which is very robust, flexible, and efficient. In contrast to most archiving formats, it allows fast random access to individual trajectory frames. Our method is implemented in C++ and provided as free software under the GNU LGPL license. It has been included in the TRAVIS program package but is also available as stand-alone tool and as a library ("libbqb") for use in other projects.

Return to Publications page