Performance Analysis and Optimization of Molecular Dynamics Simulation on Godson-T Many-core Processor
L Peng and A Nakano and GM Tan and P Vashishta and DR Fan and H Zhang and RK Kalia and FL Song, PROCEEDINGS OF THE 2011 8TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF 2011), 32 (2011).
DOI: 10.1145/2016604.2016643
Molecular dynamics (MD) simulation has broad applications, but its irregular memory-access pattern makes performance optimization a challenge. This paper presents a joint application/architecture study to enhance on-chip parallelism of MD on Godson-T-like many-core architecture. First, a preprocessing leveraging an adaptive divide-and- conquer framework is designed to exploit locality through memory hierarchy with software controlled memory. Then we propose three incremental optimization strategies: (1) a novel data-layout to re- organize linked-list cell data structures to improve data locality; (2) an on-chip locality-aware parallel algorithm to enhance data reuse; and (3) a pipelining algorithm to hide latency to shared memory. Experiments on Godson-T simulator exhibit strong-scaling parallel efficiency 0.99 on 64 cores, which is confirmed by an FPGA emulator. Detailed analysis shows that optimizations utilizing architectural features to maximize data locality and to enhance data reuse benefit scalability most. Furthermore, a simple performance model suggests that the optimization scheme is likely to scale well toward exascale. Certain architectural features are found essential for these optimizations, which could guide future hardware developments.
Return to Publications page