Behavior of Linear and Nonlinear Dimensionality Reduction for Collective Variable Identification of Small Molecule Solution-Phase Reactions
HM Le and S Kumar and N May and E Martinez-Baez and R Sundararaman and B Krishnamoorthy and AE Clark, JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 18, 1286-1296 (2022).
DOI: 10.1021/acs.jctc.1c00983
Identifying collective variables (CVs) for chemical reactions is essential to reduce the 3N-dimensional energy landscape into lower dimensional basins and barriers of interest. However, in condensed phase processes, the nonmeaningful motions of bulk solvent often overpower the ability of dimensionality reduction methods to identify correlated motions that underpin collective variables. Yet solvent can play important indirect or direct roles in reactivity, and much can be lost through treatments that remove or dampen solvent motion. This has been amply demonstrated within principal component analysis (PCA), although less is known about the behavior of nonlinear dimensionality reduction methods, e.g., uniform manifold approximation and projection (UMAP), that have become recently utilized. The latter presents an interesting alternative to linear methods though often at the expense of interpretability. This work presents distance-attenuated projection methods of atomic coordinates that facilitate the application of both PCA and UMAP to identify collective variables in the presence of explicit solvent and further the specific identity of solvent molecules that participate in chemical reactions. The performance of both methods is examined in detail for two reactions where the explicit solvent plays very different roles within the collective variables. When applied to raw molecular dynamics data in solution, both PCA and UMAP representations are dominated by bulk solvent motions. On the other hand, when applied to data preprocessed by our attenuated projection methods, both PCA and UMAP identify the appropriate collective variables (though varying sensitivity is observed due to the presence of explicit solvent that results from the projection method). Importantly, this approach allows identification of specific solvent molecules that are relevant to the CVs and their importance.
Return to Publications page