Modular Software for Generating and Modeling Diverse Polymer Databases

A Santana-Bonilla and R Lopez-Rios de Castro and PK Sun and RM Ziolek and CD Lorenz, JOURNAL OF CHEMICAL INFORMATION AND MODELING, 63, 3761-3771 (2023).

DOI: 10.1021/acs.jcim.3c00081

Machine learning methods offer the opportunity to designnew functionalmaterials on an unprecedented scale; however, building the large,diverse databases of molecules on which to train such methods remainsa daunting task. Automated computational chemistry modeling workflowsare therefore becoming essential tools in this data-driven hunt fornew materials with novel properties, since they offer a means by whichto create and curate molecular databases without requiring significantlevels of user input. This ensures that well-founded concerns regardingdata provenance, reproducibility, and replicability are mitigated.We have developed a versatile and flexible software package, PySoftK(Python Soft Matter at King's College London) that providesflexible, automated computational workflows to create, model, andcurate libraries of polymers with minimal user intervention. PySoftKis available as an efficient, fully tested, and easily installablePython package. Key features of the software include the wide rangeof different polymer topologies that can be automatically generatedand its fully parallelized library generation tools. It is anticipatedthat PySoftK will support the generation, modeling, and curation oflarge polymer libraries to support functional materials discoveryin the nanotechnology and biotechnology arenas.

Return to Publications page