Machine Learning

High-throughput Simulation and Machine Learning Aided Molecular Discovery

To enable computer-aided chemical discovery, one needs to explore the vast chemical space efficiently. Data-driven models, including the emerging machine learning techniques, provide faster-than-fast tools for traversing the chemical space. However, the accuracy of these models relies on the quality of first-principle calculation datasets used for model training. Therefore, we need automated workflows to enable high-throughput first-principle simulation of numerous systems at an appropriate level of theories. These workflows will be applied to the discovery of molecules with catalytic or photophysical functionalities.

Automated workflow for explicit-solvent-model calculations

Despite the essential roles of solvents in chemistry, the rapid computational data set generation of solution-phase molecular properties at the quantum mechanical level of theory was previously hampered by the complicated simulation procedure. Software toolkits that can automate the procedure to set up high-throughput explicit-solvent quantum chemistry (QC) calculations for arbitrary solutes and solvents in an open-source framework are still lacking. We developed AutoSolvate, an open-source toolkit to streamline the workflow for QC calculation of explicitly solvated molecules.

Related Publications

Eugen Hruska,PD Ariel Gale, Xiao Huang,U Fang Liu*, “AutoSolvate: A Toolkit for Automating Quantum Chemistry Design and Discovery of Solvated Molecules” J. Chem. Phys. 156, (2022): 124801

Automated workflow for reaction kinetics study in reaction networks

Transition state (TS) search is crucial for understanding reaction kinetics, but computationally challenging because the result highly depends on the conformers of the reactants and products. I have developed an automated toolkit, AutoNEB, for studies of reaction pathways in complicated reaction networks involving multiple conformers. This workflow avoids the typical biases in TS search, and automatically post-processes simulation results to discover the most favorable path. This toolkit has been applied to the reaction network study of gas-phase Pomeranz-Fritsch synthesis of isoquinoline.

Related Publications

S. Banerjee,§ F. Liu,§ D.M. Sanchez, T. J. Martínez, and R. N. Zare, Pomeranz-Fritsch Synthesis of Isoquinoline: Gas-Phase Collisional Activation Opens Additional Reaction Pathways, J. Am. Chem. Soc. 139, 14352 (2017) [ §These two authors contribute equally]

L.-P. Wang, A. Titov, R. McGibbon, F. Liu, V. S. Pande, and T. J. Martínez. “Discovering chemistry with an ab initio nanoreactor.” Nature chemistry 6 (2014)

Machine learning aided workflow for method selection of transition metal complexes

A crucial step to accurately model transition metal chemistry is to choose between single- and multi-reference based methods. As a MolSSI software fellow, I developed a Python API to calculate widely used multi-reference (MR) character diagnostics for any given molecule. With this tool, I curated a dataset for the MR diagnostics of thousands of octahedral complexes. I have built machine-learning models based on this dataset to predict MR character without first-principle calculation. This will enable automated method selection for modeling transition-metal complexes.

Related Software (Website) (Documentation)