Potential energy surfaces (PESs) represent the relationship between the positions of atoms or molecules and their associated potential energy. PESs are essential in understanding molecular behavior, chemical reactions, and material properties. They describe how the potential energy of a system changes as the positions of its constituent atoms or molecules vary. These surfaces are often high-dimensional and complex, making their accurate computation challenging, especially for large molecules or systems.
The reliability of the machine learning ML model still heavily depends on the diversity of the training data, especially for chemically reactive systems that must visit high-energy states when undergoing chemical transformations. ML models, by their nature, interpolate between known training data. Still, its extrapolation capability is limited as predictions can be unreliable when molecules or their configurations are dissimilar to those in the training set.
Formulating a balanced and diverse dataset for a given reactive system is challenging. It is common for the ML model to still suffer from an overfitting problem that can lead to models with good accuracy on their original test set but are error-prone when applied to MD simulations, especially for gas phase chemical reactivity in which energy configurations are highly diverse.
Researchers at the University of California, Lawrence Berkeley National Laboratory, and Penn State University have built an active learning AL workflow that expands the originally formulated Hydrogen combustion dataset by preparing collective variables (CVs) for the first systematic sample. Their work reflects that a negative design data acquisition strategy is necessary to create a more complete ML model of the PES.
Following this active learning strategy, they were able to achieve a final hydrogen combustion ML model that is more diverse and balanced. The ML models recover accurate forces to continue the trajectory without further retraining. They could predict the change in the transition state and reaction mechanism at finite temperature and pressure for hydrogen combustion.
The team has illustrated the active learning approach on Rxn18 as an example in which the potential energy surface is projected onto two reaction coordinates, CN(O2-O5) and CN(O5-H4). The ML model performance was tracked by analyzing the original data points derived from AIMD and normal modes calculations. They used longer metadynamics simulations for sampling as the active learning rounds proceeded and errors decreased.
They found metadynamics to be an efficient sampling tool for unstable structures, which helps the AL workflow identify holes in the PES landscape to inform the ML model through retraining with such data. Using metadynamics only as a sampling tool, the tricky CV selection step can be avoided by starting with reasonable or intuitive CVs. Their future work also includes analyzing alternate approaches like delta learning and working on more physical models like C-GeM.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Arshad is an intern at MarktechPost. He is currently pursuing his Int. MSc Physics from the Indian Institute of Technology Kharagpur. Understanding things to the fundamental level leads to new discoveries which lead to advancement in technology. He is passionate about understanding the nature fundamentally with the help of tools like mathematical models, ML models and AI.