Predicting Molecular Energy from QTAIM-derived Multi-granularity Descriptors using Graph Neural Networks and Transformers

Co-Supervised by: Dr. Linlin Jia

If you are interested in this topic or have further questions, do not hesitate to contact linlin.jia@unibe.ch.

Background / Context

Molecular energy plays a key role in numerous applications across various fields such as computational chemistry, biology, and materials science. With the rapid advancement of machine learning, data-driven methods have emerged as powerful tools for predicting molecular energy, offering significant potential in terms of time efficiency, resource optimization, and accuracy.

This project aims to leverage graph neural networks (GNNs) and transformers for molecular energy prediction. The multi-granularity energy values, derived from the Quantum Theory of Atoms-in-Molecules (QTAIM), along with additional information such as nuclide species and atomic geometrical coordinates, will be adapted to create novel descriptors for graph structures. Benefiting from the flexibility and expressiveness of graph- and attention-based learning methods and the precision and informativeness of quantum properties, this project seeks to develop cutting-edge architectures tailored to molecular energy prediction, pushing the boundaries of prediction accuracy, time and resource efficiency, and explainability.

Research Question(s) / Goals

In this project, we would like to answer the following research questions:

Q1: Can multi-granularity energy descriptors derived from QTAIM improve the performance of graph-based machine learning models in energy prediction?
Q2: Can graph- and attention-based approaches improve time and resource efficiency given the aforementioned descriptors?
Q3: How valuable are the proposed approaches against the state-of-the-art ones?

To answer these questions, we expect to achieve the following goals:

G1: Design and extract QTAIM-based multi-granularity descriptors of high relevance to the questions.
G2: Develop novel graph- and / or attention-based models for the designed descriptors.
G3: Design loss functions able to fuse the multi-granularity information.
G4: Evaluate and compare the performance of the proposed methods against the baselines. Reporting metrics such as energy MAE.
G5: Communicate with chemists for further understanding and evaluation of the models from chemistry point of view.

Approach / Methods

Analyze the data provided via chemists, extracting relevant information from raw datasets to design and construct innovative descriptors.
Survey the state-of-the-art graph-based machine learning that are relevant to molecular energy prediction
Develop novel multi-scale GNN models and corresponding loss functions capable of handling multi-granularity energy descriptors, drawing inspiration from modern architectures such as AIMNet, PhysNet and 3d-equivariant GNNs.
Collaborate with chemists to evaluate the models, validating the outputs and ensuring scientific relevance.
Benchmark the designed descriptors and models against baselines, such as Coulomb matrix and vector-based approaches, respectively.

Expected Contributions / Outcomes

Novel QTAIM-based multi-granularity descriptors for molecular energy prediction.
A novel graph- and attention-based framework for molecular energy prediction that fuses the multi-granularity QTAIM energy information.
A thorough experimental evaluation to access the proposal method, in terms of prediction accuracy, computational and resource efficiency, ablation studies, and benchmarking against state-of-the-art approaches.
Reproducible code, model checkpoints, experimental settings, and results.
A thesis and potentially a research paper submission.

Required Skills / Prerequisites

Solid knowledge of statistics and machine learning, with an understanding of deep learning and graph neural networks being a plus.
Strong programming skills, preferably in Python and PyTorch.
Basic relevant chemical knowledge (e.g., molecular structures, quantum energy decomposition) or willingness to learn.
Proficiency in English for effective communication and presentations at the research level.

Possible Extensions

Benchmark SOTA transformer and LLM-based methods against the proposed architecture from two aspects: first, examine if these methods are practical in terms of prediction performance, efficiency, and explainability for the structure-energy mapping; second, explore the possibility of integrating ChemLLMs, such as those based on SMILES strings, with the proposed QTAIM-derived descriptors.
Explore more efficient approaches by using QTAIM-informed training to instill physical priors, without introducing costly QTAIM descriptors during inference.

Further Reading / Starting Literature

Tognetti, V., & Joubert, L. (2024). Exchange‐correlation effects in interatomic energies for pure density functionals and their application to the molecular energy prediction. Journal of Computational Chemistry, 45(27), 2270-2283.
Zubatyuk, R., Smith, J. S., Leszczynski, J., & Isayev, O. (2019). Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network. Science advances, 5(8), eaav6490
Unke, O. T., & Meuwly, M. (2019). PhysNet: A neural network for predicting energies, forces, dipole moments, and partial charges. Journal of chemical theory and computation, 15(6), 3678-3693.
Batzner, S., Musaelian, A., Sun, L., Geiger, M., Mailoa, J. P., Kornbluth, M., … & Kozinsky, B. (2022). E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications, 13(1), 2453.