Integrating Equivariant Architectures and Charge Supervision for Data-Efficient Molecular Property Prediction
Abstract
Understanding and predicting molecular properties remains a central challenge in scientific machine learning, especially when training data are limited or task-specific supervision is scarce. We introduce the Molecular Equivariant Transformer (MET), a symmetry-aware pretraining framework that leverages quantum-derived atomic charge distributions to guide molecular representation learning. MET combines an Equivariant Graph Neural Network (EGNN) with a Transformer architecture to extract physically meaningful features from three-dimensional molecular geometries. Unlike previous models that rely purely on structural inputs or handcrafted descriptors, MET is pretrained to predict atomic partial charges, which are quantities grounded in quantum chemistry. This enables MET to capture essential electronic information without requiring downstream labels. We show that this pretraining scheme improves performance across diverse molecular property prediction tasks, particularly in low-data regimes. Analyses of the learned representations reveal chemically interpretable structure-property relationships, including the emergence of functional group patterns and smooth alignment with molecular dipoles. Ablation studies confirm that the EGNN encoder plays a crucial role in capturing transferable spatial features, while the Transformer layers adapt these features to specific prediction tasks. This architecture draws direct analogies to quantum mechanical basis transformations, where MET learns to transition from coordinate-based to electronbased representations in a symmetry-preserving manner. By integrating domain knowledge with modern deep learning techniques, MET offers a unified and interpretable framework for data-efficient molecular modeling, with broad applications in computational chemistry, drug discovery, and materials science.
Please wait while we load your content...