Ensuring Structural Fidelity of Generated Polymers with PoGE
Abstract
The computational design of polymers with targeted macroscopic properties is frequently limited by the tendency of generative models to produce structures that deviate from experimental reality. Standard evaluation metrics, originally derived for small drug-like molecules, often do not adequately capture the physicochemical constraints unique to polymeric materials. To address this, we introduce PoGE (Polymer Generation and Evaluation), a transformer-based framework trained on a hybrid corpus of generated and experimentally validated linear homopolymer representations. We propose a physics-driven evaluation suite that utilizes Wasserstein distance to quantify the alignment between generated and experimental descriptors distributions, replacing traditional fragment-based scoring. PoGE demonstrates superior fidelity to the attributes of real-world linear homopolymers – specifically regarding molecular weight, topological polar surface area, and chain flexibility – when compared to existing recurrent neural network and conditional generation baselines. Notably, this alignment is achieved through unconditional sequence modeling, indicating that the architecture implicitly captures the complex structural rules without requiring explicit descriptor conditioning. By providing a rigorously validated pre-training corpus and a physics-informed benchmarking framework, this work establishes a foundational benchmarking framework for generating structurally consistent linear homopolymers, serving as a prerequisite for downstream inverse design pipelines.
Please wait while we load your content...