SmartCIF: a context-aware multi-agent system for automated preprocessing and curation of MOF CIFs
Abstract
Computational screening of metal–organic frameworks (MOFs) relies on crystallographic inputs that are commonly treated as “computation-ready”. In practice, however, conventional CIF preprocessing often applies fixed-parameter treatments, overlooking the structural details described in the original reports. To address this, we introduce SmartCIF, a context-aware literature-integrated framework that redefines CIF preprocessing as an explicit assumption-driven procedure. SmartCIF couples topology-based structural analysis with natural-language reasoning over the original publications to make chemically informed decisions about retaining or removing all kinds of CIF parts according to the user's computational objectives. Benchmarking against reported BET surface areas for 65 MOFs and reported CO2/N2 adsorption data comprising 321 data points demonstrates that SmartCIF can reconciles geometric accessibility according to the original publications and request, avoiding both pore-blocking and over-opened nonphysical results based on the original publications. These results establish that CIF preprocessing is inherently application-dependent and that treating preprocessing assumptions as explicit, controllable variables is essential for reproducible, interpretable high-throughput screening. This assumption-aware paradigm embodied by SmartCIF generalizes existing computation-ready resources and provides a flexible foundation for large-scale simulations beyond adsorption.

Please wait while we load your content...