Can large language models predict the hydrophobicity of metal–organic frameworks?†
Abstract
Recent advances in large language models (LLMs) offer a transformative paradigm for data-driven materials discovery. Herein, we exploit the potential of LLMs in predicting the hydrophobicity of metal–organic frameworks (MOFs). By fine-tuning the state-of-the-art Gemini-1.5 model exclusively on the chemical language of MOFs, we demonstrate its capacity to deliver weighted accuracies that surpass those of traditional machine learning approaches based on sophisticated descriptors. To further interpret the chemical “understanding” embedded within the Gemini model, we conduct systematic moiety masking experiments, where our fine-tuned Gemini model consistently retains robust predictive performance even with partial information loss. Finally, we show the practical applicability of the Gemini model via a blind test on solvent- and ion-containing MOFs. The results illustrate that Gemini, combined with lightweight fine-tuning on chemically annotated texts, can serve as a powerful tool for rapidly screening MOFs in pursuit of hydrophobic candidates. Taking a step forward, our work underscores the potential of LLMs in offering robust and data-efficient approaches to accelerate the discovery of functional materials.