POLARIS: perovskite optimization using LLM-assisted refinement and intelligent screening

Abstract

We present a comprehensive and reproducible pipeline that unites literature mining, molecular graph generation, and uncertainty-aware predictive modeling to accelerate the design of organic spacer cations for two-dimensional (2D) halide perovskites (HPs). Despite the critical influence of spacer chemistry on phase stability, excitonic behavior, transport properties and environmental robustness, the chemical space of HPs remains underexplored due to inconsistent reporting and limited structured datasets. To overcome this, we curated a diverse set of 200 experimental papers from various publishers and research groups into Google's NotebookLM powered by Gemini, utilizing its retrieval-augmented generation (RAG) framework to extract synthesis-relevant metadata with high accuracy and reproducibility. To ensure data quality and consistency, we limited our selection to papers published in peer-reviewed journals with an impact factor above 10, focusing on studies with well-documented experimental protocols. Benchmarking against five other large language models (LLMs) confirmed NotebookLM's superior stability and minimal hallucination rate, making it ideal for hypothesis-driven data curation. From extracted IUPAC names, we constructed SMILES representations and augmented the dataset with over 10 000 ammonium-containing molecules from QM9. These were converted into graph-based molecular embeddings and used to train a multitask graph neural network coupled with a Gaussian process (GNN–GP) backend to predict optoelectronic and structural properties with uncertainty quantification. The latent space clustering of the learned embeddings revealed chemically interpretable families of spacer candidates, which we cross-validated against ChatGPT-generated design heuristics. The convergence between unsupervised clustering and transformer-derived guidance highlights the power of combining LLMs with active learning to generate, test, and refine design hypotheses in underexplored chemical domains. This study demonstrates how fragmented literature can be transformed into actionable, structure–property insights through a tightly integrated informatics pipeline available to a broad experimental community, and demonstrates the value of open repositories that can be mined for information. Our approach lays the foundation for closed-loop, autonomous materials discovery and design and provides a scalable strategy for targeted development of next-generation HP optoelectronics.

Graphical abstract: POLARIS: perovskite optimization using LLM-assisted refinement and intelligent screening

Supplementary files

Article information

Article type
Paper
Submitted
20 Aug 2025
Accepted
12 Mar 2026
First published
30 Mar 2026
This article is Open Access
Creative Commons BY-NC license

Digital Discovery, 2026, Advance Article

POLARIS: perovskite optimization using LLM-assisted refinement and intelligent screening

J. Marshall, S. L. Sanchez, R. Desai, E. Foadian, U. Pratiush, A. Mannodi-Kanakkithodi, S. V. Kalinin and M. Ahmadi, Digital Discovery, 2026, Advance Article , DOI: 10.1039/D5DD00378D

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements