Open Access Article
This Open Access Article is licensed under a
Creative Commons Attribution 3.0 Unported Licence

Tackling the reproducibility gap in perovskite research: a vision for FAIR data and standardised protocols

Eva Ungerab and T. Jesper Jacobsson*c
aHySPRINT Innovation Lab: Hybrid Materials Formation and Scaling, Helmholtz-Zentrum Berlin für Materialien und Energie GmbH, Kekuléstraße 5, 12489, Berlin, Germany
bDivision of Chemical Physics and Nano Lund, Lund University, Box 124, 22100 Lund, Sweden
cDepartment of Physics, Chemistry and Biology (IFM), Linköping University, Linköping, Sweden. E-mail: jacobsson.jesper.work@gmail.com; jesper.jacobsson@liu.se

Received 1st October 2025 , Accepted 29th October 2025

First published on 29th October 2025


Abstract

Reproducing literature results for perovskite-based optoelectronic devices is often surprisingly difficult. We argue that a major reason for this problem is insufficient data dissemination, and that this could be mitigated by a collective effort to develop and adopt standardised data-sharing protocols for experimental procedures.



Broader context

Metal–halide perovskites have become leading candidates for next-generation optoelectronics. A persistent challenge in achieving this has been reproducibility. Devices reported under seemingly identical conditions often yield widely varying results, raising concerns about reliability and slowing progress toward commercialisation. This problem is partly rooted in insufficient standards for data reporting, where critical fabrication and characterization details are often omitted or ambiguously described. Adopting FAIR (Findable, Accessible, Interoperable, Reusable) principles and machine-readable data protocols offers a practical path forward. Structured device descriptions would enable reproducibility across laboratories, support large-scale data aggregation, and unlock the potential of machine learning and autonomous experimentation. In this perspective, we present a vision and a path forward for how this could be implemented in the perovskite field. While focusing on perovskites, the problem, as well as the vision for how to solve it, extends to other experimental material science fields as well.

Over the past 15 years, hybrid perovskites have emerged as an exciting class of materials for a variety of optoelectronic applications, such as solar cells,1 light-emitting diodes,2 lasers,3 and X-ray detectors.4 The overarching technological goal of perovskite optoelectronics is to produce devices that combine high efficiency, long-term operational stability, and cost-effectiveness at production scale. A major and enduring challenge to achieving this is a large variability in device performance – even when seemingly identical production protocols are followed.5 Fig. 1a illustrates this challenge for one of the most common solar cell device architectures (SLG|FTO|TiO2-c|TiO2-mp|perovskite|spiro-MeOTAD|Au) using a MAPbI3 perovskite spin-coated from a DMF:DMSO solution with an antisolvent treatment. The data, extracted from the Perovskite Database and based on 164 papers published between 2015 and 2020,6,7 show device efficiencies ranging from near zero to over 20%, despite representing very similar devices. Additional available metadata – such as annealing temperatures and spiro-MeOTAD doping levels – can explain some of the variability but can hardly account for all of it. Even control devices manufactured under supposedly identical conditions in the same lab routinely exhibit substantial performance deviations (Fig. 1b).8 Given that a 0.5% absolute improvement often constitutes a notable result in the literature, this raises critical questions concerning the trustworthiness of reported results. Anyone who has tried to replicate published perovskite fabrication protocols will find these concerns all too familiar.
image file: d5el00163c-f1.tif
Fig. 1 Illustration of sample variance. (a) Solar cell efficiency for 625 standard devices from 164 papers extracted from the Perovskite Database. (b) Sample performance of 8000 standard devices made within the same lab. Reproduced with permission from Chem. Mater. 2018, 30, 4193–4201.

Multiple factors underpin this variability. We argue that a core issue lies in inadequate standards for reporting device data. We further propose that by improving data management practices – through standardisation and accessible dissemination protocols – we could reduce performance variability, boost reproducibility, strengthen trust in published findings, and ultimately accelerate technological progress. While our discussion centers on perovskite solar cells, the same principles and practices are broadly applicable to other material systems and applications.

A closer look at the problem

If a device recipe is described in sufficient detail, one might expect that two researchers following it to the letter would obtain identical results. In practice, however, this is often not the case. This discrepancy can partly be attributed to hidden variables – some anticipated, others entirely unknown.5 Examples of such parameters known to be significant include the precise timing and dispense speed during spin-coating.9 However, as spin-coating is often done manually, such parameters are challenging to track, which is why they often are omitted from reports and thus become hidden variables. It is inevitable that inconsistencies introduced by manual steps will cause some cell-to-cell, batch-to-batch, operator-to-operator, and lab-to-lab variability. However, much of the observed variance, particularly the lab-to-lab variance, could likely be mitigated – or at least explained – through better standards and more comprehensive reporting of experimental protocols.

The standard medium for describing experiments is the methods section of journal articles. One of the insights from the Perovskite Database Project,6,7 where we extracted data on over 42[thin space (1/6-em)]000 solar cell devices from 7370 publications, was that as a community, we struggle to write experimental sections with sufficient detail and clarity. Reproducibility is favoured by exhaustive technical detail, structure, rigor, repetition, and completeness. However, those are qualities often at odds with good narrative prose, and scientific journals are primarily vehicles for storytelling, which favours a concise, fluid, and engaging language. This tension between crafting an engaging narrative and reporting exhaustive technical detail frequently leads to gaps in the information needed to replicate a study. For example, effective storytelling tends to focus on novelty and highlights the differences between control devices and the improved devices that are central to the study, while omitting shared or routine details crucial for external reproducibility and cross-study comparisons. Even when experimental protocols are reasonably detailed and well described, challenges often arise when multiple device types are described. In such cases, compactness often takes precedence over clarity, leaving readers to piece together which conditions apply to which device in a way reminiscent of solving a detective mystery. Moreover, there are also many instances where parameters critical to the results have simply been omitted.10

These unsatisfactory standards in data dissemination are partly a matter of cultural norms and practices. Writing a clear, precise, and detailed experimental protocol is hard work. It requires time, effort, and careful thought – effort that may not be undertaken without proper incentives. Setting better standards will therefore require both proper tools and strong community norms, as well as explicit pressure from publishers, reviewers, and funders.

The vision

We believe that more comprehensive and standardised device descriptions are not only desirable, but entirely achievable. In an ideal future, device recipes would be so robust that, when followed by different laboratories, they consistently yield the same results, i.e. there would be a one-to-one mapping between procedures and results, accounting for experimental noise. Whenever discrepancies do arise, they could then be traceable to identifiable variations in execution, enabling systematic improvements in device fabrication protocols. Ultimately, these device descriptions should be systematically structured and based on common ontologies that allow for seamless transfer not only between laboratories but also to automated lab equipment. Given the specified raw materials, a lab robot should then be able to reproduce the result with minimal human oversight.

A practical path toward this vision would involve assigning each device a detailed protocol in a structured file format – JSON (JavaScript Object Notation) is a strong candidate being an open standard widely adopted for data storage, transmission, and dissemination through name-value pairs.11 Each device would thus have a dedicated file containing all relevant parameters necessary for reproducibility, as well as data and metadata from any measurements performed on that device. We envision a modular, hierarchical approach to describing perovskite devices, relying on reusable data blocks that capture every aspect of their fabrication and characterization (Fig. 2).


image file: d5el00163c-f2.tif
Fig. 2 Illustration of the vision. (a) Every device would get its own datafile, with everything worth knowing about the device. (b) We suggest those datafiles to have a hierarchical and modular structure. (c) Ideally every device datafile should be stored in public online repositories, from which either: (d) devices and device properties can be reproduced, or (e) statistical methods can be applied to gain new insights.

In this scenario, every published paper would include such a data file for each device discussed. Ideally, these files would also be deposited in a persistent online repository, making them citable, referable, and programmatically accessible (Fig. 2). This strategy aligns with the FAIR data principles (i.e. Findable, Accessible, Interoperable, Reusable),12,13 and would enhance the quality, value, and accountability of perovskite research. It would also create an aggregated resource that can be used to track research progress and for extracting additional insights using statistical methods such as machine learning.

Towards the solution

Realising this vision requires several key elements. First and foremost, someone must take the initiative to design protocols and standards that enable structured descriptions of device data. These protocols must account for material and process parameters with a high level of detail and precision, while also being user-friendly, flexible, and easy to adapt to accommodate the changing needs of a rapidly advancing research field.

As a pilot project in this direction, we have developed a hierarchical, modular JSON schema for describing perovskite compositions which we have described in more detail in a recent publication.14 Complementary online tools, hosted on the NOMAD platform,13,15 enable simple generation of perovskite composition files and provide persistent, citable, and programmatically accessible online storage. This provides a clear example of how a standardised protocol can be designed and made available online together with functionality for generating and storing structured descriptions in an online repository.

Building on insights from this pilot, we are now developing protocols and utilities for a more complete description of both single-junction and tandem perovskite solar cells. Creating data schemas that are sufficiently precise and complete to allow for consistent reproducibility across laboratories is, by necessity, an iterative process. We have recruited numerous volunteers to assist with this effort. To identify critical process and material parameters essential for reproducible results, we are also conducting round-robin experiments in which device recipes are cross-validated across multiple labs.

A cultural shift is likewise needed to realise this vision. A critical mass of researchers must recognise the value of sharing structured device data in accordance with FAIR principles. Without widespread buy-in, even the best tools and protocols will fail to gain traction. Encouragingly, there are signs that such a shift is underway. Growing interest in artificial intelligence – spurred by tools like ChatGPT – has increased awareness of AI's potential, as well as the need for abundant, well-structured data to power it. This awareness may help catalyse a broader adoption of FAIR data-sharing practices.

Nevertheless, while the availability of standards, user-friendly online utilities, successful implementation examples, and demonstrated benefits may encourage many to adopt this vision, it will not be enough for everyone. If only a few groups adopt good standards, the quality of their papers will improve, but the broader field will not benefit very much. This is a classic network effect. The overall benefits rise dramatically as more groups comply with the same improved reporting standards. Some will see only the additional work involved and conclude that they will themselves not reap any immediate benefits. This is essentially the “tragedy of the commons”, where the collective good is undermined by individual reluctance to contribute.16 To achieve widespread compliance, appropriate incentives must therefore be established.

Journals, publishers, and editors are uniquely positioned to create and enforce such incentives, both because they are few in number and because they act as the gatekeepers of peer-reviewed research. By encouraging the use of proper standards and promoting online data repositories that serve the needs of their scientific communities, publishers can have a profound impact. Once proper standards and infrastructure have been established and accepted, they can go on and mandate that device data must be reported in accordance with best practices. Such policies would create the pressure needed to ensure compliance and help drive the cultural shift required for a more valuable representation of perovskite devices.

To convince a community to buy into this vision will likely require a combination of defined standards, reasonably user-friendly procedures for using them, and demonstration of utility, paired with pressures from publishers and funding agencies.

New and better reporting standards are now being developed. We hope that the community will adopt and continue to refine these standards, and that publishers will help to enforce them. A better, FAIRer, and more reproducible future is possible – at least if we are willing to invest the necessary effort, embrace the required changes, and work collectively to enforce them.

Autor contributions

T. J. J. came up with the idea and has been responsible for writing. E. U. has contributed to discussions and writing.

Conflicts of interest

The authors declare no competing financial interest.

Data availability

The data in Fig. 1a is downloaded from the Perovskite Database. The data, as well as the code use for filtering the data and making the figure is available at https://github.com/Jesperkemist/Perovskite_data_vision.

Acknowledgements

T. J. J. would like to acknowledge Åforsk (Grant No. 23-629), the Swedish Government Strategic Research Area in Materials Science on Advanced Functional Materials at Linköping University (Faculty Grant SFO-Mat-LiU No. 2009-00971), and Carl Tryggers Stiftelse (Grant No. CTS 24: 3375). ChatGPT 4o has been used as a writing assistant to improve grammar, suggest alternative phrasings, and to improve the flow of the text. However, the text, both initial and final, is the product of the authors' own work, mind, body, and soul.

References

  1. O. Almora, G. C. Bazan, C. I. Cabrera, L. A. Castriotta, S. Erten-Ela, K. Forberich, K. Fukuda, F. Guo, J. Hauch, A. W. Y. Ho-Baillie, T. J. Jacobsson, R. A. J. Janssen, T. Kirchartz, R. R. Lunt, X. Mathew, D. B. Mitzi, M. K. Nazeeruddin, J. Nelson, A. F. Nogueira, U. W. Paetzold, B. P. Rand, U. Rau, T. Someya, C. Sprau, L. Vaillant-Roca and C. J. Brabec, Adv. Energy Mater., 2024, 15(12), 2404386 CrossRef.
  2. A. Fakharuddin, M. K. Gangishetty, M. Abdi-Jalebi, S.-H. Chin, A. R. bin Mohd Yusoff, D. N. Congreve, W. Tress, F. Deschler, M. Vasilopoulou and H. J. Bolink, Nat. Electron., 2022, 5, 203–216 CrossRef CAS.
  3. J. Moon, Y. Mehta, K. Gundogdu, F. So and Q. Gu, Adv. Mater., 2024, 36, 2211284 CrossRef CAS.
  4. Y. Wu, J. Feng, Z. Yang, Y. Liu and S. Liu, Adv. Sci., 2023, 10, 2205536 CrossRef CAS.
  5. K. P. Goetz and Y. Vaynzof, ACS Energy Lett., 2022, 7, 1750–1757 CrossRef CAS.
  6. T. J. Jacobsson, A. Hultqvist, A. García-Fernández, A. Anand, A. Al-Ashouri, A. Hagfeldt, A. Crovetto, A. Abate, A. G. Ricciardulli, A. Vijayan, A. Kulkarni, A. Y. Anderson, B. P. Darwich, B. Yang, B. L. Coles, C. A. R. Perini, C. Rehermann, D. Ramirez, D. Fairen-Jimenez, D. Di Girolamo, D. Jia, E. Avila, E. J. Juarez-Perez, F. Baumann, F. Mathies, G. S. A. González, G. Boschloo, G. Nasti, G. Paramasivam, G. Martínez-Denegri, H. Näsström, H. Michaels, H. Köbler, H. Wu, I. Benesperi, M. I. Dar, I. Bayrak Pehlivan, I. E. Gould, J. N. Vagott, J. Dagar, J. Kettle, J. Yang, J. Li, J. A. Smith, J. Pascual, J. J. Jerónimo-Rendón, J. F. Montoya, J.-P. Correa-Baena, J. Qiu, J. Wang, K. Sveinbjörnsson, K. Hirselandt, K. Dey, K. Frohna, L. Mathies, L. A. Castriotta, M. H. Aldamasy, M. Vasquez-Montoya, M. A. Ruiz-Preciado, M. A. Flatken, M. V. Khenkin, M. Grischek, M. Kedia, M. Saliba, M. Anaya, M. Veldhoen, N. Arora, O. Shargaieva, O. Maus, O. S. Game, O. Yudilevich, P. Fassl, Q. Zhou, R. Betancur, R. Munir, R. Patidar, S. D. Stranks, S. Alam, S. Kar, T. Unold, T. Abzieher, T. Edvinsson, T. W. David, U. W. Paetzold, W. Zia, W. Fu, W. Zuo, V. R. F. Schröder, W. Tress, X. Zhang, Y.-H. Chiang, Z. Iqbal, Z. Xie and E. Unger, Nat. Energy, 2022, 7, 107–115 CrossRef CAS.
  7. E. Unger and T. J. Jacobsson, ACS Energy Lett., 2022, 7, 1240–1245 CrossRef CAS.
  8. M. Saliba, J.-P. Correa-Baena, C. M. Wolff, M. Stolterfoht, N. Phung, S. Albrecht, D. Neher and A. Abate, Chem. Mater., 2018, 30, 4193–4201 CrossRef CAS.
  9. K. Wang, M. C. Tang, H. X. Dang, R. Munir, D. Barrit, M. De Bastiani, E. Aydin, D. M. Smilgies, S. De Wolf and A. Amassian, Adv. Mater., 2019, 31, 1808357 CrossRef.
  10. A. Tiihonen, K. Miettunen, J. Halme, S. Lepikko, A. Poskela and P. D. Lund, Energy Environ. Sci., 2018, 11, 730–738 RSC.
  11. F. Pezoa, J. L. Reutter, F. Suarez, M. Ugarte and D. Vrgoč, presented in part at the Proceedings of the 25th International Conference on World Wide Web, Montréal, Québec, Canada, 2016 Search PubMed.
  12. M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos and P. E. Bourne, Sci. Data, 2016, 3, 1–9 Search PubMed.
  13. C. Draxl and M. Scheffler, MRS Bull., 2018, 43, 676–682 CrossRef.
  14. A. Maqsood, H. Näsström, C. Chen, L. Qiutong, J. Luo, R. Chakraborty, V. Blum, E. Unger, C. Draxl, J. A. Márquez and T. J. Jacobsson, Nat. Commun., 2025, 16, 8725 CrossRef CAS PubMed.
  15. M. Scheidgen, L. Himanen, A. N. Ladines, D. Sikter, M. Nakhaee, Á. Fekete, T. Chang, A. Golparvar, J. A. Márquez and S. Brockhauser, J. Open Source Softw., 2023, 8, 5388 CrossRef.
  16. G. Hardin, science, 1968, 162, 1243–1248 CrossRef CAS PubMed.

This journal is © The Royal Society of Chemistry 2025
Click here to see how this site uses Cookies. View our privacy policy here.