Open Access Article
Nik
Reeves-McLaren
*a and
Sarah
Moth-Lund Christensen
b
aSchool of Chemical, Materials and Biological Engineering, University of Sheffield, Mappin Street, Sheffield, S1 3JD, UK. E-mail: nik.reeves-mclaren@sheffield.ac.uk
bCentre for Machine Intelligence, University of Sheffield, Mappin Street, Sheffield S1 3JD, UK
First published on 4th November 2025
Artificial intelligence promises to revolutionise materials discovery through accelerated prediction and optimisation, yet this transformation brings critical data integrity challenges that threaten the scientific record. Recent studies demonstrate that experts cannot reliably distinguish AI-generated microscopy images from authentic experimental data, while widespread errors plague 20–30% of materials characterisation analyses. Generative AI tools can now produce code for data manipulation at pace, creating plausible-looking results that violate fundamental physical principles yet evade traditional peer review. These risks are compounded by inherent biases in training datasets that systematically over represent equilibrium-phase oxide systems, and by the “black box” opacity of AI models that challenges scientific accountability and epistemic agency. We propose a multifaceted framework for enhanced research integrity encompassing materials-specific ethical governance, professional standards for AI disclosure and data validation, and modular integrity checklists with technique-specific validation protocols. Critical enablers include mandatory deposition of structured raw instrument files, AI-powered fraud detection systems, and cultivation of critical AI literacy through interdisciplinary education. Without immediate action to address these challenges, the materials science community risks perpetuating errors and biases that will fundamentally undermine AI's transformative potential.
The severity of this threat was recently demonstrated in nanomaterials research, where a survey of 250 scientists found that experts could not reliably distinguish AI-generated microscopy images from authentic experimental data.1 These AI-generated images were created in under one hour using publicly available tools, requiring no specialised technical knowledge. The traditional peer review process, reliant on visual inspection by experts, is no longer sufficient to detect sophisticated image fraud.
These challenges appear at a time when artificial intelligence (AI) is set to reshape materials science, promising rapid discovery of advanced materials by predicting material properties, optimising compositions, and exploring vast chemical design spaces. Emergent examples include the development of alloys with superior mechanical properties,2 the generation of numerous MOF candidates, and advances in battery material discovery.3–5 Developments such as Google DeepMind's GNOME (graph networks for materials exploration) have demonstrated the potential for large-scale materials discovery, identifying 2.2 million stable crystal structures and representing an order-of-magnitude expansion in known stable materials.6 The reliability of such AI models, however, depends entirely on the integrity of their training data.7
High-quality, relevant, and representative data is essential for accurate and effective generalisation. The principle “garbage in, garbage out” is key: if training data are limited or flawed, AI models will be inaccurate.8 Intense debate followed a 2023 publication on an automated lab for rapid synthesis and characterisation of ‘new’ inorganic materials, with critiques of the work focussed around issues of metadata and what constitutes a novel discovery, the quality of automated analyses, and the ability to model the complexities of real materials, such as disorder.9,10
Despite the clear promise of AI, widespread errors and inconsistencies in data, along with fraudulently manipulated or fabricated data threaten research validity in materials characterisation. Generative AI (GenAI) is readily capable of, for example, producing code to manipulate data, and then cover one's tracks, without asking any challenging ethical questions of researchers increasingly under ‘publish or perish’ pressures. Manipulating data to report a new room temperature superconductor would be discredited within hours of publication, but smaller iterative materials developments are more likely to sneak past peer review.
There is an urgent need for research on, and new approaches to, data integrity. Given the broad uptake of generative AI in materials science, and across all disciplines in engineering and the physical sciences, small errors, biases in foundational training data, and outright unethical conduct risk widespread research misdirection.
There are widespread issues with the underutilisation of well-established physical consistency checks in materials science data analyses, compounded by many instances of poor understanding of statistical measures employed to judge the perceived quality of work. Rietveld refinement is a powerful tool for extracting structural information from powder diffraction data, but misinterpretations of statistical measures are common; one example is the effective ‘goodness of fit’ of the diffraction pattern calculated from the refined structure to the experimental data, the reduced chi-squared (χ2). A critical misunderstanding is evident when reports on a refinement quote a χ2 value less than 1.0, statistically problematic as it implies a fit that is “better than ideal”. This can indicate either that the standard uncertainties associated with the observed data are overestimated, or that too many parameters have been introduced, leading to overfitting of the model to noise rather than true physical phenomena. The result is publication of structural parameters that are statistically unreliable or physically meaningless.12,13 Furthermore, many publications fail to report or justify crucial details of the refinement model itself, such as the mathematical function used to model the peak profiles and background, the constraints applied to parameters, or the handling of atomic displacement parameters (ADPs). This frequently leads to the publication of physically nonsensical results, such as negative ADPs, and structural models that are statistically unsound and ultimately irreproducible.14
A further example: despite proven utility for ensuring consistency in dielectric functions and accurate optical and electronic property measurements, methods such as F-sum rules and Kramers–Kronig (K–K) relations are reportedly often overlooked in research on optical materials.7 K–K relations are mathematical constraints linking the real and imaginary components of optical constants, derived from fundamental causality requirements. Violation of these relations – or of F-sum rules, which constrain integrated absorption based on electron density – indicates either measurement errors, incomplete spectral data, or data manipulation. The failure to apply such validation methods leaves optical property claims vulnerable to fabrication, particularly as GenAI tools could generate superficially plausible spectra that nevertheless violate basic physical constraints.
These types of widespread shortcomings highlight potentially severe issues around the reliability of reported materials and their properties, creating a substantial barrier to the development of high-performance advanced materials. Without improvement in data integrity, handling and reporting, we risk these shortcomings becoming fixtures of AI training and validation data sets – in turn undermining the promise of AI in materials science and leaving us instead with unreliable models, and misdirected research.
The arrival of GenAI brings new and complex ethical and scientific problems, at a time when research integrity in materials science is already under pressure. Highly realistic synthetic GenAI data and images can easily be misrepresented as experimental. Real data can be altered to better support scientific hypotheses. This capability poses serious risks to research integrity. Traditional methods for detecting fraud, such as identifying non-random digits, are now obsolete due to GenAI's sophistication, leading to an “arms race” between AI tools for detection and new methods designed to avoid them.15
The growth in use of text produced using GenAI tools such as ChatGPT, Gemini, Claude etc. means that many journals now require authors to declare where these have been used. But what about manipulation or fabrication of raw data? There is much less awareness around this risk. One GenAI tool these authors tested was able to yield reuseable Python code for data manipulation (to remove secondary phase peaks in diffraction data and fill the resulting void with randomly generated believable background, or to manipulate long-term battery testing data to remove noise and glitched cycles) in comfortably less than an hour. This is a challenge for the here and now.
Recent work in nanomaterials characterisation provides sobering empirical evidence of these capabilities. Researchers generated convincing fake atomic force microscopy (AFM), scanning transmission electron microscopy (STEM), and transmission electron microscopy (TEM) images in less than one hour using commercially available generative AI tools.1 When presented to 250 scientists in a blind survey, experts correctly identified real versus AI-generated images only 40–51% of the time for most image pairs – performance indistinguishable from random guessing. For four out of six image pairs tested, statistical analysis (chi-squared test, p > 0.05) showed no significant difference in scientists' ability to identify authentic versus fake images.
Energy materials research offers another good example of how research fields are facing vulnerabilities to AI-assisted data manipulation, in this case due to the complex, multi-parameter nature of electrochemical measurements. Photovoltaic current–voltage characteristics are readily susceptible to algorithmic enhancement, where fill factors could be artificially improved from experimentally observed values of 0.83 to theoretically optimal values approaching 0.89 through subtle modification of series resistance contributions.19 Such manipulations remain within plausible ranges for peer review assessment whilst significantly inflating reported power conversion efficiencies. Electrocatalyst performance data present similar vulnerabilities through fabrication of Tafel slope values. Experimental studies demonstrate that Pt/C catalysts exhibit Tafel slopes varying from 30 mV dec−1 in 0.5 M H2SO4 to 120 mV dec−1 under fuel cell conditions, with additional dependence upon catalyst loading (63 to 211 mV dec−1 across different overpotential ranges for identical materials).20 It is perceivable that GenAI tools could readily generate synthetic data presenting artificially consistent Tafel slopes of 30 mV dec−1 across varied conditions, thereby suggesting superior kinetic performance. Electrochemical impedance spectroscopy measurements in battery and fuel cell research are similarly vulnerable, where complex multi-semicircle Nyquist plots can be algorithmically simplified to eliminate inconvenient high-frequency resistances or low-frequency inductive features associated with side reactions or interface instabilities.21
If a research group was to fraudulently manipulate data and present the discovery of a room-temperature superconductor based on these types of subtle data hacks, they would likely be found out on the same day. For the more iterative work on less transformative materials that makes up much of the publication record – we propose this is currently much more likely to slip through the net unnoticed.
The Retraction Watch database currently runs to almost 60
000 records, but none yet specifically focus on the use of AI for the purposes discussed here. One tangentially related case claimed to track the productivity of “a thousand material scientists” at a large R&D company, reporting a “44% increase in materials discovery” and an “81% productivity increase for top-decile scientists” due to the introduction of a “machine learning material generation tool”. The data included were found to be “suspiciously clean and neat: nearly every sub measure of success gave a clear and statistically significant result”. Following an internal, confidential review, the sole author's institution concluded that the paper “should be withdrawn from public discourse” and requested the paper's withdrawal.22
It may seem unintuitive to think of bias in the context of materials science data, but our scientific track record – the AI's training data – fundamentally overrepresent stable, inorganic, equilibrium-phase systems, particularly oxide based materials, especially relative to amorphous, disordered or highly entropic materials.24 AI models trained on such data struggle to generalise or extrapolate to new, unexplored chemistries or processing conditions, often leading to “hallucinations” or unreliable predictions for novel materials. Generative AI models can accidently learn and then amplify any biases and shortcomings present in their training data.25
For instance, if AI models are predominantly trained on data from materials developed and characterised under specific, well-established (e.g., equilibrium) conditions or for certain applications, they may inadvertently learn to prioritise or ‘hallucinate’ properties that conform to these existing paradigms. This could lead to a biased prediction landscape, where novel materials with unusual or non-equilibrium properties, or those relevant to emerging applications, are systematically overlooked or inaccurately predicted. Such a bias could perpetuate existing research trajectories, effectively ‘stereotyping’ what constitutes a ‘good’ or ‘feasible’ material based on historical data, rather than enabling truly disruptive discoveries.
To reduce these risks and ensure responsible AI deployment, meaningful human control and oversight are essential. This involves actively monitoring AI behaviour and developing plans to prevent harmful effects on users, with human validation being crucial for high-risk decisions.25 Ethical guidelines for trustworthy AI, such as those from the European Commission, outline different levels of human involvement and oversight of AI system activity, including consideration of societal and ethical impacts, and ultimate decision-making.26
The impact of AI on human “epistemic agency” – the control individuals have over their beliefs, the questions they ask, and the reasons they entertain – is also a critical concern. There is an ongoing discussion about whether AI-based science poses a social epistemological problem, particularly concerning trust in opaque models and the responsibility of scientists for outputs based on AI models.27 Some argue that full transparency is not always needed for trust if systems follow established academic and institutional norms, but this applies to human and institutional actors, not to AI models which cannot be held to norms in the same way. Therefore, how can materials scientists ensure research integrity and fully take responsibility for AI tools, when they cannot foresee, fully understand nor verify how these tools gets to a given specific output? For example, if an AI model identifies a novel battery electrolyte composition but cannot explain why certain additives improve ionic conductivity, researchers cannot properly assess safety risks or optimise the formulation further. Similarly, if an AI predicts a ceramic will exhibit ferroelectric properties but provides no mechanistic insight, experimental validation becomes trial-and-error rather than hypothesis-driven science. This requires an evolving understanding of scientific responsibility and epistemic agency in the AI era. The traditional idea of scientific responsibility assumes a human agent's full understanding and control over their research tools.27 AI's opacity directly challenges this, raising basic questions about who is accountable when an AI system makes a flawed decision or generates inaccurate data, and curtailing the scientist's ability to articulate the reasons and evidence supporting AI-generated hypotheses. This implies a profound shift in the epistemology of science, and a new understanding of human agency in research integrity and accountability.25–27
Despite the concerted efforts in AI governance, the practical challenges for materials science-oriented adoption of these principles and governance frameworks is and will not be straightforward. Even these broad principles need to be translated into specific guidelines and governance mechanisms relevant to the specific challenges of materials science data. Further, the current generalised list of ethical AI principles can and should not be expected to be exhaustive. One must expect that considerable work will be needed to identify potential ethical challenges posed by the use and development of AI specifically in and for materials science research. This includes ethical challenges requiring both technical and non-technical address, for example potentially relating to sustainability, dual use, and how AI might change perceptions and assumptions about materials science research practises.
Frameworks for assessing “AI-ready” data are appearing to deal with some of these problems. The SciHorizon framework, for instance, suggests four main aspects: quality, FAIRness (findable, accessible, interoperable, reusable), explainability, and compliance.8 Key parts of the ‘quality’ component include completeness, accuracy, consistency (both internal coherence within a dataset and external alignment to related datasets), and timeliness (prompt publication and continuous updating). ‘Compliance’ stresses the importance of data provenance (clear documentation of data sources, authorship, and licensing), ethics & safety (adherence to scientific ethical standards), and trustworthiness (compliance with national regulations and sustainability of data services). This shifts the focus from sheer data volume to the quality, relevance, and representativeness of the data, but implicit in this is a fundamental re-evaluation of how scientific data is collected, curated, and prepared – with both good practice and AI in mind.
The Nature Portfolio also largely prohibits the use of generative AI for images due to unresolved legal copyright and research integrity issues. This stance, while understandable from a legal and ethical perspective, may be an implicit hurdle for AI-driven materials discovery workflows reliant on generative models to propose novel material structures or microstructures, where visual representation is key.
Recent calls from the microscopy community emphasise the need to reframe expectations around image quality, recognising that not every nanomaterial or assembly is perfect, and that pristine images may signal manipulation rather than excellence.1 Excessive demands for polished images create pressure on researchers that can inadvertently incentivise AI usage. Reviewers should not request “better looking” images unless visual improvements would change the authors' scientific conclusions. The purpose of images is to support conclusions and enable fair judgment, not to serve as polished content for dissemination. Editors must actively dismiss such reviewer comments when they are scientifically unjustified, recognising that this pressure contributes to the integrity crisis.
Standardised validation protocols are also becoming more prominent. There is a recognised need for new norms, standards, and best practices for conducting research with AI. Data provenance is vital for AI authentication, transparency, and traceability. The detailed requirements for documentation, including researcher responsibilities, workflow, input, output, metadata, origin/access point, and data management, go beyond a traditional citation. This implies that for AI-driven materials science, true reproducibility and trustworthiness depend not just on the final model or results, but on a carefully documented “data trail” from raw source to final output, especially given AI's potential to hallucinate and/or generate synthetic data.
The potential for automated quality assurance and better peer review processes specifically within materials science is significant. Some publishers and research institutions already use AI tools to scan submitted manuscripts for image integrity problems before peer review. Christmann's study showed the power of AI-powered data analysis in uncovering previously unknown systematic errors in chemical publications highlights AI's capability for automated quality control in chemical and materials data.11 The future must bring better collaboration between AI and human reviewers to improve fraud detection. This will likely involve AI handling issues of scale and initial pattern detection, with human experts then providing critical judgement, contextual understanding, and deal with nuanced cases AI might miss.
The Science family of journals has adopted Proofig, an AI-powered image-analysis tool, to screen for manipulation.29 However, ethical implementation requires that AI-flagged suspicions be reviewed by humans, with outcomes communicated to authors who must have opportunity to respond, in accordance with Committee on Publication Ethics (COPE) guidelines. Such tools should be deployed both during submission and retrospectively to audit previous publications, with the sophistication of anti-fraud measures potentially serving as an indicator of journal quality.
Fostering a reflective research culture by bringing AI ethics and sound data analysis skills into scientific education and training is an essential and proactive step. For instance, Freie Universität Berlin's Department of Biology, Chemistry, and Pharmacy plans to bring AI tools into its curriculum to help students develop strong data analysis skills and critical thinking, preparing them for their future research careers.30 Similarly, Cornell Engineering has started a graduate-level course, “AI for materials”, designed to give the next generation of researchers and engineers the knowledge to drive discovery where AI and materials science meet, highlighting both applications and the challenges involved. The challenges posed by AI can also be seen as an opportune moment to foster new interdisciplinary relations, and to benefit from the strengths of disciplines, who specialise in ethics or meta-science matters. As an example, “Embedded EthiCS” at Harvard is a collaboration between computer scientists and philosophers fuelling both teaching and research on ethical concerns in AI development and adoption.
Encouraging a culture of critical reflection and healthy scepticism towards AI outputs is essential. It is vital to develop a culture within materials science that sees AI models as tools requiring responsible use while instilling strong data analysis skills and critical thinking skills in researchers – at all levels, ensuring we are all equipped with the necessary knowledge and ethical grounding to navigate the complexities of AI-driven science responsibly.
To supplement individual diligence, the community should also consider adopting established structural approaches from other scientific disciplines designed to improve the reliability of research findings. Adversarial collaborations, for instance, unite researchers with conflicting viewpoints to jointly design and conduct a critical experiment, increasing the impartiality of the outcome. The Registered Reports publication format, where methods and analysis protocols are peer-reviewed before experiments are conducted, mitigates publication bias and questionable research practices. Finally, a ‘Red Team’ approach, where researchers actively solicit rigorous, structured criticism of a project from designated colleagues prior to submission, can identify weaknesses in argumentation and data interpretation that might otherwise be missed. The adoption of such practices would represent a systemic commitment to research integrity.
A critical enabler for this checklist is a policy mandating the deposition of raw experimental data. The distinction between raw and processed data is vital for integrity. It is insufficient to provide only processed data, such as a text file of a diffraction pattern (.xy); journals must require the structured, machine-readable raw data files generated by the instrument itself (e.g., .raw, .xrdml). Raw instrument files contain a rich set of metadata – including calibration parameters, detector settings, and collection times – that are essential for reproducing the analysis and verifying the data's origin. This embedded metadata makes the convincing fabrication of a raw data file substantially more difficult than creating a simple text file of processed numbers. To make verification of these files practical rather than overwhelming, recent proposals from the nanomaterials community suggest adopting standardised data storage structures.1 The minimal arrangement of instrument files (MAIF) framework proposes that each manuscript has its own folder, with each figure having a subfolder containing primary instrument files specific to that figure, and non-figure data stored in a separate ‘additional data’ folder.1 This structured approach – as opposed to idiosyncratic, researcher-specific filing systems – enables efficient checking of key instrument files for legitimacy without overwhelming reviewers or investigators. We recommend that journals require structured raw data files (following MAIF or similar principles) as a publication criterion, published as compressed directories in supplementary information or repositories such as Zenodo, Open Science Framework, or Figshare. This requirement should over time become mandatory rather than merely encouraged.
While the checklist approach we propose addresses the integrity of individual studies, a distinct but related challenge is the quality control of the large, aggregated datasets upon which foundational AI models are built. In fields such as clinical science, where meta-analyses of randomised controlled trials face similar issues with flawed or fraudulent data, researchers have developed formal tools to identify problematic studies before their inclusion in a wider analysis.31 A parallel approach is required in materials science to ensure that AI models are not trained on compromised data.
Accordingly, we propose the development of a complementary data-vetting framework specifically for the curation of AI training sets. Such a framework would consist of a series of checks to be applied to any dataset being considered for inclusion in a larger corpus, including: (i) verification of the publication status of the source data, checking for retractions, corrections, or expressions of concern; (ii) screening against public post-publication review platforms for credible criticisms; and (iii) programmatic checks for statistical anomalies or physically implausible results within the data itself, such as efficiencies exceeding 100% or unrealistic electrochemical parameters.
These proposed actions are not one-and-done fixes, and as shown earlier connects with larger reflections upon how AI interacts with our notion of trustworthy research and knowledge generation, therefore instead requiring continuous involvement and engagement from both the materials science research community and beyond. However, we hope with these initial suggestions to open up for a focused discussion on responsible AI adoption in Materials Sciences, and encourage a wider dialogue on how we can ensure trustworthy and responsible research practices in the AI age.
| This journal is © The Royal Society of Chemistry 2025 |