Systematic Error Detection in the Database of Liquid Crystals (LiqCryst) Using Predictive Models
Abstract
Experimental data often contain anomalies, which can be errors or previously unrecognised knowledge gaps. While errors undermine the reliability of reported findings, unknown gaps can sometimes point to opportunities for discoveries. Machine learning (ML) techniques offer a promising means of identifying such anomalies. In this study, we propose a human-in-the-loop approach that integrates domain expertise and an ML model trained on a comprehensive database of phase transition behaviours of liquid crystalline (LC) materials (LiqCryst 5.2) to scrutinize data integrity. The ML model uncovered multiple anomalies in reported chemical data on LC phase transition behaviours, which were subsequently re-examined by human experts to determine whether they were due to errors. Our results demonstrate that the ML model can effectively detect inconsistencies even within a large-scale database widely regarded as an industry standard. At the same time, anomalies that do not originate from errors may highlight unexplored phenomena and thereby stimulate future discoveries. The proposed methodology for systematically reassessing reported chemical data has the potential to be applied broadly across different materials systems and scientific domains.
Please wait while we load your content...