Black-box data: a new paradigm for biomedicine in the AI era
Abstract
As artificial Intelligence cements its role as a cornerstone of scientific discovery, the field is undergoing a fundamental shift beyond the current transition from “white-box” first-principles models to “black-box” deep learning. We argue that a parallel, necessary transformation is emerging in data generation: the rise of “black-box data.” These data sources are intentionally optimized for machine consumption rather than human intuition—a trade-off we contend is essential to achieving the scale required for high-capacity biological foundation models. This article defines the “black-box data” paradigm, explores the necessity of this shift for the future of AI-driven science, and provides a unifying taxonomy illustrated by both historical precedents and contemporary breakthroughs.
- This article is part of the themed collection: 2026 Chemical Science Perspective & Review Collection

Please wait while we load your content...