How big is big data?

Abstract

Big data has ushered in a new wave of predictive power using machine-learning models. In this work, we assess what big means in the context of typical materials-science machine-learning problems. This concerns not only data volume, but also data quality and veracity as much as infrastructure issues. With selected examples, we ask (i) how models generalize to similar datasets, (ii) how high-quality datasets can be gathered from heterogenous sources, (iii) how the feature set and complexity of a model can affect expressivity, and (iv) what infrastructure requirements are needed to create larger datasets and train models on them. In sum, we find that big data present unique challenges along very different aspects that should serve to motivate further work.

Graphical abstract: How big is big data?

Article information

Article type
Paper
Submitted
14 mei 2024
Accepted
08 jul 2024
First published
11 jul 2024
This article is Open Access
Creative Commons BY license

Faraday Discuss., 2024, Advance Article

How big is big data?

D. Speckhard, T. Bechtel, L. M. Ghiringhelli, M. Kuban, S. Rigamonti and C. Draxl, Faraday Discuss., 2024, Advance Article , DOI: 10.1039/D4FD00102H

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements