Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community

Rebekah Duke; Vinayak Bhat; Chad Risko

doi:10.1039/D2SC05142G

Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community

Rebekah Duke,

^a Vinayak Bhat

^a and Chad Risko

*^a

Author affiliations

* Corresponding authors

^a Department of Chemistry & Center for Applied Energy Research, University of Kentucky, Lexington 40506, Kentucky, USA
E-mail: chad.risko@uky.edu

Abstract

As buzzwords like “big data,” “machine learning,” and “high-throughput” expand through chemistry, chemists need to consider more than ever their data storage, data management, and data accessibility, whether in their own laboratories or with the broader community. While it is commonplace for chemists to use spreadsheets for data storage and analysis, a move towards database architectures ensures that the data can be more readily findable, accessible, interoperable, and reusable (FAIR). However, making this move has several challenges for those with limited-to-no knowledge of computer programming and databases. This Perspective presents basics of data management using databases with a focus on chemical data. We overview database fundamentals by exploring benefits of database use, introducing terminology, and establishing database design principles. We then detail the extract, transform, and load process for database construction, which includes an overview of data parsing and database architectures, spanning Standard Query Language (SQL) and No-SQL structures. We close by cataloging overarching challenges in database design. This Perspective is accompanied by an interactive demonstration available at https://github.com/D3TaLES/databases_demo. We do all of this within the context of chemical data with the aim of equipping chemists with the knowledge and skills to store, manage, and share their data while abiding by FAIR principles.

This article is part of the themed collection: 2022 Chemical Science Perspective & Review Collection

Article information

https://doi.org/10.1039/D2SC05142G

Article type

Perspective

Submitted

14 Sep 2022

Accepted

06 Nov 2022

First published

08 Nov 2022

This article is Open Access

All publication charges for this article have been paid for by the Royal Society of Chemistry

Download Citation

Chem. Sci., 2022,13, 13646-13656

Permissions

Request permissions

Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community

R. Duke, V. Bhat and C. Risko, Chem. Sci., 2022, 13, 13646 DOI: 10.1039/D2SC05142G

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Chemical Science

Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community

Abstract

Article information

Download Citation

Permissions

Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community

Social activity

Search articles by author

Spotlight

Advertisements