: a Python framework for assessing similarity in materials-science data

Abstract

Computational materials science produces large quantities of data, both in terms of high-throughput calculations and individual studies. Extracting knowledge from this large and heterogeneous pool of data is challenging due to the wide variety of computational methods and approximations, resulting in significant veracity in the sheer amount of available data. One way of dealing with the problem is using similarity measures to group data, but also to understand where possible differences may come from. Here, we present Image ID:d4dd00258j-u2.gif, a Python framework for computing similarity relations between material properties. It can be used to automate the download of data from various sources, compute descriptors and similarities between materials, analyze the relationship between materials through their properties, and can incorporate a variety of existing machine learning methods. We explain the architecture of the package and demonstrate its power with representative examples.

Graphical abstract: : a Python framework for assessing similarity in materials-science data

Article information

Article type
Paper
Submitted
13 Aug 2024
Accepted
16 Sep 2024
First published
19 Sep 2024
This article is Open Access
Creative Commons BY license

Digital Discovery, 2024, Advance Article

Image ID:d4dd00258j-u1.gif: a Python framework for assessing similarity in materials-science data

M. Kuban, S. Rigamonti and C. Draxl, Digital Discovery, 2024, Advance Article , DOI: 10.1039/D4DD00258J

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements