Emil I.
Jaffal‡
ab,
Sangjoon
Lee‡
*c,
Danila
Shiryaev
a,
Alex
Vtorov
a,
Nikhil Kumar
Barua
d,
Holger
Kleinke
d and
Anton O.
Oliynyk
*ab
aDepartment of Chemistry, Hunter College, City University of New York, New York, NY 10065, USA. E-mail: anton.oliynyk@hunter.cuny.edu
bPhD Program in Chemistry, The Graduate Center of the City University of New York, New York, NY 10016, USA
cDepartment of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027, USA
dDepartment of Chemistry, University of Waterloo, 200 University Ave W, Waterloo, ON, Canada
First published on 17th January 2025
Traditional and non-classical machine learning models for solid-state structure prediction have predominantly relied on compositional features (derived from properties of constituent elements) to predict the existence of a structure and its properties. However, the lack of structural information can be a source of suboptimal property mapping and increased predictive uncertainty. To address this challenge, we have introduced a strategy that generates and combines both compositional and structural features with minimal programming expertise required. Our approach utilizes open-source, interactive Python programs named Composition Analyzer Featurizer (CAF) and Structure Analyzer Featurizer (SAF). CAF generates numerical compositional features from a list of formulae provided in an Excel file, while SAF extracts numerical structural features from a .cif file by generating a supercell. 133 features from CAF and 94 features from SAF are used either individually or in combination to cluster nine structure types in equiatomic AB intermetallics. The performance is comparable to those with features from JARVIS, MAGPIE, mat2vec, and OLED datasets in PLS-DA, SVM, and XGBoost models. Our SAF + CAF features provide a cost-efficient and reliable solution, even with the PLS-DA method, where a significant fraction of the most contributing features is the same as those identified in the more computationally intensive XGBoost models.
Table 1 summarizes the featurizers used to predict solid state structures that employ compositional and/or structural features. The table includes examples, those with an asterisk indicating experimentally validated works. This is not an exhaustive list of available featurizers, as we focus primarily on those applied to solid state materials and specifically to structure prediction. Herein, we list open-source featurizers that, while widely used, are not appropriate for crystal structure prediction. RDKit39 is used to generate features for the development of structurally distinct activators of pregnane X receptors40 and protein domain-based prediction of drug/compound-target interactions.41 This featurizer addresses challenging topics, such as prediction of conditions for organic reactions.42 However, RDKit primarily focuses on molecular structures, with unknown applicability in extended crystal structure prediction. Similarly, Mordred43 is a widely used featurizer by Takagi, which produces close to 2000 features and is used in experimentally validated medical-related studies, such as drug repurposing screening to identify clinical drugs targeting SARS-CoV-2 main proteases44 and an open drug discovery competition for novel antimalarials.45 Despite its applications in other fundamental chemistry studies, such as predicting the reactivity power of hypervalent iodine compounds,46 like RKDit, Mordred does not focus on solid state materials. Additionally, MOFormer47 by Cao is also software for metal–organic frameworks (MOFs), not intended to be used as a general featurizer for inorganic solid-state materials.
Featurizer | No. of features, including structural | Used in the following works: *experimentally validated |
---|---|---|
MAGPIE17 | 115![]() |
Accelerated discovery of perovskite materials18 |
ML modeling of superconducting critical temperature19 | ||
*Accelerated discovery of metallic glasses through iteration of ML and high-throughput experiments20 | ||
JARVIS21 | 438 total | High-throughput identification and characterization of 2D materials using DFT22 |
*Thermodynamic properties of the Nd–Bi system via EMF measurements, DFT, ML, and CALPHAD modeling23 | ||
Screening Sn2M(III)Ch2X3 chalcohalides for photovoltaic applications24 | ||
Atom2vec25 | N/A | Predicting the synthesizability of crystalline inorganic materials26 |
ML-based prediction of crystal systems and space groups from inorganic material compositions27 | ||
Evaluating the prediction power of ML algorithms for materials discovery using k-fold cross-validation28 | ||
Mat2vec29 | 200 total | *Compositionally restricted attention-based network for materials property predictions9 |
Using word embeddings in abstracts to accelerate metallocene catalysis polymerization research30 | ||
Word embeddings for chemical patent natural language processing31 | ||
Elemnet32 | 145 total | *Compositionally restricted attention-based network for materials property predictions9 |
Enhancing materials property prediction by leveraging computational and experimental data using deep transfer learning33 | ||
*Element selection for crystalline inorganic solid discovery guided by unsupervised ML of experimentally explored chemistry34 | ||
CGCNN35 | N/A | Developing an improved crystal graph convolutional neural network framework for accelerated materials discovery36 |
*Band gap prediction in crystalline borate materials37 | ||
Machine learning-based feature engineering for thermoelectric materials by design38 |
Structural features have been used for solid state materials with ML frameworks. Numerical features generated by the DScribe package48 offer structural representations of molecules and materials.49 These features are used for determining transferable ML interatomic potential, ranging from bond dissociation energy prediction of drug-like molecules50 to reactivity of single-atom alloy nanoparticles.51 However, their vectorized representation and lack of human interpretability do not align with the current need for human interpretable approaches. Additionally, lattice convolutional networks (LCNN) by Jung and Vlachos, which calculate surface graph features in two dimensions with six different permutations52 have been used for predicting properties, including surface composition and surface reaction kinetics,53 ground states,54 catalyst properties,55 and phases.20 While these features are evidently optimized for deep neural networks, they do not address the requirements for interpretability and explainability in solid state materials studies. We also tested the smooth overlap of atomic positions (SOAP) featurizer, provided by the DScribe package.48 We generated a total of 6633 features and achieved F-1 scores of 0.983 (XGBoost), 0.978 (SVM) and 0.94 (PLS-DA). The performance was highly comparable to other featurizers for SVM and XGBoost, but it vastly outperformed the rest in PLS-DA. Although it outperformed, with the 6633 features, it became very computationally expensive. Likewise, the features are not explainable, so we are not able to track what physical feature they correspond to which does not align with our goal of interpretability of features in this case.
Nevertheless, sorting chemical formulae is a crucial preprocessing step, especially when working solely with composition for modeling. Despite the limited information, for certain structure types (e.g., Heusler's AB2C or perovskites, ABO3), the indices may serve as a proxy for structure where a specific index is related to a particular structure site. However, this approach is prone to what is known as a coloring problem when indices duplicate; it might not be clear from the index which crystallographic site is occupied by which element.65,66 Commonly, when only composition features are employed, we observe nothing more than the fact that elements group according to their elemental properties, which echoes with the periodic table principle.67 The next level is structure maps that depict more complex information in either two or three dimensions.64,68,69
Fig. 1 illustrates the most common approach for determining compositional features used in machine learning for chemistry and materials science. Often, no preprocessing (e.g., index normalization) is better than preprocessing. Chemical information such as structural complexity can be lost when opting for atomic percentage instead of the indices to represent chemical composition. However, simple preprocessing such as meaningful sorting and rearranging of formulae can greatly enhance the model performance.
Prior to writing code for CAF, we considered the user experience with open-source software used for feature generation. Our goal was to develop easy-to-use software that does not require programming skills, including for those without formal programming training in the solid-state materials community. Utilizing the packages featured in Table 1, we documented the experiences of individuals with various levels of academic training: an undergraduate student with no prior programming experience, an undergraduate student with a semester's worth of programming experience, a postbaccalaureate user, and a master's level student majoring in software development. These subjective experiences from our group members are summarized in ESI Tables S1 and S2.†
CAF (Fig. 2) is available on GitHub at https://github.com/bobleesj/composition-featurizer-analyzer or https://github.com/OliynykLab. As discussed, the sorting of formulae can significantly impact the model quality. CAF supports Excel file formats and includes a filtering option that summarizes dataset content and filters data based on the number of elements in a formula or removes non-elements. All solid elements are accounted for, for a total of 73 elements, to ensure maximum applicability. Additionally, a heatmap based on element occurrence can be generated, allowing users to visually analyze their dataset. If data are stored as CIFs in a folder, CAF can extract compositions from the CIFs and generate a table with element formulae. Following filtering, the second option is sorting, which can be based on composition (indices or element fractions). Another sorting method is based on properties; if a file containing properties is provided, they will be listed to give users the option to sort them in the ascending or descending order. Sorting can also be based on a manually modified list of element groups to meet specific user needs. Once the file is updated with sorted compositions, the third option, featurization, can be applied using a pre-prepared list of descriptors designed to avoid mathematical operations that could result in values of infinity or NaN. The descriptor list can also be tailored to address specific problems the user aims to solve. For instance, we include the option for users to hot-encode their data, converting categorical information into a binary vector format suitable for machine learning algorithms. The presence or absence of an element is indicated by 1 or 0, respectively. To maximize data utility, we have prepared binary and ternary featurizers, along with a universal featurizer that is agnostic to the number of elements in a compound. The final two options allow users to cross-reference the list of compounds against the folder containing CIFs and to enhance the file with features from other files (e.g., those generated by other featurizers).
CAF is also designed for extensibility. The list of properties used for calculating features can be further enhanced by incorporating novel size or electronegativity scales defined by the user. For instance, the size scale is sensitive to the class of materials and the presence of other elements, making it advisable for users to calculate their own scale for effective modeling. For example, to define a new size scale, one could use the shortest homoatomic distance from CIF reports, divided by 2, to determine the CIF radius. We recommend generating the output with the mean value, standard deviation, and a histogram for visual inspection. The CIF radius scale can then be used as a property for feature definition, comparable to other metrics such as covalent radius, ionic radius, and others.
Proposed here is the Structure Analyzer Featurizer (SAF) available at https://github.com/bobleesj/structure-analyzer-featurizer or https://github.com/OliynykLab/. At the time of writing, SAF currently supports binary and ternary compounds, generating 94 numerical features and 134 features for ternary with the goal to support quaternary and beyond for future studies. The complete lists of features are available in the GitHub repository and the ESI Tables S4–S6, with Table S4† providing comments that allow users to utilize extracted data not only for ML modeling but also for structure analysis. INT_* features are calculated from interatomic distance analysis, WYC_* features are based on Wyckoff symbol/multiplicity, ENV_* features are derived from atomic environment data, and CN_* features are also calculated from atomic environment data. Fig. 3 illustrates the process of procuring a single set of numerical features extracted from structural, compositional, and raw data as an input data source for ML models. Parts of the SAF code have been used to determine coordination geometry using various methods.71 Furthermore, although not implemented in this study, the features can be used for feature relationship analysis (e.g., SISSO) to reveal the relationships between the measured structural features and properties.72
SAF supports .cif files from databases such as PCD, ICSD, COD, and Materials Studio. PCD provides detailed structural descriptions, including editor-entered crystal structure prototypes and fully standardized crystal structure data. Similarly, we have ensured that our code is compatible with the ICSD database,73 where most structures also have assigned structure types, which facilitates searches for specific structure classes. We recommend standardizing CIFs through trusted crystallographic software which writes CIFs in the correct format. Large CIF repositories do not guarantee consistency in CIF formatting, and even with large online databases, there could be cases when, for example, atomic label and atomic type are reversed, which might cause errors in file processing. Furthermore, CIFs, even from reputable databases, require some editing, due to typographic error or missing entries which prevents them from being parsed. Extracting data from databases might seem to be a straightforward process, but preparing the files for processing tends to require some adjustments. For instance, parsing errors might arise in cases where CIF info loops have blanks reported with some information missing. These could be as simple as the title of publication missing or author's affiliation, but these problems can affect file parsing. In materials science, especially where experimental data are scarce, it is crucial to ensure that all reports are included, and errors are automatically corrected. Another common CIF problem is inconsistent site labels, especially the numbering of labels or problematic labels in the case of atomic mixing, where the same site could be labeled differently, causing confusion and inconsistent results during a high-throughput CIF processing. Therefore, to filter ill-formatted CIF files, we have also developed a standalone and user-interactive Python application called CIF Cleaner available at https://github.com/bobleesj/cif-cleaner or https://github.com/OliynykLab/ (Fig. 4).
Structure type | Search result | CIFs needed editing | Under ambient conditions |
---|---|---|---|
TlI | 411 | 10 | 401 |
FeB | 279 | 1 | 197 |
NaCl | 243 | 1 | 236 |
FeSi | 190 | 1 | 164 |
CsCl | 188 | 4 | 138 |
ZnS | 89 | 3 | 89 |
FeAs | 86 | 1 | 79 |
NiAs | 85 | 1 | 83 |
CuAu | 47 | 3 | 41 |
Cu | 141 | 1 | 104 |
Mg | 32 | 0 | 29 |
W | 15 | 0 | 0 |
![]() | ||
Fig. 5 (a) Elements used in the current study to illustrate the application of Structure Analyzer Featurizer (SAF) and (b) all solid elements included in Composition Analyzer Featurizer (CAF). |
Features | PLS-DA | SVM | XGBoost | Cost | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Generated | Precision | Recall | F1-score | Accuracy | Precision | Recall | F1-score | Accuracy | Precision | Recall | F1-score | Accuracy | ||
JARVIS | 3066 | 0.372 | 0.366 | 0.330 | 0.391 | 0.979 | 0.965 | 0.977 | 0.972 | 0.989 | 0.985 | 0.987 | 0.989 | 23.7 |
MAGPIE | 154 | 0.369 | 0.364 | 0.328 | 0.404 | 0.965 | 0.956 | 0.960 | 0.967 | 0.988 | 0.983 | 0.985 | 0.986 | 0.8 |
mat2vec | 1400 | 0.611 | 0.658 | 0.582 | 0.609 | 0.990 | 0.985 | 0.987 | 0.989 | 0.981 | 0.978 | 0.979 | 0.983 | 9.3 |
OLED | 308 | 0.449 | 0.448 | 0.399 | 0.457 | 0.974 | 0.955 | 0.963 | 0.973 | 0.987 | 0.984 | 0.985 | 0.987 | 1.4 |
CAF | 133 | 0.419 | 0.384 | 0.363 | 0.404 | 0.967 | 0.950 | 0.957 | 0.959 | 0.988 | 0.984 | 0.986 | 0.987 | 1.03 |
SAF | 94 | 0.526 | 0.567 | 0.511 | 0.603 | 0.993 | 0.989 | 0.991 | 0.990 | 0.997 | 0.993 | 0.995 | 0.994 | 0.6 |
CAF + SAF | 227 | 0.569 | 0.589 | 0.533 | 0.579 | 0.994 | 0.987 | 0.991 | 0.993 | 0.997 | 0.996 | 0.996 | 0.996 | 1 |
Among the featurizers used via CBFV (Fig. 6a–d), none demonstrate a clear class clustering in two dimensions, except for NaCl structures (yellow circles), especially with mat2vec. Clustering with CAF (Fig. 6e) also is not better than other composition-based featurizers, which is not surprising, given that it is based on the OLED set of properties for feature generation. The location of some datapoints with extreme values of LV1 and LV2 might indicate decreased confidence in their prediction as they are approaching the limits of the compositional space. JARVIS had precious metal silicides at the edge of the model confidence, which is typical for underrepresented cases such as OsSi (FeSi-type), IrSi (FeAs-type), and RhSi (in both FeAs- and CsCl-types). MAGPIE had some issues with classifying rare cases when the compounds are formed with two p-block elements, such as GaSb (ZnS-type), InSb (ZnS-type), and SnSb (ZnS- and NaCl-type). This is not surprising as most of the datasets consisted of transition metal-containing phases, and compounds with only main block elements are rare. Similar issues, and actually the same compounds, were problematic on the PLS-DA plot from the model based on OLED. One of the limitations of composition-based featurizers is their inability to handle polymorph cases, where one stoichiometry can form multiple structures, like it was in the cases described above (SnSb and RhSi).
CAF was developed with the output data consistency in mind and with a principle that treats differently integral values (measured exactly) and property values (measured over the range of values) to avoid Inf or NaN values in cells. CAF, SAF (Fig. 6e and f), and SAF + CAF (Fig. 7) models in PLS-DA plotted with reduced dimensions to two LV dimensions are shown to be complementary to each other. While CAF had NaCl points mixed with FeB and TlI and SAF had difficulties in segregating NaCl with FeAs, the combination of SAF + CAF (Fig. 7) resolved individual CAF and SAF issues completely. Some structure types have a wide composition range which results in partial success of structure segregation. For instance, the segregation of the TlI-type dataset depends on the element present in the TlI structure. The Fe-family member representative of the TlI class could not be efficiently separated from the rest of the structures; however, the rare-earth element TlI representatives are well separated with mat2vec (Fig. 6c, TlI points at LV1 = 8–15), SAF (Fig. 6f, TlI points at LV1 = −5.0 to −1.8 and LV2 = 2.5–8.0), and SAF + CAF (Fig. 7, TlI points at LV2 = −3 to −7). SAF analyzes geometry and is agnostic to the composition of the samples, resulting in clear clustering of the structures, besides a large cluster that has mixed TlI/FeB, and FeSi/FeAs/NiAs types in one. This clustering happens because of similar coordination geometry within the cluster but distinctly different from the rest of the structures. Traditionally, like in all previous plots, ZnS-type is at the edge of the confidence with extreme LV1 and LV2 values. Combination of CAF and SAF results in the solution to the large unseparated blocks mentioned above, with only two structure types overlapping (TlI and FeB), as shown in Fig. 7. None of the featurizers provided data to successfully separate these two structure types, but to be fair, mat2vec was the closest to separation compared to the rest of the featurizers.
![]() | ||
Fig. 8 Indirect learning with our model to solve problems of (a) compound class and (b) noncentrosymmetric phase prediction and (c) testing set of features from another AB classification model. |
Aiming for explainability in models, especially with structural descriptors, reveals correlations that advance chemical knowledge. For instance, it helps to develop new size or electronegativity scales for a specific subclass of compounds or analyze polyhedra through the lens of electron configurations and orbital hybridization theory. Structural features play an important role in the overall prediction of property schema in various situations. For instance, in a study analyzing ligand affinities, the authors ranked the features with three different methods, namely RF, Permutation Importance, and AdaBoost, which consistently placed PEOE_VSA2 and NumHAcceptors as the two highest-ranked features.81 These features are two-dimensional topological and topochemical properties that have versatile uses. However, the authors specifically needed them to provide valuable information about the molecular surface and its potential interactions with binding species. NumHAcceptors is self-explanatory, while PEOE_VSA calculates the atomic contributions to the van der Waals surface area using partial total charges and molar refractivity. In another paper predicting band gap for materials,82 the authors analyzed the respective features by using two different ranking criteria, one based on Pearson correlation between each of the fifteen features and the target variable and the other based on the weights obtained from Lasso regularization (weights of the Lasso coefficient). They were able to reduce the original set of fifteen features to seven with no loss of information. Another work had a similar schema using both a low number of descriptors (nine in total, a combination of elemental and structurally based) and found only one of those structural descriptors (octahedral factor) in the top five descriptor ranking.83 They ranked these descriptors using recursive feature elimination, which selects features by recursively removing those which exhibit the smallest weight assigned by an extra tree classifier. Structural features were also used for bulk and shear modulus prediction (proxy properties for hardness),77 where they were among the most important features after iterative feature selection. As we can see, structural features are used in building explainable models and could be easily identified in datasets with a small number of features. This allows detailed feature correlation analysis and straightforward construction of the decision trees, which are regarded as the most visual representation of model explainability.
In the current study, we deal with a larger set of features (a few hundred), which requires dimensionality reduction, while preserving the information on each feature importance. We believe that even simple methods like PLS-DA might be effective in solving crystallographic structure classification problems. And with an effective set of features (SAF + CAF), it could identify the same important features as more expensive (XGBoost) methods, providing us the explainability in an affordable way. While PLS-DA (as well as PCA) methods allow us to explore the LV (or PC) space and extract the weights of the original features that contribute to the axes, it is important to keep in mind that for these methods the combination of features matter more so than the individual features. In recent years, explainability has become an important topic as the age of the black box methods is over, and users want to gain insight rather than just getting results with excellent modelling statistics. (Often, for experimentalists, explainability that results in new chemical knowledge and eventually translates into novel material discovery is more important than high model accuracy statistics.) There are a few methods that improve the explainability/interpretability of the models, and here we summarize the top ten XGBoost scores for each feature set that was used in our test study (Fig. 9).7,84,85
Labeling features is the first step to explainability, and despite being on a par with our featurizers in terms of model statistics, mat2vec fails to provide appropriate and scientifically meaningful labels for their features (Fig. 9c). In the top 10 features with the highest gain according to XGBoost, JARVIS identifies the mass, volume, and electron properties (ionization energy, electron affinity, etc.) to be the most important. MAGPIE identifies periodicity and systematization information (Mendeleev number, group number, and space group number), electron properties, and physical properties among the most important features. OLED shows a great balance of features in the top gain list, which consists of the periodic table information (group number), various size scales (metallic and Miracle radii), various electronegativity scales (Gordy and Pauling), and electron count approaches (metallic valence and valence electron count), along with the physical properties of different origins (polarizability, ionization energy, and specific heat). We continued the approach behind OLED featurization in our CAF development; therefore, CAF top features also demonstrate the excellent balance of features: periodic (group number and Mendeleev number), size (radii difference, average radius, and Pauling radius), Pauling electronegativity, and physical properties (bulk modulus, ionization energy, and melting point). It is important to mention that the user-introduced features such as the CIF radius scale introduced in this work and element preprocessing play a crucial role, since 8 out of 10 top features had a specific A/B element sorting tag. The CAF feature set is the closest to the classical structure map works by Villars and Pettifor.64,69 SAF produces structural features, which are unique to other featurizers. The features are related to the coordination environment, interatomic distances, and distortions of polyhedra. Interestingly, the combined SAF + CAF (Fig. 9g) results in the most effective model, and the gain scores of the top two features overlap with the top two features from SAF (Fig. 9f) and CAF (Fig. 9e) separately, which is a great indication of the balance. While the top 10 features of SAF + CAF are dominated by the structural origin, as we will show next, PLS-DA LV contribution scores solve this issue, and compositional features become on a par with the structural features.
PLS-DA is an affordable method for analysis and modeling of large volumes of data. With a combination of a properly constructed feature vector, it becomes an effective method to increase the explainability. Ultimately, PLS-DA application in solid state chemistry originates from structural maps that were traditionally used in crystal structure classifications. While model statistics of PLS-DA are not comparable to SVM and XGBoost methods (Table 3), PLS-DA model statistics (which deviate more in the PLS-DA method) can provide an indication of a suitable feature set, when more advanced methods produce quite comparable results indifferent to the feature set. Another application of the PLS-DA method could be feature analysis for explainability. The first indication is variance percent in latent value vectors (LVs). With modern computational power, we have a privilege of utilizing any number of LVs we want, and the cumulative variance increases with more LVs. Eventually, the accuracy converges at certain LV levels, but the most effective number of LVs is usually low, with the first 3 LVs being the most helpful as it allows one to visualize data in plots, essentially creating structure maps. In our comparison, we looked at the two most accurate PLS-DA models, mat2vec and our development, SAF + CAF (Table 4). The cumulative variance of SAF + CAF is significantly higher than that of mat2vec, meaning that our features are used more effectively. While the first two LVs are dominated by SAF features, the CAF features are also present, especially in the third LV. In bold, we have highlighted the features that were listed in the top 10 gain features by the XGBoost model. In the case of mat2vec, only one feature had an overlap, while half of the features found to be helpful with XGBoost were also found in the first three LVs with the PLS-DA method. This is significant considering the relative cost of the methods and highlights how effective features from SAF and CAF are.
mat2vec | SAF + CAF | |||
---|---|---|---|---|
Variance | Top contributors | Variance | Top contributors | |
LV1 | 11.50% | max_53 | 15.26% | CN_MIN_packing_efficiency |
min_74 | CN_AVG_packing_efficiency | |||
mode_74 | CN_MAX_packing_efficiency | |||
min_178 | WYK_A_lowest_wyckoff | |||
mode_178 | WYK_B_lowest_wyckoff | |||
sum_46 | WYK_A_multiplicity_total | |||
sum_84 | WYK_B_multiplicity_total | |||
sum_129 | CN_MIN_B_atom_count bulk_modulus_avg | |||
avg_46 | ENV_B_shortest_tol_dist_count | |||
avg_84 | ||||
LV2 | 2.82% | dev_193 | 11.82% | INT_UNI_refined_packing_efficiency |
range_193 | ENV_B_count_at_A_shortest_dist | |||
min_23 | ENV_B_avg_count_at_A_shortest_dist | |||
mode_23 | INT_Asize_ref | |||
max_193 | CN_AVG_central_atom_to_center_of_mass_dist | |||
dev_194 | CN_MAX_central_atom_to_center_of_mass_dist | |||
range_194 | CN_MIN_central_atom_to_center_of_mass_dist | |||
dev_91 | ENV_A_shortest_dist_count | |||
range_91 | ENV_A_avg_shortest_dist_count | |||
dev_195 | CN_AVG_packing_efficiency | |||
LV3 | 4.79% | max_75 | 4.56% | specific_heat_A–B |
dev_129 | specific_heat_B | |||
range_129 | ENV_A_count_at_A_shortest_dist | |||
min_40 | ENV_A_avg_count_at_A_shortest_dist period_B | |||
min_50 | CN_MAX_B_atom_count specific_heat_A/B | |||
mode_40 | Z_eff_B ratio_closest_min | |||
mode_50 | density_A/B | |||
sum_40 | ||||
avg_40 | ||||
dev_191 |
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4dd00332b |
‡ These authors contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2025 |