Structural characterisation of nanoalloys for (photo)catalytic applications with the Sapphire library

A non-trivial interplay rules the relationship between the structure and the chemophysical properties of a nanoparticle. In this context, characterization experiments, molecular dynamics simulations and electronic structure calculations may allow the variables that determine a given property to be pinpointed. Conversely, a rigorous computational characterization of the geometry and chemical ordering of metallic nanoparticles and nanoalloys enables discrimination of which descriptors could be linked with their stability and performance. To this end, we introduce a modular and open-source library, Sapphire , which may classify the structural characteristics of a given nanoparticle through several structural analysis techniques and order parameters. A special focus is geared towards using geometrical descriptors to make predictions on a given nanoparticle's catalytic activity.

The need for a detailed characterization of the MNP's morphology stems from the delicate interplay among size, geometrical features, chemical composition and ordering, and the chemo-physical (e.g., optical, catalytic, etc.) properties of the MNP itself.

Sapphire descriptors
In this section, we list and explain the analysis that Sapphire performs. We rst discuss the distance related properties as they are fundamental to introduce a cutoff and hence the concept of local environment for any further analysis.

Distance distribution functions
The distribution of pair-atomic distances (pair-distance distribution function, PDDF) is a crucial quantity to characterise the geometry and chemical ordering of an NA. Further, the PDDF is a directly measurable quantity via X-ray techniques. 30,31 To dene the PDDF, let d ij be the pair-distance between atoms i and j: We then calculate the PDDF via a Kernel density estimate (KDE), constructed from n observations: where K(d ij ,d;h) labels the kernel function over the d ij variable, and the parameter h is the bandwidth that denes the tightness of the kernel function. Note that the KDE assumes that each interatomic distance has been randomly drawn from a given distribution, which the user may dene via input. The Gaussian distribution is the default choice. Alternatives, such as the Epanechnikov, 32 and the uniform distribution, are also currently supported. Fig. 2 shows the effect of the K and h choice in the PDDF calculation, as a function of the independent parameter d, written in terms of the lattice bulk a 0 . Setting the bandwidth, h, is a delicate step. The h-choice should balance a too ne resolutionwhere each distance might be present a single time, and below any reasonable resolutionand a too large onewhere different neighbour shells are projected onto the same distance width. As a default, we have set the bandwidth to be 0.05 of the bulk lattice, which we have found to be sufficiently broad to smooth the sharp Dirac peaks from having a nite sample, and to sufficiently resolve key features, i.e., the rst peak, minimum, and the second peak (orange line in Fig. 2). In principle, one may consider the h parameter to be comparable to the resolution in various microscopy techniques. As such, deviating from the default value must be done whilst considering the physical meaning of this parameter. That is to say that measuring interatomic distances to arbitrary precision is not possible at nite temperatures due to lattice vibrations.
Given the analytical form of the KDE, derivatives can be easily calculated, and the approximate location of minima (and maxima) in the distribution can be identied. This is a key utility. The rst PDDF minimum is instrumental to a robust determination of nearest neighbour distance. The position of the second maximum may also allow identication of whether the NA loses its symmetry or undergoes a structural or phase change. The latter should be associated to an overall radial breathing of the NA.
The evolution of the pair-distance distribution function (PDDF), independently of the chemical label, enables the: Identication of a cut-off distance (R cut ) set as where the rst minimum in the PDDF falls. Such a denition of radial distance allows the subsequent denition of a local environment, and the construction of the adjacency matrix.
Detection of solid, liquid, and amorphous, depending on the position of the second peak of the PDDF. A robust denition of "amorphous/molten-like" shape is where there is not a maximum in correspondence with the bulk lattice, a 0 , and the maximum radial distance is comparable with other well ordered shapes. The same feature and a radial breathing refers to a solid to liquid transition. 33 To clarify further this point, Fig. 3 details the PDDFs of pure MNPs with 1415 atoms of Au, Pt, and AuPt with two different compositions. The reported simulation data are obtained from iterative molecular dynamics runs, as available in LoDiS, [34][35][36] where the temperature is increased by 50 K every 1 ns, from 300 K to 1000 K.
We show PDDFs sampled at both cold (solid lines in panels (a)-(d), Fig. 3) and high temperatures (dashed lines in the same panels). We note a broadening of the rst peak at higher T, for both pure and alloyed nanosystems. At the same time, it is evident that the position of the rst peak does not change signicantly with the temperature. We stress that a radial cut-off can be dened independently of the temperature, as highlighted by the full-circle dots marking the position of the rst minimum. 33,37 The position of the second peak instead clearly changes between cold and hot temperatures. This is particularly evident in the case of pure Au and Au-based nanoalloys. Pt-NAs just show a minor effect because they are still approaching their melting point. Indeed, the pure Pt-NP PDDF still presents Fig. 2 The three possible parameterisations for calculating the PDDF using eqn (2). The top panel demonstrates the agreement between available kernels. The bottom panel illustrates the dependence on the bandwidth, h, orange (0.05a 0Å ), blue (0.0001a 0Å ) and (0.25a 0Å ) green. We have selected a randomly alloyed AuPt NA of 1415 atoms arranged as an icosahedron with a ratio of Au : Pt of 4 : 1.
a clear peak in correspondence with its bulk lattice distance. For NAs with a large amount of Pt, the second peak also still presents a shoulder in correspondence with the bulk lattice, suggesting that the phase change starts mainly in the gold network (see the lines corresponding to Au-Au and Au-Pt bonding in panel (f) of Fig. 3).
Sapphire also supports the pair correlation function, g(r), which is used to describe long-range order in liquids 38 and is related to the structure factor.
where r is the bulk density of atoms, and dn(r) is a function which computes the density of atoms within a spherical shell thickness of dr. 3.1.1 Radial distribution function. The 3D chemical ordering of a nanoalloy can be qualitatively, if not quantitatively, extracted from the radial distribution function (RDF) of the chemical species in the NA. Similarly to the PDDF, this density quantity can be readily extracted from both numerical simulations and (line-scan) experiments. 30 To this goal, Sapphire provides three complementary radial distributions as we can identify possible atomic interaction environments: the whole system, labelled with a w, and the sub-regionsor sub-clusters -A and B which gather only atoms A and B, respectively. Sapphire calculates (i) the density of atoms, independently of their chemical species, from the centre of mass of the whole NA, CoM w ; (ii) the distribution of A-atoms from their centre of mass, CoM A ; and (iii) similarly for B-atoms, their distribution from the CoM B . The RDFs count the number of atoms falling in concentric shells from the centre of mass of the nanoparticle: where the coordinatesx a ,ŷ a , andẑ a of the atom-i and chemical species a ¼ A, B are re-scaled with respect to the centre of mass chosen, namely the whole NA, or the A and B sub-regions. As was the case for computing the PDDF, we may also apply the KDE approach to compute the RDFs for each of the three types of radial distribution described above: In principle, eqn (5) can be extended to more than two chemical species, where we consider all the independent sub-regions occupied by each element.
For binary nanoalloys, aside from how the atoms are radially distributed, a handy and easy quantity to characterise the chemical ordering is the relative distance between the centres of mass of the two chemical sub-regions, D(CoM). This quantity is a easy and robust descriptor to monitor the formation and the evolution to/from Janus and quasi-Janus chemical orderings. 39 where r i,A˛A is the radial distance of the atom-i of species A taken from the centre of mass of the sub-region A, and similarly for the B sub-cluster. N A(B) is the number of A(B) atoms in the nanoparticle. D(CoM) is expected to be close to zero for alloyed, core-shell, and multi-shell ordering, but not zero when the phasesegregation breaks the radial symmetry. We provide in Fig. 9 an example of this measure, averaged over independent ensembles.

Adjacency matrix based descriptors
The denition and characterization of nearest neighbour networks has been largely adopted to classify MNPs' and NAs' morphology and draw structureproperty relationships. [40][41][42] To evaluate nearest neighbour networks and quantities deriving from the latter, solely according to a distance criterion, the adjacency matrix A is dened as: 3.2.1 Coordination numbers. One of the rst quantities mentioned in any solid state books is the number of neighbours for the various Bravais lattices. With a focus on catalysis, the coordination of an adsorption site has been oen used as a descriptor to rationalize its activity. Still related to the ability of a system to bind other molecules, in biophysics and so matter, several algorithms have been developed to estimate the number of nearest neighbours of a macromolecule. 43 A simple chemical intuition suggests that low-coordinated atoms are more likely to form chemical bonds than highly-coordinated ones. The coordination number of an atom j, CN j , follows from the adjacency matrix: Nonetheless, this denition might suffer in nding a suitable cut-off that might not always be easy to select, especially for nanoalloys with a large mismatch.
As the identication of the local environment of each atom is of paramount importance, Sapphire also contains a solid-angle nearest-neighbour criterion, as in van Meel's paper 43 (VMCN), to estimate, count, and list the nearest-neighbours of each atom i. The VMCN algorithm lists the nearest-neighbours of the atom i as the atoms j such that the solid-angle q i,j -dened by the cone with atom i at the apex and j at the center of the cone's baseequals 4p. The solid angle can be replaced by the ratio r i;j =R ðmÞ i , where the inter-particle distances r i;j are ordered such that r i;j \r i;jþ1 . R (m) i is the neighbour-shell radius, The minimum value of m, the number of neighbours, is 3. In a FCC bulk, the maximum is expected to be 12. The smallest m,m that satises R ðmÞ i \r i;mþ1 identies the number of neighbours of the atom i, and hence its coordination. The corresponding R ðmÞ i is the shell-radius that identies the neighbourhood of atom i. The second step is to calculate the solid angle as Both algorithms, the one with a xed cut-off from the PDDF and the adjacency matrix, and the VMCN, can be used simultaneously to dene the number of neighbours and hence to perform the following analysis.
The nominal CN denition and the one from van Meel's algorithm generally agree well. The latter tends to show higher values of the coordination than the former; see the lighter blue atoms in Fig. 4. In both cases, it is possible to have atoms that are more than 12-coordinated, even for metals that are FCC in the bulk. This is due to internal distortion and not to a faulty algorithm.
3.2.2 Coordination number based descriptors. The denition of more complex local atomic environment descriptors may be written as a function of the coordination number. Sapphire calculates two descriptors, namely the atop generalised coordination number (aGCN) and the mixing parameter m. . Atoms are coloured with respect to their CN (or VMCN), ranging from 16 (full red) to 3 (dark blue). Au atoms are represented as smaller than Pt, to improve visualization. Minor discrepancies between the two algorithms occur. Using the solid angle condition, we count 561 atoms with a CN less than 10, 170 with CN ¼ 11, and 684 with a CN above or equal to 12; using van Meel's algorithm, their occurrence is 477, 216, and 722, respectively.
The atop generalised coordination number (aGCN) 44 is dened as: with CN max set to 12, as this is the coordination of an FCC atom in the bulk. We point out that for metallic systems, it seems reasonable to consider the total coordination of each atom j regardless of the difference in electronegativity of the chemical species. The reason why we suggest to look at the aGCN is three-fold. First, it has been shown to provide a robust linear relationship with the adsorption energy of small molecules (e.g., O, CO, OH). 45 Second, the aGCN is able to characterise the MNP surface sitesavoiding the basic classication into faces, edges, and verticesand to classify a nanoparticle geometry on the basis of its aGCN-genome. 46 Finally, the use of the aGCN offers a route to estimate the surface area, beyond, e.g., spherical approximations. The MNP's surface can be well approximated by writing it as a sum over atomic contributions, which are a function of the atomic radius r at weighted with their aGCN. The latter is a measure of how much they are "exposed": 47 A full comparison of the surface area calculated for closed shell geometries using cluster spherical approximation, using geometrical consideration, and based on the aGCN has been reported in ref. 47.
In the case of nanoalloys, we can easily separate the local environment of each atom between homo-pairs and hetero-pairs. Counting the A-A, NB AA , B-B, NB BB , and A-B pairs, NB AB enables evaluation of the mixing parameter m. The latter is a useful parameter for a fast characterization of the NA's chemical ordering: where m tends to À1 when the NA is fully alloyed, and to +1 when there is a complete phase separation.

Concertedness and collectivity of a morphology rearrangement
Monitoring time-dependent changes in the adjacency matrix, over successive time-steps, allows characterization of whether structural rearrangements took place, and if these involve concerted and/or collective rearrangements. In this context, we adopt the denitions rst put forward in ref. 35. Let AA ij (Dt) be a matrix counting the absolute number of bonds formed or lost between each ij pair of atoms, within a characteristic time length Dt: From this quantity, the system mobility, R(t, t + Dt), is measured by summing over single atom mobility: To measure the collectivity of a mechanism, H, one then calculates the ratio of atoms which change at least one neighbour within a Dt interval, and the total number of atoms in the MNP: where Q labels a Heaviside step function and N the number of atoms in the MNP. By denition, the magnitude of H can vary between 0 and 1. The level of concertedness, C, is instead dened as the change in the number of atoms involved in the process between (t À Dt, t), and between (t, t + Dt) When all the atoms in the MNP change their local connectivity at time t, but the connectivity remains stable at the successive time point t + Dt, C(t À Dt, t, t + Dt) reaches its maximum value, 1.
Note that all the descriptors of mobility, concertedness, and collectivity discussed in this section display a dependence on the magnitude of D(t). A too long Dt may affect the H and C estimate by coarsening many atomic rearrangements into a single one. A too short Dt may unfaithfully describe a single step rearrangement as a multi-step one. The suggested (and default) value for the choice of this quantity is D(t) ¼ 10 ps, which is consistent with the time-scale of adatom diffusion on low Miller index surfaces. We consider the jumping of adatoms as one of the fastest mechanisms that can take place during MNPs' structural rearrangements.
All the quantities discussed above can be readily modied to account for the presence of multiple chemical species, chem ¼ AA, BB, AB, in the NA: where R chem i is given by Single-step mechanisms involving only a section of the cluster, or multi-step processes are characterized by lower values of C(t À Dt,t,t + Dt), while continuous atomic rearrangements display C $ 0. This is precisely exemplied in Fig. 6, which characterizes the concertedness and collectivity of rearrangements in the AuPd NA undergoing a solid-liquid phase change when heated above 700 K.
The C statistic is consistently near 0 whilst the H statistic monotonically increases as a function of temperature. This is indicative of a highly dynamical object whose constituent atoms are regularly rearranging with respect to one another. At each subsequent 10 ps window beyond 600 K, half of the atoms have a minimum of one change to their nearest-neighbour network. By the same token, we also draw attention to the sharp peak in the C statistic at the beginning of the dynamics. In this instance, this is representative of the collective motion of the Pt atoms within the cluster, which quickly rearrange into a more energetically favourable geometry shortly aer thermal equilibration of the NA.

Common neighbour analysis: signatures and patterns
To classify the geometrical environments of core and surface atoms, we evaluate all of the common neighbour analysis (CNA) signatures attributed to each pair of nearest-neighbour atoms. CNA signatures are of the form (rst) such that r is the number of nearest-neighbours common to both atoms in the pair; s is the number of bonds between shared neighbours and t is the longest chain which can be made from bonding s atoms if they are nearest neighbours. While an individual CNA signature describes the local environment of a pair of neighbours, the estimate of the percentage of how many pairs contribute to selected CNA signatures provides information on the overall nanoparticle's structure, allowing a fast classication into the expected geometrical family, as icosahedral, decahedral, or FCC. 48 The main CNA signatures are (555), (422), and (421) for chemical species with a FCC bulk lattice. Although varying with the NPs' size, a high concentration of (555) stands for Ih, while FCC morphologies are expected to have 0% of that signature. As seen for structural rearrangements, the nearest-neighbour pair can be labelled as homo (A-A and B-B type), and hetero (A-B) when needed. However, to describe the overall geometry of a NA, we weight A and B atoms in the same manner. While proven to be successful in many cases, the CNA is not a property of an atom i. Some commercial soware, i.e. Ovito, 49 offers a basic characterisation of each atom i looking at its relative contribution to a few known CNA signatures. Usually this criterion is able to classify only atoms in the core and when there is a clear character of each atomic environment. 42 Sapphire extends the idea behind the assignment of a local crystal structure to an atom using the common neighbour analysis.
First of all, we list all the possible CNA signatures and we limit them to only a few values important in the bulk. For each atom, we list all the CNA signatures it contributes to, and count how many times the same signature occurs (frequency chart). The local environment of each atom-i is, therefore, the collection of those signatures, and their frequency, into a CNA-pattern (CNAP). On top of the mapping following the CNAP, as highlighted in the snapshots reported in Table 1, we note that we can have insights on the NP-morphology on the basis of the number of different CNAPs. In principle, each atom has its own CNAP, leading to N independent CNA-patterns for a nanoparticle with N atoms. However, atoms in a similar environment will display the same CNAP. Therefore, the number of different CNA-patterns will decrease when the symmetry of the shape increases. Furthermore, with the CNAP being sensitive to the surface, and not only to the core, nanoparticles limited by different facets will have more CNAPs. With this is mind, an octahedron presents ve different CNA patterns while a Marks decahedron has at least 11. Table 1 reports the CNAP distributions for common geometries classifying the most commonly observed CNAPs, and a short description of each CNAP. We do stress that no restriction to any specic signature occurs here. However, as the geometries shown are highly symmetric, overall we provide a description for a limited number of CNAPs. The CNAP is given as a string [(% i , (r i s i t i ))], where % i is the number of times the atom participates in the (rst) signature. The sum of % i is expected to be 12 if the atom i is in the core, and less for surface atoms as this sum is simply the coordination number of the atom with that given CNAP.
Given the evident sensitivity to local geometry, the next question to ask is how sensitive CNA-patterns are to imperfections in the crystalline structure and if they could be used to classify a MNP or NA into a certain family. While we discussed already how we expect the number of independent CNAPs to be dependent on the geometrical symmetry, here we would like to stress that similar edges such as the one between (100) and (111) can be identied by two patterns, namely CNAP2 and CNAP10, depending on whether they are from a Dh structure or from a FCC structure. Generally speaking, an automatic classication of NAs into the distinguished three main morphology families, namely FCC-like, decahedra (Dh), and icosahedra (Ih), is desirable. The CNA-patterns [ 421))] with different l, k frequency, such as l + k ¼ 12. Table 1 Definition of observed CNA patterns (CNAPs) and their descriptions in the common geometries in Fig. 7. We further label when a certain CNAP can refer to atom in the core, surface, or both

Surface identification
As many physical and chemical processes occur at the surface, 50-52 an automated classication of which atoms are at the surface, subsurface, and core for a given NA is welcome. The classication of core and subsurface atoms will be facile following a "peeling" of the outermost layer. Core atoms are designated as those atoms that do not belong to the surface or sub-surface. The reader should note that combining the identication of atoms in the surface, subsurface, and core shells will permit the calculation of other atomistic properties, i.e., those related to the energy when calculated as atomistic contributions. While on perfectly built geometries the identication of atoms at the surface follows their coordination number, i.e., imposing CN # 10 to dene atoms at the surface. For unknown, low-symmetry, or amorphous morphologies, this task is far from trivial. Any classication simply based on the coordination number fails, independently of how the coordination number is calculated, as both low and high coordinated atoms can be present at the surface; see the heatmap in Fig. 5.
A combination of a condition on CN and radial distance will work only for spherical MNPs, but will fail for prolate/oblate systems. Hence, there is a need to nd alternative approaches which require no a priori information on the target system. For these reasons, we provide three algorithms to identify, list, and classify atoms at the surface. (i) The rst is based on imposing a threshold on the solid angle U i , as calculated via eqn (10), combined with the radial distance of atoms i. This method reproduces data for closed shell geometries, and provides a good estimate for the surface sites of melted/low symmetry MNPs and NAs as   Table 1. illustrated in Fig. 4. However, it suffers from the introduction of a threshold value on the solid angle which depends inversely on the size of the MNP.

Faraday Discussions Paper
(ii) Sapphire offers the "Layers algorithm", usually adopted to calculate the accessible area in macromolecules. 53 This algorithm creates three cylinders for each atom along the x, y, z Cartesian components, and records the lowest and highest atoms in each cylinder. In this case, there is only a free parameter tuning the radius of the cylinder. As a rule of thumb, choosing 1.5 times the atomic radius (or times the average atomic radius for binary systems) provides a good denition. A comparison of the two algorithms is shown in Fig. 8 for Au and CuPt cases with 1415 atoms.
(iii) The nal algorithm is a clustering approach based on local atomic environment descriptors. 37 The clustering approach was used to classify atoms in Au NPs, and implemented in the RAFFY Python package, 54 and labels atoms in an unsupervised manner via hierarchical k-means clustering. The atoms from one or more MD snapshots are ngerprinted using local descriptors based on the atomic cluster expansion 55 (ACE) framework. The latter encodes information on local atomic environments by approximating the local atomic density via spherical harmonics and radial basis expansions, and then constructing rotation-invariant representations from the coefficients of such expansions. A cut-off radius, alongside other parameters, must be specied for the ACE descriptors; typical cut-off values are the distances between the second and third shells of neighbours. This approach unbiasedly distinguishes between lowcoordination surfaces, highly coordinated surfaces, and bulk atoms. 37 Moreover, it enables the discernment of locally melted and locally solid atomic environments, therefore providing a robust measure of melting and surface rearrangement temperatures.

Averaging routes
There is support for creating plots for a single analysed trajectory or to consider the average over multiple, independent simulations. One can toggle between these alternatives by setting the single_le ag to be true and iter_dir to be false, or by setting the single_le ag to be false and setting the iter_dir ag to be a list of relative paths from the dened base directory to each of the iterations to be considered. Fig. 8 Comparison between the solid-angle (left) and the Layers (middle and right) algorithms to identify the surface layer. Here we display only half of the external shell of low symmetry Au 1415 . We use the condition on the solid angle as U i # 0.05 combined with r i $ 15Å (left). On the right, the Layers method is applied to low symmetry CuPt 1415 with a chemical composition of 0.3 for Pt (silver-ish coloured and Cu is orange).
It should be noted that Sapphire provides the arithmetic average for each of the requested quantities. Caution is advised to users who take advantage of this feature as it may not always be meaningful to compute or present such averages. Indeed, these meta-analyses are only meaningful if the ensemble being sampled across is consistent over time, and if the dynamics are at equilibrium.
When reporting and presenting these ensemble averages, Sapphire computes uncertainties across samples as the standard error. That is to say that across the ensemble, we assume that all deviations are independently and identically distributed (i.i.d.). Indeed, the former is trivial to guarantee if multiple independent realisations of a given dynamical process have been computed. For the extensive phase space of dynamical systems, a Boltzmann distribution is an adequate candidate for systems close to equilibrium. However, if one is intending to probe non-equilibrium dynamics for which a steady state distribution cannot be uniquely dened at a given point in the dynamics, then the choice of distribution for describing the observables as random variables becomes more nuanced and additional care is necessary to ensure that the phase space has been adequately sampled. Nonetheless, when the dynamics of the system are not far from equilibrium, this scheme provides a fast and intuitive method to visualise variance and spread in the descriptors. We have elected to present a AuPt system to demonstrate the utility of the Sapphire averaging scheme.   9 reports four ensemble averaged quantities for coalesced Au Ih 923 Pt Ih 55 held at 450 K for 500 ns. Classical molecular dynamics simulations are done using the soware package LoDiS. 36 This temperature is too low to activate structural rearrangements within the timescale sampled by the molecular dynamics trajectory.
We dene the local heterogeneous atomic environment (LAE) as being the percentage of species B ¼ Pt with LAE neighbours of species A ¼ Au. LAE ¼ 0 indicates that these atoms create no heterogeneous bonds. 1 # LAE # 6 describes a mixed environment; while LAE > 9 suggests that these atoms (Pt) are almost totally encapsulated by Au atoms. Initially, we note that all descriptors are found to have only small inter-sample variations at approximately 10% of the mean value. All the descriptors considered, namely the radius of gyration (RoG), the number of Pt atoms at the surface (Pt surf ), local atomic environment (LAE) of Pt, and the distance between the sub-region centres of mass D(CoM), versus time, show that a structural transformation occurs at the beginning (below 100 ns), with Pt incorporating inside the Au core and decreasing the number of Pt atoms at the surface. At the same time, we recognise two other steps. At the small plateau between 100 and 200 ns where D(CoM) is almost at a small change in the LAE occurs. Aer 200 ns, we observe an increment in the Pt mobility and a change in the chemical distribution of Pt atoms in the NA.
We do not present these averaging techniques to serve as a replacement for an individual's choice of averaging routine, as these are unique and particular to the dynamics and quantities under consideration. Rather, we present this utility as a means to explore and present variations across i.i.d. samples.

Structure-property relationships
The Sapphire library focuses on the characterisation of MNP and NA morphology. Nonetheless, Sapphire can be used in synergy with available codes geared toward the prediction of chemophysical properties of MNPs and NAs. In the next subsections we briey discuss three examples, where: Sapphire leverages the ASE environment to move back to energy-related properties; Sapphire exploits a link with a microkinetic model 56 to estimate the activity over certain reduction reactions; Sapphire is connected with the pyGDM 57 library to qualitatively estimate the optical properties, and absorption and extinction spectra, of mono-and bimetallic nanoparticles.

From structure to energy
Oen it is desirable to check the energy stability of a certain shape or chemical ordering. Currently, ASE supports effective medium (EMT), the embedded atom method (EAM), and the kimpy calculators. EAM benets from the Interatomic Potentials Repository Project available at https://www.ctcms.nist.gov/potentials/, while OpenKIM is a recent NSF project devoted to providing access to classical interatomic potentials. Unfortunately, second moment approximations of the tight binding (SMATB) calculators are not currently available. They were part of the ASAP3 distribution, 58 which has since been deprecated and discontinued.
In the manual and tutorials we demonstrate how Sapphire may interface with ASE and its library of calculators to compute energetics. Given the broad support for potential calculators in the ASE distribution, one is at liberty to select their own theory of choice for their particular system.

Electrocatalytic properties via a microkinetic model
Let a be the value of a descriptor which encodes an explicit scaling between the structural properties of an adsorption site and the reaction free energy associated to the rate-limiting step of an electrochemical reaction occurring at that site. A simple micro-kinetic model for the current density j nanoparticle (t,T,U) produced by a nanoparticle, for such an electro-chemical process, can be written as: j nanoparticle ðt; T; UÞ ¼ where DG(U,T,a) labels the reaction free energy at the applied potential U and temperature T, which is also a function of the descriptor a, b is the Maxwell-Boltzmann factor, xðt; TÞ ¼ UðaÞ N site ðt; TÞ is the fraction of non-equivalent adsorption sites F nanoparticle to the total number of sites available, N site , and the sum runs over the collection of the non-equivalent sites appearing in the MNP under consideration. The latter are also categorized by their a values. Note that in eqn (21) the effect of the potential is treated as an a posteriori correction, in line with the computational hydrogen electrode model. 59 The aGCN, eqn (11), of a surface site is an accurate descriptor to predict the activity of a metallic adsorption site, as demonstrated for Au, Cu, Pt, and PtNi, catalysing the electrochemical reduction of oxygen or carbon dioxide. 44,45,60 Assuming a volcanoplot relationship, the reaction free energy DG and the aGCN are related by the following general expression: where the coefficients a 1 , b 1 , a 2 , and b 2 , and the value of aGCN volcano-peak should be derived from a small set of DFT calculations, if not available in the literature, 61 for example, the case of the ORR on Pt. 47 Once the Gibbs free energy is dened in terms of the aGCN, a quantity that can be monitored during the lifetime of a MNP, we can establish relationships between the ageing of a MNP, referring to its structural stability, and its activity. Examples have been reported previously by our group for the ORR on Pt, 47 and for the conversion of CO 2 into methane on Cu NPs. 56 Under the assumption of identifying robust structure-activity relationships, we do not see any limitation in using the scheme for NAs and for other chemical reactions.

Optical properties via semi-classical methods
For a fast evaluation and screening of the extinction spectra of MNPs and NAs, one can adopt a classical approach, via Green's Dyadic Method (GDM), 57,62 as implemented in the pyGDM code. 63,64 At its core, this library constructs a refractive environment via the construction of a 3D mesh of dipole oscillators whose physics is predicted within the quasi-static coupled dipole approximation. It then uses an efficient and generalised propagator to predict the extinction spectrum associated to the user-dened mesh. For example, this mesh may be dened from a series of coordinates consisting of multiple constitutive components.
To evaluate the extinction spectrum, we need only to consider the interaction of the dipole moment and total eld in each discretised volume cell: where n env , l 0 , and P j are respectively the dielectric constant of the embedding environment, the incident wavelength, and the dipole moment of cell j, and E 0,j ¼ E 0 (r j ) is the incident eld. 65 In the described method, the near eld may be found by self-consistently solving where G EE tot (r i ,r j ,u) is the Green dyadic to be solved, c is the metal's susceptibility, and V cell is the volume of a unit cell. From this eld, we then calculate the effective dipole moments, P j ¼ V cell c j $E j , from the internal eld distribution.
From a knowledge of the functional form of the incident eld, and a sufficiently converged internal eld distribution, one is able to compute the extinction spectrum for a given nite system held within a dispersive medium, as described by eqn (23). All metallic dielectric parameters used in these calculations are provided by either experimental work, [66][67][68] or ab initio investigations. 69 The interested reader is referred to previous literature for further detail. 57,63,64 Any behaviour that may arise from quantum many-body effects is neglected. Furthermore, when considering structures with size $ 1 nm illuminated in the ultraviolet-visible-near-infrared, the incident eld will only weakly couple to the structure. This results in non-trivial internal eld enhancement. Fig. 10 reports the dipole mesh reconstructed from atomic coordinates and species in the le set of panels, and the near-eld enhancement at 520 nm plane wave illumination on the right side. Intensity, as described by the colour bar, is reported as jE(r)j 2 /jE 0 j 2 , the strength of the eld relative to the amplitude of the illuminating wave. In the bottom panel, we present the extinction spectra computed via eqn (23) for each of the four different systems taken into consideration. We have elected to present these systems to demonstrate the utility in being able to directly map a set of atomic coordinates, as may be parsed from a structural le, to a mesh of coupled dipoles with pre-dened dielectric properties.

Conclusions
The characterization and classication of the morphology and chemical ordering in mono-, bi-, and multi-metallic NAs is a key ingredient towards rationalizing their chemo-physical properties.
The necessity of an open-source robust, reproducible, and FAIR compliant post-processing tool, which caters to the needs of the nanoalloy computational modelling community, is more and more evident. Sapphire, the soware we presented here, is tailored specically to address this need, by providing a library of standardised analysis tools for NA characterization. Sapphire is indeed an open-source, modular, user-friendly platform to characterise a nanoparticle's or NA's architecture observed during, e.g., atom coordinate reconstructions from experiment, molecular dynamics, or Monte Carlo based sampling.
Within Sapphire, several order-parameters and descriptors can be calculated, namely: pair distance and radial distribution functions, inertia tensors and quantities derived from the latter, coordination distributions and other topological descriptors (generalized coordination and common neighbour analysis) derived from adjacency matrices. Sapphire also offers tools for detailed analysis of the surface of MNPs and NAs. In the long term, it is our vision that Sapphire may serve to create a standardised, meaningful characterisation and classication tool for MNPs and NAs. This will be the rst critical stage in creating a comprehensive community database for such complex systems.

Conflicts of interest
There are no conicts to declare.