Open Access Article
      
        
          
            Kiran 
            Vaddi
          
        
      
*a, 
      
        
          
            Karen 
            Li
          
        
      
b and 
      
        
          
            Lilo D. 
            Pozzo
          
        
      
c
      
aDepartment of Chemical Engineering, eScience Institute, University of Washington, Seattle, WA 98195, USA. E-mail: kiranvad@uw.edu
      
bDepartment of Chemical Engineering, University of Washington, Seattle, WA 98195, USA. E-mail: kli625@uw.edu
      
cDepartment of Materials Science and Engineering, eScience Institute, University of Washington, Seattle, WA 98195, USA. E-mail: dpozzo@uw.edu
    
First published on 29th August 2023
Extracting a phase map that provides a hierarchical summary of high-throughput experiments is a long-standing bottleneck for the modern goal of achieving automation and acceleration in material discovery. A phase map that underpins the inherent properties of materials is typically denoted using a composition-structure map but can be extended to other relevant parameters such as synthesis. This paper describes a computational statistical tool to efficiently obtain a phase map from multi-scale experimental measurement profiles obtained from high-throughput measurements. We motivate the construction of a phase map as the problem of learning the underlying metric geometry defined by a set of templates in infinite-dimensional function spaces. We provide a statistical analysis tool to obtain a phase map as an asymptotic of the diffusion of resulting distance functions on the composition. Using examples from small-angle X-ray scattering experiments of polymer blend systems, we show that learned metric geometry can efficiently differentiate ordered phase regions with shifted, missing, and broad Bragg peaks along with features related to non-Bragg behavior of soft-matter systems. The metric geometry allows us to define a shape distance between scattering profiles invariant to phase-independent transformations thus valuable for obtaining a phase map. We also apply the methodology to benchmark experimental diffraction data to showcase potential utility and broad applicability.
The mathematical basis of this paper is the notion of ‘shape’ popularized by mathematician David Kendall in the 1980s16 who used triangles as an example to showcase that ‘shape’ is what remains after discounting invariant transformations. Kendall also showed that after discounting for the triangle's position, scale and orientation, the remaining structure defines an equivalence class of triangles into isosceles, equilateral, scalene, etc., that result in a spherical manifold for the space of triangles. In this work, we extend the statistical shape theory ideas to experimentally measured one-dimensional profiles by considering them as points in an infinite-dimensional function space. In particular, we use the amplitude-phase distance defined in (ref. 17) to construct a ‘shape’ distance between profiles by only considering the aligned amplitude distance that effectively quotients the distance contribution from shape independent features. The amplitude distance defined can be used as an alternative to the Euclidean distance to learn shape-based representations and thus identify a shape-invariant metric structure of the data. A perennial issue of the existing signal-based statistical methods in the construction of phase maps is the lack of continuity in the composition domain. Several approaches were proposed to overcome this such as imposing continuity constraints18 when learning representative ‘basis’, and using smooth kernels in clustering and segmentation.19 In this work, we consider continuity as the result of the local geometry where the correlation between structures of compositionally varying materials is characterized by a continuous function. We then model the continuity using a stochastic model where variations in the local statistical similarities are represented by the diffusion of the corresponding distance function. A phase map is then obtained by considering the asymptotic properties of the distance function thus attaining a local smoothness or continuity.
The goal of this paper is not to provide an algorithm that outperforms other methods used for the automatic generation of structure phase maps but rather to provide a principled approach to realize phase maps purely from analyzing them as functionals (i.e. functions of functions) and exploring the results from an empirical behavior point of view. We argue that the presented approach performs much better at alleviating problems in defining distance that is aligned with the physical intuition of analysis applied (primarily) to diffraction or scattering and demonstrate this with a few example case studies. We focus mainly on the application to SAXS data as they are much more challenging to phase map with the information pertaining to the structure encoded in higher-order features of the profile such as the curvature. The rest of the paper is arranged as follows: we first describe the overall workflow of the autophasemap algorithm in Section 3 and introduce concepts of metric geometry (in Section 4) and diffusion (in Section 5) as relevant to the computational tools presented in this study. We then apply it to an experimental SAXS data set of self-assembled block-copolymer blend materials synthesized and characterized using SAXS by us and ternary alloys dataset from (ref. 20) characterized using XRD. We analyze the results in Sections 6–8 and provide insights into the generation of a phase map and list our conclusion and contributions in Section 9.
, which is related to the scattering angle θ but is independent of the incident X-ray wavelength λ. All electrons in the sample are potential sources of secondary waves with spatially dependent phases. Thus an isolated nanostructure within the sample will contribute to the intensity that is detected as the square of the amplitude of the scattered secondary waves – referred to as the form factor. The form factor is a function of q as the interference pattern changes with the length scale and the resulting phase of the secondary waves. Real experimental samples consist of ensembles of nanostructures distributed across space. Particles and molecules interact via colloidal and molecular forces that, under concentrated conditions or strong interaction limits, result in the emergence of spatial correlations. The contributions to the scattering from these spatial correlations are generally referred to as the structure factor. The term ‘factor’ comes from the fact that for simple homogeneous systems, the average observed intensity can be expressed as a multiplication of the form factor and structure factor. The interplay between the form factor and structure factor makes the analysis of SAXS profiles complicated as they are difficult to resolve from experimental curves with little to no understanding of the nanoscale features of the sample. For example, in the case of an ordered three-dimensional nanostructure, some of the peaks may be missing because either the structure factor or the form factor has local minima in its q-dependent intensity. This phenomenon is not unique to periodic structures, as interactions and particle aggregation can significantly change the observed intensity profile. Similar to powder X-ray diffraction data (XRD), the finite size of the periodic structures and instrument limitations (e.g. smearing) can result in shifts and the broadening of peaks. In the case of soft-matter systems, such as the micelles studied in this work, the shifts and widening of peaks can occur at much larger ranges of the q values in comparison to inorganic crystals because of the wide range of lattice parameters that are possible. For a detailed explanation of the techniques and fundamentals of SAXS, readers are referred to ref. (21,22). Frequently, practical SAXS analysis relies on solutions to analytical form and structure factors, and general scaling relations (i.e. Guinier or Kratky plots) to compare and analyze SAXS curves, or uses heuristics such as expectations of power-law scalings between intensity and q values for certain nanostructures and shapes. In this work, we describe a mathematical framework that provides a robust pipeline for performing a comparative analysis of the shape of SAXS profiles to automatically generate phase maps, the foundations of which are detailed next. Such phase-maps can then be used as a starting point for the automated application of detailed analyses to samples for which these are applicable, and also avoid the incorrect application of model fits data when they would be inappropriate to use.
    
    
      
      
 space.
      ![]()  | ||
| Fig. 1 A pictorial representation of the autophasemap algorithm with the iteration between the identification and assignment steps depicted as curved arrows. Some of the steps are annotated with plots corresponding to a synthetic data set of Gaussian peak shapes generated using the procedure from (ref. 23) (Section 4). The plots are for the final converged results. The input data (a random sample for clarity) is shown in the lower left corner which depicts three groups based on the number of peaks but arbitrarily shifted over the design space. | ||
 space of one-dimensional functions with the domain mapped to a unit interval [0,1]. The inner product for two functions 
 is given by eqn (1):![]()  | (1) | 
![]()  | (2) | 
![]()  | (3) | 
The norm of a function in the 
 space can be used to normalize the data, for example, to have a unit norm giving rise to interesting manifold structures to the data.
One of the most fundamental notions required to perform statistical analysis on data belonging to a manifold is to compute distances between points. Since we are interested in measuring the ‘shape’ distance between two profiles represented as functions, we need a distance that is invariant to various warping actions. Warping functions are the translation and rotation equivalents of functions to define a shape following the original definition by Kendall.16 Warping actions are defined as a right composition of a function with a warping function that maps the domain to itself. The warping functions belong to a class of mathematical objects called diffeomorphisms which are smooth functions with an inverse. Consider a space of functions 
 with their domain mapped to a unit interval Ω = [0,1] and the set of boundary-preserving diffeomorphism as the set:
For any given function 
, we can formalize action of a warping function γ using the function composition as follows:
A shape space for the functions can now be defined as the space of function 
 that is left behind after quotienting out the set Diff+(Ω). Once again, going back to the original ideas of Kendall and applying the notion of a shape to a collection of triangles, the rotations play the role of diffeomorphisms that quotient out the orientation before comparing a pair of triangular shapes. Using the shape-preserving diffeomorphisms, we can define a ‘shape space’ to be 
 and obtain the following definition for a shape distance:
![]()  | (4) | 
 that is defined using functions and the warping function. One way to define a warping invariant 
 is to exploit certain transformations between two spaces that allow a metric to be pulled back from one of the spaces for which there exists a known metric. One such transformation is the Square Root Slope Framework (SRSF) in eqn (5) introduced in (ref. 25) that results in a warping invariant metric via pullback from 
.![]()  | (5) | 
The invariance of the resulting pullback metric can be observed by considering the case where two functions are warped by the same γ function:
We can now use a change of variables to obtain:
Defining 
 using the SRSF and the pullback metric, we obtain a distance whose infimum over Diff+(Ω) is the distance that is invariant to warping function. This is because fixing f1 and solving for a γ to warp f2 is equivalent to finding the distance after quotienting out any distance contributions from domain warping alone. In practice, we solve for 
 by minimizing E(γ) given in eqn (6) using techniques such as Dynamic Programming25 or Riemannian gradient descent.26
![]()  | (6) | 
The shape distance in eqn (4) is invariant to various domain warpings denoted by γ. For scattering (or diffraction) profiles, the γ function can be used to quotient out the distance contribution from non-phase-specific changes (such as peak shifts and missing peaks) and also instrument-limited features (such as peak widths). We illustrate the computation using a simulated scattering profile of a face-centered cubic (FCC) and body-centered cubic (BCC) phase (using the simulator from (ref. 27)) in Fig. 2. The two simulated SAXS profiles shown in the left-most panel of Fig. 2 are for the BCC phase (top panel, with peak ratios 
) and for an FCC phase (bottom panel with peak ratios 
). Fig. 2 depicts the scenario when we are trying to compute a distance to quantify, how dissimilar the FCC phase curve is from a BCC phase based on the shape. As mentioned above, the first task in computing the distance is to (peak-)align the two functions which are shown in the middle panel of Fig. 2. The amplitude distance – defined as the 
 distance between the (peak-)aligned functions – (roughly) measures the area between the functions. The key component of this computation, the optimal warping function, is shown in the rightmost panel of Fig. 2 as a map from the domain (the q – values) to itself. The action of the warping function can be understood by observing where it deviates from its identity (solid blue line). We observe that there are two regions where the orange curve deviates from the blue curve each corresponding to the alignment of peaks numbered 1 and 3 in the leftmost panel of Fig. 2. Furthermore, the alignment distorted the second peak of the FCC phase because the peak separation between 1 and 2 is not the same as the reference BCC phase. The distortion contributes the most to the amplitude distance, as seen from the shaded region between the curves in the middle panel of Fig. 2. Similarly, we can show that the warping function assigns almost no distortion when the peaks are perfectly aligned but shifted uniformly resulting in a minimal distance (see ESI†).
In Fig. 3, we depict an example of using a distance measure to make phase assignments using scattering curves given a template as a reference. The top row (panels A, B) in Fig. 3 depicts a case where we are using the standard vector-based distances (such as Euclidean) to compute the similarity to a given reference profile (dotted line corresponding to a BCC phase with the lattice parameter being 8 nm). The solid lines in panels A, and B correspond to a BCC, and FCC phase respectively both with the lattice parameter 20% greater than the reference in the dotted line. Visually, we can observe that any distance measure that simply measures the overlap (i.e. the shaded region) would consider that a shifted BCC is more similar to an FCC phase than it is to the BCC phase. We can observe that this is primarily because the distance emphasizes the high-intensity peak disproportionately and fails to account for the mismatched pattern of peaks that encode the periodicity of the structure represented in scattering. The bottom row (panels C, D) of Fig. 3 depicts a similar exercise using the shape distance. Unlike traditional distance measures, an assignment based on the overlaps (as shown using the blue-shaded regions) would assign the pair of BCC phases to be more similar to each other. This example clearly illustrates that using shape distance results in template-based phase assignments that are more aligned with an expert understanding of scattering curves.
In this work, we clearly distinguish between a metric and a distance function. Although both are defined as maps that take two points and produce a scalar output, a metric is only defined infinitesimally between tangent vectors and often changes from point to point. A distance function, however, is defined between any two points in the space and thus can be far less restricted in its structure such as not following the triangle inequality. Because we are interested in building phase maps, we need a distance function that measures the distance to any scattering curve from a template that serves as the representative curve for a particular phase. This distance function identifies each curve with a distance closer to zero with the same phase as the template. Changes to the distance function over the design space (such as the composition) are constrained by continuity such that phase transitions occur gradually within a transition width. For polymer materials that are of interest to this work, the transition width is finite thus we also need to encode the continuity into our definition of distance. One way to ensure this is to obtain the distance as a solution to a diffusion equation defined on the design space. In this work, we use the idea of diffusion maps to obtain one such solution as described next.
![]()  | (7) | 
![]()  | (8) | 
The minimization problem boils down to finding generalized eigenvalues of the form Af = λf, A = Q2−1Q1 which defines an infinitesimal generator of the diffusion defined by e−At. Following the terminology of diffusion maps in (ref. 28), we consider the number of hops between graph nodes as the time steps of diffusion. Thus, the diffusion of information on the set S can now be expressed in a lower-dimensional form using the eigenvalues of the infinitesimal generator A effectively filtering out the higher modes of the function f making it smooth over the domain. In the special case of a weighted graph of the set S, Q1 is the graph Laplacian, and Q2 is the normalization factor giving rise to the normalized graph Laplacian as the generator of the diffusion process on the graph. The resulting generator has a discrete sequence of Eigenvalues upper bound by one. By truncating higher eigenvalues of the generator, we obtain an asymptotic solution to the diffusion problem resulting in a lower-dimensional approximation of the diffusion operator Â. In this work, we use the asymptotic diffusion operator  and apply it to various distance functions to obtain an asymptotic distance defined using the shape distance (eqn (4)) from a set of template functions learned from the data. We can interpret the asymptotic distance as a (continuous) posterior probability of a measured profile being closest to the corresponding template function.28
We evaluate the proposed phase mapping algorithm qualitatively on two different data modalities (SAXS and XRD) to showcase its versatility and generalizability. For the first case study, we synthesized and collected SAXS data of a self-assembling block copolymer that has a previously reported phasemap.29 We then applied the same methodology (with no additional data processing or customization) to generate a phase map from XRD data of ternary metal alloy systems to showcase the versatility and generalizability of the presented approach. Finally, we showcase the utility of the proposed approach in generating and analyzing phase maps of a novel system using SAXS data of self-assembling polymer blends.
Self-assembly of the P123 pluronic system has been previously studied using computational and one-at-a-time experimental approaches29 thus we had access to a set of expected phases to recreate a manual annotation of the phase diagram. We used this knowledge to create a phase diagram shown in Fig. 4.
![]()  | ||
| Fig. 4 Manually annotated phase diagrams based on the SAXS patterns of P123 pluronic with varying temperature. (A) Expert labeled phase diagram: disordered phase – no self-assembly as evidenced by a lack of sharp peaks in their SAXS curve; spherical micelles – broad peaks that oscillate towards higher q values; ordered structures (FCC, HCP, HEX) are adjudged by matching peak spacing ratios obtained from the literature. (B) Observed phase transitions with an increase of temperature: SAXS patterns of pluronic P123 in a 35% weight fraction of water resembled that of correlated micelles at lower temperatures which self-assembled into a mixed phase of cubic and hexagonal lattices. Upon further increase of the temperature beyond 40 °C only the features corresponding to the hexagonal phase were observed that turned into a single broad peak at temperatures beyond 50 °C signifying a disordered phase of hexagonally self-assembled structures. (C) Phase diagram with only four reference sets similar to the one proposed in ref. 29. | ||
Lower concentrations of P123 (≤25 wt%) at low temperatures exist as unimers (simple polymer strands) with no peaks in the SAXS spectra. SAXS of dilute P123 with increasing temperatures shows spherical micelles. At high temperatures, an unknown micellar structure appears with insufficient information at the measured range of q to identify the structure. As P123 concentration increases, diffraction peaks appear, indicating the micelles have self-assembled into crystalline mesophases. The structures formed by P123 micelles were identified by matching the diffraction peaks to a sequence of peak position ratios. The scattering vector of the primary peak, q1, was chosen such that the scattering vector of subsequent peaks matches those calculated with the position ratios. Based on this, we identified that the P123 forms FCC 
, HCP (q1, q11.06, q11.13, q11.46, q11.73, q11.87, q12.03, …), and HEX 
 phases. The diffraction peaks of some of the SAXS profiles could not be matched to a distinct phase and thus were fitted to multiple phases, to account for all possible phases. P123 spectra that show peaks at low temperatures indicate the micelles are assembling but have not fully organized to FCC, HCP, or HEX and thus were characterized as being ‘correlated micelles’ with strong interactions. Concentrated P123 at high temperatures exhibits peaks but no definitive organized structure.
A set of reference phases may not be available for novel systems thus we should treat this as a variable in our algorithm. For example, if we had access to only four reference phases – micellar solutions, self-assembled mesoscopic order of a cubic and hexagonal lattice, and disordered particles of different lattices – we would have ended up with the phase diagram shown in Fig. 4C. In fact, this phase diagram resembles one of the earlier demonstrations of experimental phase mapping of P123 pluronic systems shown in (ref. 29).
One strategy then would be to start with a phase map that ‘broadly’ classifies the samples such as Fig. 4C and then further refine each observed region into specific subclasses to obtain a phase map that looks like Fig. 4A. This is akin to having a hierarchy in the phase map that is controlled by a number of reference sets available based on prior knowledge. In our autophasemap algorithm, this hierarchy is controlled by the number of template functions. As shown in Fig. 5 and 6, we indeed obtain this hierarchy where the phase map shown in panel (E) of Fig. 5 roughly corresponding to the phase diagram with four reference phases (Fig. 4C), while that in panel (H) of Fig. 6 roughly corresponds to Fig. 4A. This can be verified by observing that the shaded region of each learned template corresponds to a particular phase in the phase diagram obtained using the same number of reference sets thus the hierarchy observed in manual annotation was recovered by increasing the number of template functions. In Fig. 6, we show the set of templates (in a solid color) and the assigned experimental SAXS curves (overlayed in grey color) along with their location in the design space. The partition of design space into phase regions is highlighted in the inset plot in each panel with the concentration of P123 on the x-axis (ranging from 0–40 weight percentage) and temperature on the y-axis (ranging from 0–85 degree Celsius). Once again, observing for peak spacing ratios, we obtain that the template functions in (A) to be a mixture of HCP and HEX; (B) HEX; (C, D) to be disorganized correlated micelles; and (G) to be FCC. The above analysis also suggests that the phase map can be used to assign phase labels by performing complex and laborious phase labeling techniques only on a small number of template functions, thus potentially accelerating the learning while performing the high-throughout measurements.
![]()  | ||
| Fig. 6 Phase map learned with 7 template functions shows a hierarchical partition of Fig. 5: (A–G) SAXS curves (in the grey color) assigned to each learned template (in a solid color) with the corresponding region in the composition space identified in the inset; (H) a phase map obtained by considering regions of distance from the template up to a threshold of 0.35 units. The color of the templates in (A–G) matches the corresponding phase region in (H). | ||
Although we represented the phase diagram in Fig. 4 using sharp boundaries representing a phase region, this is purely for visualization. In fact, as shown in the middle panel of Fig. 4, there is a smooth transition between different phases with an increase in temperature. Once again, we observe that the continuous nature of distance functions as shown in Fig. 7 allows us to extract this phase transition behavior along with the labels obtained from phase ‘templates’ shown in Fig. 6. For example, we observe that the 35% weight fraction of P123 (highlighted using a solid white line in Fig. 6) passes through zero distance along panels B, C, D, E, and G each corresponding to different template function. SAXS curves at lower temperatures are the closest to the template of panel (E) corresponding to unimers and they slowly diverge from it with an increase in temperature and become closer in shape to the template of panel (G) (as evidenced by the color gradient of the distance) that correspond to an FCC structure. Upon further increase beyond 25 °C, the SAXS curves slowly converge towards the shape of the template in (B) (i.e. a hexagonal self-assembly of cylindrical micelles) as measured by distance approaching zero. An increase in temperature beyond 50 °C results in a smooth divergence from the shape of the template in panel (B) towards that of panel (D) – a disordered phase – signifying a smooth phase transition. This showcases the advantages of using the autophasemap algorithm for high-throughput experimental systems to extract phase mapping and transition information purely based on SAXS patterns.
![]()  | ||
| Fig. 7 Learned distance functions of the phase map in Fig. 6 with 7 templates. Panels are arranged in the same sequence as that of Fig. 6. | ||
![]()  | ||
| Fig. 8 Fe–Ga–Pd phase map learned with 5 templates to compare with expert labeled phase diagrams from (ref. 33). (A–E) Shows learned template functions in solid color with smoothened XRD spectrum (Savitsky–Golay filtering with a 1.0) radians of window length and a third-order polynomial as implemented in (ref. 34). The inset plot shows the distance distribution from the template to all the XRD curves along with points identified to be closer to the template in clustering. (F) A phase map is obtained by selecting a distance threshold of 0.5 units. All the ternary plots represent the weight fraction of elements on the axis. | ||
The phase diagram in Fig. 9 can be used for further planning targeted synthesis and measurements in an iterative fashion. For example, to achieve long-range order that facilitates electron transport, we can use the phase map in Fig. 9 as a starting point and down select regions of design space that form crystalline phases (i.e. regions (A), (B), and (D)).
In the case of scattering data from small-angle X-ray scattering of pluronic systems, we constructed a phase diagram from high-throughput data for two systems (one with temperature variance and the other with polymer blends). We have shown that the resulting phase diagram is (topologically) continuous with each phase corresponding to a set of scattering profiles similar in shape. For regions of the phase map with potentially ordered crystal phases, the phase map is also invariant to peak shifts and experimentally limited features to phase assignment. Furthermore, the phase map is shown to be a hierarchical partition function of the design space that shows higher-order hierarchical relations with an increase in the number of template functions used in the algorithm. Finally, the broad applicability of the algorithm is shown using a known benchmark data set of X-ray diffraction studies. We have also shown the ability of the current approach in augmenting the traditional techniques to rapidly map out interesting phase regions and down-select a small set of template curves. Using the case studies we have shown the utility of learned templates in performing time-consuming traditional labeling approaches on only select curves rather than the entire dataset.
As a part of future work, the present phase mapping framework can be extended to run in a closed-loop manner for example using active learning. The diffusion of similarity functions can be used to determine an acquisition function that encourages sampling near the boundaries of the phase map (for example, by maximizing the gradient of the diffused similarity function). The learned template functions also serve as the low-throughput summary of the phases formed in a synthesis study thus allowing users to obtain a rapid analysis of the experiments. Furthermore, learned phase maps (either via online or offline mode) can be further used in property optimization (measured by various performance measures of interest) for rapid development and understanding of its relations to the underlying structure.
Footnote | 
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3dd00105a | 
| This journal is © The Royal Society of Chemistry 2023 |