Beyond ManifoldEM: geometric relationships between manifold embeddings of a continuum of 3D molecular structures and their 2D projections

ManifoldEM is an established method of geometric machine learning developed to extract information on conformational motions of molecules from their projections obtained by cryogenic electron microscopy (cryo-EM). In a previous work, in-depth analysis of the properties of manifolds obtained for simulated ground-truth data from molecules exhibiting domain motions has led to improvements of this method, as demonstrated in selected applications of single-particle cryo-EM. In the present work this analysis has been extended to investigate the properties of manifolds constructed by embedding data from synthetic models represented by atomic coordinates in motion, or three-dimensional density maps from biophysical experiments other than single-particle cryo-EM, with extensions to cryo-electron tomography and single-particle imaging with a X-ray free-electron laser. Our theoretical analysis revealed interesting relationships between all these manifolds, which can be exploited in future work.

where = det ( !" ) and !" are the components of the metric tensor. 5,6 Specifically, the eigenfunctions of the LBO, ∇ # = , form a complete basis in the function space # (Ω) of measurable and square-integrable functions on the manifold Ω. 7 For a bounded manifold, the eigenfunctions must further satisfy boundary conditions; for example, DM requires the Neumann boundary conditions, 1 such that the normal derivatives on the boundaries vanish. Therefore, the eigenfunctions depend also on the boundary of Ω.
It is well understood that the eigenfunctions of the LBO on Ω carry valuable information about the underlying intrinsic geometry and are thus important for understanding many systems. For compact manifolds with a boundary, as an example, the eigenfunctions are the modes of vibration of a 1D string or a 2D membrane. For compact manifolds without a boundary (i.e., closed manifolds), the well-known spherical harmonics are eigenfunctions of the spherical surface. In the field of structural biology, the eigenfunctions of the LBO on SO(3), which are the Wigner-D functions, have been used for retrieving the unknown orientations of single-particle X-ray and cryo-EM snapshots. 8,9 In general, the eigenfunctions of the LBO on different manifolds are fundamental to mathematics and sciences, and describe a wide diversity of seemingly disparate phenomena-reflecting the so-called "underlying unity of nature"-from quantum mechanics to gravitational fields. 10 Principal Component Analysis. For the PCA approach, 11 instead of defining the Gaussian kernel as previously used in DM for the Markov transition matrix, a matrix of dimension × is formed, where is the number of components (e.g., number of pixels when dealing with images) describing each element of the dataset. Additionally, is normalized by removing the mean of all images from each image. Finally, an eigendecomposition of the × matrix ( yields a set of orthogonal eigenvectors, the principal components (PC), together with corresponding eigenvalues.
To note an important commonality between PCA and DM, the matrix ( is symmetric and positive semi-definite (i.e., all eigenvalues are larger than zero), 12 which is also the case for the Markov transition matrix used in the DM method. A detailed comparison of results for PCA and DM for pristine PD datasets is provided in our companion article, where we further study the sensitivity of PCA and DM to experimental perturbations such as SNR and CTF. 13 Electronic Supplementary Material (ESI) for Digital Discovery. This journal is © The Royal Society of Chemistry 2023

B. Additional properties of PD manifolds
Two orthographic views of 3D models in the directions of two PDs are shown in Figure 13-A and 13-B, each composed of 20 overlaid 3D volumes from CM # . The 2D distances (in units of pixels) were measured between the peripheral ends of each consecutive states' rotated subunit (as seen in red and blue encircled regions). In Figure 13-C, the mean 2D distance measurements on each consecutive region (i.e., the red and blue region, respectively) are plotted with error bars representing standard deviation, along with linear regression. Although the interval between successive 3D states is constant, when projections are taken, apparent distances can strongly vary based on the viewing direction. We note that the behavior of the Euclidean distance matrix calculated in the DM method is less intuitive than the distance matrix in the current demonstration, and instead records the changes on a pixel-by-pixel basis for the entire image.

Fig. 13
Example for emergence of PD disparity due to foreshortened distances when taking 2D projections of 3D EDMs. To note, pixels were only tracked for demonstration purposes in the current figure, which is not a prerequisite for our unsupervised machine learning approach. Seitz, Frank & Schwander | Beyond ManifoldEM In Figure 14, a presentation similar to Figure 11 is shown for the remaining four PDs. Here, subspaces requiring eigenvector rotations (e.g., both parabolas in b) and presenting subtle boundary problems (e.g., the inwards curling of the point-cloud trajectory in {Ψ ) × Ψ * } of d) can also be seen in certain 2D subspaces. Note that for the PD in a, due to PD disparity, the hierarchy of CM information is actually reversed from those seen in the other four PDs. Here, the CM # Chebyshev polynomials are instead present along {Ψ & × Ψ ! } combinations (in the first row), while CM & Chebyshev polynomials are present along {Ψ # × Ψ " } combinations (in the second row).

Fig. 14
Collection of 2D subspaces from leading eigenfunctions for the remaining four PDs, as was similarly presented for PD1 in Figure 11. Seitz, Frank & Schwander | Beyond ManifoldEM We further investigated the ) = 10 × 10 × 10 = 1000 states making up SS ) (chosen for ease of computation to contain only half as many states along each degree of freedom as compared to SS & and SS # ). For each conformational motion present in a given PD data set (this time for CM & , CM # and CM ) ), a set of unique Lissajous curves were again found spanning specific 2D subspaces of the embedded manifold, with the Chebyshev subset describing the corresponding CM along a trajectory in the 2D subspace explicitly. As an example, Figure 15 shows the set of 2D subspaces where these modes exist for PD + . To note, due to the increased complexity of SS ) , these patterns were much more interspersed throughout the embedding, but still followed a similarly consistent ordering. In addition, due to the relatively small range of motion exhibited by the third conformational domain (as seen from these PDs and as designed in the ground-truth structures), all CM ) modes were found in higher-order eigenvectors; e.g., Ψ + and higher for these five PDs. As similar patterns were identified in SS ) as in previous accounts, for the remainder of our study, focus is honed onto mapping data sets generated for SS # .

C. Example of non-trivial boundary conditions due to steric hindrance
The initial 20 × 20 rectangular state space is displayed in Figure 16-A, where red boxes indicate states that were removed to form a grid with octagonal boundaries. The schematic in Figure 16-B provides some context for the possibility of a non-rectangular state space, which can be envisioned as a top-down view of ( ) a large domain that opens and closes, and ( ) a small domain that translates left and right. Naturally, due to steric hindrance, while the larger domain is in a closed or half-closed state, the smaller domain is impeded from accessing a subset of its possible states, and vice versa. The eigenbasis obtained after application of a set of high-dimensional rotations 13 (of dimension = 15) is shown in Figure 16