Vibrational spectroscopic profiling of biomolecular interactions between oak powdery mildew and oak leaves

Oak powdery mildew, caused by the biotrophic fungus Erysiphe alphitoides, is a prevalent disease affecting oak trees, such as English oak (Quercus robur). While mature oak populations are generally less susceptible to this disease, it can endanger young oak seedlings and new leaves on mature trees. Although disruptions of photosynthate and carbohydrate translocation have been observed, accurately detecting and understanding the specific biomolecular interactions between the fungus and the leaves of oak trees is currently lacking. Herein, via hybrid Raman spectroscopy combined with an advanced artificial neural network algorithm, the underpinning biomolecular interactions between biological soft matter, i.e., Quercus robur leaves and Erysiphe alphitoides, are investigated and profiled, generating a spectral library and shedding light on the changes induced by fungal infection and the tree's defence response. The adaxial surfaces of oak leaves are categorised based on either the presence or absence of Erysiphe alphitoides mildew and further distinguishing between covered or not covered infected leaf tissues, yielding three disease classes including healthy controls, non-mildew covered and mildew-covered. By analysing spectral changes between each disease category per tissue type, we identified important biomolecular interactions including disruption of chlorophyll in the non-vein and venule tissues, pathogen-induced degradation of cellulose and pectin and tree-initiated lignification of cell walls in response, amongst others, in lateral vein and mid-vein tissues. Via our developed computational algorithm, the underlying biomolecular differences between classes were identified and allowed accurate and rapid classification of disease with high accuracy of 69.6% for non-vein, 73.5% for venule, 82.1% for lateral vein and 85.6% for mid-vein tissues. Interfacial wetting differences between non-mildew covered and mildew-covered tissue were further analysed on the surfaces of non-vein and venule tissue. The overall results demonstrated the ability of Raman spectroscopy, combined with advanced AI, to act as a powerful and specific tool to probe foliar interactions between forest pathogens and host trees with the simultaneous potential to probe and catalogue molecular interactions between biological soft matter, paving the way for exploring similar relations in broader forest tree-pathogen systems.

Unlike conventional SOM algorithms, SKiNET provides supervised learning combined with 10-fold cross validation, which helps to provide consistently accurate classification of Raman spectra into different disease classes whilst simultaneously providing SOMDIs which help the user identify peaks which are class-defining and thus are the most important for classification and ultimately, separation between disease classes or lack thereof.

Details of SKiNET
SOMs are inspired by the visual cortex of the brain and are designed to learn from data autonomously.ANNs operate by iteratively adjusting weights and biases of neurons to achieve a desired objective through epochs, this process is known as training the ANN algorithm.In the case of SOMs, this training process is designed to allow inputs with similar characteristics to activate neighbouring neurons.
In this study, the inputs for the SKiNET algorithm are normalised Raman spectra.Neurons in the SKiNET algorithm possess weight vectors with a dimensionality equal to the number of variables present in the Raman spectrum.Throughout the training process, these weight vectors are varied by examining the differences between neuron weights and class weight vectors, until they closely match the input training spectra such that each neuron will only activate on a spectrum from a single class.This is shown visually as a cluster which can be identified as from a given sample classification.
This training process enables the detection of which class-defining characteristics activate specific neurons by inspection of the weights across all neurons and isolating those which belong to a specific class.Upon inputting unseen testing data into the algorithm, if class-defining characteristics are present, these will activate those neurons which match to those characteristics thus, providing a sorting of that data to a class / group / state / disease classification.
For the SKiNET algorithm, the training parameters are the grid size, learning rate and number of epochs the network is trained over.The outcome of this SOM process is a visual representation of the separability of the testing spectra into classes, ultimately showing biomolecular similarities or dissimilarities between the spectra.
Electronic Supplementary Material (ESI) for Soft Matter.This journal is © The Royal Society of Chemistry 2024 These class-defining characteristics are shown as peaks on a SOMDI plot against wavenumber (in cm -1 ) and relate to those peaks in the testing or training spectra, which contribute most to the activation of class label-specific neurons and ultimately, the SOM clustering.

Specifically, in the 'Vibrational Spectroscopic Profiling of Biomolecular Interactions Between Oak
Powdery Mildew and Oak Leaves' study, the process of applying SKiNET to achieve hybrid Raman spectroscopy was used to investigate potential biomolecular similarity or dissimilarity between healthy, non-mildew covered and mildew-covered tissues of Erysiphe alphitoides infected Quercus robur leaves, whilst simultaneously providing which characteristic peaks were important / dominant in achieving this separation, if present.
SKiNET enabled this by providing SOMs comprised of a hexagonal grid, where each hexagon represents a given neuron in the model.Each neuron was coloured according to the disease classification they activate, given by the class-defining characteristics, provided by the inputted training spectra.In this study, the training spectra were composed of 80% of the total 70-75 spectra from tissues which were healthy, non-mildew covered or mildew-covered assigned using a visual and microscopic survey of the surface of the leaf for the presence of mildew or the lack thereof.White neurons in the hexagonal grid express neurons that did not have a majority class or did not activate with any of the class-defining characteristics.Mixed colour neurons in the hexagonal grid express neurons which had multiple classdefining characteristics so could not be given a single colour for a single disease classification.This results in three options of hexagons present in SOM, i.e., a coloured hexagon belonging to a single class classification, a mixed colour hexagon belonging to one or more class classification or a white hexagon belonging to no class classification.Thus, by examining contiguous or connected blocks of hexagons of single colours, along with the number of white hexagons and the number and condition of mixed colour hexagons, if they exist, it is possible to rapidly ascertain the degree of biomolecular similarity or dissimilarity between spectra of different disease / class classifications.
Finally, the inherent SOMDI component of the SKiNET plotted as index versus the wavenumber (cm -1 ) allows the assessment of peaks of importance for the separability of spectra from different classifications, with a higher SOMDI value attributed to a higher importance which has led to the class separation and classification in SOM.

Figure S1 .
Figure S1.Illustration of a workflow for data analysis pipeline via SKiNET.Raman spectra measured from non-vein, venule, lateral vein or mid-vein tissue of a Quercus robur leaf (a).These spectra are grouped according to disease

Figure S3 .
Figure S3.Box and whisker plot of the range of intensities from spectroscopic measurements of non-vein tissue from healthy, non-mildew covered and mildew-covered Quercus robur leaves.Brackets indicate significantly different pairs of ranges compared via a Mann-Whitney U test (p***<0.005).

Figure S4 .Figure S5 .
Figure S4.Chlorophyll concentrations (µg/mL) from non-vein and venule containing tissues of Quercus robur leaves following extraction with 80% DMSO.Significantly different pairs of concentrations compared via a Mann-Whitney U test (p*<0.05)are highlighted.

Table S5 .
1: Summary of reproducibility coefficients (RCs) determined from Fig.S5calculated via: RC = standard deviation x 2.77 x 100, where 2.77 is chosen for a 95% level of confidence.Confidence interval limits (CILs) were calculated as:where is the mean ratio and σ is the standard deviation.   = ̅  ± 1 .