Rebecca
Betts
and
Ingo
Dierking
*
Department of Physics and Astronomy, University of Manchester, Oxford Road, Manchester M139PL, UK. E-mail: ingo.dierking@manchester.ac.uk
First published on 9th May 2024
Machine learning is becoming a valuable tool in the characterisation and property prediction of liquid crystals. It is thus worthwhile to be aware of the possibilities but also the limitations of current machine learning algorithms. In this study we investigated a phase sequence of isotropic – fluid smecticA – hexatic smectic B – soft crystal CrE – crystalline. This is a sequence of transitions between orthogonal phases, which are expected to be difficult to distinguish, because of only minute changes in order. As expected, strong first order transitions such as the liquid to liquid crystal transition and the crystallisation can be distinguished with high accuracy. It is shown that also the hexatic SmB to soft crystal CrE transition is clearly characterised, which represents the transition from short- to long-range order. Limitations of convolutional neural networks can be observed for the fluid to hexatic SmA to SmB transition, where both phases exhibit short-range ordering.
In recent years, machine learning has shown to be quite successful in many areas of science. In physics5 particularly in the fields of particle physics and cosmology,6 but also in astronomy7–9 or photonics.10 Another wide field of applicability of machine learning lies in various aspects of material science.11 In chemistry, machine learning algorithms are employed in the computer-aided planning of synthetic work12 and the discovery of novel drugs.13 In biology the techniques are used in the development of biosensors,14 and particular success of machine learning is found in various medical imaging techniques15–17 and the image interpretation in cancer research.18–20
With the successful implementation of machine learning in solid state physics and material science, it is not surprising that efforts have also been expanded into the fields of liquid crystals (LCs) and soft matter in general.21–23 Naturally, the prediction of liquid crystalline behaviour and phase transitions, particularly that from the isotropic liquid to the nematic liquid crystal phase were of paramount importance in the beginning of the use of machine learning methodologies in liquid crystals.24–27 At this stage investigations were often carried out on thermotropic LCs with computer generated Schlieren textures, but sometimes also with experimental textures. This work is to a large extent connected to the identification of topological defects in experimental28 and simulated nematic textures,29 thus related to object recognition.30 Closely relating to this is an investigation of machine learning detection of bubbles and islands in free-standing smectic films,31 and work on active nematics relating to hydrodynamics.32
Further machine learning studies were connected to theoretical predictions of the molecular ordering of binary mixtures of molecules with different length,33 the self-assembled nanostructures of lyotropic liquid crystals,34 and the local structure of liquid crystalline polymers.35 An aspect which is now gaining momentum is the use of machine learning in the prediction of physical properties. This has for example been demonstrated for the dielectric properties of a nematic LC through a comparison of the experimental and predicted values.36 Another example is the prediction of elastic constants in relation to experimental and simulated curves.37 Also melting temperatures have been shown to be predictable,38 as has structural colour, i.e. selective reflection in formulation space,39 or minimisation of threshold voltages in ZnO doped liquid crystals.40
In terms of applications, where machine learning is used as a methodology for readout, one needs to mention various sensors, which were first introduced by the group of Abbott.41,42 The readout mechanism is based on texture transitions when a liquid crystal responds to molecules changing the orientation from homeotropic to planar or vice versa. The concept was applied to biochemical sensors, detecting endotoxins from different bacterial species,43 or SARS-CoV-2.44 Similarly, gases45 and gas mixtures46 can be detected. A recent review of biochemical sensors on the basis of liquid crystals can be found in ref. 47.
In recent years we have demonstrated that not only binary classification tasks between two individual liquid crystal phases can be predicted with very high accuracy close to 100%,48 but that also more complicated multiphase tasks such as distinguishing between isotropic, orientationally ordered, fluid smectic, hexatic smectic and soft crystal phases can be achieved.48 This includes the characterisation of phase transitions49 and also the distinction between different smectic subphases; ferroelectric, ferrielectric and antiferroelectric phases by their textures.50
Generally, each convolutional layer is followed by a pooling layer which reduces the height and width (but not the number of channels) of the input. Again, a kernel is passed over all the data but, rather than having trainable parameters, it outputs either the average (average pooling) or maximum value (max pooling) of the area of the grid it covers.
Reducing the number of parameters in the model can prevent the network from having the capacity to overfit. However, if reduced too much, this can lead to underfitting (meaning the model is inaccurate when evaluated on both the training and unseen data). Dropout regularization involves randomly removing nodes from the network with a given probability (the dropout rate), to prevent the network from relying on any one input or feature.52 The model will require more epochs to train. This method is generally applied to fully connected layers, but not convolutional layers. Data augmentation is a method of artificially increasing the size of the training set. For images, this can involve rotations, translations, shears, flips, or changes to brightness or contrast. The final method used is batch normalization. This prevents overfitting53 as well as accelerating training.54
The sequential model consists of several convolutional layers, each implementing the convolution operation described above, with a kernel size of 3 × 3 and a max pooling operation with kernel size 2 × 2. These are followed by global average pooling (where the pool size is equal to the input size) followed by several dense (fully connected) layers. Each convolutional layer has batch normalization applied and the number of channels is doubled at each layer. Dropout is applied after each dense layer and the number of nodes is halved at each layer. L2 regularization with λ = 0.001 is applied to all convolutional and dense layers. An example sequential CNN architecture is shown in Fig. 1. In each case, the number of channels in the final convolutional layer is equal to the number of nodes in the first dense layer.
Fig. 1 Example sequential CNN architecture, including layer output dimensions. “CONV” represents a 3 × 3 convolutional layer and “MAX POOL” and “GLOBAL AVG POOL” maximum and global average pooling. |
The inception model was introduced to decrease the computational cost of a CNN, by running several convolutions in parallel. The InceptionV3 network has 2.39 × 107 trainable parameters, making it too large for the dataset used here, hence likely to overfit.55 Therefore, a simplified version is used, using the stem (first section of the network), shown in Fig. 2(a), and a number of inception modules, shown in Fig. 2(b). Each of these inception modules consist of an arrangement of parallel convolution and max pooling layers, using kernel sizes of 1 × 1, 3 × 3 and 5 × 5. These are followed by a global average pooling layer, then several dense layers, with the number of nodes halving at each one (similarly to the sequential model). Again, batch normalization is applied after each convolutional layer and dropout is applied after each dense layer. L2 regularization is applied to all convolutional and dense layers with λ = 0.001.
Fig. 2 Schematics of an example (a) inception stem and (b) module showing the kernel sizes of each convolutional and pooling layer. |
In order to find the optimum hyperparameter values, the number of layers and channels was first set to low values, then increased incrementally until overfitting was seen (diagnosed by a large gap between the test and validation curves). Batch size and learning rate were varied to find the combination giving the highest validation and test accuracy. Each model was trained and tested three times to find the mean and standard deviation of the test set accuracy. The uncertainty due to the finite size of the test dataset is negligible.
The material was observed in self-constructed sandwich cells of thickness d = 10 μm made from glass substrates which were cleaned with acetone but otherwise left untreated, thus without ITO or alignment layers applied. The cells were placed in a hot stage (Linkam LTSE350) with temperature control (Linkam TP94) of relative temperatures to 0.1 K accuracy. The cell was filled by capillary action in the isotropic phase and texture transitions followed via video recording during phase transitions in a polarizing microscope (Leica Optipol) between crossed polarisers at a frame rate of 10 fps at a resolution of 2048 × 1088 pixels (UI-3360CP-C-HQ, uEye Gigabit Ethernet).
The thermotropic achiral liquid crystal phases investigated can be placed broadly into three categories, the fluid smectic SmA phase, the hexatic smectic SmB phase and the soft crystal CrE (SmE) phase (Fig. 3(a)). These are framed at elevated temperatures by the disordered isotropic liquid and at low temperatures by the three dimensionally ordered crystal (Fig. 3(b)).
All of the liquid crystal phases belong to the orthogonal type, thus with the molecular long axis on average being parallel to the smectic layer normal. The fluid SmA as well as the hexatic SmB phase exhibit fan-shaped textures which appear very similar to each other and can hardly be distinguished in polarizing microscopy. The soft crystal SmE phase in contrast exhibits a typical striation across the fans, while in the crystalline phase cracks appear in the structure.
Individual images of the videos were frame grabbed at each phase transition using VLC media player. Each video was taken over a known temperature range, across a transition where textures clearly changed, allowing them to be labelled with their phase based on whether they occurred before or after the transition. Videos of 30 heating and 30 cooling cycles were taken, each over the same temperature range at the same rate of temperature change. Training, validation and testing datasets were created with an approximate ratio of 70:15:15, with the validation data set being used to monitor underfitting and overfitting during training.
Data leakage is a problem whereby the accuracy of the model is overestimated due to overly similar data in the testing and training sets. In order to prevent this, the videos were split between datasets (rather than the images) so that images from the same video were not split between datasets. Each image initially had a size of 2048 × 1088 pixels. These were split into 6 images each, cropped and scaled to a resolution of 256 × 256 pixels, and converted to greyscale, such that each pixel had a value between 0 and 1. The dataset was augmented by flipping each image vertically and horizontally. Fig. 4 shows the number of images of each phase in each dataset.
The brightness of each image was adjusted by a random value between −0.2 and +0.2, with this value being added to each pixel value. The contrast of each image was adjusted by a random contrast factor, γ, between −0.2 and 0.2. For a pixel value x in an image with a mean pixel value μ, this adjusts the pixel value to x → (x − μ) γ − μ. Finally, each image was rotated by a random angle between −0.2π and +0.2π rad. Areas outside the regions filled by the input through reflecting the image across the boundary. Each augmentation was tested on two models, both of which used a learning rate of 10−4, a batch size of 16, and a dropout rate of 0.5. The remaining hyperparameters are specified in Table 1.
Hyperparameter | Model 1 | Model 2 |
---|---|---|
Convolutional layers | 4 | 5 |
Starting channels | 8 | 16 |
Dense layers | 2 | 3 |
The test accuracies of each of these models, with each augmentation applied, are displayed in Fig. 5. The rotation augmentation clearly decreased the accuracy significantly. The training accuracy reached 96.2% and 97.6% for each model respectively, showing severe overfitting. This is likely due to the image being reflected into the unfilled regions. As the augmentations are only applied to the training set, the network may have then been unable to generalise the learned features to the unseen images without any reflections.
Fig. 5 Test accuracies with various augmentations applied to each model specified in Table 1. Error bars represent 95% confidence intervals. |
Both the brightness and contrast augmentations produced no significant change in the test accuracy when compared to a lack of augmentations. Brightness and contrast augmentations also resulted in higher uncertainties as well as requiring more epochs to train (25 rather than 20). Therefore, no augmentations were used (except for the previously applied flips) in any subsequent models. Although these augmentations were only tested on two model architectures, the results are likely generalisable to other models as well because the augmentations are applied before each batch is trained. It can also be assumed that these results would be applicable to the other phase transitions used in this study due to the similarity in the structures of each texture (see Fig. 3).
I to SmA | Sequential model | Inception model |
---|---|---|
Convolutional layers | 1 | NA |
Inception modules | NA | 1 |
Starting channels | 16 | 4 |
Dense layers | 2 | 2 |
Batch size | 16 | 16 |
Learning rate | 1 × 10−4 | 5 × 10−5 |
Dropout rate | 0.5 | 0.5 |
Trainable parameters | 650 | 2402 |
Test accuracy | 1 ± 0 | 1 ± 0 |
Both models achieved (100 ± 0)% accuracy. As presented in Fig. 4, there are approximately twice as many images in the SmA dataset than in the Iso dataset which could have introduced bias in the network, with Iso images being incorrectly classified as SmA. However, this was clearly not the case, due to the uniformity of the texture of the isotropic phase. This implies that there were no features of the Iso phase to be learned by the network, so convolutional layers may have been unnecessary. Although both models resulted in the same accuracy and uncertainty, the sequential model required only 650 parameters (in comparison to 2402 for the inception model), suggesting it is the most suitable model for a classification task of this simple type, requiring the least time and computing power to train. For applications such as the above introduced (bio)sensors, a sequential CNN will thus be absolutely sufficient to obtain close to 100% accuracy in the readout.
SmA to SmB | Sequential model | Inception model |
---|---|---|
Convolutional layers | 4 | NA |
Inception modules | NA | 1 |
Starting channels | 8 | 16 |
Dense layers | 2 | 2 |
Batch size | 16 | 16 |
Learning rate | 1 × 10−4 | 1 × 10−4 |
Dropout rate | 0.5 | 0.5 |
Trainable parameters | 30930 | 31298 |
Test accuracy | 0.56 ± 0.07 | 0.6 ± 0.1 |
Neither model achieved an accuracy significantly above 50%, hence was no more accurate than randomly assigning each test image arbitrarily to a phase. Both models appear to show a bias towards the SmB phase, with (59 ± 9)% and (48 ± 17)% of SmA images being classified as SmB by the sequential and inception models respectively (Fig. 7). The SmA dataset contains approximately 400 more image samples than the SmB dataset, so this imbalance is not the cause of the bias. The validation accuracy reached 71% and 68% in each of the sequential and inception models, suggesting there are some meaningful differences between the validation and test datasets which is possibly responsible for the bias. However, the validation accuracy showed significant fluctuations during training, so taking the epoch with the highest validation accuracy would likely not generalise to high accuracy on new, unseen data. Neither model appears suitable for this classification task, however, it is possible that a larger dataset combined with a larger network capacity would be capable of identifying the features of this subtle transition. Alternatively, a different mesogen may produce more visible texture changes during this transition, displaying differing features that these networks are capable of learning.
At this point it is worthwhile to mention the recent results of Osiecka-Drewniak et al.,57 who studied a very similar transition between two orthogonal phases, fluid SmA to soft crystal SmB (also called CrB). In this case higher accuracies of 80–90% were reported, which can be attributed to the clear differences in textures between the smooth SmA fans and the striated CrB fans. This striation can often be observed for soft crystal phases, as will be demonstrated below for the hexatic SmB to the soft crystal SmE (CrE) phase, both of which being orthogonal. Yet, both phases are clearly distinguishable due to the striations which allow for high identification accuracies, in our case close to 100%.
SmB to SmE | Sequential model | Inception model |
---|---|---|
Convolutional layers | 4 | NA |
Inception modules | NA | 1 |
Starting channels | 16 | 16 |
Dense layers | 3 | 2 |
Batch size | 32 | 16 |
Learning rate | 1 × 10−4 | 5 × 10−5 |
Dropout rate | 0.6 | 0.6 |
Trainable parameters | 108530 | 31298 |
Test accuracy | 0.99 ± 0.01 | 0.99 ± 0.01 |
Both models achieved high accuracies of (99 ± 1)% so either could be appropriately employed for identification of the transitions into the soft crystal phases. However, the sequential model required around three times as many parameters in order to achieve this accuracy, so the inception model is faster to train (although it required 40 epochs rather than the 30 required for the sequential model). The slight improvement in accuracy for the SmA phase by the sequential model (shown in the confusion matrices of Fig. 8) is statistically insignificant so the inception model is quicker to train for this particular classification task than the sequential model, while both are very suitable to distinguish the hexatic from the soft crystal phase.
SmE to Cr | Sequential model | Inception model |
---|---|---|
Convolutional layers | 3 | NA |
Inception modules | NA | 1 |
Starting channels | 16 | 16 |
Dense layers | 3 | 2 |
Batch size | 16 | 16 |
Learning rate | 1 × 10−4 | 5 × 10−5 |
Dropout rate | 0.5 | 0.5 |
Trainable parameters | 30322 | 31298 |
Test accuracy | 0.989 ± 0.001 | 0.99 ± 0.01 |
As anticipated, both models achieved similar accuracies, with the inception model resulting in a significantly higher standard deviation. The sequential model achieved 100% test accuracy on all three training instances for the crystalline phase, and the inception model achieved the same on the SmE phase (Fig. 9). Despite the SmE dataset containing approximately double the number of images compared to the Crystal dataset, there is no evidence of bias in either network. The sequential model required slightly fewer parameters as well as ten fewer epochs to train, yet both architectures are well suited to predict the phases involved.
Multiphase | Sequential model | Inception model |
---|---|---|
Convolutional layers | 6 | NA |
Inception modules | NA | 1 |
Starting channels | 16 | 16 |
Dense layers | 4 | 2 |
Batch size | 16 | 16 |
Learning rate | 1 × 10−4 | 5 × 10−5 |
Dropout rate | 0.5 | 0.5 |
Trainable parameters | 1648206 | 31298 |
Test accuracy | 0.99 ± 0.01 | 0.984 ± 0.006 |
There is no statistical difference between the test accuracies of each model. As expected, both consistently identified the isotropic phase correctly. All other phases were generally correctly identified with the exceptions being the sequential model mislabeling (5 ± 3)% of the crystal images as smectic and the inception model mislabeling (4 ± 2)% of soft crystal E images as orthogonal smectic liquid crystal (SmAB). These three phases all show significant similarity, with some shared features so some confusion is to be expected. Overall, both models achieved high accuracy, however, the sequential model required fifty times the number of parameters compared to the inception model. Therefore, the inception model appears to be better suited to the more complicated, high capacity classification tasks of multiphase classifiers, also requiring only 50 rather than 100 epochs to train. The confusion matrices for both models are depicted in Fig. 10.
Other transitions, like nematic to SmA, or the fluid orthogonal SmA to fluid tilted SmC phase can also be verified with high accuracy.48–51 In general, orthogonal to tilted transitions are identified with high accuracy, as are transitions from liquid crystal to soft crystal, independent if orthogonal phases are involved (SmA–CrB,57 SmB–CrE (this work)) or not. On the contrary, transitions between orthogonal liquid crystals (fluid SmA–hexatic SmB (this work)) or tilted liquid crystals (fluid SmC–hexatic SmI48) represent some limitations for conventional machine learning architectures such as sequential CNNs or inception models. This is in part shown in the present study, as can be seen in Fig. 11.
Fig. 11 Test accuracies for all of the presented binary and multiphase classification scenarios. Errors represent 95% confident intervals. |
A similar result as depicted in Fig. 11 was obtained for the machine learning test accuracies of a homologous series of materials with predominantly tilted mesophases.48 In summary, transitions between orthogonal and tilted phases, as well as those between liquid crystal and soft crystal phases can very well be characterized by machine learning. Limitations are found for the characterization of transitions between orthogonal fluid to hexatic phases (SmA–SmB) and for tilted fluid to hexatic phases (SmC–SmI).
In general the inception models required fewer trainable parameters to achieve the same accuracy as the sequential models disregarding the isotropic to SmA transition. The reason being likely the possibility that both models could have used less complex and lower capacity architectures, with no convolutional layers. This general behaviour is expected as the 1 × 1 convolutions utilized in the inception module are used to reduce the number of channels, hence reducing the number of feature maps at each layer, reducing the number of trainable parameters required.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4sm00295d |
This journal is © The Royal Society of Chemistry 2024 |