Open Access Article
Alexander R.
Quinn
a,
Rebecca
Walker
b,
Naila
Tufaha
b,
John MD
Storey
b,
Corrie T.
Imrie†
b and
Ingo
Dierking
*a
aDepartment of Physics and Astronomy, University of Manchester, Oxford Road, Manchester, M13 9PL, UK. E-mail: ingo.dierking@manchester.ac.uk
bDepartment of Chemistry, School of Natural and Computing Sciences, University of Aberdeen, Meston Walk, Aberdeen, AB24 3UE, UK
First published on 1st December 2025
Two different machine learning architectures – sequential convolutional neural networks (CNN) and parallel inception models were evaluated with respect to their ability to identify nematic liquid crystal variants, including the ferroelectric and the twist-bend nematic phases. Varying levels of model complexity were employed from 1- to 5-layer CNNs, to 1- to 3-block inception models. Various types of augmentations like flip, contrast and brightness were used, together with dropout-layer regularisation. Flip was the only augmentation trialled to yield positive results with an acceptable level of accuracy and error, while the inclusion of dropout regularisation almost exclusively led to lower accuracies. From the systematic investigation it is advised that different variants of the nematic phase can be distinguished to an accuracy better than 0.96–0.98 ± 0.01 by the use of 3-layer CNNs or a model with a single inception block, if flip augmentation is applied. Computational restraints therefore suggest that a sequential CNN is sufficient to characterise phase sequences with four or fewer different phases. Higher accuracies, closer to 100%, can be achieved for extended and class-balanced datasets. In the latter case an inception approach would possibly be beneficial, depending on the size of the dataset, but overfitting needs to be avoided.
In recent years, a fourth method of phase characterisation has been established in the form of machine learning via convolutional neural networks (CNN) and other algorithms.7 Naturally, this started with the distinction between the isotropic and the nematic phase, thus the simple case of dark vs. bright,8–11 which is used for example in the automatic readout of liquid crystal sensors.12 Work was mainly carried out on thermotropic nematics with their characteristic schlieren texture; while the training of algorithms was performed mostly with simulated textures,13 some experimental studies14 have been reported. It was not until recently,15 that the characterisation of liquid crystal phases was expanded to various other phases with algorithm training being performed on experimentally obtained textures. It was demonstrated that nematic, fluid smectic, hexatic smectic, and soft crystal phases can be distinguished and characterised with good accuracies of approximately 95%15; and even continuous second order transitions like SmA–SmC were surprisingly easy to be distinguished.16 Further successful experiments were carried out on transitions involving the soft crystal B phase17 and glasses,18 while other transitions like the fluid SmA to hexatic SmB phase are still somewhat illusive,19 due to the absence of any distinguishing features in the textures of both phases. Chiral phases, like the fluid sub-phases exhibiting paraelectric, ferroelectric, ferri- and antiferroelectric behaviour, could also be well distinguished and characterised.20
Despite all the success in the application of machine learning algorithms to liquid crystals in the last few years, it is also of importance to realise the limitations of this approach. A good quality set of training data is of utmost importance to achieve decent results, which obviously implies the correct labelling of phases. One further criterium was already mentioned above: different phases need to exhibit different varying features, which is not always the case, as for example in the transition between SmA to hexatic SmB, where sometimes no differences in textures are observed when the transition is passed.19 Another point of importance is that the individual textures for a particular phase need to show some variation, otherwise, the algorithm will show pronounced overfitting. A similar effect is observed for datasets that are too small. In our experience it is best to have at least 1000 images per phase, unless the phase is completely different from the others, like the isotropic or the crystalline phase, for which fewer images may be sufficient. For example, in the simple yes–no classification problem of a LC sensor, fewer images are permittable. Further, it is important to rely on a balanced dataset of approximately equal numbers of images for each liquid crystal phase, otherwise, the analysis will be biased towards the phase with the larger number of images.15 At last, the complexity of the machine learning algorithm employed should be matched to that of the problem to be investigated, in the case of an over-complex model, overfitting and a reduced accuracy is observed.21 A detailed investigation of the factors influencing the performance of CNNs can be found in ref. 22.
The nematic is probably the best studied and most well-known of the liquid crystal phases, due to its broad range of applications. It is the least ordered of the liquid crystal phases, and the one with the highest symmetry. Until some time ago it was thought that the thermotropic nematic phase exhibits a structure with only uniaxial orientational order of the long axis of calamitic (rod-like) molecules. The first observation of a biaxial nematic was then suggested23,24; a much-discussed question which does not seem to have been resolved to date.25 A nematic variant which has indeed been confirmed beyond doubt is the twist-bend nematic (NTB) phase,26,27 which has recently been reviewed,28 also with respect to chemistry,29 theory,30 and applications.31 A more recent variant is the long sought after ferroelectric nematic phase (NF),32,33 with a very informative summary provided in,34 and reviews published with respect to chemistry,35 theory,36 as well as properties and applications.37
Both the twist-bend nematic and the ferroelectric nematic phases have been schematically illustrated in Fig. 1(a) and (b), respectively, in comparison to the standard thermotropic nematic phase composed of calamitic molecules. The standard nematic phase exhibits orientational order of the long axis of rod-like molecules along an average direction called the director n, while the centres of mass are isotropically distributed. The director is a pseudo-vector which shows head–tail symmetry, thus n = −n. For reasons of completeness, we should mention that the nematic phase of chiral molecules (chiral nematic, cholesteric phase) exhibits a helical superstructure with a pitch of the order between 100 nm to many µm. In the twist-bend nematic phase the molecules spiral around a preferred direction with a pitch which is extremely small, consisting of approximately 10 molecules.
![]() | ||
| Fig. 1 (a) Schematic illustration of the standard nematic phase with orientational order, its chiral counterpart, the cholesteric phase which exhibits a macroscopic helical superstructure, and the twist-bend nematic phase which locally spirals around a preferred direction. (figure reproduced by permission after38). (b) In the ferroelectric nematic phase, the head–tail symmetry n = −n of the standard nematic phase is broken, and the molecular electric dipoles align approximately parallel, leading to the formation of a spontaneous polarisation whose direction can be reversed between two stable states by reversal of an applied electric field. (figure reproduced by permission after39). | ||
For the ferroelectric nematic phase, the common head–tail symmetry of n = −n is broken and the molecular electric dipole moments do not compensate across small spatial dimensions. The structure therefore exhibits a spontaneous polarisation which can be switched between two polar states by reversal of an applied electric field.
In this study we demonstrate that the different nematic variants, as well as the isotropic and the crystalline phase can be distinguished by machine learning via convolutional neural networks and inception models.
![]() | ||
| Fig. 2 Structural formulae and representative textures of the materials investigated, (a) the twist-bend nematic CB6O.740 and (b) the ferroelectric nematic NT3.5.41 The longer length of the texture images corresponds to 860 µm. | ||
The molecular structure of the ferroelectric nematic phase is provided in Fig. 2(b), together with respective textures. The compound was reported by Tufaha et al. in ref. 41. Its nematic and ferroelectric nematic phases are also monotropic, with the phase sequence given by Cr. 102 NF (63) N (68) Iso. (temperatures in °C). We note that CB6O.7 exhibits a thread-like texture of the standard nematic phase, while NT3.5 shows that of a schlieren texture with topological defects, besides the NTB and NF textures.
The texture images to create a dataset were frame grabbed from a number of different movies taken at different positions of the sample between untreated glass plates with optical polarising microscopy (POM, Leica DMLP). This was equipped with a Linkam LTSE350 hot stage and a TP94 temperature controller for relative temperature accuracies of ±0.1 K. Movies were recorded on cooling, at rates between 0.1–0.5 K min−1 at 10 frames per second (fps) with a IDS uEye digital camera. Care was taken to generate images that were different from each other to prevent the employed machine learning algorithms to learn textures “by heart”. Images of 2048 × 1088 pixel resolution were extracted using the video scene filter in the VLC media player.42 Depending on the rapidity of changes in textures, from each of the recorded videos approximately one frame every 1.5 seconds was grabbed. These images were then cropped to a resolution of 256 × 256 pixels and changed to greyscale with a pixel value between 0 and 1, in order to reduce computational cost and to avoid misidentification of phases due to colour instead of texture. The number of images generated for this study is shown in Fig. 3.
![]() | ||
| Fig. 3 Number of images generated for the different phases of (a) the twist-bend nematic CB6O.7 and (b) the ferroelectric nematic NT3.5, before augmentation. | ||
From Fig. 3 it can be seen that the dataset of the compound exhibiting the NTB phase is not quite ideally balanced, with the respective phase representing approximately 1000 images more than the standard nematic phase. However, according to a study43 where class imbalances were investigated in detail, it was pointed out that imbalances of the order of 2
:
1 are not of significant concern, but that such imbalances only have marketable effects on prediction accuracies when imbalances like 20
:
1 are included. The imbalances of our dataset should thus only have minimal impact on the accuracy, although of course a balanced set of class images would obviously be better. This could be achieved by leaving out images from the over-represented classes, but this would lead to fewer training images, which would have a larger effect on the accuracy than the class imbalance.
The collected images were separated into training, validation, and test data subsets at an approximate ratio of 70
:
15
:
15. For this separation to provide accurate results, it is important that the subsets have no overlap with each other, to prevent data leakage, which would inflate the accuracy. Images of the same phase coming from the same video were therefore not divided between the subsets and further shuffled to ensure randomness within each batch. Overall, and before augmentation, this procedure provided roughly balanced datasets of about 1500 images, which should provide reasonable accuracies, especially since the crystalline and the isotropic phase are very distinct, either showing features of typical cracks and simply a black image, which are easy to identify by the machine learning models. During the investigations, images were further subjected to different augmentations, which will be discussed in more detail below.
![]() | ||
| Fig. 4 General representation of (a) the convolutional neural network (CNN) model and (b) the Inception model, employed. | ||
For the inception models, Google's prebuilt inceptionV350 model was used, as its architecture has been fine-tuned by experts for the sole purpose of image identification. Inception V3 has been trained on ImageNet,51 a large database of approximately 14 Mio. classified images. This iteration of Google's inception model is freely available. The model utilises batch normalisation, factorised 7 × 7 convolutions, average and max pooling layers, and, like the CNN models, SoftMax activation on the output. The pre-loaded version of InceptionV3 in Keras comes with the weights and biases found when training on the ImageNet database; these are turned off when training with the datasets from this study to avoid any unintentional bias in the predictions. The number of inception blocks will also be greatly reduced as the fully intact and trained architecture has approximately 25 Mio. parameters. With only four classes and about 5000 images in each dataset, the complete Inceptionv3 architecture would result in memorising each image, delivering high accuracies but making any predictions meaningless. At the end of each chosen number of inception blocks, a global average pooling layer is used as well as a SoftMax activation layer and dropout layer(s) when required.
Each model was trained for 50 epochs on both the training and validation datasets. The accuracy (Fig. 5(a)) and loss (Fig. 5(b)) of each epoch was monitored and once the initial training was complete, the learning curve was used to evaluate a model's performance. Successful training was characterised by accuracy and loss curves that follow a similar pattern as the exemplary data depicted in Fig. 5(a) and (b), with the accuracy for the training and validation datasets converging at a similar value close to one and the loss curves converging at a low value close to zero. Overfitting is shown by diverging training and validation curves, while underfitting is observed from low training loss at the beginning and dropping to an arbitrary minimum point (Fig. 5(c)).52 The trained model would then be subjected to the test dataset of completely unseen images to evaluate model performance. Its predictions for each image were plotted on a confusion matrix to visualise the model's accuracy (Fig. 5(d)).
The batch size has a direct impact on the accuracy of a model and its efficiency during the training process. The ideal batch size will vary depending on the size of the dataset in relation to the complexity of the model. As the datasets in this study are all of roughly equal size, the model's complexity was varied and the optimum batch size found for the first dataset will be used for all subsequent datasets. This was done by adding convolutional layers or inception blocks until the prediction accuracy starts to decrease or until satisfactory test accuracies between 90–100% were achieved. If similar results were observed in two models of differing complexity, the least complex model was chosen as the sufficiently optimal solution. A similar approach was applied to regularisation and dropout, adding a dropout layer to each successful iteration of a model and evaluating its effect on performance and accuracy.
In previous studies we have shown that flip augmentations are particularly effective in generating larger datasets without loss in phase prediction accuracy. Here, this was achieved both by manually editing images using batch editing software or by using the inbuilt augmentation layers from the Keras library during training. An investigation of the different types of augmentation, their effect on the dataset and the model's performance was conducted using the same approach as to testing the hyperparameters of the models. The most effective augmentation or combination of augmentations was then used on all datasets to improve the accuracy of the models.
Three augmentations were chosen for investigation: (i) brightness, (ii) contrast and (iii) flip augmentation. These were chosen in a way to significantly alter the appearance of the texture images without distorting the features key to identification. It should be mentioned that the Keras library also offers zoom and translation augmentation layers. These were not used as they were found to distort or change the image in undesirable ways. Using the zoom layer resulted in significant pixelation of images, which could result in the model being unable to identify certain key features. The translation layer applies random translations to each image during training, filling empty space with the part of the image that has been displaced. This generates boundaries which could result in some anomalous features and misinterpreted as being characteristic of a texture, thus leading to the inability to identify the actual phase.
Flip augmentations progress by flipping images on either one or both axes. In this study, we used a Keras augmentation layer for the CNN models which randomly selects images during each epoch and flips them depending on the conditions given by the user. Both horizontal and vertical flip were used to maximise the variation between the augmented images and the originals.
When inception models were employed, these are functional models rather than sequential, and as such, inception models cannot use the augmentation layer during training. All augmentations used for functional models have been completed manually using the BeFunky53 batch editing software. Brightness and contrast augmentations were implemented using a Keras augmentation layer within a range of 0.2–0.8, avoiding either extreme of 1 or 0 where images lose all features, becoming either a blank white image or a blank black image. This protected against possible confusion during training, relating to false positives where darkened textures are misidentified as the isotropic phase.
Horizontal and vertical flipping was used for both, the CNN and the Inception models. For CNNs and inbuilt Keras function was used that selected a portion of images in each batch and applied the chosen flip augmentation. For the Inception models manual vertical flipping was applied to all the images in the training dataset, because the random function as used for the CNNs was not compatible with the Inception models within Keras. Further, due to the complexity of the Inception models it is beneficial for those datasets to have an increased size.
For improved computational performance all training and testing was completed in Google Colaboratory, a hosted Jupyter Notebook service that allows access to the GPUs hosted on Google servers. The Nvidia T4 Tensor Core GPU was used for all models as they have been specifically designed for machine learning and deep learning training.54
Using a basic CNN model with a single convolutional layer, one max pooling layer and a final dense layer to give output, testing was performed on some of the model's hyperparameters including batch size and learning rate. In the following, the effect of augmentations was investigated with different levels of complexity, followed by different regularisation techniques. Similarly, the inception model was studied.
The batch size, e.g. the number of data points given to a model at each iteration, can influence the learning of the model during training. The optimal batch size can depend on the size of the dataset, the optimisation algorithm used, or hardware constraints. As all the datasets in this study are of similar size and the same hardware was used throughout, an initial test was carried out to determine the optimal batch size, which was then used for the remainder of the investigations. It was found that a batch size of 32 gave the lowest validation loss and the highest test accuracy, together with the smallest amount of noise observed for the loss curves (Fig. 6). As such, for all further testing a batch size of 32 was to be used.
Augmentations were applied to the NT3.5 dataset to artificially increase the number of images used for model training and to increase the variability between those images. This was implemented using a Keras augmentation layer that augments and randomly selects images in each batch, every epoch. Horizontal and vertical flip, brightness, and contrast augmentations were all individually tested and compared against a non-augmented dataset with increasing model complexity. It was found that increasing the model complexity resulted in a slight increase in the test accuracy of the flip-augmented dataset. Yet on the other hand, the brightness- and contrast-augmented datasets performed much below the models with the non-augmented dataset. The difference in test accuracies for each model is illustrated in Fig. 7. The graph clearly shows that brightness and contrast augmentations are unsuccessful augmentations, associated with large variations between individual runs during the test phase, as illustrated by large errors.
Even at higher levels of complexity, the models with brightness and contrast augmentations consistently displayed low validation and test accuracies with diverging losses. As the texture images of the datasets are greyscale, it is possible that even small changes to brightness and/or contrast largely obscure the features of the textures – the datasets with contrast tending to be identified as isotropic and the datasets with brightness tending to be categorised as crystalline. Horizontal and vertical flips were the only augmentation used in subsequent investigations.
A method to reduce possible overfitting is regularisation, done by reducing the weights put on connections between layers in the network or by removing them entirely; this latter case is known as dropout. To further increase the accuracy of the models used in this study, a dropout of 0.5 was used on models with both an un-augmented dataset and a flip-augmented dataset, essentially removing at random 50% of the connections between layers at random between each epoch.
Fig. 8 depicts the accuracies of each CNN model with flip augmentation, dropout, and both flip augmentation and dropout. The models with flip augmentation are the best performing and display an increase in accuracy with each added level of complexity. Adding a dropout layer to both the augmented and un-augmented datasets resulted in decreased accuracies, which varied significantly with each test, resulting in clearly larger errors. The best-performing model within this group of testing was the five-layer flip-augmented CNN with a test accuracy of 0.96 ± 0.01. One could anticipate that the even more complex models may lead to even higher accuracies, but this is generally not the case, due to overfitting. This can also be seen on the respective learning curves, where in general flip augmentation + dropout showed lower accuracies and higher loss, together with larger noise, when compared to pure flip augmentations. We therefore terminated this investigation at the 5-layer model.
Using the dataset of the ferroelectric nematic material NT3.5 we finally also employed a different machine learning model. For testing with the inceptionV3 model, the same NT3.5 dataset was used as before with manually implemented horizontal and flip augmentation. The complexity of the model is varied by altering the number of inception blocks. Due to the much larger number of variables, the inception model proves successful with already the one-block model outperforming the five-layer CNN leading to a test accuracy of 0.99 ± 0.01. Increasing the complexity still further improves the test accuracy, which is most likely the result of overfitting, as suggested by the accuracy and loss curves. With this in mind, and with high accuracies being achieved by the two and three-block models, dropout layers were only added to the lowest performing one-block model, with the results depicted in Fig. 9 (note the change in scale as compared to previous CNN graphs).
As can be seen in Fig. 10, the test accuracies for the simple non-augmented models increase from about 89% to 96% as the model complexity is increased from one to three layers. Application of flip augmentations increases the average test accuracy slightly by another 1–2% to 97–98% prediction accuracy, while additional dropout not only decreases the overall accuracy considerably, but also increases the errors observed. Overall, the best performing model is the CNN with five layers and flip augmentation with an average of 98% accuracy.
For demonstration we also show the respective learning curves for the model accuracy and loss for the 1-layer, 3-layer and 5-layer CNN in Fig. 11(a)–(c), respectively. For the model accuracy it can clearly be seen that the validation curves approach the training curves as the CNN complexity increases. At about 40 epochs the training accuracy has reached approximately 98%, 99% and 100% for the 1-, 3-, and 5-layer model, while the validation curves reach 93%, 97% and 99%.
![]() | ||
| Fig. 11 Comparison of the learning curves for accuracy and loss for the (a) 1-layer, (b) 3-layer, and (c) 5-layer CNNs with flip augmentation. | ||
Similarly, the training loss curves are approximately 0.5%, 0% and 0% for the 1-, 3-, and 5-layer model, while the validation loss approaches these values to about 5%, 0.1%, and 0.05%. At the same time the noise on the loss curves is strongly reduced between the 1-layer and the 3-layer CNN. The behaviour is further evidenced in the confusion matrices of Fig. 12.
![]() | ||
| Fig. 12 Confusion matrices for the phase classification of CB6O.7 with a twist-bend nematic phase, for a (a) 1-layer, (b) 3-layer, and (c) 5-layer CNN with flip augmentation. | ||
The confusion matrix obtained from the tests of this model shows that the increase in complexity enabled more accurate identification of all phases of CB6O.7. The twist-bend phase showed an increase in accuracy to almost 100%. The greatest improvement is seen for the crystalline phase, for which the test accuracy increased from approximately 88% to 98%.
Using the same CB6O.7 dataset, the InceptionV3 model proved immediately adept at identifying the phase sequence. With only one inception block, test accuracies of 0.9992 ± 0.0008 were achieved (Fig. 13). Manual flip augmentation was used in all cases.
It appears that any number of inception blocks up to at least three would provide an accuracy of very close to 100%, yet a closer look at the learning curves shows that already after few epochs the training and validation curves have reached an accuracy of 100% at zero loss, even for the one-block model. In principle this implies that the inception model is simply too complex for the task. The addition of dropout regularisation, an action that should help to avoid overfitting, even decreases the accuracy slightly. Similar to the compound NT3.5 with the ferroelectric nematic phase, more so for CB6O.7 with the twist-bend nematic phase, the inception models are too complex for the task of phase sequence characterisation to exclude overfitting, which will always lead to accuracies of roughly 100%. It can thus be suggested that the best model employed for tasks as discuss so far are 3-layer CNNs with flip augmentation.
Fig. 14(a) demonstrates that an increase in the model's test accuracies is related to the number of convolutional layers. In all cases, the inclusion of flip augmentation resulted in higher accuracies at each level of complexity. The inclusion of a 0.5 dropout layer to the augmented datasets resulted in lower test accuracies (Fig. 14(b)). The exemplary confusion matrix shown in Fig. 14(c) shows that the CNN model is clearly able to identify both the ferroelectric nematic and nematic twist-bend phases but is slightly less effective at identifying the standard nematic phases. Nevertheless, with both accuracies for the standard nematic phase being clearly above 90%, it is demonstrated that machine learning models are feasible to identify not only different phases but also different textures of the same phase.
Unlike it was observed with the previous datasets, training the 1-block inception model with the “all nematic” dataset did not result in 100% accuracy. The highest test accuracy (0.998 ± 0.000) was achieved by a 3-block model with manual flip augmentation applied. Fig. 15(a) shows that the inclusion of a 0.5 dropout layer resulted in a decrease in test accuracies within the limits of error.
Fig. 15(c) shows the learning curves for the one-block inception model with flip augmentation. Training and validation accuracies start at high levels from the first epoch and reach a value close to 100% by the end of the 50 epochs. Applying this model to the test dataset, accuracies of 0.987 ± 0.003 are achieved. The associated confusion matrix (Fig. 15(b)) confirms this behaviour, with the lower accuracies being found when identifying the standard nematic phases. However, these two phases are not often mistaken for one another despite being the same phase.
When comparing the highest accuracies achieved by the CNN and Inception models for the all nematic dataset (0.970 ± 0.003 vs. 0.987 ± 0.003, respectively), the inception models appear slightly more adept at identifying the LC phases from their texture images. It is clear, however, that the CNN models offer sufficient accuracies to demonstrate the feasibility of such models as well. Given the fact that training an inception model consumes considerably more time and computer resources than training a CNN, the latter is certainly sufficient for characterisation, at least for unconventional nematic phases. The effect of adding dropout to these models rarely had a positive impact on accuracy.
Without loss of generality, flip-augmented three- to four-layer CNNs were found to be of sufficient complexity to fulfil the task of characterising all sequences to better than 95%. The inception model was found to achieve higher accuracies of 99% with as little as one inception block, however, learning curves during training and validation suggested that these high accuracies were most likely the result of overfitting. It is worth noting that for both models the inclusion of dropout regularisation resulted in the worst test accuracies. Thus, the inception model is deemed to be far too complex for the datasets investigated, even with considerable regularisation employed. Inception models are also computationally much more expensive than CNNs. One can thus conclude that for the present investigation the use of Inception models is not necessary or justified.
The datasets used in this study are relatively small in machine learning terms and exhibit minor class imbalances. While these problems do not necessarily negatively impact the efficacy of the findings, greater accuracies could possibly be achieved in cases with larger and more balanced datasets. If such higher accuracies are required rather than the ones achieved, it is clearly necessary to collect larger and more balanced datasets. In the latter case, one may then also need to resort to more complex machine learning models at the expense of computational costs.
Footnote |
| † Deceased. |
| This journal is © The Royal Society of Chemistry 2026 |