C.
Roldán-Piñero
*ae,
M. Teresa
González
b,
Pablo M.
Olmos
c,
Linda A.
Zotti
ade and
Edmund
Leary
*c
aDepartamento de Física Teórica de la Materia Condensada, Universidad Autónoma de Madrid, E-28049 Madrid, Spain. E-mail: carlos.roldanp@uam.es
bFundación IMDEA Nanociencia, E-28049 Madrid, Spain. E-mail: edmund.leary@imdea.org
cDepartamento de Teoría de la Señal, Universidad Carlos III de Madrid, Madrid, Spain
dInstituto Nicolás Cabrera (INC), Universidad Autónoma de Madrid, E-28049 Madrid, Spain
eCondensed Matter Physics Center (IFIMAC), Universidad Autónoma de Madrid, E-28049 Madrid, Spain
First published on 15th September 2025
We present a new automated supervised procedure trained to classify both conductance-voltage (G(V)) curves and conductance-distance (G(z)) traces generated in single-molecule junctions to a high degree of confidence. Compared to unsupervised methods, our approach, involving a convolutional neural network (CNN), is vastly superior as it allows core shapes to be recognised by ignoring differences in scale and is relatively insensitive to conductance jumps. A key aspect is the transformation of curves into a spiral image map, which allows us to separate various fundamental G(V) and G(z) shapes from datasets containing tens of thousands of curves. Moreover, by using transfer learning, training requires little input data compared to other approaches. This is extremely advantageous as it reduces training time by many orders of magnitude and means the model can be trained on user-selected shapes, including rare types. This contrasts with arbitrary class-assignment, instead basing classification on a sound physical understanding of the system. Furthermore, as there is no minimum class population requirement, our method can be used to find rare events with a high degree of confidence. As an example, we used our procedure to find, with a minimum 66% confidence level, a class of G(V) curves which are parabolic at low bias but flat at high bias. Such curves make up just 4% of the total, and would be very difficult to detect cleanly with unsupervised methods. This gives insights into the electron transport behaviour at high-bias because we can now easily quantify the types of curves present. Thanks to its universality, this opens up new possibilities in general signal processing and the identification of rare and important events.
More recently, ML techniques have been applied to classify G(z) traces which look at their overall shape rather than isolated segments. These techniques can be divided into two broad categories: supervised and unsupervised machine learning (SML and UML respectively). SML involves training a model using preclassified data, i.e. a control group, and then tries to infer for new data based on similarities with the control group. Typically, this requires a large amount of training data. Alternatively, UML aims not to impose any preconceived notions about the data, and does not require training. Instead, it looks for commonalities throughout a large-enough dataset, grouping the data based on different metrics.
UML approaches, like k-means, have been successfully applied to G(z) traces, identifying physically meaningful groups.13–17 This includes the presence of different conductance pathways through porphyrin molecular junctions,18 and different conformations of amino acid molecular junctions have also been distinguished this way.14 There are, however, significant limitations of this approach as, in particular, it fails to group the smaller population shapes due to the uniform effect.19,20 The problem of choosing the optimal number of cluster groups is especially tricky as, a priori, no physical relevance is assigned to the groups. This raises the possibility of forming groups where the physical significance is unclear and, moreover, rarer events will be ignored.
SML algorithms have started being used to identify G(z) traces, but only in a very limited way. Several groups have implemented models that can reliably identify molecule-free junctions displaying a well-defined shape in which the conductance decreases exponentially with electrode separation.21,22 As far as we are aware, however, no SML algorithms have been directed towards identifying specific shapes of G(z) traces, which is a more complex task. Furthermore, training requires thousands of individual traces, making manual labelling a cumbersome task. It also means that rare events cannot be analysed.
Apart from G(z) traces, the voltage dependence of the conductance of molecular junctions is an equally important characterisation tool.10,23,24 Despite this, G(V) spectroscopy has, unlike G(z) measurements, received no attention in terms of “hands-off” classification approaches. G(V) curves are obtained by fixing the inter-electrode distance and varying the applied voltage (examples of such curves are shown in Fig. 1d). This gives valuable information regarding the transport mechanism and the energetic position of the closest molecular energy level(s) with respect to the Fermi level. The technique is applied both to single-molecule as well as monolayer devices.25,26 The shape of G(V) curves depends on the structure of the compound, but is also influenced by the contacts and the temperature. Moreover, the shape can give clues as to the nature of the charge transport mechanism.27–29
Various shapes can appear, which have been related to redox events (gain or loss of electrons) for molecules with small HOMO–LUMO gaps (i.e. the energetic difference between the highest-occupied and lowest unoccupied molecular orbitals).8,30,31 In the charged state, the shape of the G(V) curves differs significantly compared to the neutral state. For a neutral junction, G(V) curves are typically approximately parabolic. In the charged state, however, different shapes appear. When plotted as log(G(V)), typical shapes include: parabolic (P), exponential (E), parabolic with saturation at high-bias (parabolic + flat, P + F) and also flat/slightly decreasing conductance (DC).30 Examples are shown in Fig. 1e–h. Combinations of these shapes also appear, probably due to asymmetries at the molecule–electrode interface, as well as stochastic switching between different shapes. There are also curves that have no clearly-definable shape.
These factors, along with the intrinsic conductance variability (due to different metal-molecule coupling strengths) make it difficult for UML algorithms to perform a meaningful classification of G(V) curves. Such algorithms classify curves based on their numerical similarity, which can work for G(z) traces, but, in the case of G(V) curves, it would be more meaningful to group curves based on their generic shape, which relates directly to the mechanism of conductance.
In this article, we solve some of the major limitations of typical SML models applied to data analysis by bringing together two separate techniques: spiral image mapping and transfer learning (TL). The former is a technique which converts 1D curves/traces into 2D images, which are more suitable for use with CNNs trained on images. The latter concept, TL, is based around the idea that pattern/image recognition tasks are transferable, and previously deeply-trained networks can be applied to seemingly unrelated recognition tasks without the need for significant retraining. The optimal use of previously deeply-trained CNNs in this case is thus enabled by the 1D to 2D conversion of data via spiralization. This is a major leap for a number of reasons. One, it hugely reduces training times, which opens the door to rare trace identification. Secondly, it means it is possible to focus on identifying the core shapes in 1D signals rather than looking for absolute similarities. This is very important as it is the shape which often carries mechanistic information regarding the transport process, not the absolute conductance values.
As discussed, the use of neural networks, more specifically convolutional neural networks (CNNs)32 and recursive neural networks,33 in G(z) trace classification has already been reported, yielding promising results. In this paper, we focus on the use of CNNs, as they are known to be optimal for the identification of images,34 which better suits the task of classifying the shape (a visual property) of G(V)/G(z) curves/traces. However, they typically require a huge number of traces in order to be trained properly, which precludes manual sample labelling for training purposes. Being able to label samples manually is advantageous in that it allows a human to decide if a particular shape has a physical significance or not. Moreover, we wanted to reduce computational demand in training so that a basic desktop computer may perform the training and subsequent classification as part of the standard data analysis workflow. As such, we sought an approach that could reduce the amount of training data required. All these factors could be resolved by using transfer learning (TL, see Section S2 in the SI)35 on very deep pretrained CNNs, trained for the identification of general objects in images.36 For the choice of the final classification step, we tested logistic regression, support vector machines, stochastic gradient descent, k-neighbours, decision trees, random forest and a single fully connected layer; for all but the last we performed hyperparameter optimization (see Section S6 of the SI). Although all displayed good performance, we found that a single fully-connected layer offered the best overall accuracy. We used the Adam algorithm for optimization37 with a constant learning rate of 0.001. For further detailed information on our approach, please see Section S2 of the SI.
In order to generate images from curves, we took inspiration from38 where they showed that U-nets39 for noise reduction work well with a greyscale spiral representation of the curves. To illustrate this, Fig. 2a shows an artificially-generated G(V) curve (solid brown line) which has been divided into bins whose heights depend on the conductance. To generate an image, we assign a shade of grey to each data bin so that the minimum G value corresponds to a black pixel and the highest to a white one. Starting from 0 V, these pixels are plotted in the image starting from the centre and moving outwards in an anticlockwise fashion. The central pixel thus corresponds to the lowest voltage, and each subsequent pixel corresponds to a higher voltage. Fig. 2b shows an example of spiral image generation for a real G(V) curve. In this case, the first pixel is generated from the most negative voltage, with subsequent pixels for progressively more positive voltages (please see Section S2 of the SI for a pseudocode implementation of the spiral mapping). We will show that this mapping works well for both G(V) and G(z) curve/trace classification. It also helps reduce the impact of high-frequency noise in individual curves. Following this, we fed these images to our pretrained network and fitted the last layer.
We have tested various available pretrained CNNs in the PyTorch API.40 We include a comparison of these in Section S3 of the SI. All performed similarly. For our results, we selected ConvNext Tiny41 for its balance between high accuracy and low feature extraction time. It gives a total of 768 features.
We note that CNN-based models operate largely as “black boxes”, making it difficult to explain their individual decisions in a physically meaningful way. Techniques such as Grad-CAM, saliency maps, and related methods may offer partial insight into which image regions or features influence a given prediction, but applying these approaches rigorously would require a dedicated study in of itself. For now, our priority is to ensure that the model produces accurate and consistent results overall. Future studies may be performed to try and interpret individual classifications at a deeper level.
In this work, we focus on data obtained using the STM breakjunction technique on the previously reported compound P2 (shown in Fig. 1b).42 Other compounds used for training and further analysis are described in Section S1.1 of the SI.
To convey the quality of the model visually, we built confusion matrices (CM), defined so that the element in the ith row and jth column corresponds to the number of times an instance with target variable of type i (the true shape) is classified as type j (the shape determined by the model). The more the matrix approaches a diagonal matrix, the higher the classifier accuracy. Note that, because we manually selected curves for the training data, these are likely fairly ideal examples, and so we would expect slightly lower accuracy when we come to analyse unseen data compared to what we estimate based on this initial testing.
Lastly, we used the whole set to train a final model was subsequently used to predict the shape of a large set of unclassified curves.
A useful property of using a dense layer as the classification model is that the results at the final layer may be normalised and interpreted as a confidence level (CL) between 0 and 1 for the classification. This will give an indication of how sure the model is that the analysed curve belongs to the predicted group. Each curve will be assigned a CL for every shape being assessed. For example, in the case of the four shapes considered here, if the model assigns a 0.85 CL to a curve being P, then the sum of the CLs for the curve being E, P + F or DC would come to 0.15. Note that this is not an absolute measure of confidence, rather it is more a relative estimation within the bounds of the shapes considered. As such, care should be taken when a trace is a mixture of two or more general shapes, as similar CLs will be assigned for each relevant group, bringing the highest confidence down. This does not necessarily mean that the model does not recognise the curve's features, rather that the curve may be classified in more than one group (vide infra). Nonetheless, a high enough CL should mean that the shape truly falls into the predicted category. In order to apply the CL, one must define a threshold. The choice of the threshold is, however, rather arbitrary. Here, we found that a threshold of 0.66 gives good balance between increasing accuracy (from 0.90 to 0.97) and not rejecting too many curves (see Section S7 in the SI).
In Fig. 3b we show the associated CM for the classification results on the training data with a threshold applied. The fractions expressed along the diagonal refer to the number of correctly assigned curves (numerator) and the total number assigned to the particular shape (denominator). After applying this threshold, the accuracy increases to 100% for E curves and 95% for P + F (n.b it was already 100% for the P shape before applying the CL). The number of classified curves decreases for all categories, from 146 before the threshold to 111. This means that 35 curves were assigned CLs less than 0.66. For the DC group, there is still some confusion between it and the P + F shape. The accuracy increased to 67%, which is an improvement, but despite this, we decided not to proceed further with the identification of DC curves as we felt the model would require more data for optimal training. Instead, we focus on the classification of the three shapes with the most training data (P, E and P + F). Based on the training results, we can see that good training of G(V) curves requires a minimum of roughly 150 curves, which is still extremely low, but the DC shape, with just 80 curves, falls just short of this level.
After testing the performance of our model, we proceeded by applying the final model (trained with the whole training set) to a much larger dataset containing 33498 curves of unknown shape. We still trained the model with the four shapes, but we ignored the classification output for the DC shape henceforth. Panels c-e of Fig. 3 show the two-dimensional histograms built from the resulting classified curves. With no CL applied, the model assigns each curve to one of the predefined groups. This means that even curves with relatively low confidence will be assigned to one of the groups. Note that for representational purposes the curves were standard-scaled before plotting in histograms, i.e. each of the curves was shifted and scaled in the y axis (after classification) to have zero mean and unit standard deviation. Despite the fact that we are “forcing” the model to place all curves within the predefined groups, all three histograms represent their respective expected shapes quite well. This shows that the dataset can, in fact, be fairly well represented by these three groups (where the DC type of curve most likely contributes with a tiny percentage). It is clear that the P curves have the best-defined histogram, closely followed by the E class. The P + F group is the only group in which the shape in the 2D histogram is not quite in line with the anticipated shape. This is not too surprising, however, given that this shape was harder for the model to classify during training than P and E shapes.
After this, we applied a 0.66 CL threshold, and the corresponding histograms are shown in panels f–h of Fig. 3. Totally, 21586 curves passed the filter (76%). For the P group, only about 16% of the initially selected curves were eliminated, showing the ease to which the model can recognise this shape. For the P + F group, the number of rejected curves increased to 58%, and the 2D histogram now much more clearly resembles the anticipated shape with a central parabolic part and flatter regions towards higher bias. As we initially force the model to choose to put the curves into one of several specified groups, it makes sense that many are rejected after applying the CL, because we do not expect all curves to have one of these shapes. For the E group, such a visual improvement in the 2D histogram is not as obvious as for the P + F group, despite the elimination of many curves (81%). Looking at the rejected curves (individual examples shown in SI Fig. S10) there are many that have a shape somewhat intermediate between parabolic and exponential. We decided to run an unsupervised clustering algorithm on these rejected curves to look for specific trends. Fig. S11 shows the results of the clustering, where we found five groups (Groups 1–5) with varying degrees of asymmetry and parabolic/exponential character. Such curves can be described as intermediate between exponential and parabolic, either having an overall average between the two extremes, or containing both shapes on different sides of 0 V. As the CL values are relative, then if a curve is intermediate between one shape and another, the CLs for the trace being of either type will be reasonably close. This means the model will assign a CL close to 0.5 for the curve being P or E, which is why, when we choose a CL = 0.66, such curves will be rejected. Furthermore, as the model was trained on symmetric curves, it again makes sense that the model assigns a lower CL to the partially-exponential curves compared to more symmetric ones. This is an interesting results as, despite the model not being trained on asymmetric curves, it can still effectively separate those with partial E character from purely P and P + F shapes. In the future, we envisage training the model with asymmetric curves, which would allow the model to identify these curves more confidently.
What this shows is that applying a CL is extremely useful in the data classification process. For curves with E character, this ability allows us initially to separate curves with any level of E character, which can be then be refined by applying a CL. In this sense, by applying a CL, we obtain only the curves with well defined E character on both sides of 0 V due to the symmetry of the curves used in training. The rejected curves can be said to have, therefore, partial E character.
Previously, we have shown that applying a bias voltage of 1.2 V to a fused porphyrin trimer (fP3) resulted in most of the G(V) curves having non-parabolic shapes. This was straight forward to assess via a visual inspection of the data. Here, by using our CNN approach, we have shown that under the same conditions, a porphyrin dimer, with rings connected by butadiyne groups, displays around 30% non-parabolic G(V) curves (the percentage of P curves is about 70% directly out of the model). Of these nonparabolic curves, we show that about 6% have some level of exponential character, and about 8% have parabolic + flat character. The remaining 16% remain, as yet, unclassified. Applying a CL to the data, we can refine the E group further by focussing on the most-symmetric curves. This allows us to identify 377 symmetric E curves, which represents just 1% of the total. Such a low percentage would be extremely difficult to classify either manually due to the size of the dataset, or by UML algorithms due to the small population size. It would also be difficult without the application of CLs. The result makes physical sense considering that fP3 has a much smaller HOMO–LUMO gap (about 0.8 eV) compared to P2 (1.7 eV) making redox events more likely. For P2, if we assume that the Fermi level sits close to the middle of the HOMO–LUMO gap, we can infer that a bias voltage of roughly 1.7 V would be required to align the closest molecular level fully with the gold chemical potential. In the measurement, the bias voltage was ramped between V = ±1.1 V, which is significantly lower, explaining why the majority of curves are parabolic. On the other hand, the presence of about 30% non-parabolic curves implies several possibilities. One is that the Fermi level may sit closer to one of the frontier molecular-based levels than the other (meaning less than 1.7 V would be required to inject charge). It may also point to significant fluctuations in level alignment (on the order of 0.5–1.0 eV) so that, occasionally, a molecular level is brought into resonance. Further work is required using a varying voltage window to analyse these possibilities further.
To train the model for G(z) identification, we hand-selected a small set of 242 traces belonging to three categories we then extracted 80% from each type: broken (49), continuous (74) and tunnelling (71). We stress this is an extremely low number, which we could do thanks to our use of TL. For comparison, 100000 traces had to be used in a previous study.22 This allowed us to select the training data manually, as opposed to the clustering approach used in the study by van Veen et al.22 We trained the model as previously, using the train-test split approach. Fig. 4a shows the CM for the trained model. It is important to underline that even with such a low number of training traces we are able to obtain almost perfect scoring (i.e. a near-perfectly diagonal CM). Only two broken traces were misclassified as continuous (see Section S8 of the SI for more details on the training of the model and applying CLs). Remarkably, compared to the training for G(V) traces, we were able to train the model well with even fewer G(z) traces.
Next, we ran our model against a large dataset containing 10807 unclassified G(z) traces. Initially, we did not impose any CL threshold so that we could compare the raw performance of the model against a custom plateau-separation algorithm (PSA) which is a simple program that sorts G(z) traces into the same three groups based on a few parameters11 (details given in Section S10 of the SI). For this purpose we plotted the histograms of the CNN method and the PSA in panels b–d and e–g of Fig. 4, respectively (the values in each panel are the number of traces in each histogram).
Generally, there is good agreement between the two procedures. Both agree the majority of traces correspond to tunnelling, with the next largest group being continuous-plateaus and finally broken-plateaus. A visual inspection, however, shows that although the tunnelling and the broken groups are quite similar, the continuous plateau histogram differs in the region below the main conductance cloud (i.e. below log(G/G0) = −5 down to the noise). The histogram from the CNN method contains few data points in this region, which is to be expected for continuous junctions when the molecule remains attached until final break-down. On the other hand, the PSA-generated histogram shows a clear extension of the background tunnelling slope below the molecular cloud. This suggests that some traces containing tunnelling signal below the main plateau level are being considered as plateau traces by the PSA (either tunnelling-only traces or broken-plateau traces with an initial tunnelling part). This is consistent with the lower number of traces classified as tunnelling by the PSA, which is possible because a relatively short length parameter is used to identify plateaus, which is necessary due to the fluctuations along typical G(z) traces. In turn, this leads to traces with slight variations from exponential decay often fitting the criteria for a continuous plateau. We generally find this is unavoidable regardless of the parameters used. To demonstrate this, we passed the continuous classified traces through a k-means clustering algorithm, which should be able to distinguish traces with small deviations from pure exponential behaviour from true molecular plateaus (Section S10 of the SI for further details).
The updated tunnelling/continuous-plateau/broken-plateau histograms after performing this second step are shown in panels h-j in Fig. 4. Now, the visual agreement between all groups is very good. A slight discrepancy remains, however, in the number of traces in the broken/continuous groups. Specifically, the CNN classifies about 30% more traces as continuous plateaus than the PSA, and about 30% fewer as broken plateaus. We do not expect perfect agreement, particularly as there is no concrete dividing line between continuous and broken plateaus. For example, the conductance may drop close to the noise threshold without the molecule actually detaching. The PSA may consider such traces as broken as the minimum conductance threshold must be above the instrument noise level. Our CNN is less sensitive to this criteria. Part of the difference may also be ascribed to the greater sensitivity of the PSA towards very small “breaks” in plateaus, which would correspond to a tiny change in the contrast of a single pixel in the images generated in the CNN method.
Overall, the CNN and PSA perform similarly, showing that both are capable of confidently identifying different G(z) trace types. The CNN, however, outperforms the PSA in correctly identifying the tunnelling traces, whereas the PSA required the use of a second clustering step to clean up the traces misclassified as continuous plateaus. This is a clear advantage of the CNN model. In the future, the CNN model could be further trained to detect other generic shapes (such as plateaus with either a positive or negative slope) and we envisage a library of shapes could be constructed over time which could be used to screen datasets. This is something ideally suited to a neural network because, as we mentioned previously, UML algorithms struggle with relatively small subpopulations. Our current CNN may be limited in terms of the resolution of detail it can identify, but this may be minimised through the use of further training and/or a finer pixel array in the 2D map. Further, we imagine that once a library of shapes has been built up, this may be used as a way of searching for unusual/exotic behaviours via a process of elimination (similar to what could also be done with G(V) curves). We also believe that combining SML models with UML algorithms may unlock new potential for discovering as yet unknown behaviours. A symbiotic approach, combining the best of both approaches, could be used to identify known shapes, and then focus in on much smaller subpopulations that are simultaneously unknown and rare. This is something which neither technique could do independently, and which may unlock new insights from large datasets that would otherwise be beyond reach.
The large difference in shapes we have been able to classify shows that this model is well suited to analysing multi-component datasets. Further still, our model is able to distinguish subtly different shapes, recognisable by eye, but difficult to separate purely numerically. Though focussed on molecular conductance traces, our approach should be transferable to other research areas where the mapping of traces into 2D images is possible, exploiting the rich technology developed for the treatment of images.
Supplementary information: including examples of training data, details of the Deep Learning approach and comparison with other classification models. See DOI: https://doi.org/10.1039/d5dd00207a.
This journal is © The Royal Society of Chemistry 2025 |