András
Magyarkuti
ab,
Nóra
Balogh
a,
Zoltán
Balogh
ab,
Latha
Venkataraman
cd and
András
Halbritter
*ab
aDepartment of Physics, Budapest University of Technology and Economics, 1111 Budapest, Budafoki ut 8, Hungary. E-mail: halbritt@mail.bme.hu
bMTA-BME Condensed Matter Research Group, 1111 Budapest, Budafoki ut 8, Hungary
cDepartment of Applied Physics, Columbia University, New York, New York 10027, USA
dDepartment of Chemistry, Columbia University, New York, New York 10027, USA
First published on 25th March 2020
Single-molecule break junction measurements deliver a huge number of conductance vs. electrode separation traces. During such measurements, the target molecules may bind to the electrodes in different geometries, and the evolution and rupture of the single-molecule junction may also follow distinct trajectories. The unraveling of the various typical trace classes is a prerequisite to the proper physical interpretation of the data. Here we exploit the efficient feature recognition properties of neural networks to automatically find the relevant trace classes. To eliminate the need for manually labeled training data we apply a combined method, which automatically selects training traces according to the extreme values of principal component projections or some auxiliary measured quantities. Then the network captures the features of these characteristic traces and generalizes its inference to the entire dataset. The use of a simple neural network structure also enables a direct insight into the decision-making mechanism. We demonstrate that this combined machine learning method is efficient in the unsupervised recognition of unobvious, but highly relevant trace classes within low and room temperature gold–4,4′ bipyridine–gold single-molecule break junction data.
Nowadays artificial intelligence methods are widely utilized in many fields of science and technology providing a rapidly developing tool to recognize the relevant features in the data without guidance by human intuition. In molecular electronics, it was also demonstrated that machine learning protocols, like unsupervised vector-based classification,17 reference-free clustering method,18 fast data sorting with principal component analysis,19 deep auto-encoder based clustering,20 and neural network based classification21 may become useful tools for data analysis. In the latter work, we have demonstrated the successful classification of single-atom and single-molecule break junction data relying on recurrent neural networks that were trained either on computer-simulated data or on manually selected and labeled experimental traces. Though being successful in the classification of the traces, it is clear that this approach can be further optimized in various aspects. On one hand, the rather complex recurrent neural networks were found to be sensitive to the choice of the network parameters, excellent classification was only achieved at some specific parameter sets. On the other hand, the training method also requires improvement: whereas training on computer-simulated data is expected to have increasing importance with the development of the simulations’ predictive power, in case of experimental training sets, it would be definitely favorable to eliminate the need for manual labeling. In this paper, we step forward towards an unsupervised learning protocol. We apply a combined method using the simplest possible feed-forward neural network with a single hidden layer for feature identification. The training is either based on the principal component (PC) projection of the conductance traces, or on an additional measured quantity, like force. In both cases, the two sides of the distributions (i.e. the extreme PC projections or rupture force values) are used for the training such that the network first learns from traces with clear features, and then generalizes for traces with less obvious characters. We demonstrate that this approach exhibits excellent performance on break junction data: (i) the classification results are insensitive to the precise choice of the network parameters; (ii) thanks to the simple network structure one can extract valuable information about the decision-making mechanism; (iii) this unsupervised training protocol provides similar classification results as the human feature recognition.
The cryogenic temperature measurements exhibit three clear differences compared to the room temperature data: (i) the 1D histogram exhibits a single peak around the LowG region (Fig. 1D); (ii) the pick-up rate significantly decreases (≈30–40%); (iii) the stability of the junction is significantly increased. The latter two features are clearly reflected by the 2D histogram: due to the reduced pick-up rate, the traces with molecular plateaus are mixed with tunneling traces where the exponential decay of the tunnel current between the metallic apexes is clearly resolved due to the enhanced stability (see the 2D histogram and a sample molecular and tunneling trace in Fig. 1C).
In the following, we analyze these datasets using the neural network illustrated in Fig. 2A. The Ini input vectors of the feed-forward neural network are simply the histograms of the individual conductance traces, Ni(r), the number of data points in bin i on trace r. This histogram is restricted to the G = 10−5–101G0 conductance range for the room temperature measurement and G = 10−6–101G0 in the case of the low-temperature measurement using logarithmic binning. The size of the network's input vector, and accordingly the M number of bins in the histogram is an adjustable parameter of the network together with the number of neurons in the hidden layer (H). The neurons in the hidden layer sum up the incoming signals using the weights of the synapses between the input and the hidden layer and placing a bias offset. Finally, these neurons output the summed signals applying a nonlinear (sigmoid) activation function. These outputs of the hidden layer are similarly fed to the output layer with a single neuron. The output value can be interpreted as a result of binary classification, e.g. the trace is classified as molecular/tunneling trace if the network output is larger/smaller than 0.5. The network is trained on a subset of the traces that are labeled according to a specific criterion (e.g. molecular trace vs. tunneling trace). Along the training process, the optimized weight and bias values are found using the backpropagation algorithm implemented in the TensorFlow machine learning platform.26 Finally, the trained network is ready to classify any conductance trace, also those that were not used for training.
In this paper, we demonstrate the classification performance of these neural networks using the following scenario: (i) as an initial test we separate molecular traces from tunneling traces using the cryogenic temperature data. Relying on the manual labeling of all traces we first train the network on a part of the traces and then we evaluate the classification accuracy on the entire dataset. Finally, we study the decision-making mechanism by analyzing the weight-products of the network. (ii) We perform the same classification by replacing the manually labeled data with a training set that is automatically generated by principal component projection. (iii) We further analyze the molecular traces applying an additional principal component decomposition on the molecular traces classified in (ii). The results of this analysis explain the difference between the room and low-temperature conductance histograms. (iv) Finally, we analyze the room temperature traces, where the conductance data are supplemented with force measurements. The force data are only used to label the training traces, demonstrating that afterwards, the network is able to identify the relevant trace classes using solely the conductance data. The results of these classification tasks demonstrate, that our protocol performs well in automatically finding the relevant parts of the traces, reflecting distinct junction formation trajectories. This would be a challenging and time-consuming task by manual data analysis, and therefore our combined classification method provides valuable guidance in understanding the physical processes in single-molecule junctions.
It is important to note, that the manual labeling of most traces is obvious, but it is a more delicate task to define a custom filtering algorithm for that. The most obvious filtering would rely on the number of datapoints within the conductance region of the molecular plateau assigning molecular/tunneling label to the traces with larger/smaller number of points than a proper threshold. This algorithm provides significantly worse classification accuracy (≈85–90% depending on the chosen conductance range) than the neural network. Furthermore, this simplified method systematically misclassifies around 25–30% of the molecular traces, whereas the neural network algorithm provides a misclassification rate below 10%. From this comparison it is clear that such a simple criterion is insufficient for the proper classification, rather some combined features should be measured, including e.g. the slope of the trace within a proper region, or comparing the number of datapoints in multiple conductance regions.
The neural network algorithm automatically finds a proper, combined classification criterion. Due to the simple structure of our neural network, it is also possible to get a quantitative insight into the network's decision making algorithm. As a simple measure, we can calculate the summed weight products for all the routes between a certain input and the output, . If SWPi is a large positive/negative number for a certain input, a large input value (i.e. a large histogram count) will push the decision towards the molecular/tunneling label, respectively. If SWPi is close to zero for a certain input, then this input is not relevant in the decision making process. For our particular network, the SWP plot displays large positive values in the region of the molecular plateau (see the region with light red background in Fig. 2F), and a large negative region is observed at lower conductances (see the region with light blue background in Fig. 2F). In the latter region, the molecular traces display a jump, but the tunneling traces contain significant counts. It is clear from the SWP plot that the network checks combined criteria: to give a trace a tunneling label it is not enough to have a small number of points in the region of the molecular peak; there should be enough points at even lower conductance values, where a molecular trace would exhibit a jump. Checking combined criteria brings clear improvement in the classification of molecular traces compared to the above misclassification of the molecular traces using a single criterion.
For the above analysis, we have used 100 input neurons, i.e. 100 histogram bins. Next, we check the stability of the neural network performance against the number of input bins (M). Fig. 2G demonstrates, that the reduction of the bin number even slightly increases the classification accuracy down to ≈10 bins, but below that the accuracy drops. However, once we know the relevant conductance regions for the decision making from the SWP plot, we can define customized bins to focus our analysis on the most relevant regions. In this particular case, we can reduce the number of input neurons to two, by calculating the number of datapoints in the two relevant regions of the SWP plot (i.e. the regions with light red and light blue background). This highly simplified network also achieves 93% accuracy (see the red dot in Fig. 2G). We generally use H/M = 1.5 ratio, where H is the number of neurons on the hidden layer but a broader region around this value also provides similar classification results.
Fig. 3 Classification of the low temperature tunneling/molecular traces using principal component projections. (A) The correlation plot of the entire dataset, where i,j represent the conductance bin labels, δNi(r) = Ni(r) − 〈Ni(r)〉r, and the 〈〉r averaging is performed along the r trace index. (B) Principal components of the correlation matrix corresponding to the three largest eigenvalues. The light red/blue regions are reproduced from the SWP plot in Fig. 2F as a reference. (C) Distribution of the PC2 projections for all measured traces (black), manually labeled molecular (red) and tunnelling (blue) traces. Traces with positive/negative projection are classified as molecular/tunneling trace. Conductance histograms of the such classified molecular (D, E) and tunneling (F, G) traces. The encircled region in panel (D) illustrates that a significant portion of the tunneling traces are misclassified. In the 2D histograms the traces are aligned at Gref = 10−5G0. |
To solve the above problem of simple PC analysis, we apply a combined approach. We first take the traces from the two sides of the principal component projections’ distribution (see light red and light blue regions in Fig. 4E, where both regions include 20% of all traces). These traces clearly exhibit the features of the two classes showing definite tunneling/molecular characters, therefore these two trace sets provide an ideal training set for the neural network illustrated in Fig. 2A. During the training, the neural network learns the relevant features of these two trace classes, and then it generalizes these features for the rest of the traces with less clear character. This combined classification not only resolves the indefinite threshold problem of the principal component projections, but the neural network may also recognize more sophisticated features, which could not be captured by a simple principal component analysis. Performing this combined analysis we achieve 93% classification accuracy, and the ratio of misclassified tunneling traces is reduced below 3%. The 2D and 1D histograms of the corresponding traces labeled as molecular/tunneling curves are demonstrated in Fig. 4A, B and C, D respectively. Both the 1D and 2D histograms confirm, that the misclassification of a significant amount of traces is avoided with this analysis. The SWP figure (Fig. 4F) exhibits a similar structure as the SWP plot in our previous analysis using the manually labeled training set (Fig. 2F).
Fig. 4 Combined principal component and neural network method for sorting the traces measured at low temperature. 2D and 1D conductance histograms of the traces classified as molecular (A, B) and tunneling (C, D) trace. In the 2D histograms the traces are aligned at Gref = 10−5G0. (E) Distribution of the PC2 projections for all measured traces (black), training traces labeled as molecular/tunneling trace (light red/blue area), traces classified as molecular/tunneling trace (red/blue line). (F) SWP plot of the trained neural network (red and blue area). As a reference, PC2 of the correlation matrix is reproduced from Fig. 3B (black line). |
This analysis demonstrated, that the efficient unsupervised feature recognition is clearly a mixed effort of principal component and neural network analysis: the PC projections are able to deliver a proper, automatically generated training set, but without neural network supplement, the PC analysis would miss the proper classification thresholds.
Fig. 5 Unsupervised identification of the relevant trace classes among the molecular traces (i.e. using the traces with molecular label according to Fig. 4A and B) (A) correlation matrix of the traces with molecular label. (B) PC1 (black): principal component corresponding to the largest eigenvalue. SWP plot of the trained neural network (red and blue area). (C–F) 2D and 1D conductance histograms of the resulting two trace classes. To highlight the initial part of the traces Gref = 0.5G0 alignment is applied in the 2D histograms. The initial configuration histograms of the two trace classes are shown by black lines in panels (C) and (E), respectively. For the first, LowGStart trace class (C, D) the molecular plateau starts in the low conductance region (light red region), wheres in the second, HighGStart trace class (E, F) the molecular plateau starts in the high conductance region (light blue region). (G) Distribution of the PC1 projections: all molecular traces (black line), traces used for training (light red and blue area), resulting trace classes: LowGStart (red line), HighGStart (blue line). (H) Step length distribution of the 1G0 plateau: all molecular traces (black line), LowGStart traces (red line), HighGStart traces (blue line). |
Both PC1 and the SWP plot exhibit a further remarkable phenomenon showing large negative weights in the region of the 1G0 = 2e2/h quantum conductance unit (Fig. 5B). This means, that a long single-atom plateau with ≈1G0 conductance would push the classification towards the LowGStart label. To test this consideration we plot the length distribution of the 1G0 plateaus (i.e. the step length histogram shown by black line in Fig. 5H), which displays double peaks. This is a clear indicator of monoatomic chain formation.27 After decomposing the step length histogram according to the two trace classes, it becomes clear that the second step length histogram peak is suppressed for the HighGStart traces, which means that these traces dominantly appear if the gold monoatomic contact breaks without chain formation. For the LowGStart traces, however, the first step length histogram peak is suppressed, and the second peak is enhanced. This means, that these traces dominantly appear if a monoatomic chain was already pulled before the rupture of the gold wire. In the latter case, the chain atoms relax back to the electrodes after the rupture leaving a significantly larger gap between the apexes than in the former case, when the gold junction breaks without chain formation. In the case of the HighG molecular configuration, the aromatic ring also binds to the side of the electrodes,23 but such a configuration cannot accommodate larger gaps, i.e. after the rupture of monoatomic gold chains this configuration is typically missing. Due to the enhanced mechanical stability at cryogenic temperatures, a sufficiently large portion of the traces exhibit atomic chain formation, which also brings the clear dominance of the LowG molecular peak in the low-temperature 1D conductance histogram in contrast to the dominance of the HighG peak at room temperature.
The SWP plot not only highlights the importance of the 1G0 region, but it clearly defines four distinct conductance regions that significantly contribute to the network's decision (see regions R1–R4 in Fig. 5B, where the regions with blue/red area push the decision towards HighGStart/LowGStart label, respectively). Similarly to the customized bins in our previous analysis (see the red dot in Fig. 2G), here we can also reduce the input dimension of the original network with 100 input bins by using regions R1–R4 as four customized bins. This simplified network with four inputs reproduces the classification of the original network with 87.5% accuracy. One can also use R1, R2, R3 or R4 as a single input bin of the network. In this case, the original network's classification is reproduced with 74.9%, 81.5%, 65.4% and 61.7% accuracy, respectively. This means that R2 is the most important region, the 1G0 region (R1) is a rather good precursor of the molecular trace classes, whereas R3 and R4 are less significant.
This analysis demonstrated that our combined classification algorithm automatically found the two relevant trace classes of LowGStart and HighGStart traces. Furthermore, the structure of the principal component/SWP plot gave us a relevant hint that the 1G0 step length acts as a precursor of the molecular trace classes, and further three regions were highlighted as the most relevant conductance intervals for the decision making. These conclusions did not require any prior knowledge about the dataset; the classification algorithm recognized the relevant motifs of the traces in an unsupervised way.
The SWP plot (Fig. 6B) shows that the most relevant part of the input data comes from the conductance range corresponding to the LowG configuration. Again, one can use the most relevant conductance intervals highlighted by the SWP plot (see regions R1 and R2 in Fig. 6B) as customized bins, and the reduced network with these two inputs reproduces the decision of the original 100 input network with 83% accuracy.
In contrast, our method relies on a simple, transparent and stable algorithm. Through the principal component approach it eliminates the need for interpreting complex correlation matrices,8,9 the neural network supplement significantly improves the classification accuracy compared to a plain principal component analysis,19 the simple network structure eliminates the sensitivity to the network parameters, and the SWP plot (along with the principal component plot) provides a clear insight to the decision making mechanism. On the other hand, our approach also has some restrictions, and some precautions should be taken during its application: (i) our method uses the histogram bins as inputs, and the displacement information is excluded from our analysis. We find that in most cases the classification is efficient with this simplified input due to the monotonic nature of the conductance traces. However, in some cases, the fine details of the displacement information may become crucial. (ii) Our method basically performs binary classification, multiple trace classes can only be recognized through iterative binary classification steps (see section 1.3). (iii) In the above analysis, we relied on the most relevant principal components, but generally, it is essential to analyze the classification according to all leading principal component projections (e.g. PC1, PC2 & PC3), and to screen the relevance of each projection. Furthermore, the improper preparation of the data, like strong temporal inhomogeneities in the temporal histograms10 or the unnecessary inclusion of the background noise level in the data may introduce dominant, but fully irrelevant correlations, which may even dominate the first principal component. In a properly prepared, temporally homogeneous dataset, however, the leading principal components typically reflect highly relevant data classes.
According to the above aspects, we consider the choice between the present method and the more complex methods in ref. 18 and 20 as a tradeoff between simplicity and transparency, or complexity and the optional access to further extents (e.g. multiple trace classes and displacement information).
The room temperature measurements were performed with a scanning tunneling microscope break junction arrangement which was supplemented with an AFM cantilever to precisely measure the force as well.25 In this measurement, molecules were evaporated onto gold-coated mica substrate.
The one dimensional (1D) histograms were created for each trace by dividing the conductance axis to small ranges (bins) and calculating the number of measurement points in each conductance bin. Then single trace histograms are averaged to calculate the 1D histogram representing the entire dataset. 2D histograms are created by aligning each trace at the crossing of a given conductance level (Gref) and performing binning along both the conductance and displacement axes. The neural network algorithm was implemented using the TensorFlow machine learning platform.26
This journal is © The Royal Society of Chemistry 2020 |