From the journal Digital Discovery Peer review history

You do not have JavaScript enabled. Please enable JavaScript to access the full features of the site or access our non-JavaScript page.

Round 1

Manuscript submitted on 23 Jun 2022

Editor’s decision letter

18-Aug-2022

Dear Dr Jayaraman:

Manuscript ID: DD-ART-06-2022-000066
TITLE: Semi-supervised machine learning workflow for analysis of nanowire morphologies from transmission electron microscopy images

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions are necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines http://www.rsc.org/journals-books-databases/journal-authors-reviewers/author-responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process. We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

************

Reviewer comments

Reviewer 1

Training a self-supervised model ‘head’, which is then frozen and coupled with task-specific downstream models is standard practice in computer vision and natural language processing (and in my opinion, what makes CV and NLP so powerful). The application to morphological analysis is clever and will certainly advance the state of deep learning for microscopy. The authors did a good job demonstrating different potential downstream tasks using this workflow. I also appreciate the authors publicly sharing their code and datasets. This openness tells me that they have a desire to advance the community as a whole, which is commendable.

Comment 1: In the Introduction, when discussing past work on transfer learning, only three example citations are given for transfer learning from large-dataset models like ImageNet into a specific domain (citations 8, 9, and 10). This makes it seem like transfer learning is a new technique or not widely used, which isn’t the case. Transfer learning is a very common technique that has been applied in a large number of domains. I suggest the authors collect and cite review articles concerning transfer learning in certain domains, rather than choosing highly specific papers to represent an entire domain. For example, this very recent review (https://link.springer.com/article/10.1007/s12551-022-00949-3) on deep learning-based image processing in optical microscopy includes some discussion of transfer learning, and this review on deep-learning in materials science (https://www.sciencedirect.com/science/article/pii/S0927025622002804), which discusses transfer learning as well.
Here’s the section I’m referring to: “For example, a model that has been trained on ImageNet7, a large dataset of 1.2 million photographic images of macroscopic objects, can be transferred to learn how to analyze images in another more specific domain [e.g., medical image classification8 or morphology classification of soft materials9]. The success of transfer learning in the field of image analysis has paved the way for accessibility to pretrained image learning models for the general public without requiring large computational resources or big data to train from scratch.10”
In addition, a discussion of TEM-specific deep learning methods is missing. See for example the following citations. This list is by no means comprehensive and should be expanded by the authors.
https://www.nature.com/articles/s41524-021-00652-z
https://onlinelibrary.wiley.com/doi/full/10.1002/adts.201800037
https://arxiv.org/abs/2105.07485

Comment 2: In the Introduction, it would be helpful to include discussion of the general idea of training model ‘heads’ to be coupled with task-specific downstream tasks to provide more context to readers who may not be as familiar with deep learning research.

Comment 3: The introduction also has no mention of autoencoders, which are common deep learning architectures for image analysis alongside CNNs. For instance, see the application of an autoencoder for morphology detection in SEM images: https://www.sciencedirect.com/science/article/pii/S0022311521002063
As the workflow in this report relies on a pre-trained encoding, a discussion of autoencoders would be useful. One could also imagine using an autoencoder as the model ‘head’ in this workflow, as autoencoders are typically trained in an unsupervised fashion.

Comment 4: Please provide more information on the “generic microscopy images” from ref 20 used for self-supervised training. For instance, what materials are included in this dataset? What instrument modality is used for imaging? Is there any general information on the morphologies included in this dataset?

Comment 5: The labeled images seem to have all come from the same instrument with the same settings, and possibly the same person controlling the instrument. Do you have any thoughts on the usability of your model on images taken on a different instrument with different settings? There have only been a few studies examining variability in data quality/collection settings, so I don’t expect the authors to spend time collecting new images at different settings and testing them on the trained model, but if the authors already have such images, it would be interesting to test the model behavior.
For instance, research in the medical field involving training object detection models (using transfer learning) on radiographs of varying quality: https://doi.org/10.21037/jmai-20-2. Training on data of varying quality allows models to maintain accuracy across data collected by various methods. But we should not assume a model trained on data of a single quality would apply to data of varying quality.

Comment 6: How does the nanowire segmentation task compare to traditional image analysis tools such as edge detection or masking?

Reviewer 2

In this paper, Lu et al. utilize a semi-supervised machine learning approach with transfer learning to classify morphologies obtained from transmission electron microscopy images. After presenting the overall idea, they compare performance of different image encoders on different tasks and offer interpretation of the results. I think that the major takeaway from the paper is that this self-supervised+transfer learning workflow generally facilitates good classification accuracy from overall few labeled images. This is certainly a nice result, and the approach may be more broadly appealing/inspiring to materials scientists working on other problems. I am not sure I fully appreciate the ramifications of e.g. the one-shot learning exercise as it doesn’t seem practically that laborious to label 10’s of examples, but it nonetheless demonstrates limits/capabilities of the approach.

The paper is overall well written and complete. Authors are “ahead of the curve” and already have a nice github repository and have published their dataset online. I will note that my expertise is not so much in image processing or classification with respect to ML. As someone with more specific expertise in other ML areas but generally well versed in ML concepts, I found the manuscript to be clear and digestible. I am supportive of its publication in Digital Discovery, which is an appropriate venue. I applaud their efforts.

I offer some points for consideration as minor revisions prior to publication. I would summarize my more substantive criticisms as perhaps re-framing some conclusions (to avoid having them be overly strong in their interpretation) or providing some additional controls/articulating the advantages of the approach over my more simple-minded thoughts. These comments are offered to insulate the work against low-hanging criticism.

1. In the introduction, authors provide two recent applications of self-supervised transfer learning (ref 14,15) for classification. I find it interesting that in these cases, the classification accuracy is <80%, despite the fact that both applications have considerably more training data. I can appreciate the point/motivation that soft materials may present with smaller datasets, as the authors point out, but it seems to me that there is something else lurking here with regards to the nature of the classification task itself. Authors point out “researchers in the soft materials domain handle much smaller datasets and have a more diverse range of image analysis tasks,” which makes it sound as if the problem for soft materials is indeed more complex than medical imaging. That might be true, but the authors have less data and achieve superior classification accuracies for their varied tasks. I might suspect that the classification tasks presented here are simpler but necessary stepping stones. If that is not the case, can authors reconcile or contextualize my observation/confusion? This would be useful as others look to apply these techniques to their own problems with level expectations.

2. “The difference between the two methods – SimCLR 13 and Barlow-Twins 21- is in the loss function as described in the methods section.” I think it would be appropriate to at least provide a high-level summative statement about the differences here. In fact, I think the statements that are made in the methods are perfectly fine and would be fair to include in the main text. In the methods, I would then prefer to see the equations for the loss function.

3. Are the microscopy images in Fig 2 (and I suppose in general) all at the same length-scale? That would seem to be an important factor for consideration in a task like this one. A comment may be warranted about how the sizes are controlled, etc. I am curious what area a 224x224 image maps to in physical space. Maybe this can be indicated in the figure caption. Edit: I notice this information is present in the methods. However, because I think other readers will have similar thoughts, I suggest a short statement regarding the different magnification and what is of interest/not in the main text.

4. “We believe that SimCLR performs worse when trained on generic TEM images due to the reduced contrast in generic TEM images compared to that in ImageNet images.” Can authors expand on this/complete the thought? I do not think this will be self-evident to most readers. Why is the reduced contrast important in SimCLR vs. Barlow-Twins?

5. “We conclude that “opposite side anchors” are better approximates of the medians of the test set than “same side anchors”, thereby leading to good accuracies.” This is perhaps the most contentious point that I will raise. I am not sure I agree with this conclusion, or at least the way in which it is phrased. I think that the concluding statement is conflating two aspects: (i) how close are the anchor images to the median of the test and (ii) the relative positioning of the anchors versus the medians. Consider the position of the “network” anchor in panel 3E… its value would actually be closer or similarly close to the median of the network test set in fig 3d. It is also worth pointing out that the portrayal could be biased(?) since he distributions significantly differ between 3D vs. 3E? Do anchors get tested on multiple test sets? The caption might be best served to indicate that these are representative.

Anyway, the discussion is focusing on the the relative positioning of the anchors whereas the conclusion is focusing on how close the anchors are to the median. It seems to me that one could control for this by picking anchor images at fixed differences of pixel density away from the test set median but in different directions of the median itself. If m1 and m2 are the two medians and a1 and a2 are the respective anchors, then there are four scenarios: m1a1a2m2, a1m1m2a2, m1a1m2a2, a1m1a2m2. Anyway, I do not think the authors necessarily need to test these scenarios in the way that I just described (they could if they think that makes sense), but I think it should be made clear what is seemingly important: the relative positioning or the distance from the median. The conclusion offered in the main text could be easily verified by reporting the average deviation from the median for same side vs opposite side anchors.

6. Sometimes authors accidentally use “dice score,” but Dice is a proper noun and should be capitalized.

7. With respect to the results and discussion surrounding fig 5: one “concern” I have is that this morphology classification is not so much actually distinguishing morphology but something like pixel density, which appears very different across the classes. This reminds me of training an intentionally bad classifier to distinguish dogs vs wolves when the classifier learned not anything related to the difference between dogs and wolves but rather the presence of grass versus snow (dogs most often being depicted on grass and wolves most often being depicted on snowy landscapes). Perhaps authors could show what the distribution of pixel densities are and the extent of overlap for these three classes. Also, in lieu of the encoder feature maps, authors could train a “control classifier” that is just based on these obvious characteristics (e.g., pixel density) and demonstrate to what extent the encoder feature map offers an improvement on the prediction task.

Reviewer 3

The authors developed machine learning methods based on the semi-supervised transfer learning approach to automate the analysis of transition electron microscopy (TEM) images. They tested the transfer learning ability of the SimCLR and Barlow-Twin models on TEM images of protein nanowires. Those models were trained to classify the nanowire morphologies. I find that the authors’ work is interesting. However, I have some specific comments and suggestions as follows.

1. There are previous projects that used transfer learning for the classification of nanomaterials. In the introduction section, I recommend adding some statements to mention how those methods are applied to classify the nanomaterials. Moreover, I recommend comparing the method introduced in this paper against the transfer learning methods for nanomaterials used in those papers (advantages, disadvantages, and differences). Some papers are mentioned as follows.

a) Modarres, M.H., Aversa, R., Cozzini, S. et al. Neural Network for Nanoscience, Scanning Electron Microscope Image Recognition. Sci Rep 7, 13282 (2017). https://doi.org/10.1038/s41598-017-13565-z

b) Han, Y., Liu, Y., Wang, B. et al. A novel transfer learning for recognition of overlapping nano object. Neural Comput & Applic 34, 5729–5741 (2022). https://doi.org/10.1007/s00521-021-06731-y

2. On page 30, the authors mentioned that “We label the nanowires with blue masks……”. Is this blue mask a technical term or just a color used? Can the authors describe “blue masking” in the manuscript?

Machine Learning Model and Its Performance

3. In Fig. 4, the Dice scores are between 0.78 and 0.8, and the IoU scores are between 0.61 and 0.68. On page 19, it is mentioned that “With transferred encoder, our Unet model can achieve good performance (median Dice score > 0.70) with just 8 labeled images per class for training, less than half of the number of test images (20 per class).” How do authors justify that those accuracies are good enough for an accurate segmentation?

4. It is mentioned that 4-fold cross-validation was performed. I recommend adding the cross-validation results to the manuscript to see how the results of the classifiers will generalize to an independent data set.

5. Can the authors show the cross-validation results at the end of the following notebooks in their Github page?

1. Assessment of classification performance on mNP dataset.ipynb
2. Assessment of classification performance on TEM virus dataset
3. Assessment of classification label-efficient training of downstream classification task.ipynb

6. If a user wants to run the Colab notebooks created in this project, how do they download the trained models (weights) like 'barlow_resnet_batch64_project128_64_1024_seed%i.h5'?

7. I highly recommend adding a manual to show how to use the notebooks in the users’ Github accounts or their computers. That will help the users to use the codes conveniently.

Author response

To,
Dr. Alán Aspuru-Guzik,
Editor-in-chief, Digital Discovery
Dr. Linda Hung,
Associate Editor, Digital Discovery

Dear Dr. Aspuru-Guzik and Dr. Hung

Please find enclosed our revised manuscript “Semi-supervised machine learning model for analysis of nanowire morphologies from transmission electron microscopy images” by Shizhao Lu, Brian Montz, Todd Emrick, and Arthi Jayaraman. I, Arthi Jayaraman, am the corresponding author for this manuscript, and I am submitting this manuscript to Digital Discovery on behalf of all authors.

On behalf of all authors, I thank you both for recruiting three excellent reviewers for our manuscript. All three reviewers expressed significant enthusiasm about our work in this manuscript and had valuable suggestions/questions that have improved our manuscript. We are now submitting this revised manuscript and our detailed response to reviews document. We hope you will find this manuscript acceptable for publication in your journal.

As noted in our original submission, the workflow codes are presented as open access in https://github.com/arthijayaraman-lab/self-supervised_learning_microscopy_images. The image dataset of nanowire morphologies is deposited on the open-access data repository Zenodo with DOI: 10.5281/zenodo.6377141. A preprint of the initial draft was submitted at arXiv on March 25, 2022, with link attached here: https://arxiv.org/abs/2203.13875.

Thank you very much for your consideration.

Sincerely,

Arthi Jayaraman, Ph.D.
Centennial Term Professor for Excellence in Research and Education,
Department of Chemical and Biomolecular Engineering
Professor, Department of Materials Science and Engineering
Faculty Council, Data Science Institute
University of Delaware (UD)
Newark, DE, USA

Email: arthij@udel.edu and Phone: 302 831 8682
UD Faculty website: https://cbe.udel.edu/people/faculty/arthij/

Director, NSF-NRT Computing and Data Science Training for Materials Innovation, Discovery, and Analytics

Associate Editor, Macromolecules
Deputy Editor, ACS Polymers Au

The following text has been copied from the PDF response to reviewers and does not include any figures, images or special characters:

RESPONSE TO REVIEWS
RSC Digital Discovery
Manuscript ID: DD-ART-06-2022-000066
Title: " Semi-supervised machine learning workflow for analysis of nanowire morphologies from
transmission electron microscopy image"
Author(s): Lu, Shizhao; Montz, Brian; Emrick, Todd; Jayaraman, Arthi*

We thank the editor and all three reviewers for their time and effort in reviewing our manuscript. We found
the comments from all three reviewers valuable and have addressed all three reviewers’
comments/questions/suggestions in our revised manuscript. Our responses to the reviews are in blue font.
Changes in the main manuscript and supplementary information are highlighted in yellow. (Please note that
the citation numbers are updated separately in the response and are not in the same order as in the main
manuscript.)

Referee: 1

Training a self-supervised model ‘head’, which is then frozen and coupled with taskspecific
downstream models is standard practice in computer vision and natural language
processing (and in my opinion, what makes CV and NLP so powerful). The application to
morphological analysis is clever and will certainly advance the state of deep learning for
microscopy. The authors did a good job demonstrating different potential downstream
tasks using this workflow. I also appreciate the authors publicly sharing their code and
datasets. This openness tells me that they have a desire to advance the community as a
whole, which is commendable.

Our Response: We thank this reviewer for their generous positive comments about our work.
By demonstrating the efficacy of this workflow on protein nanowire morphology and extending towards
nanoparticles and viruses’ morphology, we demonstrate to the nanoscience community how to use this
novel and accessible machine learning workflow for tasks critical to their research work.

Comment 1: In the Introduction, when discussing past work on transfer learning, only
three example citations are given for transfer learning from large-dataset models like
ImageNet into a specific domain (citations 8, 9, and 10). This makes it seem like transfer
learning is a new technique or not widely used, which isn’t the case. Transfer learning is a
very common technique that has been applied in a large number of domains. I suggest the
authors collect and cite review articles concerning transfer learning in certain domains,
rather than choosing highly specific papers to represent an entire domain. For example,
this very recent review (https://link.springer.com/article/10.1007/s12551-022-00949-3) on
deep learning-based image processing in optical microscopy includes some discussion of
transfer learning, and this review on deep-learning in materials science
(https://www.sciencedirect.com/science/article/pii/S0927025622002804), which discusses
transfer learning as well.

Here’s the section I’m referring to: “For example, a model that has been trained on
ImageNet7, a large dataset of 1.2 million photographic images of macroscopic objects, can
be transferred to learn how to analyze images in another more specific domain [e.g.,
medical image classification8 or morphology classification of soft materials9]. The success
of transfer learning in the field of image analysis has paved the way for accessibility to
pretrained image learning models for the general public without requiring large
computational resources or big data to train from scratch.10”

Our Response: We thank the reviewer for these excellent suggestions worth including in the
introduction. We have revised our introduction following the reviewer’s comments to include citations of
a couple of reviews on deep learning applications in microscopy image learning, object detection in
material science, and microstructure characterization.

Development of modern ML models benefits in performance from the procurement of big datasets related
to a specific task. To bypass the need to collect large training data and reduce the time needed to train the
ML model from scratch on that large data, researchers use ‘transfer learning' techniques.1 Transfer
learning involves leveraging the knowledge of a model previously trained using large training datasets to
create a new model for another related task. For example, a model that has been trained on ImageNet2, a
large dataset of 1.2 million photographic images of macroscopic objects, can be transferred to learn how
to analyze images in another more specific domain [e.g., medical image analysis3 and electron microscopy
image analysis in material science4-8]. The success of transfer learning in the field of image analysis has
paved the way for accessibility to pretrained image learning models for the general public without requiring
large computational resources or big data to train from scratch.9

In addition, a discussion of TEM-specific deep learning methods is missing. See for
example the following citations. This list is by no means comprehensive and should be
expanded by the authors.
https://www.nature.com/articles/s41524-021-00652-z
https://onlinelibrary.wiley.com/doi/full/10.1002/adts.201800037
https://arxiv.org/abs/2105.07485

Our Response: We also include a discussion of the diverse tasks in electron microscopy imaging in the
materials science domain that researchers in the field have recently applied computer vision deep learning
to:
Recent advances in deep learning have led to a surge of applications in electron microscopy image analysis
for a diverse set of tasks in two main categories: discriminative and generative. Discriminative tasks are
tasks like morphology/phase classification10-13, particle/defect detection14-17, image quality assessment18-20,
and segmentation21-26 where the objective is quantified by how well the model can distinguish (1) between
images or (2) between objects and their background. Generative tasks include microstructure
reconstruction27-29, super resolution30-32, autofocus33 and denoising34,35 where the objective is generation of
images with certain desired traits.

Comment 2: In the Introduction, it would be helpful to include discussion of the general
idea of training model ‘heads’ to be coupled with task-specific downstream tasks to provide
more context to readers who may not be as familiar with deep learning research.

Our Response: We thank this reviewer for suggestions to include a discussion on the usage of semisupervised
learning in the introduction. We also realize that the manuscript was missing a gradual
introduction to semi-supervised learning which could have confused some readers when we referred to
semi-supervised learning in the later sections. To address both these points, we have now revised our
introduction in the following way:

To overcome limitations of labeling, semi-supervised training workflow is another option which typically
consists of an unsupervised training of feature encoder requiring no manual labeling and a supervised
training of specific downstream task model requiring manual labels.36,37 Chowdhury et al. developed a
semi-supervised approach consisting of a feature extractor, a feature selector, and a classifier to classify
different dendritic microstructures.10 Peikari et al. developed a cluster-then-label semi-supervised
approach for classifying pathology images.38

Comment 3: The introduction also has no mention of autoencoders, which are common
deep learning architectures for image analysis alongside CNNs. For instance, see the
application of an autoencoder for morphology detection in SEM
images: https://www.sciencedirect.com/science/article/pii/S0022311521002063
As the workflow in this report relies on a pre-trained encoding, a discussion of
autoencoders would be useful. One could also imagine using an autoencoder as the model
‘head’ in this workflow, as autoencoders are typically trained in an unsupervised fashion.

Our Response: Although autoencoders can also be used for learning representations of images,
autoencoders are not as competitive as the recent self-supervised learning in terms of image
representation learning. In a recent review, Liu et al. attribute the more competitive status of selfsupervised
learning compared to autoencoders in classification scenarios to the more closely aligned
learning goal of self-supervised learning modules to that of vision tasks targeting high-level abstraction
such as classification and object identification.42 We appreciate the reviewer’s suggestion and have
included a brief discussion on autoencoders in the introduction as follows:

To overcome limitations of labeling, semi-supervised training workflow is another option which typically
consists of an unsupervised training of feature encoder requiring no manual labeling and a supervised
training of specific downstream task model requiring manual labels.36,37 Chowdhury et al. developed a
semi-supervised approach consisting of a feature extractor, a feature selector and a classifier to classify
different dendritic microstructures.10 Peikari et al. developed a cluster-then-label semi-supervised
approach for classifying pathology images.38 A school of generative architectures called autoencoders have
also been used in obtaining pretrained feature maps of images.39,40 An autoencoder architecture involves
training of an encoder to condense the information from the original image to a low-dimensional feature
map, and a decoder that tries to reconstruct the original image from the feature map. More recently, selfsupervised
learning of images has emerged as a new form of label-free, unsupervised training.41 In a recent
review, Liu et al. attribute the more competitive status of self-supervised learning compared to
autoencoders in classification scenarios to the more closely aligned learning goal of self-supervised
learning modules to that of vision tasks targeting high-level abstraction such as classification and object
identification.42 Through self-supervised training, the ML model learns a representation of an image by
maximizing the similarity between two different transformed versions of the same image.

Comment 4: Please provide more information on the “generic microscopy images” from ref
20 used for self-supervised training. For instance, what materials are included in this
dataset? What instrument modality is used for imaging? Is there any general information
on the morphologies included in this dataset?

Our Response: The CEM500k dataset that we used as the generic microscopy images, is curated by
Ryan Conrad and Kedar Narayan45 and contains electron microscopy (EM) images of cellular and
biomaterial structures obtained from a variety of imaging modalities of both publicly available sources
and their own experiments. We are greatly appreciative of their efforts to curate an open-access EM
dataset for community machine learning model development and training purposes. We have added the
information following the first mention of the generic microscopy images in the main manuscript as
follows:

We illustrate the conceptual workflow of semi-supervised transfer learning for microscopy images in Fig.
1. First, a generic image learning model, an encoder, undergoes self-supervised training (i.e., no labels
required during training) on a dataset of generic microscopy images called CEM500k45, an open-access
electron microscopy image dataset curated from various imaging modalities characterizing cellular or
biomaterial structures by Conrad and Narayan.

Comment 5: The labeled images seem to have all come from the same instrument with the
same settings, and possibly the same person controlling the instrument. Do you have any
thoughts on the usability of your model on images taken on a different instrument with
different settings? There have only been a few studies examining variability in data
quality/collection settings, so I don’t expect the authors to spend time collecting new images
at different settings and testing them on the trained model, but if the authors already have
such images, it would be interesting to test the model behavior.

For instance, research in the medical field involving training object detection models (using
transfer learning) on radiographs of varying quality: https://doi.org/10.21037/jmai-20-2.
Training on data of varying quality allows models to maintain accuracy across data
collected by various methods. But we should not assume a model trained on data of a single
quality would apply to data of varying quality.

Our Response: We appreciate the reviewer’s question regarding the generalizability of our machine
learning workflow on images coming from different modalities. We first want to remind the reviewer that
we have tested and shown the generalizability of the machine learning workflow on the AutoDetect-mNP
dataset, and the TEM virus dataset which are open-access, coming from different instruments with very
different settings. Our machine learning workflow has been tested to work on multiple TEM image
datasets on classifying nanowire / nanoparticle morphologies and identifying types of viruses when
number of labeled training images are limited. Secondly, we want to comment that in dealing with smalldata
problems, the quality of the images and quality of labeling matter more than the quantity of the
images. With regards to labeling, we have curated high-quality image dataset of nanowire morphologies
with accurate labeling. With regards to the imaging quality, the differences in object-background contrast
are always present in different images even when taken from the same instrument. We also observe
different contrast in the images in our nanowire dataset. The TEM virus dataset also presents more
variance in terms of the image quality (good or poor object-background contrast) and labeling quality
(due to the similarity between some types of viruses).

Comment 6: How does the nanowire segmentation task compare to traditional image
analysis tools such as edge detection or masking?

Our Response: Our nanowire segmentation task is more complex than traditional edge detection tasks
because edge detection models only need to detect places where the pixel intensity changes dramatically.
In contrast, segmentation tasks like our nanowire segmentation also need to consider the high-level
abstraction of nanowires as continuous objects. In terms of choosing a good model to tackle the
segmentation task, we have chosen Unet46, which has been shown in many cases to outperform traditional
segmentation methods.24,47,48

Referee: 2

In this paper, Lu et al. utilize a semi-supervised machine learning approach with transfer
learning to classify morphologies obtained from transmission electron microscopy images.
After presenting the overall idea, they compare performance of different image encoders
on different tasks and offer interpretation of the results. I think that the major takeaway
from the paper is that this self-supervised+transfer learning workflow generally facilitates
good classification accuracy from overall few labeled images. This is certainly a nice result,
and the approach may be more broadly appealing/inspiring to materials scientists working
on other problems. I am not sure I fully appreciate the ramifications of e.g. the one-shot
learning exercise as it doesn’t seem practically that laborious to label 10’s of examples, but
it nonetheless demonstrates limits/capabilities of the approach.

The paper is overall well written and complete. Authors are “ahead of the curve” and
already have a nice github repository and have published their dataset online. I will note
that my expertise is not so much in image processing or classification with respect to ML.
As someone with more specific expertise in other ML areas but generally well versed in ML
concepts, I found the manuscript to be clear and digestible. I am supportive of its
publication in Digital Discovery, which is an appropriate venue. I applaud their efforts.
I offer some points for consideration as minor revisions prior to publication. I would
summarize my more substantive criticisms as perhaps re-framing some conclusions (to
avoid having them be overly strong in their interpretation) or providing some additional
controls/articulating the advantages of the approach over my more simple-minded
thoughts. These comments are offered to insulate the work against low-hanging criticism.

Our Response: We thank this reviewer for their overall positive comments about our work. We also
greatly appreciate the constructive comments the reviewer has brought forward for us to improve the
paper.

1. In the introduction, authors provide two recent applications of self-supervised transfer
learning (ref 14,15) for classification. I find it interesting that in these cases, the
classification accuracy is <80%, despite the fact that both applications have considerably
more training data. I can appreciate the point/motivation that soft materials may present
with smaller datasets, as the authors point out, but it seems to me that there is something
else lurking here with regards to the nature of the classification task itself. Authors point
out “researchers in the soft materials domain handle much smaller datasets and have a
more diverse range of image analysis tasks,” which makes it sound as if the problem for
soft materials is indeed more complex than medical imaging. That might be true, but the
authors have less data and achieve superior classification accuracies for their varied
tasks. I might suspect that the classification tasks presented here are simpler but necessary
stepping stones. If that is not the case, can authors reconcile or contextualize my
observation/confusion? This would be useful as others look to apply these techniques to
their own problems with level expectations.

Our Response: We appreciate the reviewer’s observation on the classification accuracies obtained in this
study. First, we believe that one cannot compare the classification performances across different datasets
simply by the accuracy numbers because each dataset is different and presents unique problems.
Secondly, while more training data typically would benefit the deep learning model, another important
aspect is the quality of the image data. For example, Moderres et al. have used transfer learning to
classify SEM images belonging to different nanomaterial subcategories like particles, patterned surfaces,
nanowires, etc.11 Having an unbalanced dataset, they observed higher accuracies for categories that have
fewer images. They made the comment that the categories that have fewer images performed better
because the features in those categories were distinct and sufficiently clear to be learned by the network.
While other categories with larger number of images suffered from having indistinct features.
We attribute our models’ high accuracies of the nanowire morphology classification task to
(1) good curation of the images into balanced, distinguishable categories of real experimental
interest. Additional pixel-level percolation analysis was used as quantification distinguishing
between dispersed and percolating morphologies, the two morphologies that domain scientists
have trouble distinguishing apart
(2) the effectiveness of the encoders trained via self-supervised learning.
We have added the following remark about the importance of good image data curation especially for
image learning tasks with limited data in the main manuscript for readers:
We also show that broader applicability of our machine learning workflow for classification and
identification tasks in other microscopy images (e.g., assembled nanoparticles of various shapes, viruses)
with limited available images for training. While there may exist actionable qualification criteria for
manual labeling an image for an object identification task, subtle morphological differences are
intrinsically harder for human experts to discern and classify into categories. Our machine learning
workflow is precisely targeting such morphology classification problems to mitigate human biases. In
addition, we also want to emphasize that thoughtful categorization and proper labeling of image data is
crucial regardless of dataset size11 and is especially important for data-limited image learning problems.

2. “The difference between the two methods – SimCLR 13 and Barlow-Twins 21- is in the
loss function as described in the methods section.” I think it would be appropriate to at least
provide a high-level summative statement about the differences here. In fact, I think the
statements that are made in the methods are perfectly fine and would be fair to include in
the main text. In the methods, I would then prefer to see the equations for the loss function.

Our Response: We thank this reviewer for their suggestion and have moved the description of the loss
function of SimCLR and Barlow-Twins methods to the main manuscript. We have also added the
equation form of the two loss functions in the methods section following the reviewer’s suggestion.

Changes made in the main manuscript:
We implement two self-supervised training methods: SimCLR41 and Barlow-Twins49. Both methods start by
taking a batch of images and generating two randomly augmented images for each image by performing
random color/hue/contrast changes, and/or randomly crop a portion of the image. The augmented images
are then turned into feature maps by an encoder with ResNet5050 architecture. The feature maps are input
into a projector with three layers of fully connected neurons to generate projections of each image. The
projections are then used to calculate and minimize the loss function to train both the encoder and the
projector. Through maximizing the similarity between two augmented images of the same image, the
encoder is trained to produce feature maps that can represent the images more accurately. The difference
between the two methods – SimCLR 41 and Barlow-Twins 49- is in the loss function. The loss function of
SimCLR method aims to maximize the calculated cosine similarity of projections from the “true” pairs of
augmented images from the same image and minimize that of the “false” pairs of augmented images from
different images. The loss function of SimCLR method has dependence on the batch size and contrast
between images; larger batch size and higher contrast theoretically gives higher ability of discerning “true”
from “false” pairs. The loss function of Barlow-Twins method aims to minimize the redundancy in the
representation of the projection by tuning the cross-correlation matrix of projections from the same image
to be an identity matrix. The equations of the two loss functions are presented in the methods section with
a more detailed explanation.

Changes made in the methods section:
For each batch of images in training, two augmented images were generated from each of the original
images. The projections of the images were used as inputs in the loss function. Two projections undergo
row vector × column vector multiplication and obtain a cosine similarity coefficient (SimCLR method), or
undergo column vector × row vector multiplication to obtain a cross-correlation matrix (Barlow-Twins
method), as illustrated in Fig. 7. The loss function for SimCLR method consists of two terms: a similarity
term measuring the L2-normalized cosine similarity coefficient of a “true” pair of projections (coming
from the same original image), and a contrast term measuring the L2-normalized cosine similarity
coefficient of a “false” pair of projections (coming from different images) as shown in Eq. 3. The loss
function for Barlow-Twins method also consists of two terms: an invariance term in the form of L2-
normalized sum-of-squares penalizing the diagonal values in the cross-correlation matrix for deviating
from unity, and a redundancy reduction term measuring the L2-normalized sum-of-squares of off-diagonal
values in the cross-correlation matrix as shown in Eq. 4.

Fig. 7. Differences in vector multiplication when calculating the loss functions for SimCLR and Barlow-
Twins methods. A) With SimCLR method, a cosine similarity coefficient is obtained from pairs of
projections. B) With Barlow-Twins method, a cross-correlation matrix is obtained from pairs of projections.

where zA and zB are the projections, indexes the sample in a batch, indexes the vector component of the
projection, is the temperature parameter analogous to statistical mechanics, we use the recommended
value of 0.10 in our trainings, is the weighting factor for the redundancy reduction term, we use the
recommended value of 0.005 in our trainings.

3. Are the microscopy images in Fig 2 (and I suppose in general) all at the same lengthscale?
That would seem to be an important factor for consideration in a task like this one.
A comment may be warranted about how the sizes are controlled, etc. I am curious what
area a 224x224 image maps to in physical space. Maybe this can be indicated in the figure
caption. Edit: I notice this information is present in the methods. However, because I think
other readers will have similar thoughts, I suggest a short statement regarding the different
magnification and what is of interest/not in the main text.

Our Response: We thank this reviewer for their suggestion on providing more information on the
different magnification of the nanowire morphologies images in the main manuscript. Providing
information on magnification would indeed help readers understand that although the magnification
varies from image to image, the morphologies observed occur on the same order of magnitude and are of
interest to our experimental collaborators. We have added the following statements:
Protein / peptide nanowires exhibit one of four morphologies when dispersed in solvent – singular (i.e.,
isolated nanowire), dispersed (i.e., isolated collection of multiple nanowires), network (i.e., percolated
nanowires), and bundle morphologies. Materials with dispersed nanowires are desired for mechanical
reinforcement51, while materials with network morphologies are desired for improving conductivity52. The
singular, dispersed, and network morphologies in this work arise from assembly of synthetic oligopeptides
shown in Fig. 2A; the bundle morphologies represent aggregates of protein nanowires harvested from
Geobacter sulfurreducens. 100 images from each morphology are employed (Fig. 2B). The magnification
of the morphology images varies from image to image, but the length scales are on the same order of
magnitude as indicated from the scale bars in Fig. 1B. Because the interest of our study is the type of
morphology rather than the length scale of the morphology, we do not include the scale bars in the images
for training the machine learning models. Due to differences in the peptide / protein nanowire chemistry,
solvent condition and magnification, the object-background contrast in each morphology image are
different.

4. “We believe that SimCLR performs worse when trained on generic TEM images due to
the reduced contrast in generic TEM images compared to that in ImageNet images.” Can
authors expand on this/complete the thought? I do not think this will be self-evident to
most readers. Why is the reduced contrast important in SimCLR vs. Barlow-Twins?

Our Response: The SimCLR method as shown in the equation of the loss function works by increasing
the cosine similarity coefficient of the “true” pairs of projections (two projections coming from the same
image) and minimizing the cosine similarity coefficient of the “false” pairs of projections (two projections
coming from different images). Our interpretation is that when the batch of images have more contrast
between images, the SimCLR method learns better. And the generic TEM images lack contrast because
these images are in grayscale, whereas the ImageNet images are colored images. On the other hand, the
Barlow-Twins method only focuses on minimizing the cross-correlation matrix of two projections from
the same image, therefore may not be subjective to contrast between different images in the batch.
5. “We conclude that “opposite side anchors” are better approximates of the medians of the
test set than “same side anchors”, thereby leading to good accuracies.” This is perhaps the
most contentious point that I will raise. I am not sure I agree with this conclusion, or at
least the way in which it is phrased. I think that the concluding statement is conflating two
aspects: (i) how close are the anchor images to the median of the test and (ii) the relative
positioning of the anchors versus the medians. Consider the position of the “network”
anchor in panel 3E… its value would actually be closer or similarly close to the median of
the network test set in fig 3d. It is also worth pointing out that the portrayal could be
biased(?) since he distributions significantly differ between 3D vs. 3E? Do anchors get
tested on multiple test sets? The caption might be best served to indicate that these are
representative.

Anyway, the discussion is focusing on the the relative positioning of the anchors whereas
the conclusion is focusing on how close the anchors are to the median. It seems to me that
one could control for this by picking anchor images at fixed differences of pixel density
away from the test set median but in different directions of the median itself. If m1 and m2
are the two medians and a1 and a2 are the respective anchors, then there are four
scenarios: m1a1a2m2, a1m1m2a2, m1a1m2a2, a1m1a2m2. Anyway, I do not think the
authors necessarily need to test these scenarios in the way that I just described (they could
if they think that makes sense), but I think it should be made clear what is seemingly
important: the relative positioning or the distance from the median. The conclusion
offered in the main text could be easily verified by reporting the average deviation from the
median for same side vs opposite side anchors.

Our Response: We thank this reviewer for their observation and curiosity at the one-shot learning
findings. Following the reviewer’s suggestion, we have made violin plot + boxplot of the sum of the
absolute distance between the anchor and median of the two morphologies. And we find that the median
of the absolute distance is lower for anchors that are located on the opposite of the two medians in our
study. Figure 3 and descriptions has been updated to include the new piece of information.

Nanowire morphology classification – one-shot learning
As we observe large fluctuations in the accuracy of linear classifiers trained with only one labeled image
per class (i.e., one-shot learning), we want to understand how to select the labeled images or the “anchor”
images for high accuracy. Using feature maps obtained from the Barlow-Twins-TEM encoder, we show
one example of “good anchors” and “bad anchors” each chosen posteriorly from the accuracies (Fig. 3A-
3B). We use t-distributed Stochastic Neighbor Embedding (t-SNE)53 to visualize the feature maps of the test
images projected in 2-dimensional space. From the t-SNE plots, we see that while there are few
misclassifications between the dispersed and network morphologies when the linear classifier is trained on
“good anchors” (Fig. 3C and 3G), most images in dispersed morphology are misclassified as network
morphology when the linear classifier is trained on “bad anchors” (Fig. 3F and 3J). To explain the visible
difference in the performance of linear classifiers trained on different “anchors”, we look at the distribution
of the nanowire pixel density, i.e., percentage of “nanowire pixels” over all pixels, of the ground truth (i.e.,
manually labeled images with nanowire pixels and background pixels) for test images in dispersed and
network morphologies. The nanowire pixel density of the two anchor images is on the opposite sides of that
of the respective median of the two morphologies for “good anchors” (Fig. 3D), but on the same sides for
“bad anchors” (Fig. 3E). We also show the statistics of all 100 sets of anchors and find that the accuracy
of linear classifiers trained on “opposite side anchors” is statistically higher than that trained on “same
side anchors” (Fig. 3H). We conclude that the “opposite side anchors” in our study are better
approximates of the medians of the test set than “same side anchors” for having smaller absolute (anchorto-
median) distance as shown in (Fig. 3I), thereby leading to good accuracies.

Fig. 3. Knowledge of the underlying distribution of the images can help determine good “anchor” images
for high accuracy one-shot learning. (A and B) Good “anchor” images i.e., a set of labeled images from
each morphology, which when used for training give high classification accuracy. Bad “anchor” images
which when used for training give low classification accuracy. t-SNE53 representations of the test set
colored by their true labels and by their predicted labels with “good anchor” images as training set (C, G)
and “bad anchor” images as training sets (F, J). Images of the t-SNE plots of (part C, G, F, and J in
original size and resolution) are provided in the supporting information as Fig. S1-S4. The image count
distribution with nanowire pixel density obtained from the manual nanowire labels for the dispersed and
network morphology images in the test set with “good anchors” (D) and “bad anchors” (E) images as
training sets, respectively. Solid lines are positions of the two “anchors”, and dashed lines are positions of
the two medians. Test set size is 20 for both dispersed and network morphology. (H) The prediction
accuracies with different relative positions of the two “anchors” to the median of the dispersed and network
images in the test set. For the 100 samples obtained in Fig. 2C for Barlow-Twins-TEM, 25 resulted on the
opposite side, 75 resulted on the same side. (I) The sum of the absolute distance between the anchor and
median of the two morphologies with different relative positions of two “anchors” to the median of the
dispersed and network images in the test set.

6. Sometimes authors accidentally use “dice score,” but Dice is a proper noun and should
be capitalized.

Our Response: We thank this reviewer for noting this typo. We have ensured that ‘Dice’ is capitalized
everywhere in the manuscript and supporting information.

7. With respect to the results and discussion surrounding fig 5: one “concern” I have is that
this morphology classification is not so much actually distinguishing morphology but
something like pixel density, which appears very different across the classes. This reminds
me of training an intentionally bad classifier to distinguish dogs vs wolves when the
classifier learned not anything related to the difference between dogs and wolves but rather
the presence of grass versus snow (dogs most often being depicted on grass and wolves most
often being depicted on snowy landscapes). Perhaps authors could show what the
distribution of pixel densities are and the extent of overlap for these three classes. Also, in
lieu of the encoder feature maps, authors could train a “control classifier” that is just based
on these obvious characteristics (e.g., pixel density) and demonstrate to what extent the
encoder feature map offers an improvement on the prediction task.

Our Response: We do understand and thank the reviewer for voicing their concern regarding what the
model has learned. First, in our workflow, the classifier we use is the simplest linear classifier, it relies
heavily on having good feature maps as representations of the images. The feature maps are obtained
from the encoders trained via self-supervised learning. The similarity criteria used as loss function in selfsupervised
learning share a likeness to classification task and the feature maps of similar images would
often show closeness in dimensionality reduction projections as the t-SNE projection that we have shown
in the main manuscript. Second, we do not believe that our encoder models rely on learning the
background information over the information of the objects. In the nanoparticle morphology classification
task, the shape of the particles in each morphology can be either long or short nanorods or triangular
prisms. We believe that convolutional neural networks learn by receiving and encoding contrast in color
(pixel intensity) which gives shape of the objects in contrast to the background. In our nanoparticle
morphology classification task, we are not classifying an image based on what it contains (dog or wolf),
but on the extent and spatial distribution of the nanoparticles in the image (dispersed, separated clusters or
percolated clusters). Finally, we do not think the (nanoparticle pixel density = total number of
nanoparticle pixels / total pixels) is the sole determinant and representative for our morphology
classification because it ignores the distribution and connectivity aspect of the objects in the images.
Three images can have the same nanoparticle pixel density but be in the three different morphologies
depending on the distribution of the nanoparticle pixels in the images. Therefore, we do not see using a
control classifier to train just on the pixel density a useful additional investigation.

Referee: 3

The authors developed machine learning methods based on the semi-supervised transfer
learning approach to automate the analysis of transition electron microscopy (TEM)
images. They tested the transfer learning ability of the SimCLR and Barlow-Twin models
on TEM images of protein nanowires. Those models were trained to classify the nanowire
morphologies. I find that the authors’ work is interesting.

Our Response: We thank this reviewer for their overall positive comments about our work.

However, I have some specific comments and suggestions as follows.

1. There are previous projects that used transfer learning for the classification of
nanomaterials. In the introduction section, I recommend adding some statements to
mention how those methods are applied to classify the nanomaterials. Moreover, I
recommend comparing the method introduced in this paper against the transfer learning
methods for nanomaterials used in those papers (advantages, disadvantages, and
differences). Some papers are mentioned as follows.
a) Modarres, M.H., Aversa, R., Cozzini, S. et al. Neural Network for Nanoscience,
Scanning Electron Microscope Image Recognition. Sci Rep 7, 13282
(2017). https://doi.org/10.1038/s41598-017-13565-z
b) Han, Y., Liu, Y., Wang, B. et al. A novel transfer learning for recognition of
overlapping nano object. Neural Comput & Applic 34, 5729–5741
(2022). https://doi.org/10.1007/s00521-021-06731-y

Our Response: We appreciate the reviewer’s recommendation to add some discussion on transfer
learning studies of nanomaterial classification. The largest difference between this paper and the previous
work stated in the manuscript is that our workflow utilizes self-supervised pretraining rather than transfer
learning from CNNs trained by supervised methods. We have included mentions of several previous work
by researchers tackling morphology classification with transfer learning as follows:

Transfer learning from CNNs trained with supervised methods has been utilized in nanomaterial
classification task in recent years. Moderres et al. have used transfer learning to classify SEM images
belonging to different nanomaterial subcategories like particles, patterned surfaces, nanowires, etc.11
Having an unbalanced dataset, they observed higher accuracies for categories that have fewer images.
They made the comment that the categories that have fewer images performed better because the features
in those categories were distinct and sufficiently clear to be learned by the network. While other
categories with higher number of images suffered from having indistinct features. Their dataset size was
~18000 in total with smallest category containing ~150 images, and largest category containing ~4000
images. Luo et al. used transfer learning to classify carbon nanotube or fiber morphologies on an image
dataset containing 600 images per morphology class.12 They were able to achieve 91% average accuracy
on a four-class dataset, and 85% average accuracy on an eight-class dataset. Matuszewski and Sintorn
recently compared the accuracy of different CNN architectures (with transferred weights or trained from
scratch) on identifying various viruses from TEM images.54 In this article, we present an automated,
label-efficient transfer learning workflow incorporating self-supervised pretraining that aims to classify
nanomaterial morphologies in microscopy images with high accuracy after training on only a handful of
carefully labeled microscopy images.

2. On page 30, the authors mentioned that “We label the nanowires with blue
masks……”. Is this blue mask a technical term or just a color used? Can the authors
describe “blue masking” in the manuscript?

Our Response: We understand the reviewer’s confusion of the term blue mask and would like to clarify.
We use the Microscopy Image Browser software to manually mark out the nanowires with colored masks,
in this case blue-colored masks, to distinguish the nanowire-containing pixels from the grayscale
background pixels before turning the masked image to binary image. We have revised the text where we
mention the mask in the methods section to reflect our purpose for using a colored mask:
Due to difficulty in distinguishing the network morphology from dispersed morphology in some cases, we
manually labeled the nanowires in images with dispersed and network morphology to provide quantitative
basis for the qualitative morphology class labels of the two easy-to-confuse morphology classes. We labeled
the nanowires with masks colored in blue (to distinguish from the contents in the original grayscale image)
through Microscopy Image Browser (MIB)55, a MATLAB-based annotation software, and saved a binary
image with the manual labels. Nanowires that we manually masked out with “colored” masks were labeled
as “nanowire pixels”, other pixels are labeled as “background pixels”. We then performed DBSCAN56, a
clustering algorithm implemented in scikit-learn package57, on the manually labeled “nanowire pixels”.
For each image with clusters of nanowires found by DBSCAN, we quantify percolation by checking whether
there exists a cluster that spans both the horizontal and vertical dimension, i.e., two-dimensional
percolation. To check criteria of spanning both dimensions, for each cluster, we check if the coordinate of
the rightmost pixel minus that of the leftmost pixel is no less than the horizontal dimension minus two, same
for the vertical dimension. We have confirmed that all the network images are percolated and all the
dispersed images are not percolated. We acknowledge that the definition of percolation, in this case, is
local to the image, and not necessarily representative of the material as a whole.

Machine Learning Model and Its Performance
3. In Fig. 4, the Dice scores are between 0.78 and 0.8, and the IoU scores are between 0.61
and 0.68. On page 19, it is mentioned that “With transferred encoder, our Unet model can
achieve good performance (median Dice score > 0.70) with just 8 labeled images per class
for training, less than half of the number of test images (20 per class).” How do authors
justify that those accuracies are good enough for an accurate segmentation?

Our Response: We appreciate the reviewer’s concern about the performance of the segmentation task.
However, we want to first note that there is not a universal bar on the Dice score to quantify a good
segmentation performance on any dataset. Depending on the difficulty, contrast, brightness of the images
in the dataset, the best possible Dice score will differ. Secondly, our images do not have the best contrast
with the background, and some of the nanowires or some parts of the nanowires are more translucent due
to poor staining. The nanowires in our images also have non-uniform and/or tortuous shapes giving rise to
higher nanowire-background intersection regions, leading to potential mis-segmentation. The nanowires
in the images have varied distributions in grayscale intensity and shapes making it difficult to train.
Thirdly, we also inspect the segmentation image result to manually assess the segmentation performance
and our segmentation results are reasonable. Finally, we reiterate that the focus of our paper is on the
efficacy of the feature maps obtained through self-supervised learning. For classification, we only use the
simplest linear classifier, and for segmentation we only use the Unet model without further modifications.
We are aware of newer Unet models such as Unet++58 or Unet 3+59, or mask R-CNN models60, or the new
school of vision transformer models61. A good thing about our workflow is the versatility of downstream
models. We would like to add more functionality to use the newer segmentation models in the future
versions.

4. It is mentioned that 4-fold cross-validation was performed. I recommend adding the
cross-validation results to the manuscript to see how the results of the classifiers will
generalize to an independent data set.

Our Response: We appreciate the reviewer’s concern about the generalizability of the classifiers. We
remind the reviewer that in the process of testing the classifier’s performance, we not only have a 4-fold
cross-validation, we also have 5 encoders (trained on different random selection of datasets via selfsupervised
training), and 5 random train-validation data split. In terms of generalization, we have in fact
trained 5 * 5 * 4 = 100 classifiers for each type of encoder (SimCLR_SotA_ImageNet,
Barlow_Twins_TEM, SimCLR_TEM, Barlow_Twins_ImageNet, SimCLR_ImageNet), and at different
sizes of labeled training images for each type of encoder. We also agree that the average and standard
deviation of the classification accuracy is not a good report of the generalizability of the classifiers.
Therefore, we have in the original manuscript used boxplots to show and reflect the breadth of the
classification accuracy. A boxplot contains information of the median, the 25th and 75th percentiles and
indication of outliers of the “normal distribution” assumption. We believe a boxplot provides more wellrounded
presentation of the performance of the classifiers (and Dice scores / IoU scores for the
segmentation models) compared to just reporting the average and standard deviation. We also think that it
would not be reader-friendly to report each individual classification accuracy for the 100 classifiers that
we trained on each type of encoder (SimCLR_SotA_ImageNet, Barlow_Twins_TEM, SimCLR_TEM,
Barlow_Twins_ImageNet, SimCLR_ImageNet), and at different sizes of labeled training images for each
encoder.

5. Can the authors show the cross-validation results at the end of the following notebooks
in their Github page?
1. Assessment of classification performance on mNP dataset.ipynb
2. Assessment of classification performance on TEM virus dataset
3. Assessment of classification label-efficient training of downstream classification
task.ipynb

Our response: We thank the reviewer again, but we believe what we have in the manuscript: boxplots
provide more well-rounded presentation of the performance of the classifiers (and Dice scores / IoU
scores for the segmentation models) compared to just reporting the average and standard deviation. We
also think that it would not be reader-friendly to report each individual classification accuracy for the 100
classifiers that we trained on each type of encoder (Barlow_Twins_TEM, SimCLR_TEM,
Barlow_Twins_Imagenet, SimCLR_Imagenet), and at different sizes of labeled training images for each
encoder. Therefore, we don’t see it necessary to add the tabulated classification accuracies in addition to
the boxplots that we present in the main manuscript.

6. If a user wants to run the Colab notebooks created in this project, how do they
download the trained models (weights) like
'barlow_resnet_batch64_project128_64_1024_seed%i.h5'?

Our Response: We thank the reviewer for voicing their confusion on where the saved weights of the
trained encoders are located. The weights of the saved image encoders trained via self-supervised training
were uploaded to the Zenodo dataset in our first release. However, we realize that we did not mention it in
the data availability section in the original draft. Now, to better guide readers, we have added the
following in the data availability section in the manuscript:
Data and materials availability: The python code for implementing these models with Keras and
Tensorflow is available at https://github.com/arthijayaraman-lab/selfsupervised_
learning_microscopy_images.
The image dataset of nanowire morphologies and image encoders trained via self-supervised training are
deposited on the open-access data repository Zenodo with DOI: 10.5281/zenodo.6377140. All
intermediate models generated during and/or analyzed during the current study are available from the
corresponding author upon reasonable request.
We also add in our Github README mentions of the image encoders also available in the Zenodo
dataset.

7. I highly recommend adding a manual to show how to use the notebooks in the users’
Github accounts or their computers. That will help the users to use the codes conveniently.
Our Response: We thank the reviewer for the suggestion and have added descriptions of each notebook
and instruction in the Github README on how to use / adapt our jupyter notebooks (deposited on
Github) on google Colab. We hope that readers can get of a flavor of self-supervised learning in trying
out our notebooks but also learn how to fish by applying self-supervised learning on their datasets and in
different domains.

1 Yosinski, J., Clune, J., Bengio, Y. & Lipson, H. How transferable are features in deep neural
networks? Advances in neural information processing systems 27 (2014).
2 Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional
neural networks. Commun. ACM 60, 84–90 (2017). https://doi.org:10.1145/3065386
3 Shen, D., Wu, G. & Suk, H.-I. Deep learning in medical image analysis. Annual review of
biomedical engineering 19, 221 (2017).
4 Ge, M., Su, F., Zhao, Z. & Su, D. Deep learning analysis on microscopic imaging in materials
science. Materials Today Nano 11 (2020). https://doi.org:10.1016/j.mtnano.2020.100087
5 Baskaran, A. et al. Adoption of Image-Driven Machine Learning for Microstructure
Characterization and Materials Design: A Perspective. JOM 73, 3639-3657 (2021).
6 Melanthota, S. K. et al. Deep learning-based image processing in optical microscopy. Biophysical
Reviews, 1-19 (2022).
7 Jacobs, R. Deep learning object detection in materials science: Current state and future directions.
Computational Materials Science 211, 111527 (2022).
8 Ede, J. M. Deep learning in electron microscopy. Machine Learning: Science and Technology 2,
011004 (2021).
9 von Chamier, L. et al. Democratising deep learning for microscopy with ZeroCostDL4Mic.
Nature communications 12, 1-18 (2021).
10 Chowdhury, A., Kautz, E., Yener, B. & Lewis, D. Image driven machine learning methods for
microstructure recognition. Computational Materials Science 123, 176-187 (2016).
11 Modarres, M. H. et al. Neural network for nanoscience scanning electron microscope image
recognition. Scientific reports 7, 1-12 (2017).
12 Luo, Q., Holm, E. A. & Wang, C. A transfer learning approach for improved classification of
carbon nanomaterials from TEM images. Nanoscale Advances 3, 206-213 (2021).
13 Akers, S. et al. Rapid and flexible segmentation of electron microscopy data using few-shot
machine learning. npj Computational Materials 7, 1-9 (2021).
14 Madsen, J. et al. A deep learning approach to identify local structures in atomic‐resolution
transmission electron microscopy images. Advanced Theory and Simulations 1, 1800037 (2018).
15 Li, W., Field, K. G. & Morgan, D. Automated defect analysis in electron microscopic images. npj
Computational Materials 4, 1-9 (2018).
16 Han, Y. et al. A novel transfer learning for recognition of overlapping nano object. Neural
Computing and Applications 34, 5729-5741 (2022).
17 Qu, E. Z., Jimenez, A. M., Kumar, S. K. & Zhang, K. Quantifying Nanoparticle Assembly States
in a Polymer Matrix through Deep Learning. Macromolecules 54, 3034-3040 (2021).
18 Yang, S. J. et al. Assessing microscope image focus quality with deep learning. BMC
bioinformatics 19, 1-9 (2018).
19 Senaras, C., Niazi, M. K. K., Lozanski, G. & Gurcan, M. N. DeepFocus: detection of out-of-focus
regions in whole slide digital images using deep learning. PloS one 13, e0205387 (2018).
20 Lee, W. et al. Robust autofocusing for scanning electron microscopy based on a dual deep
learning network. Scientific reports 11, 1-12 (2021).
21 Azimi, S. M., Britz, D., Engstler, M., Fritz, M. & Mücklich, F. Advanced steel microstructural
classification by deep learning methods. Scientific reports 8, 1-14 (2018).
22 Furat, O. et al. Machine learning techniques for the segmentation of tomographic image data of
functional materials. Frontiers in Materials 6, 145 (2019).
23 Tsopanidis, S., Moreno, R. H. & Osovski, S. Toward quantitative fractography using
convolutional neural networks. Engineering Fracture Mechanics 231, 106992 (2020).
24 Groschner, C. K., Choi, C. & Scott, M. C. Machine learning pipeline for segmentation and defect
identification from high-resolution transmission electron microscopy data. Microscopy and
Microanalysis 27, 549-556 (2021).
25 Jacobs, R. et al. Performance and limitations of deep learning semantic segmentation of multiple
defects in transmission electron micrographs. Cell Reports Physical Science 3, 100876 (2022).
26 Han, Y. et al. Center-environment feature models for materials image segmentation based on
machine learning. Scientific Reports 12, 1-9 (2022).
27 Li, X. et al. A transfer learning approach for microstructure reconstruction and structure-property
predictions. Scientific reports 8, 1-13 (2018).
28 Yang, Z. et al. Microstructural materials design via deep adversarial learning methodology.
Journal of Mechanical Design 140 (2018).
29 Kudyshev, Z. A., Kildishev, A. V., Shalaev, V. M. & Boltasseva, A. Machine-learning-assisted
metasurface design for high-efficiency thermal emitter optimization. Applied Physics Reviews 7,
021407 (2020).
30 Weigert, M. et al. Content-aware image restoration: pushing the limits of fluorescence
microscopy. Nature methods 15, 1090-1097 (2018).
31 Wang, H. et al. Deep learning enables cross-modality super-resolution in fluorescence
microscopy. Nature methods 16, 103-110 (2019).
32 Qiao, C. et al. Evaluation and development of deep neural networks for image super-resolution in
optical microscopy. Nature Methods 18, 194-202 (2021).
33 Luo, Y., Huang, L., Rivenson, Y. & Ozcan, A. Single-shot autofocusing of microscopy images
using deep learning. ACS Photonics 8, 625-638 (2021).
34 Manifold, B., Thomas, E., Francis, A. T., Hill, A. H. & Fu, D. Denoising of stimulated Raman
scattering microscopy images via deep learning. Biomedical optics express 10, 3860-3874 (2019).
35 Laine, R. F., Jacquemet, G. & Krull, A. Imaging in focus: an introduction to denoising bioimages
in the era of deep learning. The International Journal of Biochemistry & Cell Biology 140,
106077 (2021).
36 Cheplygina, V., de Bruijne, M. & Pluim, J. P. Not-so-supervised: a survey of semi-supervised,
multi-instance, and transfer learning in medical image analysis. Medical image analysis 54, 280-
296 (2019).
37 Yang, X., Song, Z., King, I. & Xu, Z. A survey on deep semi-supervised learning. arXiv preprint
arXiv:2103.00550 (2021).
38 Peikari, M., Salama, S., Nofech-Mozes, S. & Martel, A. L. A cluster-then-label semi-supervised
learning approach for pathology image classification. Scientific reports 8, 1-13 (2018).
39 Pu, Y. et al. Variational autoencoder for deep learning of images, labels and captions. Advances
in neural information processing systems 29 (2016).
40 Chen, M., Shi, X., Zhang, Y., Wu, D. & Guizani, M. Deep feature learning for medical image
analysis with convolutional autoencoder neural network. IEEE Transactions on Big Data 7, 750-
758 (2017).
41 Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. in International conference on machine
learning. 1597-1607 (PMLR).
42 Liu, X. et al. Self-supervised learning: Generative or contrastive. IEEE Transactions on
Knowledge and Data Engineering (2021).
43 Azizi, S. et al. in Proceedings of the IEEE/CVF International Conference on Computer Vision.
3478-3488.
44 Ciga, O., Xu, T. & Martel, A. L. Self supervised contrastive learning for digital histopathology.
Machine Learning with Applications 7, 100198 (2022).
45 Conrad, R. & Narayan, K. CEM500K, a large-scale heterogeneous unlabeled cellular electron
microscopy image dataset for deep learning. Elife 10, e65894 (2021).
46 Ronneberger, O., Fischer, P. & Brox, T. in International Conference on Medical image
computing and computer-assisted intervention. 234-241 (Springer).
47 Karabağ, C., Verhoeven, J., Miller, N. R. & Reyes-Aldasoro, C. C. Texture segmentation: An
objective comparison between five traditional algorithms and a deep-learning U-Net architecture.
Applied Sciences 9, 3900 (2019).
48 Yao, L., Ou, Z., Luo, B., Xu, C. & Chen, Q. Machine learning to reveal nanoparticle dynamics
from liquid-phase TEM videos. ACS central science 6, 1421-1430 (2020).
49 Zbontar, J., Jing, L., Misra, I., LeCun, Y. & Deny, S. in International Conference on Machine
Learning. 12310-12320 (PMLR).
50 He, K., Zhang, X., Ren, S. & Sun, J. in Proceedings of the IEEE conference on computer vision
and pattern recognition. 770-778.
51 Tadiello, L. et al. The filler–rubber interface in styrene butadiene nanocomposites with
anisotropic silica particles: morphology and dynamic properties. Soft Matter 11, 4022-4033
(2015).
52 Sun, Y. L. et al. Conductive Composite Materials Fabricated from Microbially Produced Protein
Nanowires. Small 14, 1-5 (2018). https://doi.org:10.1002/smll.201802624
53 Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. Journal of machine learning
research 9 (2008).
54 Matuszewski, D. J. & Sintorn, I.-M. TEM virus images: Benchmark dataset and deep learning
classification. Computer Methods and Programs in Biomedicine 209, 106318 (2021).
55 Belevich, I., Joensuu, M., Kumar, D., Vihinen, H. & Jokitalo, E. Microscopy image browser: a
platform for segmentation and analysis of multidimensional datasets. PLoS biology 14, e1002340
(2016).
56 Birant, D. & Kut, A. ST-DBSCAN: An algorithm for clustering spatial–temporal data. Data &
knowledge engineering 60, 208-221 (2007).
57 Pedregosa, F. et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning
research 12, 2825-2830 (2011).
58 Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh, N. & Liang, J. in Deep learning in medical
image analysis and multimodal learning for clinical decision support 3-11 (Springer, 2018).
59 Huang, H. et al. in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP). 1055-1059 (IEEE).
60 He, K., Gkioxari, G., Dollár, P. & Girshick, R. in Proceedings of the IEEE international
conference on computer vision. 2961-2969.
61 Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at
scale. arXiv preprint arXiv:2010.11929 (2020).

Round 2

Revised manuscript submitted on 05 Sep 2022

Editor’s decision letter

17-Sep-2022

Dear Dr Jayaraman:

Manuscript ID: DD-ART-06-2022-000066.R1
TITLE: Semi-supervised machine learning workflow for analysis of nanowire morphologies from transmission electron microscopy images

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

Reviewer comments

Reviewer 1

I thank the authors for their detailed response and careful revisions. The updated introduction provides a nice background for the work in this manuscript.

Reviewer 3

The Authors reasonably answered my comments. Therefore, I recommend this paper to publish in the journal.

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.