From the journal Digital Discovery Peer review history

An interpretable and transferrable vision transformer model for rapid materials spectra classification

Round 1

Manuscript submitted on 04 Oct 2023
 

17-Dec-2023

Dear Dr Lin:

Manuscript ID: DD-ART-10-2023-000198
TITLE: An Interpretable and Transferrable Vision Transformer Model for Rapid Materials Spectra Classification

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

After careful evaluation of your manuscript and the reviewers’ reports, I will be pleased to accept your manuscript for publication after revisions.

Please revise your manuscript to fully address the reviewers’ comments. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link :

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log in to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Professor Jason Hein
Associate Editor, Digital Discovery

************


 
Reviewer 1

This ms demonstrates an interpretable ViT model for spectroscopic data analysis, and comprehensively compared the model performance with other available methods. I have a few minor comments and recommend publish this manuscript afterward.
1. It would be better to explain jargons and/or add references, such as Top-N.
2. The authors explicitly mentioned the advantage in training time, but the training time must be related to the hardware.
3. For a trained model, is it possible to directly extract knowledge from a certain layer? e.g., if the user's interest is the difference of 002 diffraction among several samples, how can users use this model to extract such information?
4. The authors here focused on 1D spectroscopic data, it would be good if the authors can comment on the application of this method in high-dimensional data, such as microscopy image, hyperspectral. I guess that we may need more layers for high dimensional data, and the model training time and interpretability will be poor.

Reviewer 2

A very interesting and well-written work. Hopefully, this work will stimulate other researchers to utilize the ViT model will be used in the future to combine data from NMR, IR, and MS for the characterization of novel materials.


 

Jian Lin, Ph.D.
Associate Professor
Department of Mechanical and Aerospace Engineering
University of Missouri
Columbia, MO 65211, USA
Office: E2404B Lafferre Hall
Email: LinJian@missouri.edu
Phone: 573-882-8427

Ref: DD-ART-10-2023-000198
Title: “An Interpretable and Transferrable Vision Transformer Model for Rapid Materials Spectra Classification”
Author(s): Zhenru Chen, Yunchao Xie, Yuchao Wu, Yuyi Lin, Shigetaka Tomiya, and Jian Lin

Dear Editor:
Thank you very much for turning around the manuscript. We have made responses to the reviewer’s comments and revised this paper accordingly. The “point-to-point responses to the referee’s comments” is attached below. To make it clear, we have used italic fonts for the reviewers’ comments, bold fonts for our replies and blue fonts for the revisions from the revised manuscript. The revised manuscript has been uploaded to the submission web site. We believe that the revised version meets the high quality required for the journal.

Responses to the referee’s comments
Reviewer #1:
This ms demonstrates an interpretable ViT model for spectroscopic data analysis, and comprehensively compared the model performance with other available methods. I have a few minor comments and recommend publish this manuscript afterward.
Response: We extend our deepest gratitude for the reviewer’s constructive feedback and the positive assessment of our manuscript. The recognition of our work's contribution to the field of spectroscopic data analysis using an interpretable Vision Transformer (ViT) model is greatly appreciated. We appreciate your suggestions on improving our manuscript. Accordingly, we have revised it. Please refer to the replies as shown below and the revised manuscript.

1.It would be better to explain jargons and/or add references, such as Top-N.
Response: We thank the reviewer for the valuable suggestion in clarifying the use of specific jargon. We have taken your advice into consideration and updated the Methods section accordingly.
Correspondingly, we added the following sentences in Page 24. “The model performance was evaluated using Top-N accuracy on the test datasets. In detail, Top-1 accuracy refers to the ViT model’ capability to correctly rank an MOF sample at the first position. Meanwhile, Top-3 and Top-5 accuracies assess the model’s accuracy in ranking the sample within the top three and top five positions, respectively.10”
Reference
(10) H. Wang, Y. Xie, D. Li, H. Deng, Y. Zhao, M. Xin, J. Lin. Rapid Identification of X-ray Diffraction Patterns Based on Very Limited Data by Interpretable Convolutional Neural Networks. Journal of Chemical Information and Modeling 2020, 60, 2004-2011.

2. The authors explicitly mentioned the advantage in training time, but the training time must be related to the hardware.
Response: We thank the reviewer for highlighting the importance of hardware in the training time. The training duration is directly influenced by the computing hardware, where the utilization of more advanced GPUs can significantly reduce the overall training time. To maintain consistency in the hardware environment, all models were trained using the same desktop equipped with Nvidia GPU. In addition, each model underwent a minimum of 10 repetitions, and the resulting average was reported to mitigate the potential variability, aiming to improve precision in describing each model’s training time. Hence, it is justifiable to highlight the advantage of ViT model in terms of training time when compared to other models. In the Method section, we have provided a comprehensive list detailing the hardware and software specifications of the computer utilized in our study in page 25.
“All computations were conducted on a desktop equipped with an Intel Core i7-12700K processor, an NVIDIA GeForce 2080 GPU, and 64GB of RAM, running on the Ubuntu 22.04.2 operating system. The codes were implemented using Python 3.7.9. For data processing, we utilized NumPy version 1.19.2 and Pandas version 1.2.1. The data processing and analysis on the traditional ML models were undertaken using Scikit-learn 1.0.2. The CNN model was constructed using the TensorFlow 2.2.0 framework, while the ViT model was built using PyTorch 1.13.1+cu117.”

3. For a trained model, is it possible to directly extract knowledge from a certain layer? e.g., if the user's interest is the difference of 002 diffraction among several samples, how can users use this model to extract such information?
Response: We thank the reviewer for the good question. In our work, the attention maps of the ViT model were output to investigate the attention allocated to different regions of the input XRD spectra. It was found that the early layers tend to capture basic features, such as the position of specific peaks. And the deeper layers are generally associated with more complex and abstract features, potentially beyond straightforward human visual interpretation. Therefore, it is suggested that ViT model can identify less apparent but important peaks, which proves highly beneficial in distinguishing the MOF samples that share closely similar XRD patterns.
During this process, we found that directly extracting specific, interpretable information from a particular layer, especially detailed information like specific diffraction peaks, presents a significant challenge. This is an area we are keen to explore in our future work, aiming to bridge the gap between the abstract representations learned by the model and the specific, interpretable knowledge desired by users. Such an endeavor often requires additional analysis and the integration of domain-specific knowledge. Or the training dataset of the model needs to be specific to suit the desire output. For example, Oviedo and coworkers’ work aimed to classify perovskite groups from XRD instead of end-to-end identification (2019 NPJ Comput. Mater. 5, 60). Enders and coworkers’ model classified the functional groups within the chemical from the FTIR data (2021 Anal. Chem. 93, 28, 9711-9718).

4. The authors here focused on 1D spectroscopic data, it would be good if the authors can comment on the application of this method in high-dimensional data, such as microscopy image, hyperspectral. I guess that we may need more layers for high dimensional data, and the model training time and interpretability will be poor.
Response: Thanks for the reviewer’s suggestion. ViT model is primary proposed for image classification and has been utilized to address a variety of vision problems, including object detection, image processing, and semantic segmentation. In the realm of microscopy image applications, ViT has made substantial advancements. For example, Wang and coworkers developed global voxel transformer networks (GVTNets) for augmented microscopy via aggregating global information (2021 Nature Machine Intelligence 3, 161). Christensen and his colleagues proposed Spatio-temporal Vision Transformer for super-resolution microscopy with increased temporal resolution by a factor of nine (arXiv:2203.00030). Xue et al. implemented a deep hierarchical vision transformer for hyperspectral and light detection and ranging data joint classification (2022 IEEE Transaction on Image Processing 31, 3095). In this work, we are actively exploring the extension of the methodologies to high-dimensional data scenarios. This includes work on multimodal data, which by its nature is high-dimensional, integrating different types of data sources and formats. The goal is to develop robust models that can effectively handle the complexity of such data while maintaining accuracy and interpretability.
Transitioning from 1D data to 2D data undoubtedly requires more endeavors since 2D data involves more dimensions, features and crucial spatial information. Moreover, handling higher-dimensional data significantly escalates the computational demands for processing and analyzing the information.

Reviewer #2:
A very interesting and well-written work. Hopefully, this work will stimulate other researchers to utilize the ViT model that will be used in the future to combine data from NMR, IR, and MS for the characterization of novel materials.
Response: We are deeply grateful for the reviewer’s recommendation of publication. Currently, we are extending our research to explore the integration of multimodal spectroscopic data using advanced machine learning models. Our aim is to develop methodologies that can efficiently combine and analyze data from various spectroscopic techniques, enhancing the characterization and understanding of complex materials. This approach has the potential to revolutionize how we analyze and interpret spectroscopic data in material science. Hopefully, our work can make a significant contribution to the field of automatic data analysis.




Round 2

Revised manuscript submitted on 19 Dec 2023
 

28-Dec-2023

Dear Dr Lin:

Manuscript ID: DD-ART-10-2023-000198.R1
TITLE: An Interpretable and Transferrable Vision Transformer Model for Rapid Materials Spectra Classification

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Professor Jason Hein
Associate Editor, Digital Discovery


 
Reviewer 1

The authors address most of my questions. I would like to only recommend the author add a brief discussion about the answer to my question 3 in the revised manuscript. Then, I recommend publish it.




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license