From the journal Digital Discovery Peer review history

Generative adversarial networks and diffusion models in material discovery

Round 1

Manuscript submitted on 24 Jul 2023
 

23-Aug-2023

Dear Mr Alverson:

Manuscript ID: DD-ART-07-2023-000137
TITLE: Generative adversarial networks and diffusion models in material discovery

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions are necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Kedar Hippalgaonkar
Associate Editor, Digital Discovery
Royal Society of Chemistry

************


 
Reviewer 1

In this paper, Alverson et al. propose a novel method of encoding the periodic chemical and structural information of crystals contained in Crystallographic Information File format for use in generative models. Alverson et al. apply their novel representation method to the task of training a diffusion model capable of generating new stable, symmetrical crystal structures based on the distribution of the training data and also test their representation method using other well-established generative models such as GAN, WGAN and compare the results of the diffusion model. With these results, Alverson et al. Claim that their representation method, when used with diffusion models, generate more symmetrical and realistic crystals than GAN and WGAN when using the same dataset and method of representation, and they back up their claim with evidence throughout the manuscript and direct comparison of the crystals generated by diffusion, GAN and WGAN models.

Pending minor revisions, I’m in favor of the publication of this article.

Your model performs very well when compared to GAN type models found in the literature (and cited in your paper, i.e. iMatGen, CCDCGAN) which require much more training data than the diffusion model you trained using the PCD, and the training data used in these papers were much more reflective of the types of systems that the authors were investigating. In contrast, the PCD has a much more diverse set of training structures. It may be worth mentioning this fact on page 4 before talking about other examples of diffusion models. In my opinion, for the advancement of materials by inverse design, this is the one of the most challenging hurdles for generative crystal models to overcome, and you’ve shown that the combination of your CrysTens representation and diffusion models can jump this hurdle.

Page 4, first paragraph: I would recommend using generated instead of “fake” when talking about GAN, as the discriminator is not deciding between real or fake, it’s deciding between real or generated.

Page 6, last paragraph: Instead of asking the questions yourself, please incorporate the reasons for making the choice elsewhere in your explanation of the CrysTens representation; asking the questions interrupts the flow of the paper.

Page 9, Fig. 5(e): Perhaps it’s my eyes, but the right plot does not look like the average of the positions from the instances in the left plot, please double check this and/or consider using a simpler example of positions.

Page 12: link to CGCNN formation energy model is broken.

Page 20-21: what criterion did you use to evaluate convergence of the plane-wave cutoff energy, smearing (for metals), and k-point grid of your VASP calculations? Please include these details the “Further Validation” section.

The rest of the paper is very well written and the conclusion summarizes the results and discussion well.

Reviewer 2

This paper proposes a novel crystal structure representation called "CrysTens" and uses it to encode crystal structures of Pearson Crystal Database. The study compares the data generation performance of Vanilla GAN, WGAN, and diffusion model by training them with the proposed representation. The authors obtain 1000 random generated data from each model and compare the quality of the generated representations and the various property distributions of converted CIF data predicted by pre-trained CGCNN. The results indicate that the diffusion model generates the most realistic crystal structures. I believe this research can provide meaningful insights for researchers studying crystal structure generative models, and particularly, the “CrysTens” representation has the potential for widespread utility for general representation of crystal structure. However, I have some comments on certain aspects that need to be supplemented in this paper.

Q) In a crystal structure representation, it is crucial to satisfy invariance (permutation, rotational, and translation), but the "CrysTens" representation does not meet these invariances. It seems necessary to provide an explanation in the paper (or a strategy to address the invariance issue through data augmentation or similar methods).

Q) The authors used the state-of-the-art "Imagen" model for the diffusion model, so for a fair comparison, I believe that more advanced model architectures of GAN and WGAN should also be selected. This would allow for a proper comparison of data generation performance between adversarial learning and denoising techniques. The two GAN models currently used in this paper seem too primitive.

Q) I am uncertain whether using pre-trained property prediction models (CGCNN) to compare property distributions of generated crystal structures is an appropriate comparison method. Generated structures by the generative models may significantly differ from the chemical space previously trained by the property prediction model (typically from the Material Project dataset). Moreover, the CGCNN models may be biased toward stable structures only, leading to inaccurate predictions for generated unstable structures. I am unsure if it is valid to compare the uncertain predictions as the property distribution of the generated structures.

Q) The DFT calculation conditions are not described in the paper (there is no supporting information). And, the meaning of "consecutive structural relaxation" in the main text is somewhat ambiguous. If “after the 4th relaxation” means the number of self-consistent field (SCF) iterations in the DFT structural relaxation, it needs to perform not only four relaxation steps but also converge until the structure of the local minimum emerges before energy or pressure comparison.

Q) On page 23, providing clear criteria and rules for the post-processing step would be helpful for the readers. Referring to the "potentially erroneous atomic number assignment" could be confusing without specific post-processing guidelines.

Q) In the "Future works" section, it is mentioned to add text-based conditions (chemical composition and molecular formula) for conditional generation. However, in materials design using generation models, it might be more meaningful to have a model that performs inverse design, using material properties as conditions to design materials with desired properties. Nevertheless, considering that the diffusion model is based on “Imagen”, the transformation of the model seems challenging (only text2image), this limitation could also be addressed in limitation or future work section.

Q) There is no explanation provided for the similarity score mentioned in Fig 6, 7, and 8.

Q) The discussion about the predicted property distributions by CGCNN is also too brief. More detailed explanations and analysis that can quantitatively express the differences or similarities between each distribution seem necessary.

Reviewer 3

Recommendation: Publish after major revisions.
Comments:
This work reports a new crystal representation “CrysTens” to encode crystal features, which is then paired with generative models to discover new inorganic crystal materials. While the authors comprehensively demonstrated the performance of their crystal representations and diffusion models, this paper required more detailed explanation of crystal features and validation of generated materials. I would like to see several major modifications and justifications before recommending this paper for publication.

First, the motivation and background knowledge of this work need to be clarified. In the 2nd paragraph in the background: “New representations that capture chemistry (composition) and structure (periodicity) are important because they then allow us to utilize machine learning algorithms to identify and exploit patterns in data.” The authors try to prove the importance of new crystal representation by mentioning traditional empirical approaches and simulation-based approaches. However, it’s also important and better to stress the term” crystal representation” in the context of machine learning and generative models in the 4th paragraph. Also, because one of the main focuses of the work is the CrystTens representations, the authors should spend more words on other representations and comparison of its performance against other state-of-the-art crystal representations.

The key strength and potential drawbacks of the crystal representation need to be better explained. In section 3.1, 3rd paragraph “not all CIF entries include the basis.”, CIF files should encode position information of all atoms in the unit cell (can be primitive, conventional, or calculated/output), what exactly does the basis mean here for CIF. Give an example of CIF that does not have a “basis” and the reasoning of why it cannot be used in CrysTens representation. Also, in CrysTens representation, are the lattice parameter, angles, and SG repeated 52 times because each crystal material should only have one set of lattice parameters, angles, and one space group? Please explain the reason behind this and the possible outcome of encoding the same information multiple times.

In Page 8, does “K-Means Clustering with K = 3” mean the authors assume the generated crystal is a ternary crystal material? In the context of training with not only ternary crystals, if the model generated a material with 2, 4, or more elements, how does the algorithm identify it? Please state the limitation of generated crystal and the effect of using K-means Clustering to process data.
Furthermore, the background knowledge of CIF, VESTA, and other related packages can be moved to the Appendix to provide a better focus on crystal representations and generative models.
The authors calculated decomposition energy (energy above the convex hull, Ehull) in both the Result and Discussion sections. However, the authors did not provide information about how Ehull is calculated and what is the reference materials in the same system. Another major problem of formation energy and Ehull is the use of CGCNN. Because the original CGCNN model is trained on “DFT relaxed crystal structures”, the test material should also be at its ground state or near ground state (relaxed) structure to get a good prediction from CGCNN, but in this case, there is no reason/validation given to prove the viability of using CGCNN on these generated CIF.

The validation method and procedure of the generative model can be improved. The formation energy of inorganic crystal materials is not a good filter to compare and for choosing stable compounds, especially when studying materials from different systems. In the procedure, the authors used may exclude a large number of materials with high relative value but low in-system formation energy. On the other hand, the validation section lacks a clear metric to assess the performance of generative models. In previous sections, the authors demonstrates diffusion models are more efficient at generating “symmetry” materials, but a metric for how well the model can predict “real” crystal is yet to be validated. I suggest one possible metric can be to calculate the difference between free energy before and after relaxation. For example, quantify the success rate of which energy difference is below a preset threshold. Finally, the author should explore more validation methods and prove the effectiveness of the model from other perspectives. For example, the authors can try to generate existing crystal materials using diffusion models and compare the crystal features error against these existing crystal structures.


 

Referee: 1

Comments to the Author
In this paper, Alverson et al. propose a novel method of encoding the periodic chemical and structural information of crystals contained in Crystallographic Information File format for use in generative models. Alverson et al. apply their novel representation method to the task of training a diffusion model capable of generating new stable, symmetrical crystal structures based on the distribution of the training data and also test their representation method using other well-established generative models such as GAN, WGAN and compare the results of the diffusion model. With these results, Alverson et al. Claim that their representation method, when used with diffusion models, generate more symmetrical and realistic crystals than GAN and WGAN when using the same dataset and method of representation, and they back up their claim with evidence throughout the manuscript and direct comparison of the crystals generated by diffusion, GAN and WGAN models.

Pending minor revisions, I’m in favor of the publication of this article.

Your model performs very well when compared to GAN type models found in the literature (and cited in your paper, i.e. iMatGen, CCDCGAN) which require much more training data than the diffusion model you trained using the PCD, and the training data used in these papers were much more reflective of the types of systems that the authors were investigating. In contrast, the PCD has a much more diverse set of training structures. It may be worth mentioning this fact on page 4 before talking about other examples of diffusion models. In my opinion, for the advancement of materials by inverse design, this is the one of the most challenging hurdles for generative crystal models to overcome, and you’ve shown that the combination of your CrysTens representation and diffusion models can jump this hurdle.
- This has been added to the beginning of the “CrysTens Representation” section on Page 4.

Page 4, first paragraph: I would recommend using generated instead of “fake” when talking about GAN, as the discriminator is not deciding between real or fake, it’s deciding between real or generated.
- This has been changed.

Page 6, last paragraph: Instead of asking the questions yourself, please incorporate the reasons for making the choice elsewhere in your explanation of the CrysTens representation; asking the questions interrupts the flow of the paper.
- The questions were removed and the wording was changed so as to not interrupt the flow of the paper.

Page 9, Fig. 5(e): Perhaps it’s my eyes, but the right plot does not look like the average of the positions from the instances in the left plot, please double check this and/or consider using a simpler example of positions.
- This figure was used just to illustrate the general process for the post-processing averaging, however, the lack of precision in the figure itself was misleading. The figure was recreated to be more accurate.

Page 12: link to CGCNN formation energy model is broken.
- This has been addressed

Page 20-21: what criterion did you use to evaluate convergence of the plane-wave cutoff energy, smearing (for metals), and k-point grid of your VASP calculations? Please include these details the “Further Validation” section.
- We selected the cutoff energy as 400eV, smearing (which is ISMEAR in VASP) as 1 (which is also the default method of Methfessel-Paxton of order 1), and kpoint grid as 3x3x3. We selected those values based on Vaspwiki and VASP forums, those values are quite common. Here is a paper that do similar energy calculation in other application: https://journals.aps.org/prb/abstract/10.1103/PhysRevB.87.214102. These details have been added to the manuscript.

The rest of the paper is very well written and the conclusion summarizes the results and discussion well.


Referee: 2

Comments to the Author
This paper proposes a novel crystal structure representation called "CrysTens" and uses it to encode crystal structures of Pearson Crystal Database. The study compares the data generation performance of Vanilla GAN, WGAN, and diffusion model by training them with the proposed representation. The authors obtain 1000 random generated data from each model and compare the quality of the generated representations and the various property distributions of converted CIF data predicted by pre-trained CGCNN. The results indicate that the diffusion model generates the most realistic crystal structures. I believe this research can provide meaningful insights for researchers studying crystal structure generative models, and particularly, the “CrysTens” representation has the potential for widespread utility for general representation of crystal structure. However, I have some comments on certain aspects that need to be supplemented in this paper.

Q) In a crystal structure representation, it is crucial to satisfy invariance (permutation, rotational, and translation), but the "CrysTens" representation does not meet these invariances. It seems necessary to provide an explanation in the paper (or a strategy to address the invariance issue through data augmentation or similar methods).
- The authors agree with the reviewer that satisfying invariance would be ideal for the crystal structure representation. Despite this, we found that CrysTens provided a concise, structurally informative, and image-like representation that served our purposes of comparison between GAN and diffusion model architectures. Working towards a representation that satisfies invariance is within the scope of our future work. This sentiment has been added to the manuscript.

Q) The authors used the state-of-the-art "Imagen" model for the diffusion model, so for a fair comparison, I believe that more advanced model architectures of GAN and WGAN should also be selected. This would allow for a proper comparison of data generation performance between adversarial learning and denoising techniques. The two GAN models currently used in this paper seem too primitive.
- The diffusion model used was a Github repository that was attempting to recreate the Imagen architecture from the paper “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding.” It was not the actual models (or architecture) created by Google as they are not open-source. The Imagen-Pytorch Github produces far smaller and less complex models than the actual Imagen model and appropriately served our purposes when repurposing for crystal generation. We felt as if the comparison between the GAN models and Imagen-Pytorch was appropriate as it was a comparison between techniques more than anything else. Furthermore, the scale that Imagen operates at is what makes for its spectacular performance and given our compute restraints we would not be able to replicate its generation abilities. Essentially, we pushed both our GANs and diffusion models to the limits of what our computational resources allowed. As evidence the GAN models were not primitive, our WGANs were capable of performing similarly to our diffusion models on many metrics.


Q) I am uncertain whether using pre-trained property prediction models (CGCNN) to compare property distributions of generated crystal structures is an appropriate comparison method. Generated structures by the generative models may significantly differ from the chemical space previously trained by the property prediction model (typically from the Material Project dataset). Moreover, the CGCNN models may be biased toward stable structures only, leading to inaccurate predictions for generated unstable structures. I am unsure if it is valid to compare the uncertain predictions as the property distribution of the generated structures.
- Given that CGCNN models are certainly biased towards stable structures, it is true that the predictions for unstable structures may not be 100% accurate. The choice to include CGCNN predicted property comparisons for each of the different models stems from a goal to evaluate the models on their ability to capture the intricacies on the dataset it was trained on. We understand that the CGCNN predicted properties may differ from the actual properties of the 1000 generated CIFs for each model. However, from a generalizability perspective, it is promising that our WGAN and Diffusion Models were able to replicate the CGCNN-predicted qualities that appear in the dataset whilst maintaining the ability to produce new crystals.

Q) The DFT calculation conditions are not described in the paper (there is no supporting information). And, the meaning of "consecutive structural relaxation" in the main text is somewhat ambiguous. If “after the 4th relaxation” means the number of self-consistent field (SCF) iterations in the DFT structural relaxation, it needs to perform not only four relaxation steps but also converge until the structure of the local minimum emerges before energy or pressure comparison.
- The DFT calculation conditions were expanded upon in the manscript. Our goal was to relax the structure to its stable (low) energy state for the energy calculation. We could have performed only one relaxation, allowing many SCF iterations and low cutoff energy value, however that would have taken too long and would not have provided us a way to monitor progress. Thus, rather than performing many SCF iterations within one relaxation job, we chose to perform four consecutive relaxation jobs with our selected cutoff energy values, we limited the SCF iteration to 60 cycles (which is also the default of VASP). Upon inspection of the calculation files, most of them took much less than that to reach the cutoff energy value.

Q) On page 23, providing clear criteria and rules for the post-processing step would be helpful for the readers. Referring to the "potentially erroneous atomic number assignment" could be confusing without specific post-processing guidelines.
- A post-processing summary was included to aid in understanding.

Q) In the "Future works" section, it is mentioned to add text-based conditions (chemical composition and molecular formula) for conditional generation. However, in materials design using generation models, it might be more meaningful to have a model that performs inverse design, using material properties as conditions to design materials with desired properties. Nevertheless, considering that the diffusion model is based on “Imagen”, the transformation of the model seems challenging (only text2image), this limitation could also be addressed in limitation or future work section.
- Although the example used within the “Future Works” section is simply just a method for creating crystals with a specific chemical composition, the longterm goal of text2crystal would be to have the ability to relay far more specifics about the crystals that are wanted. An example of this could be something like “An oxide with offset layers of corner shared AlO6 octahedra with rare-earth filled interstitials.” The authors agree that inverse design is difficult to do with text2crystal alone, however, another aspect of diffusion models that could be included in future work would be classifier/regressor guidance. Guidance allows diffusion models to take advantage of the outputs of a classifier or regressor to guide the reverse-diffusion process to ideal regression/classification outputs. This would make inverse design possible if the guiding model predicts material properties. These cases have been expanded to the “Future Works” section


Q) There is no explanation provided for the similarity score mentioned in Fig 6, 7, and 8.
- This has been replaced by the Kolmogorov-Smirnov statistic. The lower the statistic the more similar the distributions between the real and generated values are.

Q) The discussion about the predicted property distributions by CGCNN is also too brief. More detailed explanations and analysis that can quantitatively express the differences or similarities between each distribution seem necessary.
- The Kolmogorov-Smirnov statistic was also added in this case for quantitative comparison. The discussion surrounding CGCNN was extended.

Referee: 3

Comments to the Author
Recommendation: Publish after major revisions.
Comments:
This work reports a new crystal representation “CrysTens” to encode crystal features, which is then paired with generative models to discover new inorganic crystal materials. While the authors comprehensively demonstrated the performance of their crystal representations and diffusion models, this paper required more detailed explanation of crystal features and validation of generated materials. I would like to see several major modifications and justifications before recommending this paper for publication.

First, the motivation and background knowledge of this work need to be clarified. In the 2nd paragraph in the background: “New representations that capture chemistry (composition) and structure (periodicity) are important because they then allow us to utilize machine learning algorithms to identify and exploit patterns in data.” The authors try to prove the importance of new crystal representation by mentioning traditional empirical approaches and simulation-based approaches. However, it’s also important and better to stress the term” crystal representation” in the context of machine learning and generative models in the 4th paragraph.
- We added more clarifying points relating our use of the term “crystal representation” with machine learning and generative modelling. Anytime we mention a crystal representation we are referring to ways to encode the crystal information in a tensor that is consumable by a machine learning model.

Also, because one of the main focuses of the work is the CrysTens representations, the authors should spend more words on other representations and comparison of its performance against other state-of-the-art crystal representations.
- The authors agree with the reviewer that a comprehensive study of the space of crystal representations in generative modelling is necessary, however we do not believe it would fit well in this work. We attempted to make CrysTens as image-like as possible so that it would easily fit into current image-generation models. Other crystal representations such as FTCP and the representation used in the CCDGGAN do not follow the same shape constraints as CrysTens which would make it very difficult to insert these representations into the diffusion model because it expects square images. We do not claim state-of-the-art in our work and just want to illustrate the ability of image-like crystal representations paired with image-diffusion models in this space. Comparing between other representations may fit into future work as we explore the diffusion model’s capabilities.

The key strength and potential drawbacks of the crystal representation need to be better explained.
- These were expanded upon in the CrysTens Representation section.

In section 3.1, 3rd paragraph “not all CIF entries include the basis.”, CIF files should encode position information of all atoms in the unit cell (can be primitive, conventional, or calculated/output), what exactly does the basis mean here for CIF. Give an example of CIF that does not have a “basis” and the reasoning of why it cannot be used in CrysTens representation.
- This was misleading wording and has been removed. Some of the CIFs that were gathered from PCD were error-ridden and, for example, would not contain any atoms or would have lattice parameters that were several of magnitudes larger or smaller than expected. We performed data cleaning on the CIFs that we had access to and were left with the 53,856 CIFs that were used to train our models.

Also, in CrysTens representation, are the lattice parameter, angles, and SG repeated 52 times because each crystal material should only have one set of lattice parameters, angles, and one space group? Please explain the reason behind this and the possible outcome of encoding the same information multiple times.
- The reasoning mentioned here is correct. CrysTens is not the first representation that was investigated throughout this research. Our first representation simply consisted of a list of the parameters, angles, and SG, mixed in with the list of atomic numbers and positions. We found that, although this representation was very simple and easy to use, the generated samples were oftentimes evidently noisy due to the stochastic nature of GANs and diffusion models. As we iterated on our representation, we wanted it to represent the crystal structure in several ways such as, the redundant inclusion of a pairwise distance matrix and the unidimensional matrices. This would force the model to reconcile with all of the different patterns within the crystal representation and make each one align appropriately. Additionally, it would give us several ways to reconstruct the crystal so that we could average all of them in an attempt to mitigate noise. The same goes for the repeated parameters, angles, and SG numbers. The model would be forced to learn the same parameter with respect to different positions within the representation (as both the GANs and the diffusion models are convolutional neural networks). The repeating of parameters is a result of our desire to create a noise-resistant representation and arises naturally from the representation’s image-like appearance. As is discussed in the manuscript, the hope is that there is a high level of agreement between these different parameter assignments (and can serve as an additional performance indicator). This was expanded upon in Section 3.1 CrysTens Representation.

TODO: In Page 8, does “K-Means Clustering with K = 3” mean the authors assume the generated crystal is a ternary crystal material? In the context of training with not only ternary crystals, if the model generated a material with 2, 4, or more elements, how does the algorithm identify it? Please state the limitation of generated crystal and the effect of using K-means Clustering to process data.
- Yes, when K = 3, the only crystals that are generated will be ternary. The vast majority of crystals within PCD were ternary which is why we chose it for the generation of the 1000 CIFs for evaluation as we were unable to tune K for each of the 1000 CIFs. Each model consequentially suffered from the same limitation during this step and so a comparison between the three models with this limiting factor is still valid. When the 6 CIFs were chosen for DFT simulation, a K was chosen for each one of them specifically depending on their qualities upon inspection. This is an imperfect aspect of the current post-processing process. A large aspect of our future work will focus on the ML-drive automation of our post-processing system so that we can rid ourselves of assumptions such as these.

Furthermore, the background knowledge of CIF, VESTA, and other related packages can be moved to the Appendix to provide a better focus on crystal representations and generative models.
- This section has been moved to the Appendix.

The authors calculated decomposition energy (energy above the convex hull, Ehull) in both the Result and Discussion sections. However, the authors did not provide information about how Ehull is calculated and what is the reference materials in the same system.
- The calculation of decomposition was done with another CGCNN model. However, we recognize the shortcomings of this method in isolation so we added an additional calculation of EHull energy which is explained below.

Another major problem of formation energy and Ehull is the use of CGCNN. Because the original CGCNN model is trained on “DFT relaxed crystal structures”, the test material should also be at its ground state or near ground state (relaxed) structure to get a good prediction from CGCNN, but in this case, there is no reason/validation given to prove the viability of using CGCNN on these generated CIF.
- Given that CGCNN models are certainly biased towards stable structures, it is true that the predictions for unstable structures may not be 100% accurate. The choice to include CGCNN predicted property comparisons for each of the different models stems from a goal to evaluate the models on their ability to capture the intricacies on the dataset it was trained on. We understand that the CGCNN predicted properties may differ from the actual properties of the 1000 generated CIFs for each model. However, from a generalizability perspective, it is promising that our WGAN and Diffusion Models were able to replicate the CGCNN-predicted qualities that appear in the dataset whilst maintaining the ability to produce new crystals.

The validation method and procedure of the generative model can be improved. The formation energy of inorganic crystal materials is not a good filter to compare and for choosing stable compounds, especially when studying materials from different systems. In the procedure, the authors used may exclude a large number of materials with high relative value but low in-system formation energy.
- The authors agree with the reviewer on this point. To supplement the methods in place for verifying the crystal properties, an additional method for calculating Ehull energy using the formation energy predicted by M3GNet. The code for calculating the Ehull energy can be found on our GitHub for this work. Reference materials are gathered from Materials Project and the convex hull is constructed using PyMatGen. Then the M3GNet predicted formation energy can be used to find the Ehull distance.

On the other hand, the validation section lacks a clear metric to assess the performance of generative models. In previous sections, the authors demonstrates diffusion models are more efficient at generating “symmetry” materials, but a metric for how well the model can predict “real” crystal is yet to be validated. I suggest one possible metric can be to calculate the difference between free energy before and after relaxation. For example, quantify the success rate of which energy difference is below a preset threshold. Finally, the author should explore more validation methods and prove the effectiveness of the model from other perspectives. For example, the authors can try to generate existing crystal materials using diffusion models and compare the crystal features error against these existing crystal structures.
- The authors agree with the reviewer that further validation could be applied to validate the crystals that are produced. The primary goal of this work, however, is to show the shortcomings of the GAN architectures when compared to that of the diffusion model architecture in the material discovery space. With this in mind, we focused on how well these models performed from a generative modelling/ML perspective ie how well these models can recreate a distribution of material properties via CGCNN, a distribution of lattice parameters, etc. We do agree that absolute metrics evaluating crystal stability is needed as well, so we added an extra section evaluating the energy above hull energy for each model’s 1,000 CIFs and our VASP DFT section illustrates this as well. In future works, we intend on pushing the diffusion models to their limits and attempting to reach state-of-the-art crystal generation where more non-ML based validation metrics would be more appropriate.




Round 2

Revised manuscript submitted on 01 Oct 2023
 

31-Oct-2023

Dear Mr Alverson:

Manuscript ID: DD-ART-07-2023-000137.R1
TITLE: Generative adversarial networks and diffusion models in material discovery

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

After careful evaluation of your manuscript and the reviewers’ reports, I will be pleased to accept your manuscript for publication after revisions.

Please revise your manuscript to fully address the reviewers’ comments. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link :

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log in to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Kedar Hippalgaonkar
Associate Editor, Digital Discovery
Royal Society of Chemistry

************


 
Reviewer 2

The authors have addressed all my comments, and the quality is improved.

Reviewer 3

Recommendation: Publish after Minor Revision

Comments:
The changes made to the original manuscript resolve some of our concerns. However, some of the major problems, although mentioned by the authors in the response, still need to be addressed. The authors need to clarify several concepts and give explanations for their results.

Questions answered:
1. Crystal representations were explained with ML models in the manuscript.
2. Background knowledge of CIF was moved to the Appendix.

Problems need to be addressed:
1. The authors addressed the previously mentioned ‘basis’ problem, but the specific reasons for filtering over half of the database need to be mentioned in the manuscript.
2. In the response and in the manuscript, the authors provided acceptable explanations for repeated crystal structure parameters. However, the reason provided in two places is different, in the response: “The repeating of parameters is a result of our desire to create a noise-resistant representation and arises naturally from the representation’s image-like appearance” and in the manuscript (Page 5): “we chose to move forward with this representation as we felt it was the most natural image-like representation”. Please provide the exact reason for using the repeating parameters and explain this reason in the manuscript.
3. In the response, the authors mentioned: “diffusion model because it expects square images”. We don’t think square image is a constraint for some or most diffusion models. Please provide examples or reasons for this constraint in the response. If the “square image" is a constraint for diffusion models, please mention it in the manuscript.
4. The authors admit their drawback when using K-means clustering. The K values for the 6 chosen crystals are manually decided. This heavily damages the capability and efficiency of the generative model. In the response, the authors say: “Each model consequentially suffered from the same limitation during this step and so a comparison between the three models with this limiting factor is still valid”, no evidence for this statement is provided. Please provide the proof for this claim in the response. Furthermore, the authors deleted the sentence “K-Means Clustering with K = 3” without adding any explanation, which we think is unacceptable. Please state clearly the K-Means clustering method in the manuscript and explain the drawbacks of using the method.
5. The Drawback of using CGCNN on unrelaxed structures should be discussed in the manuscript. Also, the authors mentioned “WGAN and Diffusion Models were able to replicate the CGCNN-predicted qualities”, please provide evidence to support this statement.
6. The authors used Figure 7 to validate the model performance of WGAN and Diffusion model. The authors should provide extra discussions of the plot in the discussion section. Also, the authors need to provide an explanation for why the generated CIFs have a different distribution from the real CIFs.
7. In Figure 7, the authors plotted “Decomposition Enthalpy”, which is the same as “Energy Above Convex Hull” in Figure 8. There should only be one Ehull prediction from either CGCNN or M3GNet. Please provide a specific reason for using two graphs for two models.


 

The authors appreciate your review.
1. The authors addressed the previously mentioned ‘basis’ problem, but the specific reasons for filtering over half of the database need to be mentioned in the manuscript.
The only CIFs that were filtered out were erroneous, incomplete, or CIFs that contained over 52 atoms in the basis because of the size constraints in CrysTens. This was more actively articulated in the beginning of Section 3.1.

2. In the response and in the manuscript, the authors provided acceptable explanations for repeated crystal structure parameters. However, the reason provided in two places is different, in the response: “The repeating of parameters is a result of our desire to create a noise-resistant representation and arises naturally from the representation’s image-like appearance” and in the manuscript (Page 5): “we chose to move forward with this representation as we felt it was the most natural image-like representation”. Please provide the exact reason for using the repeating parameters and explain this reason in the manuscript.
The core reason for the selection of the CrysTens representation is the repeated crystal structure parameters for the reduction of noise. The image-like nature of the resulting representation is a additional benefit because it allowed us to easily drop this representation into generative models that expect images with minimal changes. The “image-like” description was dropped from Page 5 to minimize ambiguity.

3. In the response, the authors mentioned: “diffusion model because it expects square images”. We don’t think square image is a constraint for some or most diffusion models. Please provide examples or reasons for this constraint in the response. If the “square image" is a constraint for diffusion models, please mention it in the manuscript.
The authors agree with the reviewer on this point. Diffusion models in general do not require square images. To clarify the past response, when working with the imagen-pytorch diffusion model however, it would be difficult to fit asymmetric crystal representations due to the nature of the cascading Unets. Each Unet should be scaling the sides of the representation equally.


4. The authors admit their drawback when using K-means clustering. The K values for the 6 chosen crystals are manually decided. This heavily damages the capability and efficiency of the generative model. In the response, the authors say: “Each model consequentially suffered from the same limitation during this step and so a comparison between the three models with this limiting factor is still valid”, no evidence for this statement is provided. Please provide the proof for this claim in the response. Furthermore, the authors deleted the sentence “K-Means Clustering with K = 3” without adding any explanation, which we think is unacceptable. Please state clearly the K-Means clustering method in the manuscript and explain the drawbacks of using the method.
Since all three models, when generating the 1000 CIFs for evaluation, are treated to the same manner of post-processing (K-Means Clustering with K=3 for atomic number and K=6 for atomic positions) then the limitations imposed by statically selecting K values is uniform for the three models. We agree with the reviewer that the capabilities of the models, when generating large quantities of crystals for evaluation, are damaged with this method and are actively working towards more robust solutions in our future works. The K values selected for the evaluation were not removed from the mansucript, instead they were moved to the end of the Results text section. This was done because those K-values only refer to the 1000 CIFs generated by the models for evaluation and do not reflect the general post-processing usage. The drawbacks of K-Means post-processing were added to Limitations and at the end of Results.


5. The Drawback of using CGCNN on unrelaxed structures should be discussed in the manuscript. Also, the authors mentioned “WGAN and Diffusion Models were able to replicate the CGCNN-predicted qualities”, please provide evidence to support this statement.
The violin plots included in Figure 7 provide evidence for this statement. The Kolmogorov-Smirnov statistic is used to show similarity with CGCNN-predicted qualities of the real distribution. The drawbacks of using CGCNN were included in Results.


6. The authors used Figure 7 to validate the model performance of WGAN and Diffusion model. The authors should provide extra discussions of the plot in the discussion section. Also, the authors need to provide an explanation for why the generated CIFs have a different distribution from the real CIFs.
There is a noticeable increase in similarity between the shape of the real distribution and the generated distributions as we move from GANs to WGANs to Diffusion Models in the violin plots. The Kolmogorov-Smirnov statistic is calculated and shown to quantify this improvement. As modelling techniques improve the generated distribution will become more similar to a real distribution. Matching real CIF distributions is one of the goals of this work and is one of the metrics we use to measure performance. The reason that generated CIFs have a different distribution from real CIFs is because the models are imperfect. Additional clarification was added to the discussion section.


7. In Figure 7, the authors plotted “Decomposition Enthalpy”, which is the same as “Energy Above Convex Hull” in Figure 8. There should only be one Ehull prediction from either CGCNN or M3GNet. Please provide a specific reason for using two graphs for two models.
The authors agree with the reviewer on this point. The CGCNN Decomposition Energy plot and discussion has been removed as to not be a redundant result.




Round 3

Revised manuscript submitted on 24 Nov 2023
 

30-Nov-2023

Dear Mr Alverson:

Manuscript ID: DD-ART-07-2023-000137.R2
TITLE: Generative adversarial networks and diffusion models in material discovery

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Dr Kedar Hippalgaonkar
Associate Editor, Digital Discovery
Royal Society of Chemistry


 
Reviewer 3

The authors have addressed my concerns and made relevant edits. I recommend to accept.




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license