From the journal Digital Discovery Peer review history

You do not have JavaScript enabled. Please enable JavaScript to access the full features of the site or access our non-JavaScript page.

Round 1

Manuscript submitted on 21 Jun 2023

Editor’s decision letter

14-Aug-2023

Dear Dr Simon:

Manuscript ID: DD-ART-06-2023-000117
TITLE: Multi-fidelity Bayesian Optimization of Covalent Organic Frameworks for Xenon/Krypton Separations

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

After careful evaluation of your manuscript and the reviewers’ reports, I will be pleased to accept your manuscript for publication after revisions.

Please revise your manuscript to fully address the reviewers’ comments. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link :

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log in to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process. We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

************

Reviewer comments

Reviewer 1

In accordance with Digital Discovery's scope, this work reports the results of a multi-fidelity Bayesian optimization approach to identify promising COF adsorbents with high Xe/Kr selectivity. Two main properties which were utilized by the approach were the uncertainty between simulation results based on infinite dilution and under mixture conditions and Xe/Kr mixture selectivity of the COF adsorbents. With using a surrogate model and acquisition function, guidelines for materials with high selectivity obtained from previous works were exploited to discover next promising materials and the search space out of the guidelines was explored based on the uncertainty predictions as well as the score for identifying a promising COF. Overall, the most promising COF adsorbents among 609 candidates could be found with a total of 48 simulations using the "intelligent" decision-making scheme. The comparison of search efficiency of multi-fidelity Bayesian optimization with other approaches such as two-stage screening, random search, and single fidelity Bayesian optimization also highlighted the promise of this method for efficient computational investigation of COFs for different gas separation applications.
I have the following minor comments before publication:
1. The title of the paper suggests that it is COFs that are targeted to be optimized, however is it not the discovery or the screening methodology that is optimized?
2. In Figure 4, the accumulated runtime between simulations 20 and 21 is much higher than others. Is there a specific reason for this? Also, in the top of this figure, the COFs at the thin tail of the distribution are not visible at all, maybe using a logarithmic scale would help to show them.
3. Fig S3-S5 are not referred in the manuscript but only referred in the SI. The figure numbers might be arranged accordingly.
4. Ref. 86 and ref. 74 are the same paper.

Thank you,

Reviewer 2

First and foremost, I would like to congratulate the authors on this commendable piece of work, which was beautifully done scientifically and scholarly. I really love that this work expertly shows that why and how high-throughput computational material screening may evolve into intelligent (high-throughput) screening. This work will, in my view, undoubtedly provide inspirations for future efforts in similar research areas and/or subjects and will help drive the computational materials research community away from brute-force, high-throughput screening. Second, it is commendable that the authors went to the great lengths in documenting the technical details in an elaborate yet digestible manner. This is quite extraordinary and sets an excellent example for the field to take note and follow. (It made me feel wanting AND able to try out their approach on some of the problems my group has been working on.)

I have to admit that I found it difficult to critique this excellent piece of work, but I would still like to offer the following points for the authors to consider implementing if (and only if) any of them will strengthen the manuscript (please do NOT make additions PURELY to address my comments).

1. How good/strong are the structural and compositional features (i.e., the 14 descriptors forming vector x) in correlating with the high-fidelity (HF) Xe/Kr selectivity, through a machine learning (ML) model, such as Gaussian processes (GP) or kernel ridge regression (KRR)? It could be helpful to have one more baseline comparison, something like: taking 47 COFs uniformly sampled from the 14-dimensional feature space -> training a GP or KRR (or any other model(s) that the authors consider appropriate) to predict HF Xe/Kr selectivity -> using the trained ML model to predict for the other ‘unseen’ COFs.

2. One further baseline comparison, based on (1), is for using the 14 structural and compositional features, together with the Henry’s constants of Xe and Kr (i.e., 16 input features), in ML the HF Xe/Kr selectivity.

3. The above two points could be helpful in gauging the level of dependence of the MFBO’s high predictive ability on the strength of the chosen features input into the surrogate models. More generally, I am curious about whether it might be possible to gain predictive performance through a multi-fidelity approach even with weak ML input features.

4. It might be worth commenting on possible generalisability of the rather elaborate acquisition function or that of its design and/or conceptualization. Similarly, it would be helpful to know what the authors would propose to do for searching (x[n+1], l[n+1]) solutions in cases where the search space is large.

Reviewer 3

I reviewed the data and code for this manuscript. I appreciate the authors’ open repository. The existing state of the repository is already great as is, so my suggestions are only to help the authors further.

1. In the readme of the base git repo, it appears to me that instruction 1 is already carried out (i.e. the data/crystals folder already exists in the repo). I apprecite the exact date and archive number for the materials cloud link. Perhaps a note that this was already done (so that users who clone the repo realize this) would be helpful.
2. This will likely be fairly obvious to most computational scientists, but I think summarizing the software that is necessary in the readme could be useful. For instance, one needs to install zeo++ in order for submit_zeo_calculations.sh to work readily, and the install needs to be in the home directory. I know the repo is likely a work in progress, but there are some lines that should be cleaned up in the readmes. For instance, inside of descriptors, there are some comments produced by the authors: “uh... where did the cof_descriptors.csv come from?”. It appears to me that cof_descriptors comes from chemical_properties and geometric_properties being stitched together.
3. In instruction 3, I’m not sure what the gcmc_mixtures branch in development implies. It appears that the first part of the instruction is not too relevant because I see the script for getting the cof_features with cof_features.jl
4. The shell scripts are well commented and reusable. I appreciate the readme files in each subfolder!
5. The notebook in step 6: “generate_initializing_cof_ids.ipynb” is missing from the repo.
6. Although I mentioned the missing notebook in my note above, it appears that a pickle file containing the cof_ids is present in search_results/normalized, thus making this reproducible. I’m not seeing the point of having the normalized/ folder and thus would suggest deleting this folder and moving everything to the base level to match the readme. There are also some small typos in the readme that lead to the files not matching exactly! Would be helpful to have this adjusted.
7. The notebooks in step 7 are very useful. However the current location is incorrect. They are in “run_BO” not in “search_results”! Adjusting this typo will help users find the notebooks.
8. The figs/ directory is very nice. It is clear that this information is reproducible. I would ask the authors to check the readme quickly, as the “figure_1” directory is not available! It appears that the name of the folder just changed.
9. Overall I think the repo is usable. I think the authors can consider adding the .ipynb_checkpoints to the gitignore and removing these to declutter the repo. Additionally, there are some extra files at the base level (gcmc_calculations.csv and henry_calculations.csv) that could potentially be added to the description on the front readme. Similarly, there are some unexplained folders such as “benchmarking_sims”. Perhaps these are not necessary, but an explanation in the readme would help.

These comments are fairly minor, as I was able to follow all of the details of the repo, but I hope they can help in improving the usability of the code. In particular, I appreciate the commenting of the BO notebooks, which may enable users to repurpose the notebooks for other BO tasks.

— Aditya Nandy

Author response

Dear Dr. Hung and reviewers,

We hereby submit a revision of our article, “Multi-fidelity Bayesian Optimization of Covalent Organic Frameworks for Xenon/Krypton Separations”, to Digital Discovery. We thank the reviewers for their genuinely helpful comments. We have implemented their suggestions, and our paper has improved as a result. A point-by-point response to the reviewers is below, and we submit a new version of our manuscript (one diff copy with the changes highlighted).

Post-review improvements include:
- Adding many unit tests to our BO code. Each function is now well-tested. We are very confident in our code.
- Fixing two [minor] bugs in the code where we erroneously (i) input the fidelity index of 1 instead of the fidelity parameter of ⅔ into the surrogate model in one function and (ii) shifted the # of iterations in Fig. 5 by one.
- Better code documentation in the README.md’s.
- Rerunning using the newest version of BoTorch and GPyTorch, which allowed us to use the default optimizer for maximizing the marginal likelihood of the parameters of the GP. (Specifically, we omitted optimizer=fit_gpytorch_torch from fit_gpytorch_model; see here for the numerical issue we were having without it in the old version.)
- Explaining the different jumps in runtime of the simulations of the same fidelity.
- Using all data, examining the predictivity of the selectivity by a GP using the COF features (a) alone and (b) augmented with the low-fidelity selectivity.
- Quantified the value of the features for MFBO by running with permuted features.
- Discussion of acquisition function design.
- Discussion of scaling to larger sets of materials.

Reviewer 1
In accordance with Digital Discovery's scope, this work reports the results of a multi-fidelity Bayesian optimization approach to identify promising COF adsorbents with high Xe/Kr selectivity. Two main properties which were utilized by the approach were the uncertainty between simulation results based on infinite dilution and under mixture conditions and Xe/Kr mixture selectivity of the COF adsorbents. With using a surrogate model and acquisition function, guidelines for materials with high selectivity obtained from previous works were exploited to discover next promising materials and the search space out of the guidelines was explored based on the uncertainty predictions as well as the score for identifying a promising COF. Overall, the most promising COF adsorbents among 609 candidates could be found with a total of 48 simulations using the "intelligent" decision-making scheme. The comparison of search efficiency of multi-fidelity Bayesian optimization with other approaches such as two-stage screening, random search, and single fidelity Bayesian optimization also highlighted the promise of this method for efficient computational investigation of COFs for different gas separation applications.

Response: This is a good summary of our article.

I have the following minor comments before publication:
1. The title of the paper suggests that it is COFs that are targeted to be optimized, however is it not the discovery or the screening methodology that is optimized?

Response: Though arguably the screening methodology is optimized, this is indeed formally an optimization problem—we seek to optimize an objective function (high-fidelity Xe/Kr selectivity) over a finite set of COFs. We are just leveraging the lower-fidelity simulation to make this optimization more efficient. Hence, we believe that the title of the paper is appropriate. However, we think this is an important distinction to make, and we make this explicit in the text now.

Changes to text:
> Our task constitutes solving an optimization problem (objective function = high-fidelty Xe/Kr selectivity) over a finite set of materials [Coley] with access to bi-fidelity simulations to evaluate the material property.

[We cite Coley because he makes the distinction between optimization over finite and infinite chemical spaces.]

2. In Figure 4, the accumulated runtime between simulations 20 and 21 is much higher than others. Is there a specific reason for this?

Response: Thank you for the great question. We should have explained this. We now add a clarifying remark.

Changes to text:
> Molecular simulations of the same fidelity vary in runtime among different COFs owing to different unit cell sizes, numbers of framework atoms, and, for high-fidelity simulations, average numbers of adsorbates hosted by the COF during the simulation. This explains why some jumps in accumulated runtime, within a given fidelity, are larger than others. For example, the first high-fidelity simulation was in COF 15081N2 with 1760 atoms in the simulation box, and took 85 min; the seventh was in COF 19440N2 with 4768 atoms and took 1002 min.

Also, in the top of this figure, the COFs at the thin tail of the distribution are not visible at all, maybe using a logarithmic scale would help to show them.

Response: Thanks! True, before we couldn’t even see from the histogram that there were COFs with high selectivities. To make the visualization more instructive/interpretable/faithful to the data, we took your suggestion and changed the x-axis to a log scale.

Changes to text: We made the change to Fig. 4, and called out to readers in the text to pay attention to the log scale.
> Note the log-scale.

3. Fig S3-S5 are not referred in the manuscript but only referred in the SI. The figure numbers might be arranged accordingly.

Response: You are right. We accidentally omitted any discussion of these. We added a new subsection to discuss these.

Changes to text: We added a short, new subsection “Post-MFBO analysis of our simulated adsorption data”.

> During the iterative, MFBO-guided COF search, especially in the early stages, the surrogate model lacks complete knowledge of how the high-fidelity simulated Xe/Kr selectivities are related to (i) the structural and chemical features of the COFs and (ii) the low-fidelity selectivities. Nonetheless, post-MFBO, we now examine these relationships using the exhaustive simulation data for all COFs to gain insights. Fig. S5 shows the [strong, R2=0.93, but diminishing at high Xe/Kr selectivities] correlation between the Xe/Kr selectivity of the COFs according to high vs. low-fidelity simulations, and Fig. S6 shows the correlation between the high-fidelity Xe/Kr selectivity and the features of the COFs. To assess our ability to discriminate between the COFs with the highest and lowest simulated Xe/Kr selectivity based on their features, the radar plot in Fig. S7 visualizes the feature vectors of the top- and bottom-15 COFs. Consistent with previous computational studies of Xe/Kr adsorption [], e.g., the COFs with the largest high-fidelity simulated Xe/Kr selectivity exhibit pore diameters that fall within a narrow interval situated a little to the right of the diameter of a Xe adsorbate.

4. Ref. 86 and ref. 74 are the same paper.

Response: Good catch!
Changes to text: We merged these two citations.
Reviewer 2
First and foremost, I would like to congratulate the authors on this commendable piece of work, which was beautifully done scientifically and scholarly. I really love that this work expertly shows that why and how high-throughput computational material screening may evolve into intelligent (high-throughput) screening. This work will, in my view, undoubtedly provide inspirations for future efforts in similar research areas and/or subjects and will help drive the computational materials research community away from brute-force, high-throughput screening. Second, it is commendable that the authors went to the great lengths in documenting the technical details in an elaborate yet digestible manner. This is quite extraordinary and sets an excellent example for the field to take note and follow. (It made me feel wanting AND able to try out their approach on some of the problems my group has been working on.)

Response: Thank you very much for the praise!

I have to admit that I found it difficult to critique this excellent piece of work, but I would still like to offer the following points for the authors to consider implementing if (and only if) any of them will strengthen the manuscript (please do NOT make additions PURELY to address my comments).

Response: Thank you for the good ideas.

1. How good/strong are the structural and compositional features (i.e., the 14 descriptors forming vector x) in correlating with the high-fidelity (HF) Xe/Kr selectivity, through a machine learning (ML) model, such as Gaussian processes (GP) or kernel ridge regression (KRR)? It could be helpful to have one more baseline comparison, something like: taking 47 COFs uniformly sampled from the 14-dimensional feature space -> training a GP or KRR (or any other model(s) that the authors consider appropriate) to predict HF Xe/Kr selectivity -> using the trained ML model to predict for the other ‘unseen’ COFs.

(see next response)

2. One further baseline comparison, based on (1), is for using the 14 structural and compositional features, together with the Henry’s constants of Xe and Kr (i.e., 16 input features), in ML the HF Xe/Kr selectivity.

(This is a combined response to (1) and (2).)
Response: Interesting question. We added a new section in the SI, Sec. S4 with these analyses.

Changes to text:
Main text:
> Similar in spirit to multi-fidelity machine learning and two-stage search, the cheap-, low-fidelity calculations of dilute adsorption properties could serve as features (inputs) to a supervised machine learning model to predict the high-fidelity adsorption property []. In Fig. S5, we show that augmenting the standard chemical and structural features of the COFs with the low-fidelity Xe/Kr selectivity treated as an additional input can dramatically improve the predictivity of a GP on the high-fidelity Xe/Kr selectivity.

SI Sec. S4:
> “GP predictivity of high-fidelity Xe/Kr selectivity”
> Here, we examine the effectiveness of the hand-engineered features of the COFs in x for predicting the high-fidelity Xe/Kr selectivity via a Gaussian process (GP)---including the case where we treat the low-fidelity Xe/Kr selectivity as a feature (so the input is [x y(1/3)]). We randomly partition the COFs into an 80%/20% train/test set. We fit a GP on the train set, then apply the GP to make predictions on the test set of COFs, i. e., predict their high-fidelity Xe/Kr selectivity based on their features. Fig. S8 shows a parity plot comparing the predictions of the GP on the test set with the true, held-out high-fidelity Xe/Kr selectivity. Note, here we use all simulated data for all 608 COFs. The mean square error (MSE) of the GP is 0.99 when using the standard features x and 80% of all data for training, and it decreases to 0.29 when the input is augmented with the low-fidelity Xe/Kr selectivity. This dramatic improvement is explained by the strong correlation between the low- and high-fidelity Xe/Kr selectivities (see Fig. S5).

3. The above two points could be helpful in gauging the level of dependence of the MFBO’s high predictive ability on the strength of the chosen features input into the surrogate models. More generally, I am curious about whether it might be possible to gain predictive performance through a multi-fidelity approach even with weak ML input features.

Response: Very interesting question. This spurred us to conduct a feature permutation study to quantify how much the features are contributing to the MFBO performance.

Changes to text:
> Feature permutation baseline.
> The surrogate model in MFBO relies upon both (1) the chemical and structural features of the COFs and (2) the low-fidelity simulation data available, to make predictions of the high-fidelity Xe/Kr selectivity of COFs. We next aim to measure the cumulative value of the features for the search efficiency of MFBO. To do so, we (1) for each feature, randomly permute its values among the COFs---thus, preserving the distribution of each feature, but decorrelating each feature from the high-fidelity Xe/Kr selectivity—then (2) run MFBO with all of the features jumbled. We repeat this process 15 times. The idea is that the deterioration in the search efficiency of MFBO with permuted features is indicative of the cumulative value of the features for MFBO. Note, a per-feature permutation could quantify the importance of each feature individually for MFBO (which we did not do).

> ⏱ Fig. S4 shows that the search efficiency of MFBO is severely diminished when the features of the COFs are randomly permuted, incurring an average runtime of 254 hr. Thus, the features of the COFs are valuable for MFBO.

4. It might be worth commenting on possible generalisability of the rather elaborate acquisition function or that of its design and/or conceptualization.

Similarly, it would be helpful to know what the authors would propose to do for searching (x[n+1], l[n+1]) solutions in cases where the search space is large.

Response: Excellent questions. We added more discussion about the acquisition function and how to handle larger search spaces.

Changes to text: New subsections of the Discussion, “Remark on acquisition functions” and “Scaling MFBO to larger sets of materials”:

> Remark on acquisition functions
> MFBO constitutes an outer loop, visualized in Fig. 4, for the outer optimization problem of finding the material with the optimal property, of (1) conducting an experiment/simulation, (2) updating the surrogate model, then (3) picking the next material and fidelity for an experiment/simulation. Task (3) constitutes the inner optimization problem---finding the material and fidelity that optimize the acquisition function. The cost-performance of MFBO deteriorates when the cost of solving the inner optimization grows. []

> Herein, we solved the inner-optimization problem via a brute-force inner loop over all COFs. The runtime for this was negligible compared to our molecular simulations because (i) we are optimizing over a finite and relatively small set of COFs and (ii) we possess an analytical expression for the acquisition function in eqn. 12. Other acquisition functions, grounded in different principles
(eg., information about the minimum [], knowledge gradient [], non-myopic look-ahead [], or portfolios []) than the improvement-based, myopic one in eqn. 12, may be more expensive to compute (involving intractable integrals that must be approximated through sampling [] and/or rollout). The choice/design of an acquisition function for MFBO may involve balancing (i) the cost to evaluate it and (ii) how well it scores the utility-per-cost of material-fidelity pairs.

> Future work involves benchmarking the performance of other multi-fidelity acquisition functions [] and their robustness across a variety of materials discovery tasks.

> Scaling MFBO to larger sets of materials
> Herein, we executed MFBO for optimization over a finite, small (~600) set of materials. For MFBO to scale to larger search spaces (i.e., larger sets of materials) and sample sizes, we can (1) employ surrogate models that are more scalable than GPs, such as Bayesian linear regression [] (perhaps, using features learned from a neural network []), sparse GPs [], Bayesian neural networks [], or random forests [] (though, random forests poorly extrapolate uncertainty []) and (2) to speed up finding the solution to the inner optimization problem, maximize the acquisition function over the continuous materials space with a generic continuous optimization algorithm (e.g., gradient descent), then decode to a viable material by e.g., selecting the material in the candidate set that is closest to the maximizer. For materials with structured (non-vector) representations such as strings or graphs, one can learn a continuous representation of the materials via an autoencoder and execute MFBO in this continuous latent space []; then in strategy (2) we use the decoder to map the continuous latent representation to a material structure.

Reviewer 3
I reviewed the data and code for this manuscript. I appreciate the authors’ open repository. The existing state of the repository is already great as is, so my suggestions are only to help the authors further.

1. In the readme of the base git repo, it appears to me that instruction 1 is already carried out (i.e. the data/crystals folder already exists in the repo). I appreciate the exact date and archive number for the materials cloud link. Perhaps a note that this was already done (so that users who clone the repo realize this) would be helpful.

Response: Thanks, great point.

Changes to code:
To make it clear that the output of each step is actually already done, we add to the README.md:
> we describe the sequence of steps we took to make our paper reproducible. the output of each step is saved as a file, so you can start at any step.

We also provide the materials cloud link giving the version of the COF database we used.
> we obtained the dataset of the COF crystal structure files (.cif) from Materials Cloud and stored them in data/crystals.
(link goes to materialscloud:2021.100, the exact database we downloaded from.)

2. This will likely be fairly obvious to most computational scientists, but I think summarizing the software that is necessary in the readme could be useful. For instance, one needs to install zeo++ in order for `submit_zeo_calculations.sh` to work readily, and the install needs to be in the home directory.

Response: Thanks for the suggestion!

Changes to code: We added a section to our README.md, “required software”:
> required software
> required software/packages:
Python 3 version 3.8 or newer (for MFBO)
Julia version 1.7.3 or newer (for molecular simulations and assembling data)
Zeo++ (for computing structural features of the COFs)
I know the repo is likely a work in progress, but there are some lines that should be cleaned up in the readmes. For instance, inside of descriptors, there are some comments produced by the authors: “uh... where did the cof_descriptors.csv come from?”. It appears to me that `cof_descriptors` comes from `chemical_properties` and `geometric_properties` being stitched together.

Response: Thank you for catching this! Next time, we will have a higher-quality and polished README.md before we submit the paper!
(You are correct that the COF descriptors file is composed by joining the chemical and geometric descriptors.)

Changes to code:
First, we went through the README.md in detail and improved its organization dramatically, and cleaned it up.

We removed that extraneous comment you mentioned and clarified:
> joined structural and compositional features
> the COF descriptors are summarized in descriptors/cof_descriptors.csv, which joins descriptors/geometric_properties.csv and descriptors/chemical_properties.csv.

3. In instruction 3, I’m not sure what the “gcmc_mixtures branch in development” implies. It appears that the first part of the instruction is not too relevant because I see the script for getting the cof_features with cof_features.jl

Response: Ah, thank you for the clarifying question. This was an artifact of when we kept our simulation code in this repo. Now, this mixture simulation code is in PorousMaterials.jl v0.4.2, so this comment is no longer pertinent.

Changes to code: We removed this no-longer pertinent comment and indicated in the README.md:
> we employed PorousMaterials.jl v0.4.2 for the mixture GCMC and Henry coefficient calculations.

4. The shell scripts are well commented and reusable. I appreciate the readme files in each subfolder!

Response: Thank you for the kind feedback!

5. The notebook in step 6: “generate_initializing_cof_ids.ipynb” is missing from the repo.

Response: Thanks! Oops, we had moved the file then neglected to update the README.md.

Changes to code: We now point in the README.md to the correct location for this file:
> we generate the list of initializing COF IDs using the run_BO/generate_initializing_cof_ids.ipynb.

6. Although I mentioned the missing notebook in my note above, it appears that a pickle file containing the `cof_ids` is present in `search_results/normalized`, thus making this reproducible. I’m not seeing the point of having the normalized/ folder and thus would suggest deleting this folder and moving everything to the base level to match the readme. There are also some small typos in the readme that lead to the files not matching exactly! Would be helpful to have this adjusted.

Response: Thank you for the suggestion for improving the file structure of the project. We agree that the `normalized` folder is ultimately redundant.

Changes to code: We removed the normalized directory and simply placed the search result sub-directories directly in the search_results directory.

% ls search_results
> README.md
> initializing_cof_ids_normalized.pkl
> mfbo
> random_search_results.pkl
> sfbo

7. The notebooks in step 7 are very useful. However the current location is incorrect. They are in “run_BO” not in “search_results”! Adjusting this typo will help users find the notebooks.

Response: Ah, thank you for catching the typo.

Changes to code: We now indicate the correct location of the notebooks.
> single- and multi-fidelity Bayes Opt
> finally, the two notebooks:
run_BO/MultiFidelity_BO.ipynb
run_BO/SingleFidelity_BO.ipynb
contain the Python code for running Bayes Opt.

> the results from each run are stored in search_results to be read into our figs/viz.ipynb notebook next for analysis.

8. The figs/ directory is very nice. It is clear that this information is reproducible. I would ask the authors to check the readme quickly, as the “figure_1” directory is not available! It appears that the name of the folder just changed.

Response: Thanks; oops, we renamed the folder then did not update the README.md.

Changes to code: We updated figs/README.md to reflect the renaming as well as to describe more thoroughly the other files and folders within the directory.

9. Overall I think the repo is usable. I think the authors can consider adding the .ipynb_checkpoints to the `.gitignore` and removing these to declutter the repo. Additionally, there are some extra files at the base level (gcmc_calculations.csv and henry_calculations.csv) that could potentially be added to the description on the front readme. Similarly, there are some unexplained folders such as “benchmarking_sims”. Perhaps these are not necessary, but an explanation in the readme would help.

Response:
Thank you for the recommendations!
We agree that adding the `.ipynb_checkpoints` to the `.gitignore` to reduce the clutter is a good idea.

Thank you for catching the missing documentation for these files.

Thank you for pointing out the missing documentation. This directory was used in the initial stages of the study to determine how many Monte Carlo cycles are required in the Binary GCMC to recover reliable statistics for Xe/Kr selectivity calculations.

Changes to code:
We removed the `.ipynb_checkpoints` from the Github repository.

We moved gcmc_calculations.csv and henry_calculations.csv to a new subdirectory, named targets, which is accessible from the base directory. We updated the README.md:

> the simulation data is organized as .csv in targets/{gcmc_simulation.csv, henry_calculations.csv}
> targets: contains the high-fidelity GCMC simulation results and Henry coefficient calculation results for each material in the study as CSV files.

We added a description of the benchmarking_sims directory in the README.md.

> benchmarking_sims: contains code and analysis to determine the number of cycles required to reduce statistical error for GCMC simulations and Henry Coefficients below a given threshold.

These comments are fairly minor, as I was able to follow all of the details of the repo, but I hope they can help in improving the usability of the code. In particular, I appreciate the commenting of the BO notebooks, which may enable users to repurpose the notebooks for other BO tasks.

Response: We really appreciate your time giving us feedback on our repo organization. These comments spurred us to elevate the quality of the documentation for and organization of our code. For our next submission, we will be sure to polish our repo before submission (lesson learned)! Thank you, Aditya!

— Aditya Nandy

Round 2

Revised manuscript submitted on 11 Oct 2023

Editor’s decision letter

13-Oct-2023

Dear Dr Simon:

Manuscript ID: DD-ART-06-2023-000117.R1
TITLE: Multi-fidelity Bayesian Optimization of Covalent Organic Frameworks for Xenon/Krypton Separations

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.