From the journal Digital Discovery Peer review history

An interpretable machine learning framework for modelling macromolecular interaction mechanisms with nuclear magnetic resonance

Round 1

Manuscript submitted on 22 Jan 2023
 

04-Apr-2023

Dear Dr Gu:

Manuscript ID: DD-ART-01-2023-000009
TITLE: An Interpretable Machine Learning Framework for Modelling Macromolecular Interaction Mechanisms with Nuclear Magnetic Resonance

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions are necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

************


 
Reviewer 1

This paper addresses the difficult problem of doing robust machine learning on a small dataset, and the even trickier task of providing meaningful yet informative interpretation.

The clarity of explanations seems to be somewhat lacking and indeed I find myself somewhat confused after reading through this manuscript.

I could find no mention of a training set. If there is a training set fully distinct from the 99 protons the model is tested on, then the authors need to say so very clearly. if not, then the only legitimate way to generate a model would be via some kind of cross-validation. Given that the ESI makes repeated mention of 5-fold cross-validation, then I would assume that is how the real models were generated. This would imply making five separate models, each trained on 4 folds (79 or 80 protons) and tested on the other fold (19 or 20 protons). This would generate one prediction for each of the 99 protons, via one of the five different decision trees arising from the 5-fold CV.

However, what I see in Fig 2A is a single decision tree predicting all 99 protons. The authors need to explain very carefully precisely how it was possible to generate this tree, without the predicted outcome for any proton being dependent on that proton's target state (interactive or inert) in the experimental data. As it stands, I simply don't follow how that was achieved, despite plenty of lengthy descriptions of ML workflows. In summary, the authors probably know very well how they did the study, but it isn't at all clear to the reader.

The interpretations provided are problematic for two reasons. One is that the features are now Principal Components that are somewhat arbitrary mixtures of multiple different original features, including some PCs well down the rank order. The other is that, with a small dataset, there's little evidence that the splits in the tree are robust to small changes in the data. In fact, having 5 trees via CV would be helpful here, as their branch and leaf structures could be compared. Thus, I think this material would be better off in the ESI. This would also leave some space in the main text for a better and clearer overall summary of the workflow.

I'm also surprised at the paucity of polymer descriptors, in particular there's not much description of either the chemical monomers, or the size (molecular weight, number of monomers; even as ranges or averages) of the polymers. Other workers have made some very good sets of polymer features:

https://doi.org/10.1021/acs.jcim.2c00875

I also note that the 'baseline model' is not really any kind of baseline, more a zero. It is, as the authors acknowledge, entirely unpredictive, and its only illusions of predictivity come from the imbalance of classes. Thus, I think "null model" would be preferable to "baseline model".

Reviewer 2

My background is in machine learning and chemoinformatics, so I cannot judge the chemical side of this that much. I think the machine learning side is generally sound and show some interesting aspects. My main concern is that, from a machine learning perspective, there is no verification or testing, and no independent data. In a standard machine learning, this would be a no go. Now I understand the approach of the paper is different, it wants to help understanding the interaction. Here, my chemical knowledge is not good enough how useful those heuristics are. To me, to be honest, it looks a bit like a case of being wise after the event. Wouldn't a chemist always find some sense in it if presented with these cases? Perhaps chemistry is different here, but humans are good at finding reasons. Furthermore, the small sample size for me reinforces this: Heuristic 1 filters out three samples as interactive - doesn't indicate there is a good chance of the distinction either being trivial or accidental? If this is a problem, then the independent testing might help. Of course, if it is not an issue, all fine.
Two minor things: On page 6, table 1 has 12 polymers, the text talks about 18. And on the same page, "students t test" misses an apostrophe (I believe student was a pseudonym, but it is still treated as a name, I believe).

Reviewer 3

The authors did not comment on the possible effects of the low data set they used could have on their model performance on new data and the machine-learned hypothesis generated in their work. How did they mitigate the low dataset's inherent bias in their work? Authors should address these issues.


 

Reviewer #1 (Comments to the Author):

This paper addresses the difficult problem of doing robust machine learning on a small dataset, and the even trickier task of providing meaningful yet informative interpretation. The clarity of explanations seems to be somewhat lacking and indeed I find myself somewhat confused after reading through this manuscript.

Reviewer Comment #1
I could find no mention of a training set. If there is a training set fully distinct from the 99 protons the model is tested on, then the authors need to say so very clearly. if not, then the only legitimate way to generate a model would be via some kind of cross-validation. Given that the ESI makes repeated mention of 5-fold cross-validation, then I would assume that is how the real models were generated. This would imply making five separate models, each trained on 4 folds (79 or 80 protons) and tested on the other fold (19 or 20 protons). This would generate one prediction for each of the 99 protons, via one of the five different decision trees arising from the 5-fold CV.
However, what I see in Fig 2A is a single decision tree predicting all 99 protons. The authors need to explain very carefully precisely how it was possible to generate this tree, without the predicted outcome for any proton being dependent on that proton's target state (interactive or inert) in the experimental data. As it stands, I simply don't follow how that was achieved, despite plenty of lengthy descriptions of ML workflows. In summary, the authors probably know very well how they did the study, but it isn't at all clear to the reader.

Author Response
We thank the reviewer for their comments on improving the clarity of the manuscript, in particular the need to clarify the workflows underlying model construction and evaluations of test error. We agree, and have incorporated changes throughout the manuscript to clarify all modelling workflows applied.
A widespread challenge for materials science researchers is making informed selection decisions in the face of limited data. In this paradigm, it is common to adopt the outlook that each datapoint is ‘precious,’ (1) and it is with this outlook we set the objectives of the present work.
We set out achieve two modelling goals. First, develop and apply a reproducible modelling pipeline including experimental data collection, preparation, feature engineering, hyperparameter tuning, and modelling steps using analytical data drawing from chemical, physical, and conformational information. Second, to assess the predictive performance metrics of the modelling pipeline using nested cross validation, and establish a test score benchmark. We then applied the pipeline to create and interpret a final descriptive model to map and explain atomic-level inter-macromolecular structure activity trends which had never been previously explored for popular mucoadhesive materials.
The steps taken to create the final descriptive model are as follows. We searched the decision tree hyperparameter space for the tree configuration with the best cross validated AUC score, using scikit-learn’s GridSearchCV with 5-Fold Stratified Cross Validation. The tree with the highest cross validated AUC (0.635) in the hyperparameter space was deemed the best model (max depth 5, min samples per leaf 3, min samples per split unconstrained). This grid search cross validation was conducted using data from all 99 protons, where one fifth were “held out” as a validation set for each fold, and training was conducted on the remaining four fifths of the observations. We then fully trained and interpreted a decision tree as our final descriptive model, having the hyperparameters that resulted from grid-search cross validation to mitigate overfitting the data (max depth 5, min. samples per leaf 3, and unconstrained minimum samples per split), which forms the tree shown in Figure 2.
In the nested cross validation workflow for test set evaluation, the inner loop applied the identical 5-Fold Grid Search Cross Validation steps explained above to construct a model with maximum cross validated AUC for the inner loop (training and validation sets, 98 protons). Then, the inner fold’s cross-validated model was tested on a holdout datapoint from the outer fold, which used leave-one-out cross validation (1 proton). As such, each proton was assigned a test score, which was used to compute test performance metrics for the modelling pipeline. Four alternative pipelines with different feature sets were compared against a null model (majority classifier) as a performance baseline (ESI Figure 1). Each pipeline was run at three different random seeds.
In addition to better framing, a new figure has been provided in the main manuscript showcasing the nested cross validation results (holdout F1 scores) for the pipeline used to generate the final model, and an alternative pipeline that did not include any DISCO Effect features.
We have made the following changes and additions to the manuscript to reflect the discussion above:
“We focused on curating and modelling a high quality experimentally derived dataset of macromolecular ligand-receptor interaction mechanisms. Specifically, contrasting how polymer ligand examples of a wide variety of chemical and physical properties interact with a target protein. This represents a common reflection of the breadth of factors biomaterials researchers must consider in designing polymer delivery vehicles. Further, additional strategies are needed to navigate small, sparse, yet high quality datasets in materials science, as small datasets are expected to remain prevalent until automated experimentation is more widely adopted (24). Thus, we direct our focus in this work to creating a useful workflow and tool for researchers to descriptively navigate such problem spaces with limited information, that is additionally capable of facilitating predictive modelling when scaled data collection processes, such as automation, become available.

Towards this aim, we investigated two objectives. First, we set out to develop a reproducible framework including data collection, preparation, feature engineering, hyperparameter tuning, and modelling steps from which a machine learning model can be trained to model inter-macromolecular structure activity. To provide actionable insights, we identified the best model of the full dataset using 5-fold stratified grid-search cross validation, and interpreted it descriptively to report structure-interaction trends we observed in the data collected for this work. These descriptive insights can be directly applied by researchers to inform design decisions across widely varying polymer chemical and physical species, in particular by shining a light on the normally unknown behaviors of non-bonded groups. The second investigation assessed the predictive performance metrics of the overall framework, using nested leave-one-out cross validation, to establish a benchmark in machine learning performance for this task.”

“[…] using descriptive analysis, and establish a new machine learning performance benchmark in the runway towards predictive design of biomaterials for targeted interactions.”

“To select hyperparameters for the descriptive model’s decision tree, we employed 5-fold stratified grid search cross validation, using the hyperparameter grid in ESI Table 2. The process returned a tree with a cross-validated AUC of 0.635, having a maximum depth of 5, minimum samples per leaf of 3, and no constraint on minimum samples per split. Choosing descriptive model hyperparameters based on a cross validated grid search served to mitigate overfitting.
Finally, we fully trained a decision tree having the architecture returned from grid search cross validation to create a descriptive tree to interpret for insights.”

“Decision Tree Descriptive Model Performance Assessment
The descriptive fully trained tree is depicted in Fig. 2A..”

“Predictive Assessment of Modelling Pipeline
Towards the second objective of establishing a predictive benchmark for this task, we report estimates of pipeline out of sample performance using nested grid-search cross validation. The inner loop comprised a 5-fold stratified grid search cross validation, and the outer loop leave-one-out cross validation, to provide a test set assessment of the modelling pipeline and compute holdout F1 score. We benchmarked the holdout F1 score of the modelling pipeline with a Cumulative DISCO Effect feature against the null model baseline, and a version of the modelling pipeline with the same feature set only excluding a feature from DISCO Effect (Figure 2). Two pipelines with alternative DISCO Effect feature representations were also benchmarked, which are described in further detail in the ESI (Supplementary Table 1, Figure 1). Each benchmark was conducted at three random seeds.

Holdout F1 for the Cumulative Disco Effect feature set, i.e. the pipeline used to create the descriptive model, demonstrated a 20% improvement over the null model (Average Holdout F1=0.547, n=3), indicating that the modelling pipeline performed well in the classification task. In contrast, the assessment for the feature set using only the chemical shift, cohort fingerprint and molecular weight features failed to beat the null model baseline (Average Holdout F1=0.440, n=3). Thus, we learned that information at the intersection of proton chemical shift, polymer molecular weight, and physical conformation (DISCO Effect) was necessary to map an objective function of cross-polymer trends in interaction surpassing the null model. With these positive results, we next sought to interpret the descriptive model’s representation of the data for insights in polymer interaction design at the intersection of chemical, physical, and conformational behavior.”

“In this work we developed a knowledge framework for extracting and interpreting structure-interaction trends in macromolecular systems, applied it to extract descriptive insights. We additionally established a benchmark for the framework’s predictive capability[…]”

“[…] The predictive assessment of modelling pipelines demonstrated that incorporating a DISCO Effect feature alongside chemical shift and molecular weight was essential to beat a null model performance benchmark and convey trends. For proof of concept, we applied the framework to descriptively highlight differences in the mucoadhesive interaction mechanisms underlying a variety of popular biomedical polymer ligands with mucin protein. We interpreted the decision rules of a fully trained descriptive model created using 5-Fold Stratified Grid Search Cross Validation (F1=0.87), yielding several key insights in polymer design. Firstly, undervalued protons chemically suitable for interaction, yet in need of physical property tuning to unlock stable interaction, were identified by complex hierarchical patterns in proton cumulative DISCO effect […]”

In addition to these changes the following figure has been added to the manuscript, as discussed above:







Reviewer Comment #2
The interpretations provided are problematic for two reasons. One is that the features are now Principal Components that are somewhat arbitrary mixtures of multiple different original features, including some PCs well down the rank order. The other is that, with a small dataset, there's little evidence that the splits in the tree are robust to small changes in the data. In fact, having 5 trees via CV would be helpful here, as their branch and leaf structures could be compared. Thus, I think this material would be better off in the ESI. This would also leave some space in the main text for a better and clearer overall summary of the workflow.

Author Response
We revised the manuscript accordingly. The Results & Discussion section was regrouped into three sub-sections that represent the core insights from the work from a bird’s eye view: sharing the identities of undervalued protons as targets for engineering towards interaction, discussion of explicit structure activity trends, and suggestions for experimental investigation of undervalued protons. Detailed description of the heuristics have been moved to section 6 of the ESI, as suggested by the reviewer.
Regarding features derived from principal components, we applied an automated approach to component selection throughout the work (Minka’s MLE, scikit-learn implementation) to ensure all modelling pipelines were unbiased. A consequence of this decision was that principal components well down the rank order could be retained, as this process was automated, this retention was possible in across all tested conditions and benchmarking. We have clarified this approach in the revised manuscript.
Our amendments in response to comment #1 additionally serve to clarify our rationale behind investigating one model in detail, we sought out descriptive insights from this work. Applying cross validation and hyperparameter tuning together via grid search served to mitigate sensitivity to small changes in the data in this final descriptive model. Further predictive insights are the subject of ongoing study, but that is beyond the scope of the present paper.
In addition to moving the detailed discussion of the heuristics to the ESI, the results and discussion section of the manuscript has been revised to reflect the ideas discussed above. The following passages have been added to the manuscript:
“We interpreted the model’s eight decision tree classification rules to study mucoadhesive interaction mechanisms across polymers in the dataset. In the context of true proton identities, we further examined the principal component factor loadings underlying the decision rules to ascertain the polymer attributes that correlated to each interaction classification, in the form of heuristics. Detailed description of each decision tree heuristic, including principal components, and factor loadings are provided in the ESI and in Supplementary Table 4.

Herein, from a bird’s eye view, we draw attention to several key insights from the model that teach us about the behavior of mucoadhesive materials.

Identifying inert proton candidates for tuning towards designed polymer interactions
In general, polymer interaction mechanisms having three or more strongly contributing protons (PAA, PDMAEMA, HPC), had sufficient interactive subset size to yield individual classification branches in the tree (Figure 4). Where polymer interactions were specific to one or two proton sites, or where the dataset contained multiple examples of the same polymer with altered physical properties and interaction outcomes (CMC, HPMC, PVA, HPC) the model was forced to draw more nuanced cross-polymer comparisons to achieve its optimization objective (Figure 5, 6). It is in these nuanced cross-polymer comparisons we can elucidate the shared characteristics of interactive protons across polymer species, and the identities of the inert-labelled protons that closely border interactive decision regions. In other words, we can identify and enumerate “undervalued” inert protons that are worthy targets for engineering towards interaction.

An example of this phenomenon is demonstrated by the HPC proton decision boundary (Fig. 4D). In HPC, the model learned that tuning molecular weight, without additional chemical functionalization, enabled interaction. HPC 370kDa achieved stable mucoadhesive interactions at 4.07, 3.77, 3.46, and 1.13 ppm, and remained inert at 3.14ppm. No interactions resulted at any HPC 80kDa molecular weight protons. In addition to changes in molecular weight, we observed the average CDE of HPC protons below the decision boundary was lower than those above it, and ppm were shifted more downfield (avg. CDE PC11<=0.65 =-0.74, avg. CDE PC11>0.65 =-0.62), (avg. ppm PC11<=0.65 =3.77ppm, avg. ppm PC11>0.65 =2.64ppm). While in this example, CDE, ppm and molecular weight data exhibit clear directional trends across the decision rule, across different polymer species the nature of these relationships is increasingly complex. However, despite this complexity, by simple visual examination of the decision rule plots for inert-labelled protons from materials that border the interaction boundary, we can identify undervalued, inert labelled protons. In this instance, these are the three inert protons from HPC 80kDa that appeared in this decision region (4.07, 3.77, 1.13).

The ability to create such an objective function from datapoints that vary across diverse polymer species in a small dataset is granted by the CDE descriptor (Fig. 2), which provides orthogonal continuous numeric data contextualizing the coarser changes in chemical shift, molecular weight, and cohort fingerprint. The hierarchy of descriptors, combining atomic-level data with polymer-level property data account for variance sources at multiple length scales.

The model’s decision rules as an engine for identifying “undervalued” inert-labeled protons is best demonstrated in Fig. 5B. Chemically identical proton sites from CMC (4.58ppm), and HPMC (4.48ppm) at two molecular weights respectively, have opposite interaction outcomes in this region. At 131kDa molecular weight the 4.58ppm site in CMC interacts, however this interaction is lost at 90kDa. In HPMC the direction of the trend is opposite, interaction occurred at 86kDa molecular weight, yet was lost at 120kDa. In spite of the conflicting directionality of the trend, the model correctly identified the true interactive protons across these species, and scored their chemically identical inert counterparts on the exterior of the decision boundaries In Fig. 5B. Here, we posit that other inert labeled protons scoring within or near the decision boundaries of Rules 6 & 7 are similarly “undervalued,” and correspond to candidates for within-species physical property tuning to unlock dominant interactions. These protons are: HPC 80kDa (4.58ppm), PVA (1.58ppm), DEX150 (5.20ppm), PVP 55kDa (3.89ppm), PEOZ 50kDa (3.42ppm), CMC 90kDa (4.58ppm), and HPMC 120kDa (4.48ppm), annotated in Fig. 5B. As described previously, the latter two inert protons are experimentally verified to unlock interaction through within-species tuning of molecular weight (23).

Figure 6 shows the remaining unclassified protons in the dataset, which are bimodally distributed in two clusters along PC15 (Rule 8). Overall, the protons exhibited inert interactions, with the exception of one proton, a secondary interaction from CMC 131kDa at 3.76ppm present in the smaller cluster. The neighboring undervalued protons may therefore correlate to secondary interactions in their respective species. These undervalued protons are: CMC 131kDa (4.09ppm), CMC 90kDa (4.58ppm, 4.09ppm), DEX 150kDa (3.72ppm, 4.02ppm), HPMC 120kDa (3.71ppm, 4.05ppm), HPMC 86kDa (3.71ppm, 4.05ppm), PHPMA 40kDa (0.94ppm, 1.82ppm), PVP 55kDa (1.54ppm, 3.89ppm), PVP 1300kDa (1.54ppm), P407 13kDa (3.76ppm), PEOZ 50kDa (3.42ppm).



Identifying cross-polymer structure-activity trends
The data suggests a structure-activity relationship may exist at select proton sites across materials, in the molecular weight range of 80-150kDa. The relevant proton sites were identified by detailed examination of decision rules 7 and 8 in the ESI, alongside review of Figures 5 and 6.

DEX, CMC, HPC, and HPMC in molecular weight range 80-150kDa shared a cohort chemical shift interval of (4.0, 4.1] where downfield dominant interactions were either correctly identified, or were “undervalued” by the model in Fig. 5B. Specifically, we observed the (4.0, 4.1] cohort shift was present with: DEX 150kDa (5.20ppm, undervalued), CMC 131kDa (4.58ppm, interactive), HPMC 86kDa (4.48ppm, interactive), HPC 80kDa (4.58ppm, undervalued).

This trend is expanded to secondary interactions, with the observation that the (4.0, 4.1] and (3.7, 3.8] chemical shift intervals repeatedly appear together in the secondary interaction cluster apparent in Fig. 6B and the analysis of decision rule 8. These observations were: CMC 131kDa (4.09ppm, 3.76ppm), CMC 90kDa (4.09ppm, 3.76ppm), DEX 150kDa (4.02ppm, 3.72ppm), HPMC 86kDa (4.05ppm, 3.71ppm), and HPMC 120kDa (4.05ppm, 3.71ppm). P407 at 3.76ppm additionally clustered, without a (4.0, 4.1] shift.

Hypothesis generation and interpretation from undervalued proton candidates
There are many approaches to investigate the hypotheses generated in this work, such that physical property adjustments without additional functionalization may enable inert to interactive polymer transitions. Approaches that constrain polymer mobility merit further investigation as a means of inducing changes to polymer orientation, and subsequently interactions such as mucoadhesion. For example, given neither molecular weight PVP (55kDa, 1300kDa) incurred any mucoadhesive interactions, we expect physical property tuning approaches other than molecular weight may be beneficial for adjusting the interaction conformation of PVP protons towards mucoadhesion, particularly at 3.89ppm and 1.54 ppm sites.

In general, the dynamic, multivariate, and counterintuitive nature of the cross-species interaction mechanisms modelled in this work emphasizes that researchers will achieve the best designed polymer interaction outcomes by applying data-driven frameworks such as this, that outsource the interrogation of problem spaces to a computational model informed by chemical, physical, and conformational data, while clearly informing human researchers of the most efficient path to proceed.”

Several clarifications in the manuscript in references to model decision rules, and references to the heuristic analyses in the ESI were additionally made. 
Reviewer Comment #3
I'm also surprised at the paucity of polymer descriptors, in particular there's not much description of either the chemical monomers, or the size (molecular weight, number of monomers; even as ranges or averages) of the polymers. Other workers have made some very good sets of polymer features:

https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoi.org%2F10.1021%2Facs.jcim.2c00875&data=05%7C01%7Cf.gu%40utoronto.ca%7C4874d101f985440f157a08db34d4b419%7C78aac2262f034b4d9037b46d56c55210%7C0%7C0%7C638161860656699618%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7sQOJx3sGaYCHNfG2Jb3VeUN6lSpzch0rivDNq1mC8U%3D&reserved=0

Author Response
We used analytical DISCO NMR results as the primary source of modelling data to avoid pooling signals in our objective function with variance from an external feature representation framework.(2) DISCO NMR results provide direct observations of chemical, physical, and conformational information at the atomic level during interaction, and thus we saw merit in independently modelling this information.
We have introduced new material in the manuscript clarifying our rationale behind designing features using DISCO NMR results, without introducing an external representation framework.
Addition to the manuscript:
“We elected to derive new modelling features from raw analytical DISCO NMR results to avoid pooling the variance from DISCO NMR with external variance introduced by a feature representation framework. From DISCO NMR results, we obtain high precision, atomic-level descriptors of polymer chemical monomers in the form of proton δ 1H Chemical Shift, and polymer conformation information, as measured by saturation transfer buildup curves (23). Polymer molecular weight, as an indicator of polymer size, was used as reported by the manufacturer. To pool this variance with an additional feature framework or third party dataset introduces the risk of diluting the precise signals we observed from these analytical measurements during modelling. DISCO NMR results provide chemical, physical, and conformational information at the atomic level, and thus merit modelling by a standalone objective function without pooled variance (37).”

We additionally reviewed the suggested reference by Antoniuk et al and have cited in the introduction.

Reviewer Comment #4
I also note that the 'baseline model' is not really any kind of baseline, more a zero. It is, as the authors acknowledge, entirely unpredictive, and its only illusions of predictivity come from the imbalance of classes. Thus, I think "null model" would be preferable to "baseline model."

Author Response
We revised accordingly, and all references to this model in the manuscript and ESI have been changed to ‘null model.’
Changes to the manuscript:
“Thus, for a performance baseline we use a null model, a majority “dummy classifier,” where all samples are reported as the majority class (all protons classified inert).”

“The model’s 0.87 F1 score represents an 89% improvement over the null model F1=0.46 (Table 4).”

“…demonstrated a 20% improvement over the null model…”

“…failed to beat the null model baseline…”

“The predictive assessment of modelling pipelines demonstrated that incorporating a DISCO Effect feature alongside chemical shift and molecular weight was essential to beat a null model performance benchmark and convey meaningful trends.”

Table 4 was additionally renamed to “Null Model Metrics.” Equivalent changes are additionally provided in the ESI.

Reviewer #2 (Comments to the Author):

My background is in machine learning and chemoinformatics, so I cannot judge the chemical side of this that much. I think the machine learning side is generally sound and show some interesting aspects.
Reviewer Comment #1
My main concern is that, from a machine learning perspective, there is no verification or testing, and no independent data. In a standard machine learning, this would be a no go. Now I understand the approach of the paper is different, it wants to help understanding the interaction. Here, my chemical knowledge is not good enough how useful those heuristics are. To me, to be honest, it looks a bit like a case of being wise after the event. Wouldn't a chemist always find some sense in it if presented with these cases? Perhaps chemistry is different here, but humans are good at finding reasons. Furthermore, the small sample size for me reinforces this: Heuristic 1 filters out three samples as interactive - doesn't indicate there is a good chance of the distinction either being trivial or accidental? If this is a problem, then the independent testing might help. Of course, if it is not an issue, all fine.
Author Response
We agree that increasing the clarity of the workflows, testing, and insights reported in this work would benefit the manuscript. In this revision, we have made many adjustments to this effect. As the reviewer acknowledged, our approach was designed with a focus on improving our understanding of cross-polymer inter-macromolecular interactions, which had not been previously explored for popular mucoadhesive materials using DISCO NMR.
We clarified our primary objectives in modelling workflows. These objectives were to first develop and apply a reproducible modelling pipeline including experimental data collection, preparation, feature engineering, hyperparameter tuning, and modelling steps using analytical data drawing from chemical, physical, and conformational information. Second to assess the predictive performance metrics of the modelling pipeline using nested cross validation, and establish a test score benchmark. We then apply the pipeline to create and interpret a descriptive model, designed using grid search cross validation, to map atomic-level inter-macromolecular structure activity trends.
Validation and testing were conducted in accordance with the two objectives outlined above. The final descriptive model configuration was identified by searching the decision tree hyperparameter space for the tree configuration with the best 5-fold cross validated AUC score. We then fully trained and interpreted the final decision tree with these cross validated hyperparameters to mitigate overfitting the data. To provide a test error assessment for the modelling pipeline itself we utilized nested cross validation. The inner loop comprised 5-Fold Grid Search Cross Validation to construct a model with maximum cross validated AUC (training and validation sets, 98 protons). Then, the inner fold’s model was tested on a holdout datapoint from the outer fold, which used leave-one-out cross validation (1 proton). Each proton was assigned a test score used to compute holdout F1 scores for the modelling pipeline. The assessment was run at three different random seeds, and compared against a null model (majority dummy classifier) and a pipeline without modelling features from DISCO Effect. A new figure 2 has been added to the manuscript summarizing the modelling pipeline’s holdout F1 score assessment against these baselines (which is presented at the end of the response to Reviewer 1 Comment 1).
Finally, we have revised the Results & Discussion section to focus on core insights from the descriptive model, and moved detailed enumeration of heuristics to the ESI. We can extract three core insights from the descriptive model by examining the protons present at decision rule boundaries: identifying “undervalued” protons as targets for engineering towards interaction, identifying cross-polymer structure activity trends, and finally strategies for experimental investigation of undervalued protons.
The additions to the manuscript incorporating these changes are as follow below:
“We focused on curating and modelling a high quality experimentally derived dataset of macromolecular ligand-receptor interaction mechanisms. Specifically, contrasting how polymer ligand examples of a wide variety of chemical and physical properties interact with a target protein. This represents a common reflection of the breadth of factors biomaterials researchers must consider in designing polymer delivery vehicles. Further, additional strategies are needed to navigate small, sparse, yet high quality datasets in materials science, as small datasets are expected to remain prevalent until automated experimentation is more widely adopted (24). Thus, we direct our focus in this work to creating a useful workflow and tool for researchers to descriptively navigate such problem spaces with limited information, that is additionally capable of facilitating predictive modelling when scaled data collection processes, such as automation, become available.

Towards this aim, we investigated two objectives. First, we set out to develop a reproducible framework including data collection, preparation, feature engineering, hyperparameter tuning, and modelling steps from which a machine learning model can be trained to model inter-macromolecular structure activity. To provide actionable insights, we identified the best model of the full dataset using 5-fold stratified grid-search cross validation, and interpreted it descriptively to report structure-interaction trends we observed in the data collected for this work. These descriptive insights can be directly applied by researchers to inform design decisions across widely varying polymer chemical and physical species, in particular by shining a light on the normally unknown behaviors of non-bonded groups. The second investigation assessed the predictive performance metrics of the overall framework, using nested leave-one-out cross validation, to establish a benchmark in machine learning performance for this task.”

“[…] using descriptive analysis, and establish a new machine learning performance benchmark in the runway towards predictive design of biomaterials for targeted interactions.”

“To select hyperparameters for the descriptive model’s decision tree, we employed 5-fold stratified grid search cross validation, using the hyperparameter grid in ESI Table 2. The process returned a tree with a cross-validated AUC of 0.635, having a maximum depth of 5, minimum samples per leaf of 3, and no constraint on minimum samples per split. Choosing descriptive model hyperparameters based on a cross validated grid search served to mitigate overfitting.
Finally, we fully trained a decision tree having the architecture returned from grid search cross validation to create a descriptive tree to interpret for insights.”

“Decision Tree Descriptive Model Performance Assessment
The descriptive fully trained tree is depicted in Fig. 2A..”

“Predictive Assessment of Modelling Pipeline
Towards the second objective of establishing a predictive benchmark for this task, we report estimates of pipeline out of sample performance using nested grid-search cross validation. The inner loop comprised a 5-fold stratified grid search cross validation, and the outer loop leave-one-out cross validation, to provide a test set assessment of the modelling pipeline and compute holdout F1 score. We benchmarked the holdout F1 score of the modelling pipeline with a Cumulative DISCO Effect feature against the null model baseline, and a version of the modelling pipeline with the same feature set only excluding a feature from DISCO Effect (Figure 2). Two pipelines with alternative DISCO Effect feature representations were also benchmarked, which are described in further detail in the ESI (Supplementary Table 1, Figure 1). Each benchmark was conducted at three random seeds.

Holdout F1 for the Cumulative Disco Effect feature set, i.e. the pipeline used to create the descriptive model, demonstrated a 20% improvement over the null model (Average Holdout F1=0.547, n=3), indicating that the modelling pipeline performed well in the classification task. In contrast, the assessment for the feature set using only the chemical shift, cohort fingerprint and molecular weight features failed to beat the null model baseline (Average Holdout F1=0.440, n=3). Thus, we learned that information at the intersection of proton chemical shift, polymer molecular weight, and physical conformation (DISCO Effect) was necessary to map an objective function of cross-polymer trends in interaction surpassing the null model. With these positive results, we next sought to interpret the descriptive model’s representation of the data for insights in polymer interaction design at the intersection of chemical, physical, and conformational behavior.”

“In this work we developed a knowledge framework for extracting and interpreting structure-interaction trends in macromolecular systems, applied it to extract descriptive insights. We additionally established a benchmark for the framework’s predictive capability[…]”

“[…] The predictive assessment of modelling pipelines demonstrated that incorporating a DISCO Effect feature alongside chemical shift and molecular weight was essential to beat a null model performance benchmark and convey trends. For proof of concept, we applied the framework to descriptively highlight differences in the mucoadhesive interaction mechanisms underlying a variety of popular biomedical polymer ligands with mucin protein. We interpreted the decision rules of a fully trained descriptive model created using 5-Fold Stratified Grid Search Cross Validation (F1=0.87), yielding several key insights in polymer design. Firstly, undervalued protons chemically suitable for interaction, yet in need of physical property tuning to unlock stable interaction, were identified by complex hierarchical patterns in proton cumulative DISCO effect […]”


“We interpreted the model’s eight decision tree classification rules to study mucoadhesive interaction mechanisms across polymers in the dataset. In the context of true proton identities, we further examined the principal component factor loadings underlying the decision rules to ascertain the polymer attributes that correlated to each interaction classification, in the form of heuristics. Detailed description of each decision tree heuristic, including principal components, and factor loadings are provided in the ESI and in Supplementary Table 4.

Herein, from a bird’s eye view, we draw attention to several key insights from the model that teach us about the behavior of mucoadhesive materials.

Identifying inert proton candidates for tuning towards designed polymer interactions
In general, polymer interaction mechanisms having three or more strongly contributing protons (PAA, PDMAEMA, HPC), had sufficient interactive subset size to yield individual classification branches in the tree (Figure 4). Where polymer interactions were specific to one or two proton sites, or where the dataset contained multiple examples of the same polymer with altered physical properties and interaction outcomes (CMC, HPMC, PVA, HPC) the model was forced to draw more nuanced cross-polymer comparisons to achieve its optimization objective (Figure 5, 6). It is in these nuanced cross-polymer comparisons we can elucidate the shared characteristics of interactive protons across polymer species, and the identities of the inert-labelled protons that closely border interactive decision regions. In other words, we can identify and enumerate “undervalued” inert protons that are worthy targets for engineering towards interaction.

An example of this phenomenon is demonstrated by the HPC proton decision boundary (Fig. 4D). In HPC, the model learned that tuning molecular weight, without additional chemical functionalization, enabled interaction. HPC 370kDa achieved stable mucoadhesive interactions at 4.07, 3.77, 3.46, and 1.13 ppm, and remained inert at 3.14ppm. No interactions resulted at any HPC 80kDa molecular weight protons. In addition to changes in molecular weight, we observed the average CDE of HPC protons below the decision boundary was lower than those above it, and ppm were shifted more downfield (avg. CDE PC11<=0.65 =-0.74, avg. CDE PC11>0.65 =-0.62), (avg. ppm PC11<=0.65 =3.77ppm, avg. ppm PC11>0.65 =2.64ppm). While in this example, CDE, ppm and molecular weight data exhibit clear directional trends across the decision rule, across different polymer species the nature of these relationships is increasingly complex. However, despite this complexity, by simple visual examination of the decision rule plots for inert-labelled protons from materials that border the interaction boundary, we can identify undervalued, inert labelled protons. In this instance, these are the three inert protons from HPC 80kDa that appeared in this decision region (4.07, 3.77, 1.13).

The ability to create such an objective function from datapoints that vary across diverse polymer species in a small dataset is granted by the CDE descriptor (Fig. 2), which provides orthogonal continuous numeric data contextualizing the coarser changes in chemical shift, molecular weight, and cohort fingerprint. The hierarchy of descriptors, combining atomic-level data with polymer-level property data account for variance sources at multiple length scales.

The model’s decision rules as an engine for identifying “undervalued” inert-labeled protons is best demonstrated in Fig. 5B. Chemically identical proton sites from CMC (4.58ppm), and HPMC (4.48ppm) at two molecular weights respectively, have opposite interaction outcomes in this region. At 131kDa molecular weight the 4.58ppm site in CMC interacts, however this interaction is lost at 90kDa. In HPMC the direction of the trend is opposite, interaction occurred at 86kDa molecular weight, yet was lost at 120kDa. In spite of the conflicting directionality of the trend, the model correctly identified the true interactive protons across these species, and scored their chemically identical inert counterparts on the exterior of the decision boundaries In Fig. 5B. Here, we posit that other inert labeled protons scoring within or near the decision boundaries of Rules 6 & 7 are similarly “undervalued,” and correspond to candidates for within-species physical property tuning to unlock dominant interactions. These protons are: HPC 80kDa (4.58ppm), PVA (1.58ppm), DEX150 (5.20ppm), PVP 55kDa (3.89ppm), PEOZ 50kDa (3.42ppm), CMC 90kDa (4.58ppm), and HPMC 120kDa (4.48ppm), annotated in Fig. 5B. As described previously, the latter two inert protons are experimentally verified to unlock interaction through within-species tuning of molecular weight (23).

Figure 6 shows the remaining unclassified protons in the dataset, which are bimodally distributed in two clusters along PC15 (Rule 8). Overall, the protons exhibited inert interactions, with the exception of one proton, a secondary interaction from CMC 131kDa at 3.76ppm present in the smaller cluster. The neighboring undervalued protons may therefore correlate to secondary interactions in their respective species. These undervalued protons are: CMC 131kDa (4.09ppm), CMC 90kDa (4.58ppm, 4.09ppm), DEX 150kDa (3.72ppm, 4.02ppm), HPMC 120kDa (3.71ppm, 4.05ppm), HPMC 86kDa (3.71ppm, 4.05ppm), PHPMA 40kDa (0.94ppm, 1.82ppm), PVP 55kDa (1.54ppm, 3.89ppm), PVP 1300kDa (1.54ppm), P407 13kDa (3.76ppm), PEOZ 50kDa (3.42ppm).

Identifying cross-polymer structure-activity trends
The data suggests a structure-activity relationship may exist at select proton sites across materials, in the molecular weight range of 80-150kDa. The relevant proton sites were identified by detailed examination of decision rules 7 and 8 in the ESI, alongside review of Figures 5 and 6.

DEX, CMC, HPC, and HPMC in molecular weight range 80-150kDa shared a cohort chemical shift interval of (4.0, 4.1] where downfield dominant interactions were either correctly identified, or were “undervalued” by the model in Fig. 5B. Specifically, we observed the (4.0, 4.1] cohort shift was present with: DEX 150kDa (5.20ppm, undervalued), CMC 131kDa (4.58ppm, interactive), HPMC 86kDa (4.48ppm, interactive), HPC 80kDa (4.58ppm, undervalued).

This trend is expanded to secondary interactions, with the observation that the (4.0, 4.1] and (3.7, 3.8] chemical shift intervals repeatedly appear together in the secondary interaction cluster apparent in Fig. 6B and the analysis of decision rule 8. These observations were: CMC 131kDa (4.09ppm, 3.76ppm), CMC 90kDa (4.09ppm, 3.76ppm), DEX 150kDa (4.02ppm, 3.72ppm), HPMC 86kDa (4.05ppm, 3.71ppm), and HPMC 120kDa (4.05ppm, 3.71ppm). P407 at 3.76ppm additionally clustered, without a (4.0, 4.1] shift.

Hypothesis generation and interpretation from undervalued proton candidates
There are many approaches to investigate the hypotheses generated in this work, such that physical property adjustments without additional functionalization may enable inert to interactive polymer transitions. Approaches that constrain polymer mobility merit further investigation as a means of inducing changes to polymer orientation, and subsequently interactions such as mucoadhesion. For example, given neither molecular weight PVP (55kDa, 1300kDa) incurred any mucoadhesive interactions, we expect physical property tuning approaches other than molecular weight may be beneficial for adjusting the interaction conformation of PVP protons towards mucoadhesion, particularly at 3.89ppm and 1.54 ppm sites.

In general, the dynamic, multivariate, and counterintuitive nature of the cross-species interaction mechanisms modelled in this work emphasizes that researchers will achieve the best designed polymer interaction outcomes by applying data-driven frameworks such as this, that outsource the interrogation of problem spaces to a computational model informed by chemical, physical, and conformational data, while clearly informing human researchers of the most efficient path to proceed.”

Several clarifications in the manuscript in references to model decision rules, and references to the heuristic analyses in the ESI were additionally made.



Reviewer Comment #2
Two minor things: On page 6, table 1 has 12 polymers, the text talks about 18.
Author Response
We have clarified that the reference to 18 polymers in the text includes combinations of both varying chemistry and molecular weight.
“[…] we experimentally characterized 18 chemically and structurally distinct biomedical polymers (i.e. varying chemistry and molecular weight) for their interactions with bovine submaxillary mucin in solution with DISCO NMR…”

Reviewer Comment #3
And on the same page, "students t test" misses an apostrophe (I believe student was a pseudonym, but it is still treated as a name, I believe).
Author Response
We have corrected this to include an apostrophe.

Reviewer #3 (Comments to the Author):
Reviewer Comment #1
The authors did not comment on the possible effects of the low data set they used could have on their model performance on new data and the machine-learned hypothesis generated in their work. How did they mitigate the low dataset's inherent bias in their work? Authors should address these issues.
Author Response
We added additional discussions in the paper about the size of our dataset. The performance on untrained data was assessed and reported using nested cross validation, and bias in the descriptive final model was mitigated using grid search cross validation. We also made emphasis on introducing our two modelling objectives for the work, which in turn clarifies our approach to handling a small dataset. The first objective was developing and applying a reproducible modelling pipeline including experimental data collection, preparation, feature engineering, hyperparameter tuning, and modelling steps using analytical data drawing from chemical, physical, and conformational information. Second, to assess the predictive performance metrics of the modelling pipeline using nested cross validation, and establish a test score benchmark. We applied the pipeline to create and interpret a final descriptive model to map and explain atomic-level inter-macromolecular structure activity trends.
To address the concern of bias from a small dataset in the first objective, we have expanded our discussion of grid search cross-validation, which we applied to construct the descriptive model. Grid search cross validation served to identify the decision tree hyperparameter set with the greatest cross validated AUC (0.635), and mitigated overfitting.
Additions to the manuscript:
“We focused on curating and modelling a high quality experimentally derived dataset of macromolecular ligand-receptor interaction mechanisms. Specifically, contrasting how polymer ligand examples of a wide variety of chemical and physical properties interact with a target protein. This represents a common reflection of the breadth of factors biomaterials researchers must consider in designing polymer delivery vehicles. Further, additional strategies are needed to navigate small, sparse, yet high quality datasets in materials science, as small datasets are expected to remain prevalent until automated experimentation is more widely adopted (24). Thus, we direct our focus in this work to creating a useful workflow and tool for researchers to descriptively navigate such problem spaces with limited information, that is additionally capable of facilitating predictive modelling when scaled data collection processes, such as automation, become available.

Towards this aim, we investigated two objectives. First, we set out to develop a reproducible framework including data collection, preparation, feature engineering, hyperparameter tuning, and modelling steps from which a machine learning model can be trained to model inter-macromolecular structure activity. To provide actionable insights, we identified the best model of the full dataset using 5-fold stratified grid-search cross validation, and interpreted it descriptively to report structure-interaction trends we observed in the data collected for this work. These descriptive insights can be directly applied by researchers to inform design decisions across widely varying polymer chemical and physical species, in particular by shining a light on the normally unknown behaviors of non-bonded groups. The second investigation assessed the predictive performance metrics of the overall framework, using nested leave-one-out cross validation, to establish a benchmark in machine learning performance for this task.”

“To select hyperparameters for the descriptive model’s decision tree, we employed 5-fold stratified grid search cross validation, using the hyperparameter grid in ESI Table 2. The process returned a tree with a cross-validated AUC of 0.635, having a maximum depth of 5, minimum samples per leaf of 3, and no constraint on minimum samples per split. Choosing descriptive model hyperparameters based on a cross validated grid search served to mitigate overfitting.
Finally, we fully trained a decision tree having the architecture returned from grid search cross validation to create a descriptive tree to interpret for insights.”

Further, small materials datasets containing tightly controlled, high resolution variance can be better suited to creating an objective function for mapping quantitative relationships than utilizing larger quantities of materials data with pooled variance from multiple methods of data collection.(2) Thus, to explore such quantitative relationships in this work we construct and interpret a descriptive final model from our small, curated dataset. Material on this subject has also been incorporated in the manuscript.
Additions to the manuscript:
“We elected to derive new modelling features from raw analytical DISCO NMR results to avoid pooling the variance from DISCO NMR with external variance introduced by a feature representation framework. From DISCO NMR results, we obtain high precision, atomic-level descriptors of polymer chemical monomers in the form of proton δ 1H Chemical Shift, and polymer conformation information, as measured by saturation transfer buildup curves (23). Polymer molecular weight, as an indicator of polymer size, was used as reported by the manufacturer. To pool this variance with an additional feature framework or third party dataset introduces the risk of diluting the precise signals we observed from these analytical measurements during modelling. DISCO NMR results provide chemical, physical, and conformational information at the atomic level, and thus merit modelling by a standalone objective function without pooled variance (37).”
“[…] The ability to create such an objective function from datapoints that vary across diverse polymer species in a small dataset is granted by the CDE descriptor (Fig. 2), which provides orthogonal continuous numeric data contextualizing the coarser changes in chemical shift, molecular weight, and cohort fingerprint. The hierarchy of descriptors, combining atomic-level data with polymer-level property data account for variance sources at multiple length scales.”
Finally, in the second objective we assessed the test error of the overall modeling pipeline using nested cross-validation. This allowed us to evaluate the modelling pipeline’s generalization performance and establish a benchmark for this task using limited data. The inner fold applied stratified 5-fold grid search cross validation to construct a model, and the outer fold applied leave-one-out cross validation to test the model, which is commonly applied to assess small datasets.
Additions to the manuscript:
“Predictive Assessment of Modelling Pipeline
Towards the second objective of establishing a predictive benchmark for this task, we report estimates of pipeline out of sample performance using nested grid-search cross validation. The inner loop comprised a 5-fold stratified grid search cross validation, and the outer loop leave-one-out cross validation, to provide a test set assessment of the modelling pipeline and compute holdout F1 score. We benchmarked the holdout F1 score of the modelling pipeline with a Cumulative DISCO Effect feature against the null model baseline, and a version of the modelling pipeline with the same feature set only excluding a feature from DISCO Effect (Figure 2). Two pipelines with alternative DISCO Effect feature representations were also benchmarked, which are described in further detail in the ESI (Supplementary Table 1, Figure 1). Each benchmark was conducted at three random seeds.

Holdout F1 for the Cumulative Disco Effect feature set, i.e. the pipeline used to create the descriptive model, demonstrated a 20% improvement over the null model (Average Holdout F1=0.547, n=3), indicating that the modelling pipeline performed well in the classification task. In contrast, the assessment for the feature set using only the chemical shift, cohort fingerprint and molecular weight features failed to beat the null model baseline (Average Holdout F1=0.440, n=3). Thus, we learned that information at the intersection of proton chemical shift, polymer molecular weight, and physical conformation (DISCO Effect) was necessary to map an objective function of cross-polymer trends in interaction surpassing the null model. With these positive results, we next sought to interpret the descriptive model’s representation of the data for insights in polymer interaction design at the intersection of chemical, physical, and conformational behavior.”

“[…] The predictive assessment of modelling pipelines demonstrated that incorporating a DISCO Effect feature alongside chemical shift and molecular weight was essential to beat a null model performance benchmark and convey trends. For proof of concept, we applied the framework to descriptively highlight differences in the mucoadhesive interaction mechanisms underlying a variety of popular biomedical polymer ligands with mucin protein. We interpreted the decision rules of a fully trained descriptive model created using 5-Fold Stratified Grid Search Cross Validation (F1=0.87), yielding several key insights in polymer design. Firstly, undervalued protons chemically suitable for interaction, yet in need of physical property tuning to unlock stable interaction, were identified by complex hierarchical patterns in proton cumulative DISCO effect […]”

In addition, a new Figure 2 has been introduced in this section of the manuscript illustrating the nested cross validation results, a copy of which is presented at the end of the response to comment #1 of reviewer 1.




References
1. Xu P, Ji X, Li M, Lu W. Small data machine learning in materials science. NPJ Comput Mater [Internet]. 2023 Mar 25;9(1):42. Available from: https://www.nature.com/articles/s41524-023-01000-z
2. Meyer TA, Ramirez C, Tamasi MJ, Gormley AJ. A User’s Guide to Machine Learning for Polymeric Biomaterials. ACS Polymers Au [Internet]. 2023 Apr 12;3(2):141–57. Available from: https://pubs.acs.org/doi/10.1021/acspolymersau.2c00037




Round 2

Revised manuscript submitted on 11 May 2023
 

07-Jun-2023

Dear Dr Gu:

Manuscript ID: DD-ART-01-2023-000009.R1
TITLE: An Interpretable Machine Learning Framework for Modelling Macromolecular Interaction Mechanisms with Nuclear Magnetic Resonance

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions are necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

************


 
Reviewer 1

This authors have made very significant improvements to their manuscript through the extensive clarifications to the explanations of their workflow. This is now fully intelligible and very much more staisfactory than the previous version. I am now happy that the revisions made adequately address the comments made in the review process.

Reviewer 3

The authors have improved the manuscript significantly, particularly in the area of the focus and objective of the work. Hence I believe the work will be of great interest to the journal's audience. Also, all the concerns raised have been addressed satisfactorily.

Reviewer 4

In this paper, authors analyze decision tree classifiers in an attempt to understand/predict the interaction of polymer protons with mucin. The approach is predicated on the acquisition of DISCO NMR data, which is featurized and then transformed using principal component analysis. The investigators here highlight an interesting physical problem underlying polymer-biomacromolecule interactions. The selling point of the work is that their workflow provides a framework that can be used to generally understand “any receptor-ligand” system. Ultimately, the study centers on the analysis of a subset of 18 polymer systems (a very small dataset) and their interaction behavior. Authors distill analysis over the decision trees into “rules” that might be leveraged for future targets/modification. I find the premise of the paper to be quite interesting, although I am not convinced of its impact in the present form. Although the mechanics of model training is mostly well described, the most important and innovative aspect (analysis of subsequently generated models) is not articulated clearly enough to be readily adopted by the community. The dataset is relatively specialized and centered on interpreting what seems to be a newer experimental technique, so having a clear pathway for adoption is paramount. At this stage, I would therefore recommend major revisions. Below I provide some more detailed comments.


1. Authors should be aware that much of their audience for Digital Discovery may not be well versed in their experimental method or their systems. One important thing to understand about approaches such as this is the cost of measurement. Authors are examining and extraordinarily small dataset. I think it would behoove them to (1) briefly explain the physical premise of DISCO NMR and also highlight what kinds of systems or compatible, (2) provide readers with some sense of the labor-intensive or resource-intensive aspects of data collection and (3) further underscore the necessity to interrogate interactions using a data-driven approach. There are at least two components to the latter: are we (as humans) bad at making the interpretations or are we too biased? Authors seem to lean more on the latter but I think the former may also play a role.

2. The machine learning methods in their construction seem reasonably thought out and well-implemented. However, I really struggle with the meritorious attempt at interpretability via their discussion and results. The following comments relate to this aspect.
a. I think that a brief introduction into the general utility of the biplots would be informative for some readers. Digital Discovery will have a diverse readership, some who are very familiar with certain techniques but not so well versed in others.
b. In Fig. 4, it is not clear to me which data correspond to which polymer. Perhaps the authors can use different shapes to indicate the different markers that correspond to different polymers and then use the colors to denote intert vs. interaction. I also think using some transparency would be helpful as some markers appear to be covered up by others, and so observing some trends is obfuscated by the order of plotting, which is undesirable. Similar comments apply to the other figures.
c. How were the particular couplings of principal components eselected? Why PC3 vs PC12 or PC11 vs PC7. Authors should indicate rationale or what led to their presented analysis, otherwise, how can this be treated as any kind of transferable framework for analysis?
d. As an addendum to the above, the authors’ stated intent is to provide a framework that can be “readily applied” stating that results provide an “actionable” foundation for further pursuits. The presentation of the results just does not quite convince me of this. Figure 1 should be made more specific of the workflow. The primary part that seems missing from the manuscript is providing a principled approach to analyzing the decision trees, which is the core of their presentation. How do authors navigate the construction of their rules/heuristics? These elements are not clear, and I am not sure that most readers will readily make the same conclusions that are presented.
e. In Fig. 6 just shows a sea of data with mostly blue markers. There is a single orange point for CMC 131K, which is adjacent to a blue point for CMC 90k at the same chemical shift value. What is interpretable about this result? It is noted that there is a bimodal distribution along PC15. Why?
f. Is there any significance to the observation that many of the organ points lie proximate to the decision boundary in Fig. 5 and 6? This makes me wonder about how strongly these points were classified.

3. There needs to be some more clarity surrounding the featurization process.
a. Authors should indicate the total dimensionality of the initial feature vector.
b. Based on the author’s description of the construction of the feature vectors, I think there is potential for data pollution in their model training. Authors need to be very clear about what they have done in this area. Specifically, I am concerned about the statement “we applied linear PCA a second time to the total feature engineered dataset after standardization”. It sounds to me that authors took the feature vectors over the whole dataset and then applied PCA. If performing PCA over the whole dataset, some information about the training set can find its way into the test set. Even if not intuitive the effect, this is not a standard best practice. Authors should clarify their procedure or repeat the analysis with the PCA transform being formulated only over training data.
4. One recent area of application that I think would be relevant to design of so-called single-chain nanoparticles or polymer-protein hybrid systems that leverage polymer interactions on proteins to modulate activity. Originally modeling and multiple experiment (10.1126/science.aao0335) have been used to identify effective systems but recently machine learning has also been used to guide design to tailor polymer-protein interactions (10.1002/adma.202201809, 10.1002/adhm.202102101, 10.1021/acsabm.2c00962) – authors may be interested in such works as they seem well in the purview of “predictive design of biomaterials for targeted interactions”. It would be good to evaluate if their methods would also bolster that kind of materials class. I think these works are also appropriate in the statement by authors about other datasets featuring on the order of 100’s of datapoints (https://doi.org/10.34770/h938-nn26)

5. In Figure 1, some of the texts and images (particularly on the left-hand side and to a lesser degree the nodes in the middle) are too small to be useful. In addition, I think the figure caption should be expanded to explain what the images mean. What are the ribbons and the meaning of the different colors? What are the clouds? Presumably these are proteins and polymers but these elements should be explained.

6. In the supporting information (or main text, Table 1) I recommend including the chemical structures of the polymers. Is it clear what chemical shifts correspond to which hydrogens already in these polymers? If so, noting this would be useful.


 

Response to Reviewers
Reviewer #1 (Comments to Author)
This authors have made very significant improvements to their manuscript through the extensive clarifications to the explanations of their workflow. This is now fully intelligible and very much more staisfactory than the previous version. I am now happy that the revisions made adequately address the comments made in the review process.

Reviewer #3 (Comments to Author)
The authors have improved the manuscript significantly, particularly in the area of the focus and objective of the work. Hence I believe the work will be of great interest to the journal's audience. Also, all the concerns raised have been addressed satisfactorily.

Reviewer #4 (Comments to Author)
In this paper, authors analyze decision tree classifiers in an attempt to understand/predict the interaction of polymer protons with mucin. The approach is predicated on the acquisition of DISCO NMR data, which is featurized and then transformed using principal component analysis. The investigators here highlight an interesting physical problem underlying polymer-biomacromolecule interactions. The selling point of the work is that their workflow provides a framework that can be used to generally understand “any receptor-ligand” system. Ultimately, the study centers on the analysis of a subset of 18 polymer systems (a very small dataset) and their interaction behavior. Authors distill analysis over the decision trees into “rules” that might be leveraged for future targets/modification. I find the premise of the paper to be quite interesting, although I am not convinced of its impact in the present form. Although the mechanics of model training is mostly well described, the most important and innovative aspect (analysis of subsequently generated models) is not articulated clearly enough to be readily adopted by the community. The dataset is relatively specialized and centered on interpreting what seems to be a newer experimental technique, so having a clear pathway for adoption is paramount. At this stage, I would therefore recommend major revisions. Below I provide some more detailed comments.

Reviewer Comment #1
Authors should be aware that much of their audience for Digital Discovery may not be well versed in their experimental method or their systems. One important thing to understand about approaches such as this is the cost of measurement. Authors are examining and extraordinarily small dataset. I think it would behoove them to (1) briefly explain the physical premise of DISCO NMR and also highlight what kinds of systems or compatible, (2) provide readers with some sense of the labor-intensive or resource-intensive aspects of data collection and (3) further underscore the necessity to interrogate interactions using a data-driven approach. There are at least two components to the latter: are we (as humans) bad at making the interpretations or are we too biased? Authors seem to lean more on the latter but I think the former may also play a role.

Author Response
DISCO NMR is a refinement of existing transfer-based NMR techniques. Transfer-based NMR relies upon the nuclear Overhauser effect (nOe) which enables through-space transfer of magnetic excitation between systems of binding molecules or macromolecules.1 In brief, a selective excitation pulse is used to excite the protons present on a receptor molecule (in these experiments, mucin). In the case where a given ligand (in these experiments, polymers) binds to the excited receptor, transfer of this excitation can occur (up to a maximum separation distance of 5 Angstroms).1 This transfer of excitation presents on a given spectrum as an attenuation of the ligand signal, which is proportional to the total transfer of magnetic excitation (which is itself proportional to the steady-state separation distance between the protons of the ligand and receptor macromolecules). These attenuated signals are then subtracted from control signals where no attenuation occurs (by applying a selective excitation pulse far from any materials in solution) to quantify the binding intimacy. These experiments are best-suited for binding complexes with nanomolar to micromolar association constants. In principle, these experiments are compatible with any system of molecules or macromolecules that is freely soluble in deuterated water or a mixture or water and deuterated water. This technique is typically limited by the requirements for minimum concentration to obtain sufficient signal to noise. We have worked with polymer and protein concentrations as low as 1 uM, but these concentrations will vary depending on the system being interrogated. The following details have been added to the manuscript to explain the physical premise of DISCO NMR and the types of systems that are compatible:
“This transfer-based NMR relies on the transfer of magnetic excitation through the nuclear Overhauser effect (nOe) from a receptor macromolecule to a ligand. The intensity of this transfer signal is proportional to the steady-state proximity between ligand and receptor protons. “
“In total, this data collection and interpretation pipeline is applicable to any solution-state binding system that is freely soluble in deuterated water […]”

To the reviewer’s second point, DISCO NMR is relatively labor light. In terms of resources, each sample requires 700uL of solution created by simple mixing and dilution. In this study the solutions were created manually, but the automation of this sample preparation could be completed with a simple off-the-shelf liquid handler. In terms of labor, the workflow for data collection involves two steps, NMR spectroscopy and data aggregation. Spectroscopy was conducted with the aid of a robotic autosampler, thus once samples were placed in the sample tray no further human intervention was necessary. The NMR pulse sequence itself is composed of a series of saturation transfer difference with excitation sculpting (STD-ES) experiments, which are standard pulse sequences supplied by most NMR instrument vendors. Data from the spectrometer was then automatically transmitted to a central file server as the experiments completed. From the central data server, spectra were automatically preprocessed (fourier transformed, baseline corrected and phase corrected), and integrated automatically using the predefined integral regions. From this, the raw data was exported to excel spreadsheets, which were then processed using the python code provided in the Github repository listed in the supplementary information of this manuscript. These processes could be further streamlined and integrated leveraging python repositories such as nmrglue, this is the focus of an ongoing refactor. To detail the resource and labor intensiveness of these experiments the following passage has been added to the revised manuscript:
“In total, this data collection and interpretation pipeline is applicable to any solution-state binding system that is freely soluble in deuterated water. The NMR pulse sequences used in these experiments are based upon typical saturation transfer difference with excitation sculpting (STD-ES) experiments, which are easily accessible in most NMR spectrometers. In addition, this pipeline has the advantage of being minimally laborious as much of the handling and data preprocessing is easily automated. In this work all stages of experimentation were automated apart from the selection of candidate materials and preparation of samples for NMR.”

To the reviewer’s third point, it is correct that both the challenge of interpreting mucoadhesion data/polymer design and human bias both play key roles. We have expanded the discussion of the challenges in interpreting the correlation between the performance of polymeric biomaterials and their underlying structural and chemical features with the addition of the following passage:
“In addition, the mucoadhesion process of polymeric biomaterials is complex, leading to several competing mechanistic theories and some prominent examples of contradictory adhesive behavior between identical materials.1,31,32 More generally, understanding the link between polymeric biomaterials performance and the underlying chemistry and structures of those polymers has been deemed a “formidable challenge”.33,34 Data-driven approaches relying on ML have been suggested to address this challenge, specifically in aiding to untangle their complexity.33,34”

Reviewer Comment #2 and 2a
The machine learning methods in their construction seem reasonably thought out and well-implemented. However, I really struggle with the meritorious attempt at interpretability via their discussion and results. The following comments relate to this aspect.
a. I think that a brief introduction into the general utility of the biplots would be informative for some readers. Digital Discovery will have a diverse readership, some who are very familiar with certain techniques but not so well versed in others.

Author Response
We have provided an introduction to principal component biplots in the manuscript. New additions are as follows:
“In this work, the set of decision rules constructed by the model directly corresponds to areas in the dataset to investigate for interaction behaviour insights. To assess the similarities or differences between classified data points, we used principal component biplots, a popular methodology to ease interpretability of classification.56 Each biplot illustrates a two-dimensional representation of essential variations in the data the model used to distinguish inert and interactive proton classes by constructing a rule. We can study the boundary between inert and interactive classes by visually examining which inert-labelled protons are proximal to interactive protons in the biplots.”

“[…] Herein, from a bird’s eye view, we draw attention to several key insights from the interpretation of principal component biplots that teach us about the behavior of mucoadhesive materials.”

Reviewer Comment #2b
In Fig. 4, it is not clear to me which data correspond to which polymer. Perhaps the authors can use different shapes to indicate the different markers that correspond to different polymers and then use the colors to denote inert vs. interaction. I also think using some transparency would be helpful as some markers appear to be covered up by others, and so observing some trends is obfuscated by the order of plotting, which is undesirable. Similar comments apply to the other figures.

Author Response
We updated all the biplot figures both in the main text and the supplementary information in the manner suggested by the reviewer (different marker shapes for each polymer and added transparency). These changes have much improved the communication of the data.

Reviewer Comment #2c
How were the particular couplings of principal components selected? Why PC3 vs PC12 or PC11 vs PC7. Authors should indicate rationale or what led to their presented analysis, otherwise, how can this be treated as any kind of transferable framework for analysis?

Author Response
We clarified in the manuscript that couplings of principal components were selected in accordance with the structure of the decision tree. As such, in the framework, biplots are always to be constructed in accordance with the pairing of principal components that create the node of the tree being plotted.
New material has been introduced as follows:
“The principal components used in the biplots are, in all cases, the pairings that create the decision rule in the tree being plotted, in the sequence shown in Figure 3.”

“The principal component biplots shown in this work correspond in entirety to the set of decisions made in the descriptive tree”

Reviewer Comment #2d
As an addendum to the above, the authors’ stated intent is to provide a framework that can be “readily applied” stating that results provide an “actionable” foundation for further pursuits. The presentation of the results just does not quite convince me of this. Figure 1 should be made more specific of the workflow. The primary part that seems missing from the manuscript is providing a principled approach to analyzing the decision trees, which is the core of their presentation. How do authors navigate the construction of their rules/heuristics? These elements are not clear, and I am not sure that most readers will readily make the same conclusions that are presented.

Author Response
We updated Figure 1 of the manuscript to depict the workflow more clearly (the figure is also included in the response to reviewer comment #5).
Second, we made amendments to distinguish the methods we applied to analyze the decision tree rules and extract insights. Primarily, we constructed principal component biplots as a means of visually examining the boundary between inert and interactive datapoints in each decision tree rule. One biplot per rule was constructed, including two biplots in the ESI. The proton datapoints being terminally classified at the decision rule are labelled in the biplot. Where inert protons are present at the decision rule boundaries of terminal nodes, we examine their identities. These inert protons exhibit similar principal component scores to interactive protons in the dataset, but can be separated by a classification boundary. For example, there are inert protons that cluster the decision rule boundaries in Figure 5 and Figure 6. We list the inert proton identities near the boundaries and present them as hypotheses for further research in engineering towards interaction.
As a supplementary analysis, we enumerated the principal component factor loadings underlying each component in each biplot, and commented on the factor loadings in accordance with high level trends in the datapoints and mucoadhesion research. We summarized these brief discussion points to construct “heuristics” as a means of analyzing the factor loading data. This supplementary analysis was distinct from the analysis and insights derived from principal component biplots.
Analysis of decision rule biplots visually is the most effective means of studying interaction trends, as the complex high-dimensional trends derived from the original feature set are simply distilled into 2-D planes, and thus we emphasize this approach to analysis in the framework.
New material is presented below (in addition to the amendments made to introduce principal component biplots for comment 2a, 2c):
“Each principal component biplot distilled high dimensional information from problem space interrogation into simple 2D planes. By visual examination of the boundary between inert and interactive classes in decision regions of the model, we identify which inert protons exhibited similar principal component scores to interactive protons. Herein, we discuss the identities of protons present at each decision rule, with particular interest in inert labelled protons having scores close to the interactive class.”

Reviewer Comment #2e
In Fig. 6 just shows a sea of data with mostly blue markers. There is a single orange point for CMC 131K, which is adjacent to a blue point for CMC 90k at the same chemical shift value. What is interpretable about this result? It is noted that there is a bimodal distribution along PC15. Why?

Author Response
We have made amendments in the results and discussion section to clarify the process of interpreting the principal component biplots. In particular, we use biplots to examine data present at the boundary between inert and interactive classes. Specifically, these are cases where a decision rule at a leaf node has visually apparent inert protons bordering the interactive boundary. We distinguish and report the identities of rule-bordering inert protons (“undervalued” protons), which we posit merit further investigation for interactive tendencies given their similar principal component scores to interactive protons at leaf nodes.

Without a modelling workflow, there is not an ability to distinguish “undervalued” protons from true inert protons. Thus, enumeration of these protons provides a shortlist of protons in dataset for further engineering towards designed interaction.

In Figure 6, we plot the datapoints which are classified by the model’s 8th decision rule, which are dispersed in two clusters (i.e. the bimodal distribution). The decision rule intersects the distribution containing the interactive observation, rendering the surrounding inert proton cluster as “undervalued” inert protons. For the protons of the second cluster, which does not contain a decision rule, we make no additional distinctions.

We have amended our description of Figure 6 to increase the specificity of our biplot interpretation.

“Figure 6 shows the remaining unclassified protons in the dataset. The final decision rule in PC 15 intersects a cluster subset of the datapoints, a largely inert group containing a single interactive proton. The interactive proton is a secondary interaction from CMC 131kDa at 3.76ppm. We identify and enumerate the neighboring undervalued protons clustering this decision rule, which may correlate to secondary interactions in their respective species. These are: CMC 131kDa (4.09ppm), CMC 90kDa (4.58ppm, 4.09ppm), DEX 150kDa (3.72ppm, 4.02ppm), HPMC 120kDa (3.71ppm, 4.05ppm), HPMC 86kDa (3.71ppm, 4.05ppm), PHPMA 40kDa (0.94ppm, 1.82ppm), PVP 55 (1.54ppm, 3.89ppm), PVP 1300kDa (1.54ppm), P407 13kDa (3.76ppm), PEOZ 50kDa (3.42ppm).

For the protons of the larger second cluster, which does not contain a decision rule, we make no additional distinctions.”

Reviewer Comment #2f
Is there any significance to the observation that many of the orange points lie proximate to the decision boundary in Fig. 5 and 6? This makes me wonder about how strongly these points were classified.

Author Response
The model’s decision rules slice the data to delineate the specific segments which most correlate to interaction. This encompasses: (1) identifying the optimal principal components to examine, (2) identifying which datapoints should be examined together at a leaf node, and (3) separating data classes according to a specific rule threshold to maximize information purity.

While the precision of the decision rule position (3) is one piece of the model’s construction, we view each classification decision holistically in terms of patterns in the decision node’s data and primarily emphasize aspects (1) & (2) during interpretation. The generalizability of (3) holds greater importance for claims of a predictive nature, whereas the present analysis is descriptive.

We achieve this by emphasizing the model’s identification of which principal components best facilitate interaction classification in the dataset (1), and identifying which inert datapoints are positioned near interactive datapoints in such leaf nodes (2). Such a descriptive approach presents the most utility for researchers seeking insight into the similarities and differences in interactive behavior across popular mucoadhesive materials. These differences can be as subtle as changing a single design parameter such as molecular weight, to trigger a change in interaction outcome.2

Finally, we employed techniques to mitigate overfitting the final descriptive model’s decision rules, specifically performing hyperparameter tuning in conjunction with 5-fold cross validation.

New material is provided in the manuscript as follows:
“The modelling exercise provided a path to data interpretation in several ways within this complex problem space: first by identifying which principal components enabled interaction classification, second by segregating subsets of data to leaf nodes that should be considered together, and finally constructing decision rule boundaries themselves. We follow an approach to data analysis as such, by examining the principal components and datapoints segregated to each decision node holistically."

Reviewer Comment #3 and 3a
3. There needs to be some more clarity surrounding the featurization process.
a. should indicate the total dimensionality of the initial feature vector.

Author Response
We have added this information in the feature engineering section:
“[…]The feature vector input to model training in this work had 35 columns.”

Reviewer Comment #3b
Based on the author’s description of the construction of the feature vectors, I think there is potential for data pollution in their model training. Authors need to be very clear about what they have done in this area. Specifically, I am concerned about the statement “we applied linear PCA a second time to the total feature engineered dataset after standardization”. It sounds to me that authors took the feature vectors over the whole dataset and then applied PCA. If performing PCA over the whole dataset, some information about the training set can find its way into the test set. Even if not intuitive the effect, this is not a standard best practice. Authors should clarify their procedure or repeat the analysis with the PCA transform being formulated only over training data.

Author Response
We thank the reviewer for bringing this lack of clarity to our attention. The original wording of the phrase is imprecise, we did account for this data pollution effect in both iterations of PCA that was applied to the data. We constructed a single pipeline (sklearn.pipeline) for preprocessing and grid search cross validation, which is a method for assembling steps to be cross validated together, such that target leakage is prevented across all steps. The first step in the preprocessing pipeline included a principal component analysis of only the time-series disco effect buildup curve (totaling 7 features input), such that one component was retained (one feature output), which introduced the CDE feature. The second chained step computed principal component analysis on the chemical property and physical property features and the CDE column (totaling 35 features including CDE). Finally, the grid search cross validation operator in the last portion of the pipeline received the preprocessing steps and model type (decision tree classifier) as inputs, such that during grid search cross validation preprocessing steps were trained on only training data folds, and applied to transform validation data folds.
The following changes have been provided in the manuscript:
“For each analysis, all data preprocessing steps prior to cross validation were conducted through a single pipeline (sklearn.pipeline). A pipeline is a method for assembling steps to be cross validated together, such that defined steps are trained on only training data folds, and applied to transform validation data folds. The first step in the pipeline computed the CDE feature as previously described. Next, the CDE feature alongside the chemical property and physical property features (i.e. sample proton δ 1H chemical shift, molecular weight, cohort fingerprint, and CDE, totalling 35 features) were passed into a principal component analysis workflow. We added this principal component analysis workflow to the pipeline as a means of removing intercorrelations in the modelling features while keeping underlying information intact.18,48 “

Reviewer Comment #4
One recent area of application that I think would be relevant to design of so-called single-chain nanoparticles or polymer-protein hybrid systems that leverage polymer interactions on proteins to modulate activity. Originally modeling and multiple experiment (10.1126/science.aao0335) have been used to identify effective systems but recently machine learning has also been used to guide design to tailor polymer-protein interactions (10.1002/adma.202201809, 10.1002/adhm.202102101, 10.1021/acsabm.2c00962) – authors may be interested in such works as they seem well in the purview of “predictive design of biomaterials for targeted interactions”. It would be good to evaluate if their methods would also bolster that kind of materials class. I think these works are also appropriate in the statement by authors about other datasets featuring on the order of 100’s of datapoints (https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoi.org%2F10.34770%2Fh938-nn26&data=05%7C01%7Cf.gu%40utoronto.ca%7C244cd70f99b74bb2b8c508db67274c7e%7C78aac2262f034b4d9037b46d56c55210%7C0%7C0%7C638217190706314165%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lWOVmOYvTNK7GWNagRmLyb17e%2BpxGLY66o%2BPpnY1bHw%3D&reserved=0)

Author Response
We have reviewed the materials and cited them in the revised manuscript.

Reviewer Comment #5
In Figure 1, some of the texts and images (particularly on the left-hand side and to a lesser degree the nodes in the middle) are too small to be useful. In addition, I think the figure caption should be expanded to explain what the images mean. What are the ribbons and the meaning of the different colors? What are the clouds? Presumably these are proteins and polymers but these elements should be explained.

Author Response
We agree that the first figure of the manuscript is difficult to interpret. We have replaced the original figure with a new version of figure 1 which demonstrates the general problem workflow more clearly. The new image is attached in the manuscript and shown below:


Reviewer Comments #6
In the supporting information (or main text, Table 1) I recommend including the chemical structures of the polymers. Is it clear what chemical shifts correspond to which hydrogens already in these polymers? If so, noting this would be useful.

Author Response
We have updated the supplementary information with the addition of a new table listing the polymer name, abbreviation, and the representative chemical structures (Supplementary Table 1). Addressing the second question in this reviewer comment, for some polymers the chemical shifts correspond unambiguously to unique protons (linear polymers such as PAA, PVA, PEG, etc.), in other polymers it is not clear (such as distinguishing between the protons bound to the anomeric carbons in the cellulose derivatives CMC, HPC, HPMC, etc.). For the sake of clarity, we have opted to focus the discussion simply in terms of chemical shift to enable discussion broadly across all results, rather than complicate the discussion by including the proton identities in cases where they are known and omitting them where they are not. While it would be possible to unambiguously identify such protons with more powerful 2D NMR techniques (such as HSQC) these experiments require isotopic enrichment (C13 or N15 or both) which is challenging and beyond the scope of this work.


References
1 J. Watchorn, D. Burns, S. Stuart and F. X. Gu, Biomacromolecules, 2022, 23, 67–76.
2 J. Watchorn, S. Stuart, D. C. Burns and F. X. Gu, ACS Appl Polym Mater, 2022, 4, 7537–7546.




Round 3

Revised manuscript submitted on 28 Jun 2023
 

18-Jul-2023

Dear Dr Gu:

Manuscript ID: DD-ART-01-2023-000009.R2
TITLE: An Interpretable Machine Learning Framework for Modelling Macromolecular Interaction Mechanisms with Nuclear Magnetic Resonance

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below. You can address the minor update requested in your proof corrections.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry


 
Reviewer 4

The authors have thoughtfully and adequately addressed my comments through previous revisions. The paper is now much clearer to me, and some of the clarifications should prevent confusion by future readers. I have one last comment that can be easily addressed: please indicate which symbols correspond to which polymers in the figure caption or provide an additional legend. In the figures, the placement of text is usually adjacent to the polymer species. Authors could add this note instead, but the most unambiguous approach would be to provide a legend or more descriptive text.




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license