From the journal Digital Discovery Peer review history

Discovering life's directed metabolic (sub)paths to interpret human biochemical markers using the DSMN tool

Round 1

Manuscript submitted on 14 Apr 2023
 

20-Jun-2023

Dear Mx Slenter:

Manuscript ID: DD-ART-04-2023-000069
TITLE: Discovering life’s directed metabolic (sub)paths to interpret biochemical markers using the DSMN

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions are necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

************


 
Reviewer 1

This stud developed the Directed Small Molecules Network (DSMN), a unified graph database that integrates multiple resources through ontological linking. The DSMN enables the generation of (sub)networks that explain biochemical relationships and facilitate the identification of relevant biological pathways. The researchers validated the DSMN's efficacy by utilizing three datasets focused on biomarkers for healthy aging. Their analysis, employing shortest path calculations within the DSMN, yielded results consistent with established pathways and interactions.

From a data and code perspective, this project represents a substantial undertaking in data retrieval, cleaning, database construction, and integration into Neo4j. It makes a valuable contribution to the field of metabolomics data analysis. The code exhibits sound quality. However, to enhance reproducibility, it is advisable for the authors to provide more detailed scripts and instructions in the paper for generating figures and conducting analytics.

Reviewer 2

The authors present a Directed Small Molecules Network (DSMN) which is a compound vs reaction bipartite graph harmonized from a set of metabolite and reaction knowledgebases. This metabolic graph resource provides an alternative to classic metabolic reaction and network knowledgebases such as KEGG, MetaCyc, and Brenda. While the manuscript comprehensively presents DSMN, there are several major issues that must be addressed, especially in terms of evaluating the quality of DSMN.

In the following issues, location in the manuscript is indicated by page,column,paragraph,line abbreviated page,col,para,line. Page refers to the page number in the top right of the manuscript PDF.

Major:

1) page 4, col 1, para 1, line 14: The sentence “Asides from being difficult to measure accurately, metabolite concentrations fluctuate naturally over time, with large intra and inter-person variations” is over-broad and simply inaccurate. Implies that all or just most metabolite concentrations have large variation over time and/or between people. Many metabolites, especially central metabolism metabolites, are under strong homeostatic control and are maintained in relatively tight concentration ranges. ATP is the poster child metabolite for homeostatic control. Moreover, the reference given to justify this sentence mentions variation in genetics and environment, but there is NO statement about metabolism or metabolomics variation. Plus, it has been hard to distinguish between analytical variance and true biological variance in metabolomics datasets, since sample collection and preparation analytical variance is typically not quantified in most experiments.

2) page 5, col 1, para 2: The introduction of biochemical graphs leaves out atom-resolved graphs, where atoms are represented as individual nodes of a certain type. Such graphs have been used to prevent trivial paths considered in graphs with only nodes at the compound level. There are multiple publications that utilize atom-resolved metabolic graphs and networks. For example,
Starke C, Wegner A. MetAMDB: Metabolic Atom Mapping Database. Metabolites. 2022 Jan 27;12(2):122.
Jin H, Moseley HN. Hierarchical harmonization of atom-resolved metabolic reactions across metabolic databases. Metabolites. 2021 Jun 30;11(7):431.

3) page 5, col 2, para 1, line 4: Shortest path is NOT a panacea. Many reaction paths are not the shortest path. Also, the graph descriptions leave out compartmentalization via subcellular localization of enzymes and transporters, which is being considered in some genome-wide metabolic models now. Also, it is unclear how trivial pathways between metabolites that do not share any or very few atoms across reactions were excluded.

4) Abstract and Introduction: The authors failed to indicate that the DSMN is human centric, since the methods clearly indicate convergence to human metabolism on page 6, col 2, para 3 (section 2.1.1). If this is not the case, then the authors should explain why mouse genes are being converted to human orthologs?

5) page 6, col 2, para 4 (section 2.1.2): The authors have a lot of trust that the database cross-references are accurate. Others have pointed out problems with database cross-references, since many are created and never re-evaluated as databases change. The authors should at least state they are assuming that the database cross-references are correct and make these assumptions explicit.

6) page 10, col 1, para 2, line 8: Unclear what the authors are trying to say about using coefficient of variation to detect 14 age-related metabolites. Normally, a Cohen’s d or a correlation (r) effect size and adjusted p-value would be used. If the authors were detecting increase in variance, then coefficient of variation may be affected by the natural increase in mean with increased variance. If difference of variance was being used to select these 14 metabolites, why wasn’t a statistical test specific to detecting differences in variance used like a Bartlett’s test (after log transformation) or a Brown-Forsythe test if non-normality is indicated.

Very Major:

7) The authors provide adequate justification for not including KEGG, MetaCyc, SMPDB, and Pathway Commons. However by the fact that these databases were excluded, makes them ideal for evaluating the quality of DSMN. The application of DSMN to analyze 3 datasets demonstrates the potential utility of DSMN. However, the quality of DSMN itself is not well demonstrated. This could be done by evaluating overlap and consistency of DSMN with KEGG and MetaCyc especially, since both of these metabolic network resources have had years of computational and manual curation. Evaluation of DSMN quality is paramount for preventing its misuse or at least understanding where it may fail.

Minor Issues:

page 8, col 1, para 1: There are several sentence fragments with periods. It is the “part 3” and “part 4”.

page 8, col 1, para 2: Would be nice if the authors would reference the tool they used to retrieve Ensembl IDs. Maybe the authors wrote their own code to grab it via a REST interface, but most likely a package was utilized.

page 10, Code Example 7: Cypher queries can be notoriously hard to craft in an efficient manner. The authors are encouraged to create lists of useful queries and identify which ones are very delicate/touchy with respect to staying efficient if modified.

Figures 3 and 4: Figure legend indicates significant p-value is > (greater than) 0.05. This should be < (less than) 0.05. Also, it would be must better if the p-value is corrected for multiple testing.

Reviewer 3

Authors present DSMN (Directed Small Molecules Network) as an approach to generate metabolomic study-specific subnetworks that provide detailed biochemical relationships, thereby providing a means to interpret results from metabolomics studies. The approach represents biochemical data from curated, reliable sources and models this data as directed graphs, upon which graph-based methods can be applied (e.g., degree, shortest path calculations) to ultimately yield a biochemical network visualization of metabolites identified as relevant in a particular study. The approach was applied to 3 independent datasets.
Overall, the approach is very well described and includes relevant cypher queries to be used for constructing the graph. That authors also provided very relevant details on the current state of pathway databases, identifying key areas that the field needs to address. This study is very relevant for the field and congratulations to the authors for meticulously describing a complex method in an approachable manner.

Minor comments:
• Methods, Model Storage: when describing reaction nodes that connect proteins and metabolites, authors could clarify that those are uni-directional. It’s clear in the figures that they are, but this could be made explicit. Also, the reaction nodes are not highlighted in the figures and this could also be clarified a bit.
• Results, end of first paragraph: for clarity, authors could explicitly state that mapping nodes are – mapping to ChEBI/HMDB – to help remind readers.
• Figure 3:
o Authors mention in the figure legend the use of Disease Ontology but this is not described in the methods.
o Authors highlight carnosines in the text but this is not show in in Figure 3B. Figure 3B would benefit from further text to define metabolites/pathways from the original study in the text.
• Last paragraph of results: authors discuss a merged network for all three datasets. Could this be shown?
• While the approach is explained, is the code used to generate the graphs available through a public repository?


 

This text has been copied from the PDF response to reviewers and does not include any figures, images or special characters:

Response to the reviewers
We would like to thank all reviewers for their comments and feedback on our manuscript
“Discovering life’s directed metabolic (sub)paths to interpret biochemical markers using the DSMN”. We address each of the points below and indicate if and where the manuscript was updated. All changes in the manuscript are highlighted in red.
REVIEWER REPORT(S):
Referee: 1
Comments to the Author
This study developed the Directed Small Molecules Network (DSMN), a unified graph database that integrates multiple resources through ontological linking. The DSMN enables the generation of (sub)networks that explain biochemical relationships and facilitate the identification of relevant biological pathways. The researchers validated the DSMN's efficacy by utilizing three datasets focused on biomarkers for healthy aging. Their analysis, employing shortest path calculations within the DSMN, yielded results consistent with established pathways and interactions.
From a data and code perspective, this project represents a substantial undertaking in data retrieval, cleaning, database construction, and integration into Neo4j. It makes a valuable contribution to the field of metabolomics data analysis. The code exhibits sound quality. However, to enhance reproducibility, it is advisable for the authors to provide more detailed scripts and instructions in the paper for generating figures and conducting analytics.
Response:
We appreciate the positive remarks from this editor on our manuscript. We have added the Cytoscape network session files to our GitHub repository (available at cyNeo4j/DSMN//exampleCytoscapeFiles) to support the Figures in our manuscript. Furthermore, an example script in R has been added to show how these visualisations could be created in an automated manner (available at cyNeo4j/DSMN/visualizationScripts). We have also updated this information in the Materials and Methods section of our manuscript (section 2.3.2, page 7). The interpretation of the subnetworks was performed manually, by comparing statements in the original publication on relevant pathways for their significantly changed metabolites to the edges in the subnetworks using Cytoscape’s filter options. Comparing pathway names from different databases is not a straightforward task and could not be captured in an (automated) script. We have added a statement in our Materials and Methods section to describe our analytical procedure of the calculated subnetworks in more detail (page 7).
Referee: 2
Comments to the Author
The authors present a Directed Small Molecules Network (DSMN) which is a compound vs reaction bipartite graph harmonized from a set of metabolite and reaction knowledgebases. This metabolic graph resource provides an alternative to classic metabolic reaction and network knowledgebases such as KEGG, MetaCyc, and Brenda. While the manuscript comprehensively presents DSMN, there are several major issues that must be addressed, especially in terms of evaluating the quality of DSMN.
In the following issues, location in the manuscript is indicated by page,column,paragraph,line abbreviated page,col,para,line. Page refers to the page number in the top right of the manuscript PDF.
Major:
1) page 4, col 1, para 1, line 14: The sentence “Asides from being difficult to measure accurately, metabolite concentrations fluctuate naturally over time, with large intra and inter-person variations” is over-broad and simply inaccurate. Implies that all or just most metabolite concentrations have large variation over time and/or between people. Many metabolites, especially central metabolism metabolites, are under strong homeostatic control and are maintained in relatively tight concentration ranges. ATP is the poster child metabolite for homeostatic control. Moreover, the reference given to justify this sentence mentions variation in genetics and environment, but there is NO statement about metabolism or metabolomics variation. Plus, it has been hard to distinguish between analytical variance and true biological variance in metabolomics datasets, since sample collection and preparation analytical variance is typically not quantified in most experiments.
Response:
We appreciate the remarks and critical review of our manuscript. The erroneously added reference has been removed and replaced with the correct one, as well as adding the influence of circadian rhythm on metabolic concentrations. We have adapted the sentence to reflect the concerns regarding metabolite (classes) which are considered more stable. We also agree with the statement on distinguishing between analytical variance and biological variance and have added a sentence to address this topic (page 1).
2) page 5, col 1, para 2: The introduction of biochemical graphs leaves out atom-resolved graphs, where atoms are represented as individual nodes of a certain type. Such graphs have been used to prevent trivial paths considered in graphs with only nodes at the compound level. There are multiple publications that utilize atom-resolved metabolic graphs and networks. For example,
Starke C, Wegner A. MetAMDB: Metabolic Atom Mapping Database. Metabolites. 2022 Jan 27;12(2):122. Jin H, Moseley HN. Hierarchical harmonization of atom-resolved metabolic reactions across metabolic databases. Metabolites. 2021 Jun 30;11(7):431.
Response:
Various algorithms exist that could be used to deconvolute a metabolic reaction to the atoms (atom-to-atom Mapping, or AAM) making up the substrate(s) and product(s). Applying such an algorithm to the data we collected would lead to a harmonised overview of individual metabolic reactions, however, does not always correspond to findings represented in literature (e.g. describing compound classes), nor do these algorithms take the stereochemistry of substrate and products into account. Given the biological diversity of WikiPathways we foresee major issues here which could lead to a massive loss of information. Other pathway databases might exist with a more accurate description of the chemistry behind metabolic reactions, however WikiPathways provides us with the unique opportunity to include novel research findings due to their community based approach. Furthermore, this graph database was also developed to explore the potential of WikiPathways, LIPID MAPS, and Reactome regarding metabolomics data analysis in more detail. Several publications discuss the issues with the AAM algorithms (e.g. 1,2) in terms of reproducibility, accuracy, and user-friendliness. We believe that applying AAM mappings is not suitable for our research question at the moment, and have decided to rely on the curators of the three included databases regarding their expertise in biology and biochemistry. Last, we will take this comment into account for future versions of the DSMN to support 13C flux analysis using AAM providing a broader applicability of our tool, while also maintaining the content present in the covered databases. We have added a sentence to our manuscript to detail that the three main graph models described are aimed at modelling biochemical reactions at a molecular (not atomic) level.
1: Lin, Arkadii, et al. "Atom‐to‐atom mapping: a benchmarking study of popular mapping algorithms and consensus strategies." Molecular Informatics 41.4 (2022): 2100138. 2: Preciat Gonzalez, German A., et al. "Comparative evaluation of atom mapping algorithms for balanced metabolic reactions: application to Recon 3D." Journal of cheminformatics 9 (2017): 1-15.
3) page 5, col 2, para 1, line 4: Shortest path is NOT a panacea. Many reaction paths are not the shortest path. Also, the graph descriptions leave out compartmentalization via subcellular localization of enzymes and transporters, which is being considered in some genome-wide metabolic models now. Also, it is unclear how trivial pathways between metabolites that do not share any or very few atoms across reactions were excluded.
Response:
We agree with the comment that the shortest path is not the only answer to find relevant biological paths. In our approach, the shortest path is used as a first approximation to find potentially relevant reactions for a metabolic dataset. Capturing complex and long biochemical paths between changed biomarkers can be computational very intensive. Evaluation of different paths than shortest path is a consideration for a follow up study. We would also like to integrate transcriptomics, proteomics, methylation data in such an approach. Such a study could provide a whole new insight into the biological plausibility of the temporal activity of biochemical reactions, however this is outside the scope of this study.
In the Discussion section of our manuscript, we highlight how using the shortest path algorithm can affect the results from our method, and which additional information could be included in future releases to make these calculations more in line with biological reality, e.g. by using a weighted model through the addition of enzyme kinetics; or including transport reactions, gene regulation, and protein-protein interactions. We are currently working on another project to harmonise kinetic data from various databases into one structure to support pathway models, which could be used to include a weight in future versions of the DSMN tool. Interactions captured in our graph model can also be adapted by users to exclude reactions which they believe are not relevant to the subnetwork they have obtained; we have added a sentence about this in our discussion section to clarify the existence of this option (page 13).
For the remark on “trivial pathways”, our approach relies on the accuracy of the content in the included databases. Each interaction (edge) is provided with data on the occurrence of this reaction summed over all pathway models. This score could be used to filter out reactions with a low(er) occurrence to favour reactions which have been described in more models, potentially providing a higher accuracy for the shortest path calculations. The graph model includes the names and identifiers of the pathways where the information originates from (provenance), so users can investigate the reaction in the original resource. In case erroneous reactions or metabolic annotations are found, the pathway models within
WikiPathways (also hosting the LIPID MAPS models) can be adapted directly by users; the Reactome team can also be requested to update their pathways. There is no database available that is 100% accurate or complete; having a mechanism in place to flag potentially erroneous paths and annotations (e.g. by (internal) quality control protocols) provides a useful manner to point biocurators to data worth checking. Atom-to-atom mapping (AAM) is one method that could be applied here, however, it does not solve all issues (as explained in our response to the previous point).
4) Abstract and Introduction: The authors failed to indicate that the DSMN is human centric, since the methods clearly indicate convergence to human metabolism on page 6, col 2, para 3 (section 2.1.1). If this is not the case, then the authors should explain why mouse genes are being converted to human orthologs?
Response:
Indeed, this graph is currently based on human data only. For the included LIPID MAPS models, the original data was curated on mouse metabolism, requiring a transformation to human data to be included in our developed graph. We have added the species name more clearly at various places in our manuscript (title, abstract, introduction), to avoid any further confusion.
5) page 6, col 2, para 4 (section 2.1.2): The authors have a lot of trust that the database cross-references are accurate. Others have pointed out problems with database cross-references, since many are created and never re-evaluated as databases change. The authors should at least state they are assuming that the database cross-references are correct and make these assumptions explicit.
Response:
We agree with this comment and acknowledge that this is an issue for many (if not all) databases. We have several projects focussing on updating erroneous or outdated identifiers in our pathway models. One such project is the curation cafe’s, where a group of community pathway model authors sit together to work on a subset of pathways for a specific task (e.g. find relevant publications for pathway models, update identifiers which are missing cross-references). Another curation effort is based in automated tests, which detect for example malformed identifiers, missing annotations, and many more (see https://github.com/BiGCAT-UM/WikiPathwaysCurator for details). Another project which is in the works is based on integrating data from many biological databases regarding their outdated, replaced, and secondary identifiers (https://github.com/sec2pri). We hope to leverage this project in the future to create even more accurate content with respect to identifiers.
The mappings included in the WikiPathways RDF (which was used to construct the DSMN) are updated regularly based on another tool developed in our group: BridgeDb. Re-evaluating identifiers and cross-references is a constant process for BridgeDb, so we do not consider outdated identifiers as an issue for our approach. Gene and protein mappings are updated four times a year based on the release schedule of the consulted database (Ensembl). Metabolite and chemical compound mappings are retrieved from three databases (HMDB, ChEBI, and Wikidata), and integrated into one mapping file; since the release schedules of these databases differ tremendously these mappings are made approximately twice a year. The current version of our graph database contains the mappings created at the time of the graph database creation. For future versions of the DSMN graph (which will also include the most recent pathway models of all three databases and the reactions included thereof), we will use the most recent mappings available. We have added a sentence to describe the used mappings and their cross-references in more detail in our materials and methods section (page 3, 2.1.2 Database Harmonization).
6) page 10, col 1, para 2, line 8: Unclear what the authors are trying to say about using coefficient of variation to detect 14 age-related metabolites. Normally, a Cohen’s d or a correlation (r) effect size and adjusted p-value would be used. If the authors were detecting increase in variance, then coefficient of variation may be affected by the natural increase in mean with increased variance. If difference of variance was being used to select these 14 metabolites, why wasn’t a statistical test specific to detecting differences in variance used like a Bartlett’s test (after log transformation) or a Brown-Forsythe test if non-normality is indicated.
Response:
For our manuscript we preferred to focus on showcasing this new methodology rather than re-analysing metabolomics datasets. Therefore, we used the preprocessed datasets as they were published as compared to raw data. Additional statistical approaches could indeed be used to identify metabolites of interest in these datasets. The coefficient of variation was used in a previous study (3), which we used to test the results from our graph model. We want to compare the performance of our method to that of other metabolomics data analysis. If we would change multiple variables, we could not perform this comparison. The reasoning of the original authors to use the CV30 method is described in their publication: “to quantify
individual variation, we used a simple parameter, designated the coefficient of variation (CV), for each blood compound. The CV is the ratio of the SD of metabolite abundance (peak areas from LC-MS) divided by the mean. For stable and relatively invariant metabolites, SDs and CVs are low or negligible whereas CVs of variable metabolites may prove useful in the evaluation of metabolite variation among individuals.”
3: Chaleckis, Romanas, et al. "Individual variability in human blood metabolites identifies age-related differences." Proceedings of the National Academy of Sciences 113.16 (2016):
4252-4259.
Very Major:
7) The authors provide adequate justification for not including KEGG, MetaCyc, SMPDB, and Pathway Commons. However by the fact that these databases were excluded, makes them ideal for evaluating the quality of DSMN. The application of DSMN to analyze 3 datasets demonstrates the potential utility of DSMN. However, the quality of DSMN itself is not well demonstrated. This could be done by evaluating overlap and consistency of DSMN with KEGG and MetaCyc especially, since both of these metabolic network resources have had years of computational and manual curation. Evaluation of DSMN quality is paramount for preventing its misuse or at least understanding where it may fail.
Response:
We understand the concerns regarding this aspect. However, none of the databases that were excluded can be leveraged in the same manner as the graph model presented in our manuscript. Comparing two different methods (pathway analysis through overrepresentation analysis versus network analysis) would yield completely different results, simply because the methods are incomparable. A comparison of metabolite (coverage) between different databases is complicated by small differences in annotations, due to differences in stereochemistry or charge states of small molecules. These annotations can further be influenced by cross-referencing (as has been commented on before). One tool that integrates and harmonises metabolic data from various resources is RaMP (4), which increases the coverage for metabolic enrichment analysis (MEA). In their most recent paper, KEGG contains 5898 unique metabolites, WikiPathways 3695, and Reactome 2355 (from human pathway models). Out of these metabolites, 23% is only found in one source database. Reactome has a unique content of 457 metabolites, WikiPathways has 974 (KEGG seems to have no unique metabolites in this comparison). With these results, we believe that both Reactome and WikiPathways provide novel content compared to previously considered “golden standard” databases for pathway analysis such as KEGG.
Unfortunately, MEA comes with its own challenges and limitations, as have been recently described in detail by Wieder et al. (5). This is why our method leverages a graph based approach, and integrates several databases with an open source licence using open source code. In general, the coverage of metabolites in pathway databases needs to increase, to reduce the effect of said coverage on the accuracy of enrichment analyses (6). Due to the reasons presented above, we do not believe in comparing our method to other metabolic databases to validate our graph model. The quality of the individual databases has been shown, and pathway data from these databases has been taken up by various other tools (e.g. RaMP for metabolomics, and various tools for transcriptomics analysis). Users can integrate data from other platforms such as KEGG and MetaCyc into the graph model, however we cannot distribute the model then with a CC0 licence, which is important to us and other scientists in the light of Open Science. Another option would be to include the licence of each integrated database into the graph model, however understanding the consequences of this might be difficult for users. For this reason we have decided to only integrate models which are available under a CC-0 licence.
4: Braisted, John, et al. "RaMP-DB 2.0: a renovated knowledgebase for deriving biological and chemical insight from metabolites, proteins, and genes." Bioinformatics 39.1 (2023): btac726.
5: Wieder, Cecilia, et al. "Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis." PLoS computational biology 17.9 (2021): e1009105. 6: Marco-Ramell, Anna, et al. "Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data." BMC bioinformatics 19 (2018): 1-11.
Minor Issues:
page 8, col 1, para 1: There are several sentence fragments with periods. It is the “part 3” and “part 4”.
Response:
We thank you for bringing this point to our attention and have adapted the corresponding sentences.
page 8, col 1, para 2: Would be nice if the authors would reference the tool they used to retrieve Ensembl IDs.
Maybe the authors wrote their own code to grab it via a REST interface, but most likely a package was utilized.
Response:
The Ensembl identifiers have been added to the RDF data prior to us querying it, by using the BridgeDb tool, as described in section 2.1.2 Database Harmonization. To clarify this matter, we have adapted section 2.1.4 Data Retrieval describing the origin of the mappings.
page 10, Code Example 7: Cypher queries can be notoriously hard to craft in an efficient manner. The authors are encouraged to create lists of useful queries and identify which ones are very delicate/touchy with respect to staying efficient if modified.
Response:
Indeed, learning Cypher can be tricky at first. Neo4j provides ample examples and training opportunities to learn Cypher. The query we have constructed downloads all data on the nodes and edges, after which users can perform additional filters in Cytoscape to explore the subgraphs. We also have a tutorial and documentation page available, which includes an explanation of the query execution plan
(https://cyneo4j.github.io/DSMN/DataQueryAdvanced ).
Figures 3 and 4: Figure legend indicates significant p-value is > (greater than) 0.05. This should be < (less than)
0.05. Also, it would be must better if the p-value is corrected for multiple testing.
Response:
Thank you for catching this error, we have corrected this in the corresponding Figures. The p-values come from the original datasets, which we have used in their processed form. MTBLS265 (3) does not mention any correction for multiple testing in their manuscript, and MTBLS404 (7) included only non-adjusted p-values only (indicated with a colour code); the third consulted dataset did not include any p-values. We can therefore not include adjusted p-values in our visualisation, however, our method is capable of visualising these as well (by the use of Cytoscape).
7: Thévenot, Etienne A., et al. "Analysis of the human adult urinary metabolome variations with age, body mass index, and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses." Journal of proteome research 14.8 (2015):
3322-3335.
Referee: 3
Comments to the Author
Authors present DSMN (Directed Small Molecules Network) as an approach to generate metabolomic study-specific subnetworks that provide detailed biochemical relationships, thereby providing a means to interpret results from metabolomics studies. The approach represents biochemical data from curated, reliable sources and models this data as directed graphs, upon which graph-based methods can be applied (e.g., degree, shortest path calculations) to ultimately yield a biochemical network visualization of metabolites identified as relevant in a particular study. The approach was applied to 3 independent datasets.
Overall, the approach is very well described and includes relevant cypher queries to be used for constructing the graph. That authors also provided very relevant details on the current state of pathway databases, identifying key areas that the field needs to address. This study is very relevant for the field and congratulations to the authors for meticulously describing a complex method in an approachable manner.
Minor comments:
• Methods, Model Storage: when describing reaction nodes that connect proteins and metabolites, authors could clarify that those are uni-directional. It’s clear in the figures that they are, but this could be made explicit. Also, the reaction nodes are not highlighted in the figures and this could also be clarified a bit.
Response:
We appreciate the positive remarks on our manuscript. We have added the detail that the interactions from the substrate through the reaction node to the product metabolite are unidirectional, as well as the connection from the side metabolites to the reaction nodes and from the enzymes to the reaction nodes. The reaction nodes in the figures are indeed not directly visible; we have replaced them with small rectangular nodes to adapt to this request. • Results, end of first paragraph: for clarity, authors could explicitly state that mapping nodes are – mapping to ChEBI/HMDB – to help remind readers.
We have adapted this sentence to make this statement more clear.
• Figure 3:
o Authors mention in the figure legend the use of Disease Ontology but this is not described in the methods.
We have added more details to describe our data visualisation procedure in the Materials and Methods, section ‘Datasets and Analysis’, 2.3.2.
o Authors highlight carnosines in the text but this is not show in in Figure 3B. Figure 3B would benefit from further text to define metabolites/pathways from the original study in the text.
For Figure 3B, we use the same coordinates for each node as presented in Figure 3A, to avoid a cluttered image. However, we understand the need for readers to quickly identify the main (queried) metabolites in this Figure. We have therefore adapted Figure 3B, by adding the names of the queried biomarkers. The pathways described in the text are shown with a colour code on the edges and described in the legend. We have changed the filter used in Figure 3B to reflect the information presented in the results section. We have also added the colour code used in the Figure to the main text.
• Last paragraph of results: authors discuss a merged network for all three datasets. Could this be shown?
We have added the Cytoscape session file on our GitHub page for reproduction of our conclusion (cyNeo4j/DSMN/exampleCytoscapeFiles/Network_Comparison_265404Rist.cys). We have also added a statement about this comparison in the Materials and Methods, section ‘Datasets and Analysis’, 2.3.2.
• While the approach is explained, is the code used to generate the graphs available through a public repository?
We have added the Cytoscape network session files to our GitHub repository (available at cyNeo4j/DSMN//exampleCytoscapeFiles) to support the Figures in our manuscript. An example script in R has been added to show how these visualisations could be created in an automated manner (available at cyNeo4j/DSMN/visualizationScripts). We have also updated this information in the Materials and Methods section of our manuscript (page ).




Round 2

Revised manuscript submitted on 11 Sep 2023
 

19-Sep-2023

Dear Mx Slenter:

Manuscript ID: DD-ART-04-2023-000069.R1
TITLE: Discovering life’s directed metabolic (sub)paths to interpret human biochemical markers using the DSMN tool

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

After careful evaluation of your manuscript and the reviewers’ reports, I will be pleased to accept your manuscript for publication after revisions.

Please revise your manuscript to fully address the reviewers’ comments. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link :

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log in to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry

************


 
Reviewer 2

The authors have addressed all but one of my concerns reasonably well.
To properly address the very major Issue 7 does require significant effort. The argument that KEGG or MetaCyc would need to be merged into DSMN, which would break CC0 license is not a valid argument for not doing a comparative validation. But performing some sort of comparative validation is at the heart of demonstrating the quality of DSMN. If the authors can find a better way to evaluate the quality of DSMN, then present this approach, perform it, and add it to the manuscript. If the authors cannot present an alternative, then either do:
1) Perform some comparison to one of more of these databases as suggested.
OR
2) State the lack of such a comparison currently. This will at least highlight the current lack of such a comparative validation. The authors could indicate that such a comparison is planned for the future, especially through the use of neutral InChIKeys and other methods improvements to DSMN that will enable such a comparison.


 

Dear Prof. Aspuru-Guzik,

Hereby we resubmit our second revised manuscript ‘Discovering life’s directed metabolic (sub)paths to interpret human biochemical markers using the DSMN tool’ to your journal Digital Discovery as a full paper. We believe our manuscript showcases how the combination of metabolic reaction data from several pathway databases introduces a novel way to analyse experimental metabolomics data with biological pathway knowledge leading to major improvement in metabolic pathway analysis.

We have adapted the last comment from reviewer 2 into this new version of our manuscript. Textual changes have been coloured red for easy tracking. We have also enclosed a detailed response to this comment in a separate document. We would again like to thank all reviewers for their comments and feedback on our manuscript, and hope that with the applied changes you find our paper acceptable for publication.

Kind regards, also on behalf of all co-authors,


Drs. Denise Slenter, MSc

This text has been copied from the PDF response to reviewers and does not include any figures, images or special characters:

Response to the reviewers
REVIEWER REPORT(S): Referee: 2
Comments to the Author
The authors have addressed all but one of my concerns reasonably well.
To properly address the very major Issue 7 does require significant effort. The argument that KEGG or MetaCyc would need to be merged into DSMN, which would break CC0 license is not a valid argument for not doing a comparative validation. But performing some sort of comparative validation is at the heart of demonstrating the quality of DSMN. If the authors can find a better way to evaluate the quality of DSMN, then present this approach, perform it, and add it to the manuscript. If the authors cannot present an alternative, then either do:
1) Perform some comparison to one of more of these databases as suggested.
OR
2) State the lack of such a comparison currently. This will at least highlight the current lack of such a comparative validation. The authors could indicate that such a comparison is planned for the future, especially through the use of neutral InChIKeys and other methods improvements to DSMN that will enable such a comparison.
Response:
We agree with the author that the suggested method for validation is currently lacking in our manuscript. We are still convinced however that such an experiment is non-trivial, as there are many aspects to study, e.g. the impact on the comparison of concept and identifier matching, and the mismatch of granularity of the biology described in a pathway. That is, a comparison that does not include these complexities would tell us they are different, but neither why, nor which one is “better”. We agree we could compare them, but are not convinced we would be validating our approach, or if we would be validating KEGG and MetaCyc. As the reviewer requested, we have added a statement in our Discussion section (page 14, textual changes in red) to state that the comparison to other metabolic models is currently lacking. This section also indicates that users of the DSMN can add the data from other databases themselves to increase the coverage of metabolic reactions. We have also added an issue on GitHub (https://github.com/cyNeo4j/DSMN/issues/2) to address this comment (and others previously made by this reviewer) in the future.




Round 3

Revised manuscript submitted on 02 Oct 2023
 

06-Oct-2023

Dear Mx Slenter:

Manuscript ID: DD-ART-04-2023-000069.R2
TITLE: Discovering life’s directed metabolic (sub)paths to interpret human biochemical markers using the DSMN tool

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Linda Hung
Associate Editor
Digital Discovery
Royal Society of Chemistry




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license