From the journal Digital Discovery Peer review history

GitHub as an open electronic laboratory notebook for real-time sharing of knowledge and collaboration

Round 1

Manuscript submitted on 08 Mar 2023
 

12-Apr-2023

Dear Dr Motion:

Manuscript ID: DD-ART-03-2023-000032
TITLE: GitHub as an open electronic laboratory notebook for real-time sharing of knowledge and collaboration

Thank you for your submission to Digital Discovery, published by the Royal Society of Chemistry, and apologies for the long review process. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, which are quite mixed. Two reports indicate that major revisions are necessary, with close attention to improving the presentation in the manuscript and on the online Github resource.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Digital Discovery strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/dd?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/dd) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process.   We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Joshua Schrier
Associate Editor, Digital Discovery

************


 
Reviewer 1

The authors have addressed all my comments.

Reviewer 2

My previous comments have been addressed and I recommend accepting this paper.

Two minor issues, there seems to be an erroneous "coming" in the second paragraph on p2 col 1, not sure if that was supposed to be deleted or not. There is also a mistake in p2 para 2 col 2, where it should be described not describe.

Good luck with the paper!

Reviewer 3

Summary: This work presents the idea of using GitHub as a platform to share research data and results and facilitate collaboration and access. To demonstrate the implementation of this idea, the authors have created a Gihub-ELN for a project, "The Breaking Good." I have provided my review of the provided ELNs and share my experience with them here.
Recommendation: While the idea is neat, its applicability is not at a publication-ready level. I recommend the authors resubmit their manuscript to Digital Discovery after major revisions. Please see below for more details:
The idea of a Github ELN to promote accessibility of scientific research, as well as transparency and accountability, is attractive, and I believe it will expand and come to life in the near future. However, unfortunately, the example that the authors use to demonstrate these key features and convince their audience that Github, if not better, is comparable to other previously available ELNs, is poorly structured, disorganized, and lacks coherency and standardization. The presented Github ELNs appear to be only usable by their owners, and frankly, I suspect that even close team members would have a hard time navigating through different sections. My general recommendation is that the authors take the time to polish their ELN and make a revised and refined ELN that showcases all the key features they highlighted in their manuscript. I suggest that the authors only include one ELN Template and one actual ELN example under the Breaking Good Project for the purpose of this manuscript. Please be assured that the example ELN closely follows the styles and standards defined in the Template, so the reader can clearly see the implementation of the Template. Remove yourself from the owner position and see the ELN from the POV of an external party. Having a trusted third-party beta-test the ELN is recommended. You can also consider creating a "mock collaborator" and show their interactions with the ELN owner. Leaving the Discussion tabs empty is not helpful.
Comments:
1 - The Breaking Good Project
1.1 - Overview Tab: There are 13 repositories listed here, while only four of them are listed and linked in the manuscript. Please remove the others to reduce confusion.
1.2 - Projects Tab:
This tab contains 1 open project and 9 closed projects. Please clarify the difference or relationship between these projects and those listed in the Overview tab. Please explain the difference between Projects and Projects (classic).
2 - ELN Templates
2.1 - Only one template was provided for a generic synthesis experiment, which may indicate a lack of versatility and flexibility. There are no places to add the date, room temperature, and time required for each section. The information in the Reaction Table is repeated in the Procedure section.
2.2 - In the Template, there are two places to insert raw data, one under Procedure>raw Data and the other one under Characterization and Data. Please consider having all data stored in one place to avoid duplication.
Please consider developing a standardized format (Template) for the Readme file so the reader knows where to look for what information.
3 - USYD_PhD_ELN
3.1 - If this ELN is not related to the Breaking Good project and the manuscript, please remove it from the review because it does not follow the ELN Template and is not listed under repositories of the Breaking good project, causing confusion.
3.2 - README file
This file only provides information about the PhD candidate but not the project itself. Some key information that seems to be missing from this page are:
• the title aims and goals of this project (SCQA) -adding a high-level introductory PowerPoint presentation here would be helpful.
• the year the project started, and the estimated end date,
• where the finding is coming from (if possible),
• who are the collaborators (if any), and
• link to the project wiki.
3.3 - It would also be beneficial if there is a rundown of the project plan (or a statement of work), with the phases labeled as "completed" or "ongoing."
3.3 - ISSUES:
The next tab is Issues, which is the actual ELN.
3.3.1 - At first glance, the lack of a coherent structure is noticeable. The current version appears to be only usable for the person who created it (owner). I believe for an LNE to be publicly and globally accessible, a standardized coherent structure must be defined to make the navigation smooth and efficient, and labyrinth, personally coded frameworks must be avoided. Some main points are:
• On the ISSUES tab, the two main categories are open and closed issues, which are too general. This classification style suggests that a person should take considerable time to scroll up and down the closed issue list, for example, to understand what work was previously done (either completed or abandoned). I reckon that most busy researchers would not be willing to do so. I suggest grouping the experiments by topic. For example, one category could be "effect of solvent polarity on the yield of reaction X," where one can find all the relevant experiments.
• In the open tab, we see scattered induvial experiments with names starting with an ID, i.e., "KBS** - *," followed by #* (* = number). Please explain the labeling system used in the README file (i.e., what each number represents). This would help greatly with the navigation within experiments. It would be even more beneficial if a consistent sample indexing system was defined and used throughout ELN and between different ELNs within a project.
• Experiments are tagged with labels, some very helpful (such as repeat). I recommend that you number the repetitions, for example, "repeat 1" or "repeat 2". On the other hand, the label new reaction is confusing, as one might wonder what it is compared with. In other words, what is an old reaction? Please standardize your labels and explain them in the Readme file.
3.3.2 - General Notes on security:
I recommend:
• Adding a confidentiality statement or asking the readers to agree to some desired level of security, confidentiality, or ethical measures in the usage/sharing of data.
• Classifying your data based on the level of confidentiality and managing the access to each section accordingly. For example, you can choose to share the preps publicly but only allow your collaborators to access raw data.
• Inserting a watermark into all the unpublished graphical data that you include in the public ELN. That includes digital images and PDF files of spectra.
3.3.3 – Here, I will review two experiments as representative ISSUES.
KBS19-2 - Deprotection of bis-methylferrocenylcyclam:
• Links to HIRAC (Labarchives notebook) authentication is needed for access. HIRAC is then provided as a .docx file in the next line. Please consider removing the Labarchives link.
• Link to master page: please change [master page to master page or [master page].
• Define what the previous attempt is. Does it mean repeat?
• In SSP class sem 2, 2022, a new sample identification system is used without definition, which is confusing.

KBS29-2 - Appel reaction on KBS28
• Undefined abbreviations/indexing makes it hard for anyone other than the owner to understand the experiment. Examples include SM, KBS29-2 RM, KBS29-2 cr, KBS29-2 C3 F18-28_1, etc.
• Data is linked in multiple locations. For example, NMR spectra are linked on the main page and also on the master page (with a different name). Please consider storing all data links in one place and using a unified indexing system for the data files.
• I have had a dozen lab books in the past, and I used to leave a lot of "note to self" comments here and there. Since those notes were directed at me, they were mainly meaningless to others. For a public ELN, however, which was created to improve scientific communication at a global scale, I reckon that it is rather unprofessional to have informal, vague notes that seem to be meaningful only to the owner. Otherwise, people might as well upload photographs of their physical lab books to a publicly accessible website. For example:
M = 580 mg, 2.58 mmol
CBr4 - need 1.014 g, 3.10 mmol, 1.2 equiv.
PPh3 - need 812.5 mg, 3.10 mmol, 1.2 equiv.
use more DCM than previously!

In the case above, is Need 1.014 g a recommendation, or something that was done, …? In use more DCM than previously!, please specify what volume was used previously; how much more will be added, and why was this recommendation made?

3.3.4 - Master page:
Please, on the home page, define the purpose of a master page and list the information it contains.
3.4 - Wiki:
In the project wiki, we see a list of topics but not a clear description of the project. For example, on the home page of the wiki, we see these headings:
Cyclam Derivatives
Linear amine derivatives
Pendant group SAR
Chain length SAR

The description under each heading is very vague. For instance, under Linear amine derivatives, we read:
Previous work in the Rutledge and Todd groups has indicated that "open cyclam" i.e. linear polyamine scaffolds show promising activity against Mtb. There are a number of areas of SAR yet to be explored, summarised in the image below.
This statement does not carry any information about what exactly the topic of research is here. Qualitative vague and abbreviated descriptions such as promising activity against Mtb must be replaced with clear, preferably quantitative descriptions.
4 - ELN-Kymberley-Scroggie
4.1 - README:
Unlike the previous ELN, this one has a table of contents on the README page. This inconsistency in the style/format should be avoided. Moreover, while providing a table of contents in the README file is helpful, the classification basis is unclear. We see only a list of (apparently) scattered synthesis experiments.
This ELN uses a different labeling and sample/experiment indexing system that must be introduced and defined.
4.2 - ISSUES:
This ELN follows the Template more closely than KBS's ELN, but not still not exactly! Some sections are removed, and the Reaction Table does not have all the columns defined in the Template.
It is hard to understand the status of an open experiment by looking at it on the Issues tab, whether it is on the to-do list, is in progress, etc. Please define a clear way to describe the latest status here.
4.3 - Projects:
Four projects are listed here: 2022 DDI x BG Workshop and SSP 2021 (no details provided), Series-2-Aminothiazoles, and Series-1-Mycetoma. By clicking on one of the last two projects, we see lists of experiments categorized based on their completion status, but they do not seem to be in order within each category. Listing experiments in a defined order (e.g., chronological order) would be helpful. It is very hard for a non-owner to understand what is happening right now.
4.4 - Wiki:
The wiki main page is blank; only the sidebar has some links to follow. No additional information about the projects is provided here.
5 - Poster #23
This page represents an example of sharing a poster, I assume. If not, please explain what the purpose of this page is. Please describe the connection between the ELNs, The Breaking Good and the alintheopen.
6 - GitHub How to Guide
6.1 – The purpose of this page is to direct us to the wiki pages. When navigating the wiki tab, we see four links on the sidebar. Two are about how to get started with GitHub. The third link takes us to "wikis and how we use them," where we read:
If you are new to a project, the wiki is the best place to start. In the wiki you will find a brief background of the project. If others have already participated or are participating in this project you will also find information on their contributions. This may include literature reviews, laboratory notebooks, experimental reports and their future plans among a number of other things. You can read through this information to get a better understanding of the project and where it currently stands before considering how you might best be able to contribute.

It may go without saying that if you participate in a project, the wiki will be where you to share your contributions. Depending on in what way you contribute to the project you will need to know how to perform different tasks. Instructions on these task can be accessed by clicking the relevant links in the table of contents of this wiki.
Unfortunately, none of the underlined points in the text above have been demonstrated in KBS and USYD_PhD_ELN,
6.2 - The last (fourth) link on the sidebar is the lab book template, which is different from what was presented here https://github.com/TheBreakingGoodProject/ELN-Templates.
This Template has new sections that have not been included in the sample ELNs, including a Purpose, a Procedure and Observation, Reaction monitoring, Product table and References. This lack of consistency must be avoided for a product to be widely and globally used.

Reviewer 4

Thank you for the opportunity to review the manuscript entitled “GitHub as an open electronic laboratory notebook for real- time sharing of knowledge and collaboration” for Digital Discovery as a full paper. The authors describe the features of GitHub as they outline the perceived benefits of the platform that they have repurposed as an electronic notebook. This approach has the potential to serve as a cost-effective alternative for those seeking a stable, open source environment which supports public asynchronous, collaborative communication as well as private workspaces. The manuscript is well-written and easy to follow; offering several opportunities for expansion to strengthen and further bolster the position taken by the authors. The well-written manuscript offers much to build on toward publication as a full paper. Suggestions are provided below corresponding to the sections of the manuscript. Title The Authors use the phrase “real-time sharing” in the title—which creates an expectation for sharing based on the reader’s own definition of “real-time.” This term can be confusing unless the authors define what is meant by “real-time” in the manuscript, especially as it can be argued that sharing information on GitHub alone is largely characterized by asynchronous communication (e.g. when not augmented simultaneously with synchronous communication such as face-to-face, chat, or video-based communication). It is therefore suggested that the authors define “real-time sharing” in the introduction, and any other term that may cause confusion. For example, an early stage napkin sketch https://githubnext.com/projects/workspaces/ provides a compelling argument for why the process of use described in the manuscript may not necessarily be the best example of “real-time” although the process described by the authors can be considered collaborative and/or indicative of knowledge sharing.

Introduction
The stated objective of the manuscript is to “report on the utility of GitHub as an open ELN, detail its features in this dimension, and share our experiences in its implementation for open source drug discovery.” In each stated objective, 1) report on the utility, 2) describe features in detail, and 3) share experiences – more scientific domain context and supporting evidence (e.g. situating the discussion in concrete examples such as case studies, surveys, focus groups, interviews, etc.) is warranted. Contextualizing the author’s experience in retelling how they derived their observations would be helpful for the reader to discern how, when, and why the authors believe GitHub has utility as an ELN. That is to say, the manuscript would be much stronger if quantitative or qualitative findings from a usability study, or cases of use by different projects, or even preliminary pilot study data with the authors’ own team, were provided to support the claims made in the manuscript. It is suggested that the authors determine the best way to include more context of use either as preliminary research with their own team as members of the open drug consortia or from a rigorous human subjects study (qualitative or quantitative). Perhaps Digital Discovery would consider a solid pilot study with preliminary findings that support the authors’ claims sufficient. It is therefore this reviewer’s recommendation that the present manuscript be expanded to provide supporting evidence, or findings, from a defensible examination of the use of GitHub as an ELN by human users. Since the use of GitHub for an ELN may be new to the readers of Digital Discovery, it is also suggested that the literature review be expanded to cite references of human subjects research from related scientific domains that provide findings that may serve to corroborate the authors anecdotal descriptions of use. For example, the authors may draw inspiration from the methods and approach taken in studies such as the one reported in https://aisel.aisnet.org/hicss- 54/st/software_survivability/2/.

GitHub
The authors refer to Figures 3 and 4, although they are not included in the manuscript. To this point, the section on GitHub would benefit from more visuals, in particular if the images are descriptive, support the text, and provide context as to how the authors used GitHub to collaborate, etc. For example, the authors may draw inspiration from https://link.springer.com/article/10.1007/s10278-017-0037-8 which presents case studies to discuss the features a notebook available on GitHub in the context of the scientific requirements of their project activities.

The sentence, “Beyond an internet connection, there are no barriers for anyone who wishes to view work within a public GitHub repository, and no account or subscription is needed” may be considered an overstatement by some readers. It is suggested that the authors refrain from such absolute statements (“no barriers”) especially without evidence from a usability study to support such claims.

Real time Sharing of Knowledge and Collaboration
When discussing issues, wiki, discussion, and code tabs it is suggested that this text be accompanied by visual examples from the authors’ project repository in GitHub to facilitate understanding of how these features assisted or facilitated project requirements. In particular this section would benefit from focus group or interview testimonials on the benefits and challenges of use, collaboration, and knowledge sharing.

Shortcomings
When discussing shortcomings, this section would benefit from testimonials or quotes or findings from focus groups or interviews on the benefits and challenges of use, collaboration, and knowledge sharing.

Outlook
The sentence, “More broadly, GitHub’s extensive array of options for communication and discussion, as well as the minimal barrier to using the site, make it straight- forward for new collaborators to get involved at whatever level they wish” would be strengthened with citations to corroborating usability studies or by rephrasing based on findings from research with users in which barriers to entry are either tested or systematically observed.
In the manuscript’s final sentence, the link https://github.com/TheBreakingGoodProject/ELN-%20Templates/discussions/2 is broken and results in a 404.

In summary, both the challenge and opportunity for the authors to improve the manuscript toward a full paper submission lies in their willingness to wrap the GitHub feature details described in the present manuscript, in the context of one of their scientific activities that can serve as a case study and concrete example of how GitHub has been applied as an ELN in the authors’ work. Telling the story of how using GitHub has impacted project work, by providing examples, evidence, and testimonials; will be more compelling than a description of features. By providing supporting evidence (human subject research or usability study) for the claims made in the manuscript the authors will be closer to their objective of the following statement in the abstract: “By outlining its features and shortcomings through their implementation in our work, we demonstrate how using GitHub as a central platform can aid the real time sharing of knowledge and collaboration, and further democratise scientific research within both open and traditional research models.” While the manuscript in its current state does not adequately meet these objectives as stated by the authors at this time, the manuscript shows promise. The present manuscript suggests that the authors have thought a lot about the utility of GitHub features and have undoubtedly observed benefits in their own workflow. The reviewer also appreciated the authors’ attention to clear and understandable descriptions of the GitHub platform and easy-to-follow presentation of information. Should the authors decide to accompany their initial observations with findings from additional inquiry and usability research, the present manuscript would be greatly enhanced.


 

Dear Editors and Reviewers,

Thank you for the supportive feedback and constructive suggestions. We have revised our manuscript in light of this, as detailed below and indicated by the tracked changes in the marked-up version of our revised manuscript.


Referee 1
Thank you.

Referee 2
Thank you for picking up on these typos, we have corrected both of them.

Referee 3
1. The Breaking Good Project
We first would like to note a misconception here that we believe stems through the comments.

To demonstrate the implementation of this idea, the authors have created a Gihub-ELN for a project, "The Breaking Good."

The Breaking Good Project is an international citizen science initiative that empowers members of the public to be active researchers in projects that will improve human health. The Breaking Good Project brings together participants and projects such as Open Source Mycetoma and Open Source Tuberculosis. This paper shares GitHub ELNs for two researchers and authors, Kymberley and Klementine who are team members of The Breaking Good Project but who are also involved in the Open Source Tuberculosis and Open Source Mycetoma projects. The ELNs contain work that contributes to The Breaking Good Project, Open Source Tuberculosis and Open Source Mycetoma not to a project called “The Breaking Good”. For instance, KBS’ ELN encompasses all of her PhD, which includes both work on The Breaking Good project as well as work on the Open Source Tuberculosis project.

This links in with:
1.1 - Overview Tab: There are 13 repositories listed here, while only four of them are listed and linked in the manuscript. Please remove the others to reduce confusion.

The Breaking Good Project uses GitHub for all its projects, not just those that are directly related to this manuscript. These include projects with high school students and our citizen science project E$$ENTIAL MEDICINE$. We can see that this may cause confusion and tried to make it clear by sharing the links to the related repositories in the manuscript. It is unfeasible for us to remove all the repositories from The Breaking Good Project’s GitHub account, so we hope this is acceptable but also welcome suggestions on how to make this clearer.

And:
1.2 - Projects Tab: This tab contains 1 open project and 9 closed projects. Please clarify the difference or relationship between these projects and those listed in the Overview tab. Please explain the difference between Projects and Projects (classic).
Similarly, we have a number of projects running. Those that are specifically relevant to each ELN are added to the ELN repositories under the projects tab.
Projects (classic) is the old projects functionality the GitHub has now upgraded, and which is just called Projects. There are limited differences and Projects (classic) is being phased out and any new repositories that are created will only be able to use the new Projects. We note that with projects (classics) a project was limited to the repository it was created in. With the new Projects, any projects that are created are also shared to an organisation and will come up under the Projects tab for the whole organisation and in the specific repository.

2. ELN Templates

We address another possible misconception here, namely that the ELN template was designed before the authors started using GitHub ELNs. The template is a direct output from our experiences in using GitHub as an ELN therefore, the two ELNs we share in this manuscript do not follow the template. Rather the template shared is a generic synthetic chemistry template that we think would be a good starting point for those who are new to using GitHub as their ELN as based on our own experiences. Our main reason for completing the template was to assit in reducing the Markdown learning curve that both authors found to be a barrier when first starting to use GitHub (as discussed in the Shortcomings section of the manuscript).

2.1 - Only one template was provided for a generic synthesis experiment, which may indicate a lack of versatility and flexibility. There are no places to add the date, room temperature, and time required for each section. The information in the Reaction Table is repeated in the Procedure section.
We have chosen to only share a template for synthetic experiments as this is our expertise and the use of the ELN that we share in the manuscript. We feel we are not placed to generate templates for other fields which we have little experience in and that those who might use GitHub as their ELN find the template a helpful starting place for them to develop their own.
A place for date is not required as Git is a version control system that time stamps all changes and is captured automatically. If used as described in the manuscript, in real-time such that it is updated as the activities are taking place as would be a non-ELN then the time stamps represent that time and date that an activity is performed. Both Kymberley and Klementine have made use of the inclusion of times only when updates to the ELN have not occurred in real-time.
For synthetic chemistry at least, temperature and reaction times are generally included in the notes taken in the procedure section. We don’t see the need for a specific section for these details.

2.2 - In the Template, there are two places to insert raw data, one under Procedure>raw Data and the other one under Characterization and Data. Please consider having all data stored in one place to avoid duplication.
Please consider developing a standardized format (Template) for the Readme file so the reader knows where to look for what information.

The intent was for raw data to be included under raw data and processed under characterisation. We have clarified that the Characterisation section is for processed data. We have also developed a standardised format for the README.md which we have shared in the ELN Template repository for others to also follow which discusses where data can be/is stored.

3. USYD_PhD_ELN
3.1 - If this ELN is not related to the Breaking Good project and the manuscript, please remove it from the review because it does not follow the ELN Template and is not listed under repositories of the Breaking good project, causing confusion.
This ELN is related to the manuscript. Klementine uses GitHub as an ELN for her work that contributes to the Open Source Tuberculosis project and The Breaking Good Project. It does not follow the ELN Template because the template has been developed based on our experiences in using GitHub as an ELN. This ELN and ELN- Kymberley-Scroggie are the ELNs that are the bases of our experiences (see response to template section above for further clarification).

3.2 - README file
Thank you for these suggestions. We have addressed these comments in both the ELN repositories and in the standardised README.md we have generated. We have included places for a title, information about the ELN owner/s and users, projects, year the ELN was opened, collaborators. We have also added in reminders for those to link to the wiki and other external pages that might be of relevance and interest to visitors.

3.3 - It would also be beneficial if there is a rundown of the project plan (or a statement of work), with the phases labeled as "completed" or "ongoing."
These are helpful suggestions but we do not consider them necessary for an ELN to function as we have described. Some of this information can be found elsewhere in the repository, for instance, project plans can be found under the Projects tab and aims and goals of the project can be found on the Wiki.

3.3 - ISSUES:
The next tab is Issues, which is the actual ELN.
3.3.1 - At first glance, the lack of a coherent structure is noticeable. The current version appears to be only usable for the person who created it (owner). I believe for an LNE to be publicly and globally accessible, a standardized coherent structure must be defined to make the navigation smooth and efficient, and labyrinth, personally coded frameworks must be avoided. Some main points are:

• On the ISSUES tab, the two main categories are open and closed issues, which are too general. This classification style suggests that a person should take considerable time to scroll up and down the closed issue list, for example, to understand what work was previously done (either completed or abandoned). I reckon that most busy researchers would not be willing to do so. I suggest grouping the experiments by topic. For example, one category could be "effect of solvent polarity on the yield of reaction X," where one can find all the relevant experiments.

We do not suggest that anyone spend time scrolling up and down the list as described here. The open and closed categories are a feature of GitHub that we have found useful to use as the owner. For those who are visiting, we recommend the use of the Projects tab to be able to easily find experiments that are relevant to one another where we have grouped them by topic by placing them in different projects. They also have been given a status through placement on the To-Do, In progress and Complete titles and through the labels assigned to them which can be seen on the titles. Simply clicking on an experiment brings up a small overview and an option to open the whole issue in a new tab if they wish to read more details. Even before this, we recommend visitors head to the wiki where they will find information on the projects to lead them in the right direction rather than jumping straight into the issues tab.

• In the open tab, we see scattered individual experiments with names starting with an ID, i.e., "KBS** - *," followed by #* (* = number). Please explain the labelling system used in the README file (i.e., what each number represents). This would help greatly with the navigation within experiments. It would be even more beneficial if a consistent sample indexing system was defined and used throughout ELN and between different ELNs within a project.

The labelling system is a topic that comes up regularly in the Open Source projects with much discussion around what provides adequate information and is the best way to do it. Consistent IDs across The Breaking Good Project is an impractical idealism with involvement in so many projects many where we are only contributing not governing. To be more practical, we have allowed researchers to create their own identification systems. Most use their own personal identification system which is then consistent across their whole ELN. We agree that it is helpful to have these explain and have added them to the README.md for both ELNs and have also included this in the ELN Template as well.
• Experiments are tagged with labels, some very helpful (such as repeat). I recommend that you number the repetitions, for example, "repeat 1" or "repeat 2". On the other hand, the label new reaction is confusing, as one might wonder what it is compared with. In other words, what is an old reaction? Please standardize your labels and explain them in the Readme file

Repeats are detailed in the identifiers for experiments which we have now explained in the README.md for each ELN repository. We have also provided a link to the labels page where visitors can view all the labels and their descriptions.

3.3.2 - General Notes on security:
I recommend:
• Adding a confidentiality statement or asking the readers to agree to some desired level of security, confidentiality, or ethical measures in the usage/sharing of data.

A great point. All work that is done as part of the Open Source projects is under a CC-BY 4.0. We have added a footnote to the ELNs that makes them aware of this. Outside of open science projects, this would be a more important aspect, so we have also added a section to the README.md template that promotes it’s consideration.

• Classifying your data based on the level of confidentiality and managing the access to each section accordingly. For example, you can choose to share the preps publicly but only allow your collaborators to access raw data.

Again, another great point. All the work presented here is completed under an open science research model where everything is share freely and so this goes against that methodology. For other projects where they may follow the traditional research methodology thought this would be required and we note in the manuscript at the end of the Introduction that repositories can be kept private if required. We are unsure if it is possible to have for example the wiki and issues public and any files uploaded private as you suggest and it might be that to do this, the raw data needs to be kept elsewhere outside of the GitHub ELN.

• Inserting a watermark into all the unpublished graphical data that you include in the public ELN. That includes digital images and PDF files of spectra.

We like this idea, and will consider incorporating this in future work rather than amending exiting figures which would be impractical.
3.3.3 – Here, I will review two experiments as representative ISSUES
There were several comments here which we have addressed. Responses are dot pointed below.
• The LabArchives link is a requirement of our organisation for WHS documentation.
• The typos have been fixed.
• Yes, it means repeat. Hopefully now that we have explained the labelling system in the README.md and in our responses it is clear that this experiment is a repeat as it is identified to as KRS19-2, the 2 representing the second repeat of this reaction.
• Explanation of ID system added to README.md
• Abbreviation used have been added to README.md
• We have streamlined to a single location for finding raw data - the Wiki on KBS’.

The final comment on the inclusion of “note to self” comments in ELNs is one where we politely disagree with the reviewer, so address separately here.

An ELN, whether public or not, is a living document that is to be used to record notes on an individual’s scientific activities, be they informal or formal. ELNs are analogous to physical lab notebook and should be use in the same way, but offer further advantages in longevity and searchability etc. We would encourage researchers to write similar notes to yourself in a physical notebook, so believe they should also be present in an ELN. The ELNs in our project are public, but we don’t feel this should mean that we use them differently.
This idea that everything we share publicly should be formalised removes the transparency and ‘human’ side of the project that we are keen to share – e.g., that sometimes scientists make mistakes. We feel that the inclusion of these notes, conveys an authentic aspect of the nature of science including that processes is iterative and understanding/explanations can change with more data/better analysis. The value of a public ELN in improving scientific communication is that the information can be shared in an authentic, transparent manner more readily than a physical notebook; more formal communication of the work will be achieved through peer reviewed publications, reports and news articles.

3.3.4 - Master page
We have added this to the Wiki home page

3.4 – Wiki
We have added some more introductory detail about the projects: e.g., goal, and identified hits for the series relevant to each ELN. While We have also provided links to published work that contains a more thorough overview of the work, including quantitative data.

4. ELN-Kymberley-Scroggie
4.1 - README:
Unlike the previous ELN, this one has a table of contents on the README page. This inconsistency in the style/format should be avoided. Moreover, while providing a table of contents in the README file is helpful, the classification basis is unclear. We see only a list of (apparently) scattered synthesis experiments.
This ELN uses a different labeling and sample/experiment indexing system that must be introduced and defined.

The table of contents has been removed in the update the README.md file to match the standardised one provided in the ELN Template repository. A similar table has been added to the Wiki – Compound IDs where all the compounds that have been made are listed along with their identifiers, projects and ELN entries.

Experimental identifier explanations have been added to the README.md

4.2 - ISSUES:
This ELN follows the Template more closely than KBS's ELN, but not still not exactly! Some sections are removed, and the Reaction Table does not have all the columns defined in the Template.
It is hard to understand the status of an open experiment by looking at it on the Issues tab, whether it is on the to-do list, is in progress, etc. Please define a clear way to describe the latest status here.

Again, we note that the template was based of our experience using GitHub ELNs.
We recommend people interacting with the notebooks visit the projects tab for an overview of the experiment status. Note however, that this ELN is no longer in use, so all experiments have been finalised (we have added this information to the README.md). For ELNs that are in use, we use the to-do, in progress and complete status in the projects tab as a clear way for visitor and owners to see the latest status.

4.3 - Projects:
Four projects are listed here: 2022 DDI x BG Workshop and SSP 2021 (no details provided), Series-2-Aminothiazoles, and Series-1-Mycetoma. By clicking on one of the last two projects, we see lists of experiments categorized based on their completion status, but they do not seem to be in order within each category. Listing experiments in a defined order (e.g., chronological order) would be helpful. It is very hard for a non-owner to understand what is happening right now.

The projects have been updated with a description. The project boards have been updated to group experiments based on topic as this ELN is no longer in use so status boards are mo longer relevant. Experiments have been placed in chronological order, please note that number order does not corelate to the order the experiments were conducted in. The numbers are related to the planning of experiments.

4.4 - Wiki:
The wiki main page is blank; only the sidebar has some links to follow. No additional information about the projects is provided here.

The wiki has been updated to provide information about the projects and details of the work that is entailed in the ELN.

5. Poster #23
This page represents an example of sharing a poster, I assume. If not, please explain what the purpose of this page is. Please describe the connection between the ELNs, The Breaking Good and the alintheopen.

This page shares a poster the authors presented on how we use GitHub as and ELN. It supports the manuscript we have submitted here.

alintheopen is Alice Motion’s GitHub handle. Alice is the director and founder of The Breaking Good Project and the leader of our rearch group and hence the owner of The Breaking Good Project on GitHub. The ELN’s are owned by Kymberley and Klementine who are team members of The Breaking Good Project. There ELNs contain all their laboratory work they have conducted as part of their research with Alice and The Breaking Good Project.

6. GitHub How to Guide
The GitHub How to Guide is not a guide of how to use it as an ELN. This repository is in development for future projects within The Breaking Good Project where we plan to have participants share their contributions on GitHub. We did not link to this repository in the manuscript because it is not relevant to this publication.

Referee 4

1. Title – “real-time sharing”
This is a great point and adds to a comment from another reviewer about how we do this type of asynchronous collaborative editing work, for example when we are writing up a draft manuscript which is part of the end-to-end process but not something we use GitHub for. We use real-time sharing as a term to distinguish this work from the much slower avenues of peer-reviewed publication and conference presentations. It’s true that communication via GitHub is asynchronous, but the sharing of data is instant - as soon as the experimentalist records their work, it is available to others. We have updated the Introduction and Real-time sharing of knowledge and collaboration sections of the manuscript to clarify this.

2. Introduction

The work we share in the manuscript is all based on our own use of GitHub as an ELN. We don’t have any formal quantitative or qualitative data to support it, however how we describe the use of GitHub has come from our own experiences which we have refined through casual discussions together. The “shortcomings” section details all the difficulties we specifically faced in the process. As such we believe this manuscript presents preliminary finding of our own team members using GitHub as an ELN in open drug discovery.
We are actively developing this project, to gain qualitative data through a survey of users of our GitHub ELN, and alternative ELNs. To do this research, we will seek ethics approval. We feel this is out of scope of this paper, which aims to share our initial experiences with using GitHub and garner more interest in the tool to broaden the users we hope will participate in our follow up study. As such we have also not expanded the introduction as suggested as we feel will fit better into a follow up study.

3. GitHub
a. The authors refer to Figures 3 and 4, although they are not included in the manuscript. To this point, the section on GitHub would benefit from more visuals, in particular if the images are descriptive, support the text, and provide context as to how the authors used GitHub to collaborate, etc. For example, the authors may draw inspiration from https://protect-au.mimecast.com/s/wDhMCYW8NocDk5Ervc3eyO1?domain=link.springer.com which presents case studies to discuss the features a notebook available on GitHub in the context of the scientific requirements of their project activities.
Thanks for picking this up. Somehow we lost these figures before uploading! We have added them back in to the revised manuscript. As screenshots of different tabs on GitHub and of an example Issue we hope that they provide what you were missing.
b. The sentence, “Beyond an internet connection, there are no barriers for anyone who wishes to view work within a public GitHub repository, and no account or subscription is needed” may be considered an overstatement by some readers. It is suggested that the authors refrain from such absolute statements (“no barriers”) especially without evidence from a usability study to support such claims.
We have amended this to reduce the possibility that the reader may see it as an overstatement

4. Real time Sharing of Knowledge and Collaboration
When discussing issues, wiki, discussion, and code tabs it is suggested that this text be accompanied by visual examples from the authors’ project repository in GitHub to facilitate understanding of how these features assisted or facilitated project requirements. In particular this section would benefit from focus group or interview testimonials on the benefits and challenges of use, collaboration, and knowledge sharing.
These visuals are in Figures 3 and 4 which we have included back into the revised manuscript. While we do not have focus group or interview testimonials, we included here an example of real-time sharing to twitter that which we use as evidence of GitHub ELN being useful in cultivating collaboration. We share this as a preliminary finding from our team’s use of GitHub ELNs.

5. Shortcoming
When discussing shortcomings, this section would benefit from testimonials or quotes or findings from focus groups or interviews on the benefits and challenges of use, collaboration, and knowledge sharing.

This section is based on our own experiences and represents our preliminary findings. While we haven’t presented the findings as quotes or testimonials, they are findings from researchers (the authors) who have used GitHub as an ELN and detail the barriers they observed in its use. We have added a small introduction to this section that explains this such that it is clearer to the reader.

6. Outlook
The sentence, “More broadly, GitHub’s extensive array of options for communication and discussion, as well as the minimal barrier to using the site, make it straight- forward for new collaborators to get involved at whatever level they wish” would be strengthened with citations to corroborating usability studies or by rephrasing based on findings from research with users in which barriers to entry are either tested or systematically observed.

We have tested this link and found it to function, both by clicking on the link directly or copying and pasting the text into a web browser. Please check that the whole of the text is highlighted if copying and pasting, as sometimes the hyphen between “ELN” and “template” is cut off when selecting the text.

7. Final comment
While we agree that including human research data would be very valuable, it is beyond the intended scope of this paper. Our aim is to introduce GitHub as a tool for researchers and share our preliminary findings into its usability. A future plan is to obtain ethics approval to undertake a study exploring user experiences once the number of users increases. We are planning to trial GitHub as an ELN in our Breaking Good Project with Schools/Undergraduates to that end. The submitted paper introduces GitHub as a tool for researchers and shares our preliminary findings into its usability. We believe that our preliminary findings support the claims we have made in the manuscript and hope to share more evidence in a later publication.




Round 2

Revised manuscript submitted on 16 Jun 2023
 

28-Jun-2023

Dear Dr Motion:

Manuscript ID: DD-ART-03-2023-000032.R1
TITLE: GitHub as an open electronic laboratory notebook for real-time sharing of knowledge and collaboration

Thank you for submitting your revised manuscript to Digital Discovery. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Digital Discovery. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

If you would like us to promote your article on our Twitter account @digital_rsc please fill out this form: https://form.jotform.com/213544038469056.

We are offering all corresponding authors on publications in gold open access RSC journals who are not already members of the Royal Society of Chemistry one year’s Affiliate membership. If you would like to find out more please email membership@rsc.org, including the promo code OA100 in your message. Learn all about our member benefits at https://www.rsc.org/membership-and-community/join/#benefit

By publishing your article in Digital Discovery, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Dr Joshua Schrier
Associate Editor, Digital Discovery


 
Reviewer 3

Title: GitHub as an open electronic laboratory notebook for realtime
sharing of knowledge and collaboration (ID: DD-ART-03-2023-000032.R1)
by: Scroggie, Kymberley; Burrell-Sander, Klementine J; Rutledge, Peter; and Motion, Alice


Reviewer comments:
I express my gratitude to the authors for their comprehensive response to my comments and the efforts they have made to address my concerns. The modifications they have implemented, particularly the inclusion of the previously missing images and the cleanup of the commented sections, have greatly enhanced the overall completeness, polish, and clarity of the work.
The updated README files on GitHub effectively minimize the miscommunications mentioned by the authors on multiple occasions. I believe that incorporating some of those explanations into the manuscript would enhance the readers' comprehension of its structure. Anyhow, I would like to provide some feedback (not questions) on two certain responses, as outlined below:

1.1 Authors: The Breaking Good Project uses GitHub for all its projects, not just those that are directly related to this manuscript. These include projects with high school students and our citizen science project E$$ENTIAL MEDICINE$. We can see that this may cause confusion, and tried to make it clear by sharing the links to the related repositories in the manuscript. It is unfeasible for us to remove all the repositories from The Breaking Good Project's GitHub account, so we hope this is acceptable but also welcome suggestions on how to make this clearer.

I understand and appreciate your point. I recognize the difficulties involved in removing all the links from the Breaking Good Project GitHub account. The confusion likely arose from the absence of any mention of the Breaking Good Project in the manuscript, despite it being the first linked item on the data source page. I am curious if the authors could have created a separate account specifically for the purpose of this manuscript.

2.1 Authors: A place for date is not required as Git is a version control system that time stamps all changes and is captured automatically. If used as described in the manuscript, in real-time such that it is updated as the activities are taking place as would be a non-ELN then the time stamps represent that time and date that an activity is performed. Both Kymberley and Klementine have made use of the inclusion of times only when updates to the ELN have not occurred in real-time.

I understand that the data entry and raw data collection are automatically timestamped. However, my intention was to highlight that in "the provided synthesis template," we can only view the time of ELN updates and the collection of characterization data once we open the raw data files. What I meant to convey is that it would be beneficial to have information about the start time, individual steps, and completion time of the synthesis experiment itself. In real-world scenarios, it is common for lab members to perform a synthesis initially and then complete the product isolation after a few days. Additionally, it is also typical for the synthesized material to be sent to other centers for further characterization. Therefore, having a clear understanding of the time elapsed between each step would be valuable.



In conclusion, although I have personal thoughts on the concept of public real-time sharing of raw data, my role in this review is to focus on the data/source aspects of the work. In this regard, I am satisfied that the authors have adequately addressed my comments and concerns, leading to improvements in their ELNs. I believe this work stands out for its innovative approach in harnessing the power of GitHub to enhance collaborations and facilitate data sharing, and it will likely spark engaging discussions within this field. Therefore, I recommend accepting this manuscript for publication in Digital Discovery.




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license