Changing the face of scientific publishing

RSC Publishing has long been a champion of technological innovation within publishing, and its recent activities in promoting the creation and adoption of new standards promises to transform the way that science is published, not just within our own publications but for all creators and users of scientific data.

These enhancements will be particularly relevant to Metallomics articles and reviews as the journal is bringing together a diverse range of disciplines. We are actively developing these enhancements to enable readers to: find articles easily through web search engines, allow enhanced understanding and interpretation of the concepts and principles described, and to link more easily to related articles, information, compounds, databases and terms used.

RSC Prospect has taken elements of semantic web developments—structuring documents to enable meaning to be interpreted—and applied them to the scientific content of our articles to show the possibilities of applying standard identifiers to chemicals and concepts. Identifying the real content of published science opens up possibilities for new ways to discover, reuse, understand and analyse articles that weren’t previously possible. We have demonstrated some applications here, but there are many more possibilities which we intend to explore for our readers.

Much of this work so far has been developed for academic research, developed in-house at the RSC with our partners the Unilever Centre for Molecular Informatics and the Computer Laboratory at Cambridge University. There are other concepts being developed in other research centres and within the RSC which will apply to different areas of the chemical sciences. Essentially, the application of these concepts to publishing within RSC Prospect is to prime the pumps – we are the first publisher to use these standards, and by doing so and promoting their advantages, we hope to catalyse developments in research, to spread these developments through the publishing ecosystem, and to change the way chemical science information can be found, analysed, interpreted and reused.

All Metallomics articles are available free of charge (after a simple registration) and all have been enhanced with our award-winning project RSC Prospect. Take a look at the enhanced HTML articles on the web (www.rsc.org/metallomics) and see the fantastic potential of this interactive technology to enhance your papers in the journal (Fig. 1).


Enhanced HTML articles are highlighted in the online contents pages.
Fig. 1 Enhanced HTML articles are highlighted in the online contents pages.

Open standards for subject classifications

A common problem when trying to find information is being able to use the right terminology. Agreed and open standards covering the sciences have been lacking for some time, and hinder efforts to find and compare relevant text and data. The RSC has used selections from the Open Biomedical Ontologies (the Gene, Sequence and Cell Ontologies, and ChEBI for chemical entities) and has also contributed to these as an active user to help increase their accuracy and relevance.

In addition we have started to build our own subject classifications covering selected areas related to our journals—to allow us to classify our own content better and offer new means to search—which we will make open and act as curator. Again, we hope that by making these available for anyone to use we can make it possible to link together related science across not just our own publications but other publishers’ content and other sources available online. The advantages that ontology terms offer over simple keywords include the reduction of ambiguity caused by synonyms, and the ability to use relationships described between the ontology terms to widen or narrow down collections in very specific ways.

The first two ontologies that we’re making available are: RXNO—a reaction ontology, and CMO—a chemical methods ontology. These are freely available to download from www.rsc.org/ontologies.

Identifying and linking compounds

The RSC has been an enthusiastic early adopter of the new standard for compound identifiers, the InChI, developed by IUPAC and NIST. The InChI identifier contains full structural information about compounds, but can get long and unwieldy for normal use, so a fixed format InChIKey can be produced from the InChI which is more search-engine friendly. While the InChI can be converted back to the original structure, the InChIKey, needs to be linked back to the original InChI code to derive the real compound information.

To underpin our commitment to the standard, RSC has sponsored the development of an InChI resolver service via ChemZoo’s ChemSpider service. ChemSpider already contains over 21 million compounds, and the resolver service will allow users to lookup full InChI identifiers from the shorter fixed-length InChIkey. This is a basic ‘plumbing’ service available to the community which will facilitate the lookup of full compound information, and will be of use to anyone with compound collections which they want to make available to all. The InChI Resolver service is also intended to allow compound deposition so that compound collections can be deposited with the service, preserving their continued access for the future. This free service will allow the community to easily use InChIs and facilitate sharing of compound collections.

RSC Publishing and ChemZoo launched this service at the ACS Spring Meeting in Salt Lake City. Please visit the InChI Resolver site at inchis.chemspider.com to find out more information.

Data collections

We already make associated supplementary data files available alongside our articles and we know how powerful a standard format for data can be. It becomes not just a means to preserve research data but to share and allow the data to be visualised and reused. The RSC is a supporter of open data and will be working to encourage authors to store and supply their research data files within their publication. We will be looking at possible standards covering areas relevant to Metallomics and providing demonstrations to show what can be done with the data if it is available to share in an open, standard form.

RSC Prospect—we show what’s possible

We are using our award winning project RSC Prospect to show some of the benefits of applying new standards to our journal articles. By using the standards mentioned above, and using our skills to develop them further and apply them specifically to our areas of science publishing, we have added a layer of semantic enrichment to articles that enable them to be found more easily, to be better understood, to have the compound data available in a machine-readable form, and to link together content by subject term or by compound (Fig. 2).
Compound information linked directly from within the article.
Fig. 2 Compound information linked directly from within the article.

When this first went live in 2008, we were limited to offering an enhanced HTML view of a paper, with inline links highlighting unique compounds via InChI and terms from the Gene, Sequence and Cell ontologies.

Since then we have extended this to create machine readable RSS feeds containing real chemical information (and structures for humans!), compound image popups on mouseover, and the application of the ChEBI ontology for chemical classes and groups. Last year we introduced chemical structure and substructure searching on our enhanced articles, the first primary publisher to achieve this. We’re now using our subject and compound information pages to direct readers to content (“are you interested in this compound or subject area? These are our articles which include it…”). Most recently we have started identifying reactions and chemical methods within our papers.

The future…?

We’ll be looking at promoting the use of InChI compound identifiers further, and working with other publishers to link together our compounds. We’ll develop our classifications to cover all the subject areas that we publish. We’ll be applying this markup to all our content to give our readers an unparalleled view of our publications, and we’ll be working with our authors to preserve real scientific data through the publication process and make it openly available for reuse. An important part of this is providing compelling demonstrations of what all this can achieve, and we’ll be doing this through our RSC Prospect developments. We’re proud to do this as a learned society publisher, here to promote the chemical sciences worldwide. With the valued participation of our authors and readers we will continue to do so, and please don’t hesitate to make your own suggestions as to how this technology could be further exploited and how YOU would like to see it develop.

Richard Kidd

Manager, Informatics

Glossary and further reading

ChEBI—Chemical Entities of Biological Interest is a database of molecular entities focused on ‘small’ chemical compounds. ChEBI specifies the relationships between molecular entities or classes of entities and their parents and/or children. www.ebi.ac.uk/chebi

GO—The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism. www.geneontology.org

InChI—The IUPAC International Chemical Identifier is a textual identifier for chemical substances, designed to provide a standard and human-readable way to encode molecular information and to facilitate the search for such information in databases and on the web. www.iupac.org/inchi

InChIKey—A hashed version of the full InChI designed to allow for easy web searches of chemical compounds.

OBO—Open Biomedical Ontologies is an effort to create controlled vocabularies for shared use across different biological and medical domains, and includes the GO and SO projects. www.obofoundry.org

OSCAR—Open Source Chemical Analysis Routines is a toolkit for the high-throughput and automated annotation of chemistry in scientific articles. RSC Prospect uses OSCAR for text mining. wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Oscar3

SciBorg—SciBorg is a project to apply natural language processing methodologies to chemistry texts. The project is a collaboration between groups at the University of Cambridge and three major publishers, including RSC Publishing. www.cl.cam.ac.uk/∼aac10/escience/sciborg.html

SO—The Sequence Ontology project aims to develop an ontology for describing biological sequences. www.sequenceontology.org


This journal is © The Royal Society of Chemistry 2009
Click here to see how this site uses Cookies. View our privacy policy here.