VOCC: a database of volatile organic compounds in cancer

Subhash Mohan Agarwal*, Mansi Sharma and Shehnaz Fatima
Bioinformatics Division, National Institute of Cancer Prevention and Research (NICPR-ICMR), I-7 Sector-39, Noida – 201301, India. E-mail: smagarwal@yahoo.com

Received 30th September 2016 , Accepted 5th December 2016

First published on 6th December 2016


Abstract

Volatile organic compounds (VOCs) have become increasingly important in recent years, as they are catalyzing a shift in the cancer diagnosis approach to a non-invasive system. Despite several efforts, non-invasive tests that detect cancer early are not available and hence there is a need to develop new resources that facilitate cancer diagnosis. Therefore, we have developed for the first time a comprehensive literature curated database termed VOCC, which catalogs experimentally identified VOCs that are reported to differentiate cancers from normal samples. The database contains 551 VOCs from 17 different cancers and 10 sample sources. Each record in the database is presented in a compound centric manner and provides information on their name, cancer type, source, structure (2D and 3D), properties (physical, chemical, topological), patient sample information, VOC detection technique, statistical results and accuracy of detection. We have also assembled information about the changes in the concentration level of VOC's (qualitative and quantitative) in cancer compared to normal cells. VOCC covers 17 cancer types in which VOC concentration identified from various sources like blood, breath, and urine has been documented in normal and cancer patients. VOCC can easily be browsed or searched using various options facilitating easy retrieval and dissemination of information. Further to facilitate retrieval of existing data, each record is hyperlinked to other databases like the human metabolome database, ChemSpider and PubChem. It is expected that the availability of this free web-based database will stimulate and enable development of new methodologies as well as understanding what is essential for designing early detection approaches based on VOCs.


Introduction

Cancer is a devastating disease, which is responsible for the second highest cause of death in the world.1,2 It is predicted that the number of deaths due to cancer alone would increase to over 13 million by 2030.3 Despite the tremendous efforts made in biomarker studies and targeted therapy,4,5 non-invasive tests that are easy to conduct, inexpensive and painless are still not available in the market to detect the presence of cancer early enough. Therefore, there is an urgent global unmet need for developing inexpensive and noninvasive technology that enables early detection of cancer.6,7 One of the promising approach that has garnered attention of the researchers is based on volatolomics i.e. volatile organic compounds (VOCs) emanating from the cancer cells.8–10 VOCs are compounds that exhibit high vapor pressure at normal room temperature. As VOCs are detectable in various body fluids they provide a fast and convenient alternative solution for early cancer diagnosis and screening with high acceptability rates in the patients. Studies have demonstrated that in comparison to normal, volatile metabolites exhaled in the breath,11–14 excreted in the urine15–17 or in the blood18,19 of cancerous samples exhibits specific distinguishable patterns. In fact, breath testing has always been used to associate diseases with pathological processes in the body.20 For example, the sweet smell of acetone in breath has been used as a signal indicating uncontrolled diabetes while a fishy smell indicates liver disease and a urine-like smell is related to kidney failure.21 Similarly, urine is another non-invasive source that is considered to be important for detecting a disease. A few of the advantages of using urine are that it is available in large volumes and the analytes excreted are already concentrated by the kidney. As a result various researchers are detecting, analyzing and reporting the VOCs using which the cancer can be diagnosed early. These VOCs can be identified from various sources like the headspace of cancer cells, through the urine, from the skin, blood, and/or exhaled breath. As the VOCs in these body fluids emerge at very early stages of the cancer so their detection could serve as an indicator of the diseased state. Hence, analysis of VOCs has emerged as a new frontier in medical diagnostics, because it is non-invasive and potentially inexpensive. Many studies have investigated the presence of volatile metabolites from patients with different cancers such as lung cancer,11 colorectal cancer,14 gastric cancer,22 bladder cancer,23 thyroid cancer,13 and breast cancer.24 It has been suggested that identification of disease specific VOCs may provide novel insights into diagnostic approaches for treatments for various diseases.25 Also, researchers have proposed that restricted information on VOCs have limited our understanding of relationship between the VOCs in different body fluids.9 Therefore, we have developed Volatile Organic Compounds in Cancer database (VOCC, http://smagarwal.in/vocc/), which provides comprehensive information of VOCs distinctly observed in cancer vs. normal from various malignancies and different sources. The present database is first of the kind to the best of the author's knowledge. It is expected that the availability of VOCC would cater to the needs of researchers/clinicians working in the field of cancer volatolomics.

Methods

Data source, compilation and structure

To identify VOCs that exhibit variation in its concentration in human cancers compared to normal cases, we have extensively searched PubMed with different keywords or their combination to find the relevant literature. We gathered information on VOCs, cancer studied, sample source used for collecting VOCs, period of collection, number and mean age of patients/control, hospital, histology, variation in concentration values or fold change, technique used for detecting VOCs, journal information, sensitivity, specificity and accuracy of the predictive model. We also provide 2D and 3D structures of the collected VOC's. Different structural and topological molecular descriptors including IUPAC name, mass, composition, InChI, InChI key, SMILES, log[thin space (1/6-em)]P, refractivity, atom count, aliphatic atom count, chain atom count, bond count and ring count etc. were calculated for each compound using the Marvin sketch. Once all the information was gathered, the data was entered into different tables as given in Fig. 1. We then integrated the data in MySQL, an object-relational database management system (RDBMS), which works at the backend and the web interface, was developed in PHP. We have built VOCC on Apache HTTP server with MySQL server and PHP, HTML and javascript, as these are platform independent and are open-source.26,27
image file: c6ra24414a-f1.tif
Fig. 1 Schematic showing information available in VOCC database.

Results

VOCC is a collection of volatile metabolites in humans that exhibit variation in the levels in cancer as compared to normal sample. In the current release, we have compiled 551 VOCs from 17 different cancers and 10 sample sources. Each entry in the database has been arranged in a compound centric manner for easy access and retrieval.

Compound-centric portal

Each entry provides the following information: (i) VOC information, including the VOC name, VID (unique), IUPAC name, synonyms, class, InChi, InChi key, SMILES and CAS number; (ii) a schematic view of VOC along with 3D representation that can be downloaded in SDF format; (iii) different cancer types in which VOC has been detected along with number of papers; (iv) description of sample sources from which VOC have been isolated and number of papers (v) cancer type and sample source combination details; (vi) VOC concentration tab which on clicking leads to another page that provides data about the changes in either concentrations level or fold change or increase/decrease of VOCs in cancer compared to normal; (vii) each of the record in the database is also supplemented by reference citations (title, journal name, volume, year and PubMed ID), that provides details of sample size (normal and diseased state), mean age, hospital from which samples were collected, distribution of samples according to stages and histological classification; (viii) link to properties (physical, elemental and topological) (ix) hyperlink to external databases like Human Metabolome DataBase (HMDB),28 ChemSpider and PubChem29 (Fig. 2).
image file: c6ra24414a-f2.tif
Fig. 2 Schematic workflow of VOCC database showing the (A) compound centric data card using which a user can access the content in the database (B) cancer type information (C) sample source information (D) linkage to another page with VOC concentration information (E) reference details (F) linkage to another page providing detailed information about sample and various analysis parameters extracted from reference (G) structural information (H) cross-links to other databases.

Data access

For ease of access search facilities like text search and browse options have been designed. Users can either query the database by voc name, cancer type, source, class, CAS number, SMILE, InchiKey, PMID and PubChem ID, or browse the entire collection. This results in display of VOCC-centered information in a new page.

Data statistics and analysis

Cancer type. At present, VOCC covers 551 VOCs corresponding to 17 cancer types and the cancer-wise distribution of VOCs in VOCC is shown in Fig. 3. It is clear from the graph that majority of VOCs have been identified from lung cancer. The other major cancers are breast, colorectal, hepatocellular and skin in which VOCs have been detected.
image file: c6ra24414a-f3.tif
Fig. 3 Distribution of VOCs in VOCC database.
Sample source. VOC's have been reported in literature to be isolated from 10 various sources (documented in VOCC) including breath, blood and urine (Fig. 4). Among, all the sources breath is the most explored source as evident from the data in the VOCC database.
image file: c6ra24414a-f4.tif
Fig. 4 Number of VOCs according to sample source.
Classes. Classification of the VOCs on the basis of their structural skeleton led to identification of 22 types of classes (Fig. 5). The five top most classes in terms of the numbers are alkanes, alcohols, aromatic compounds, ketones and alkenes. This classification gives idea about which class of VOC is mainly released due to disturbance in metabolic system of humans in different cancers.
image file: c6ra24414a-f5.tif
Fig. 5 Classification of VOCs according to structural scaffold.

We have further classified the classes based on 3 major sources of VOC's i.e. breath, blood and urine (Fig. 6). We observed that alkanes are not present in urine samples while nitriles are absent from blood samples. It was thus noticed that different sources exhibit different class distribution signifying variation in release depending on the source.


image file: c6ra24414a-f6.tif
Fig. 6 Type of classes in 3 major sources breath, blood and urine.
Techniques. The database also provides data on analytical and detection techniques used for identifying VOCs. Most of the VOCs have been purified and identified using SPME-GC-MS, TD-GC-MS, PTR-MS or SIFT-MS technique. We observe that GC-MS has been mainly used by the investigators as the analytical technique for the investigation of potential VOC, due to its sensitivity and reliability in analyte identification. It has been suggested that it provides the most detailed analytical information and identifies analytes with the most certainty.
Distribution of VOCs according to cancer type and sample source. We also analyzed the distribution of VOCs that are present in a certain sample source distributed across all the cancers (Fig. 7). We observed that mainly breath and urine samples have been used to detect VOCs in several of the cancers. While some other sources like skin and lung tissue has been used only for detecting skin and lung cancer respectively. This distribution of VOCs according to cancer type and sample source also shows that most of the research in the area of volatolomics is concentrated on breath analysis for detecting VOC's.
image file: c6ra24414a-f7.tif
Fig. 7 VOC distribution in each sample source and cancer type.
VOC concentration data. In VOCC, concentration data (quantitative and qualitative) for various VOC's has also been provided. We have 487 concentration value records and variation in fold change for 78 records corresponding to 183 VOC's. This means that, 33% of data is quantified in VOCC. So, this database enables users to identify volatile organic compounds that have activity/concentration against a particular or across large number of cancers. For example: we observed that the concentration of acetaldehyde (VID00285) was found to be decreased in 6 references in case of lung cancer samples analyzed using breath sample source. Similarly, ethanol (VID00452) has information for five references of lung cancer corresponding to breath sample source and it was observed that in all the 5 references, the concentration of ethanol increases in the patient as compared to control/healthy. This demonstrates that concentration of ethanol is always increased in lung cancer patient's samples analyzed using breath as source. Thus, advantage of this database will be that it would help in the process of cancer detection by providing experimentalist control and patient concentration of volatile organic compounds. However, one of the difficulties that exist presently is use of varied sampling collection methods. As a result, comparison of VOC concentration values from different studies in a cancer becomes complex. Thus, for successful realization of using various datasets available in literature in order to complement each other, a standardized procedure will need to be developed in the near future.9
Physical, chemical and topological properties. In VOCC we have computed and assembled data of physical, chemical and topological properties of VOC's as the above properties are important to address issues such as diffusion of VOCs from blood to alveolar air across the alveolar–capillary membrane.21 It has been suggested that properties such as polarity, solubility in fat, Henry partition constant and volatility regulate the behavior of VOC's.
Comparison of VOCs present in blood, breath and urine samples. The overlapping of VOCs across the three major sample sources i.e. breath, blood and urine was investigated. We find that there are 12 VOCs (acetone, acrolein, benzene, cyclohexanone, ethylene oxide, heptanal, hexanal, styrene, toluene, 2-butanone, xylene and ethanol) that are common in these sources. Also, among all the 3 sample sources there are 404, 63 and 42 VOCs unique in breath, urine and blood respectively. Whereas, there are 38 VOCs that are common in breath & blood sample, 23 VOCs common in breath & urine, and 5 VOCs common in blood & urine sample sources.
Comparison of VOCC with HMDB. HMDB (Human Metabolome DataBase) is a repository which consists of small molecule metabolites found in the human body.28 It is a database that contains quantitative chemical, physical, clinical and biological data of all experimentally ‘detected’ and biologically ‘expected’ human metabolites. Whereas, VOCC is a literature curated database of human VOCs that exhibit variation in different cancers and sources. It is a unique resource that not only lists the VOC's found in cancer but also provides data on various aspects like concentrations level in cancer compared to normal, physical, chemical and topological properties, its structure and the corresponding bibliographic data. Further, on comparing VOCC with HMDB its uniqueness is established as we found that only 221 metabolites are overlapping with HMDB while 330 (60%) are unique to VOCC.

Conclusion

VOCC is a first of its kind attempt to provide a comprehensive non-redundant catalogue of volatile organic compounds involved in a certain cancer type along with information about the concentration pattern and supporting evidence from published literature. It thus provides a unique value-added resource in the field of cancer volatolomics. Some of the potential applications that we could envisage are that it will help in: (i) providing scientific evidence for the VOC's that are present/absent in cancers and will thus act as a referral point for other studies; (ii) identifying VOCs that are produced in significantly higher or lower levels than normal, may therefore serve as biomarkers for the assessment or detection of disease; (iii) providing indication that which source is better and has been extensively used in the field for identifying a particular cancer; (iv) studying that in how many cancers and sample source a VOC has been reported. Further as we provide quantitative and qualitative data i.e. concentration values or fold change it provides an opportunity to researchers to compare the concentration across different cancers/samples and identify patterns of interest; (v) researchers to make a thorough comparison of volatile organic compounds that are common in most of the cancers as well as detect the ones that are unique and thus identifying metabolic disturbances in different cancers (vi) identifying structural features typical of volatile organic compounds showing variation in different cancers using QSAR or binary classification approach.30–32 We also plan to regularly update the database adding new data from literature so that it contributes to the development of the field. We finally hope that the availability of this database would save time and effort of researchers involved in the field and thus will facilitate development of non-invasive diagnostic methods.

References

  1. M. Mangal, M. I. Khan and S. M. Agarwal, Adv. Anticancer Agents Med. Chem., 2016, 16, 138–159 CrossRef CAS .
  2. H. Haick, Y. Y. Broza, P. Mochalski, V. Ruzsanyi and A. Amann, Chem. Soc. Rev., 2014, 43, 1423–1449 RSC .
  3. M. Mangal, P. Sagar, H. Singh, G. P. Raghava and S. M. Agarwal, Nucleic Acids Res., 2013, 41, D1124–D1129 CrossRef CAS PubMed .
  4. I. S. Yadav, P. P. Nandekar, S. Srivastavaa, A. Sangamwar, A. Chaudhury and S. M. Agarwal, Gene, 2014, 539, 82–90 CrossRef CAS PubMed .
  5. V. K. Sharma, P. P. Nandekar, A. Sangamwar, H. Perez-Sanchez and S. M. Agarwal, RSC Adv., 2016, 6, 65725–65735 RSC .
  6. X. Sun, K. Shao and T. Wang, Anal. Bioanal. Chem., 2016, 408, 2759–2780 CrossRef CAS PubMed .
  7. A. Amann, L. Costello Bde, W. Miekisch, J. Schubert, B. Buszewski, J. Pleil, N. Ratcliffe and T. Risby, J. Breath Res., 2014, 8, 034001 CrossRef PubMed .
  8. M. Hakim, Y. Y. Broza, O. Barash, N. Peled, M. Phillips, A. Amann and H. Haick, Chem. Rev., 2012, 112, 5949–5966 CrossRef CAS PubMed .
  9. Y. Y. Broza, P. Mochalski, V. Ruzsanyi, A. Amann and H. Haick, Angew. Chem., Int. Ed. Engl., 2015, 54, 11036–11048 CrossRef CAS PubMed .
  10. R. Vishinkin and H. Haick, Small, 2015, 11, 6142–6164 CrossRef CAS PubMed .
  11. Y. Saalberg and M. Wolff, Clin. Chim. Acta, 2016, 459, 5–9 CrossRef CAS PubMed .
  12. A. Krilaviciute, J. A. Heiss, M. Leja, J. Kupcinskas, H. Haick and H. Brenner, Oncotarget, 2015, 6, 38643–38657 Search PubMed .
  13. L. Guo, C. Wang, C. Chi, X. Wang, S. Liu, W. Zhao, C. Ke, G. Xu and E. Li, Transl. Res., 2015, 166, 188–195 CrossRef CAS PubMed .
  14. N. K. de Boer, T. G. de Meij, F. A. Oort, I. Ben Larbi, C. J. Mulder, A. A. van Bodegraven and M. P. van der Schee, Clin. Gastroenterol. Hepatol., 2014, 12, 1085–1089 CrossRef CAS PubMed .
  15. T. Khalid, R. Aggio, P. White, B. De Lacy Costello, R. Persad, H. Al-Kateb, P. Jones, C. S. Probert and N. Ratcliffe, PLoS One, 2015, 10, e0143283 Search PubMed .
  16. R. P. Arasaradnam, M. J. McFarlane, C. Ryan-Fisher, E. Westenbrink, P. Hodges, M. G. Thomas, S. Chambers, N. O'Connell, C. Bailey, C. Harmston, C. U. Nwokolo, K. D. Bardhan and J. A. Covington, PLoS One, 2014, 9, e108750 Search PubMed .
  17. Y. Hanai, K. Shimono, K. Matsumura, A. Vachani, S. Albelda, K. Yamazaki, G. K. Beauchamp and H. Oka, Biosci., Biotechnol., Biochem., 2012, 76, 679–684 CrossRef CAS PubMed .
  18. P. Mochalski, J. King, M. Haas, K. Unterkofler, A. Amann and G. Mayer, BMC Nephrol., 2014, 15, 43 CrossRef PubMed .
  19. C. Wang, P. Li, A. Lian, B. Sun, X. Wang, L. Guo, C. Chi, S. Liu, W. Zhao, S. Luo, Z. Guo, Y. Zhang, C. Ke, G. Ye, G. Xu, F. Zhang and E. Li, Cancer Biol. Ther., 2014, 15, 200–206 CrossRef CAS PubMed .
  20. W. Ma, X. Liu and J. Pawliszyn, Anal. Bioanal. Chem., 2006, 385, 1398–1408 CrossRef CAS PubMed .
  21. B. Buszewski, M. Kesy, T. Ligor and A. Amann, Biomed. Chromatogr., 2007, 21, 553–566 CrossRef CAS PubMed .
  22. J. R. Huddy, M. Z. Ni, S. R. Markar and G. B. Hanna, World J. Gastroenterol., 2015, 21, 4111–4120 CrossRef PubMed .
  23. P. Bassi, V. De Marco, A. De Lisa, M. Mancini, F. Pinto, R. Bertoloni and F. Longo, Urol. Int., 2005, 75, 193–200 CrossRef PubMed .
  24. L. Lavra, A. Catini, A. Ulivieri, R. Capuano, L. Baghernajad Salehi, S. Sciacchitano, A. Bartolazzi, S. Nardis, R. Paolesse, E. Martinelli and C. Di Natale, Sci. Rep., 2015, 5, 13246 CrossRef CAS PubMed .
  25. M. Shirasu and K. Touhara, J. Biochem., 2011, 150, 257–266 CrossRef CAS PubMed .
  26. I. S. Yadav, H. Singh, M. I. Khan, A. Chaudhury, G. P. Raghava and S. M. Agarwal, Adv. Anticancer Agents Med. Chem., 2014, 14, 928–935 CrossRef CAS .
  27. S. M. Agarwal, D. Raghav, H. Singh and G. P. Raghava, Nucleic Acids Res., 2011, 39, D975–D979 CrossRef CAS PubMed .
  28. D. S. Wishart, T. Jewison, A. C. Guo, M. Wilson, C. Knox, Y. Liu, Y. Djoumbou, R. Mandal, F. Aziat, E. Dong, S. Bouatra, I. Sinelnikov, D. Arndt, J. Xia, P. Liu, F. Yallou, T. Bjorndahl, R. Perez-Pineiro, R. Eisner, F. Allen, V. Neveu, R. Greiner and A. Scalbert, Nucleic Acids Res., 2013, 41, D801–D807 CrossRef CAS PubMed .
  29. S. Kim, P. A. Thiessen, E. E. Bolton, J. Chen, G. Fu, A. Gindulyte, L. Han, J. He, S. He, B. A. Shoemaker, J. Wang, B. Yu, J. Zhang and S. H. Bryant, Nucleic Acids Res., 2016, 44, D1202–D1213 CrossRef PubMed .
  30. H. Singh, S. Singh, D. Singla, S. M. Agarwal and G. P. Raghava, Biol. Direct, 2015, 10, 10 CrossRef PubMed .
  31. J. S. Chauhan, S. K. Dhanda, D. Singla, S. M. Agarwal and G. P. Raghava, PLoS One, 2014, 9, e101079 Search PubMed .
  32. K. Dhiman and S. M. Agarwal, RSC Adv., 2016, 6, 49395–49400 RSC .

Footnote

Equal contribution.

This journal is © The Royal Society of Chemistry 2016