An integrative omics perspective for the analysis of chemical signals in ecological interactions

A. E. Brunetti a, F. Carnevale Neto a, M. C. Vera b, C. Taboada a, D. P. Pavarini a, A. Bauermeister a and N. P. Lopes *a
aPhysics and Chemistry Department, School of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, SP, 14040-903, Brazil. E-mail:
bInstituto de Herpetología, Unidad Ejecutora Lillo, CONICET, CP 4000, Tucumán, Argentina

Received 28th May 2017

First published on 8th November 2017

All living organisms emit, detect, and respond to chemical stimuli, thus creating an almost limitless number of interactions by means of chemical signals. Technological and intellectual advances in the last two decades have enabled chemical signals analyses at several molecular levels, including gene expression, molecular diversity, and receptor affinity. These advances have also deepened our understanding of nature to encompass interactions at multiple organism levels across different taxa. This tutorial review describes the most recent analytical developments in ‘omics’ technologies (i.e., genomics, transcriptomics, proteomics, and metabolomics) and provide recent examples of its application in studies of chemical signals. We highlight how studies have integrated an enormous amount of information generated from different omics disciplines into one publicly available platform. In addition, we stress the importance of considering different signal modalities and an evolutionary perspective to establish a comprehensive understanding of chemical communication.

Key learning points

(1) Chemical signals might be regarded as the primary way of conducting information in the biosphere.

(2) High-throughput analytical techniques are valuable tools to identify chemical signals because of: the great diversity of chemical classes; the small quantities in which they act; and their synergistic effect.

(3) Integrating ‘omics’ techniques can be very useful to identify chemical signals within thousands of metabolites released to the environment.

(4) Combination of ecology paradigms and integrative analytical methods can facilitate and accelerate the process of unveiling the structure, and regulatory pathways of different chemical signals.

1. Introduction

Each organism, including its structure, development, and behaviour, is entirely dependent on the ecosystem.1 Because of this dependence, their actions are governed by the flux of energy and information through an enormous number of interactions. Among all possible signalling modalities (i.e., visual, acoustic, tactile, electrical, thermal), communication by means of chemical signals is the oldest and most widespread mode of interaction.2 It represents the flux of information through a chemical substance between an emitter and a receiver either from the same species or different species. There is a chemical signal network at every level of biological organization, from single-cell metabolism to social behaviours.3–5Fig. 1 shows a schematic representation of a hypothetical forest patch; it demonstrates some examples of the large number of possible chemical interactions that occur in merely a single, small area. In this way, understanding this complex system is fundamental to gather a better comprehension of ecological behaviour, which could provide several new concepts for different fields of science.
image file: c7cs00368d-f1.tif
Fig. 1 Organism interactions in a hypothetical terrestrial environment. Interactions occur simultaneously above and below ground, within and across multiple taxonomic levels, either at the same time or in different timescales. Examples of signaling interactions shown in the figure include: (A) plant–animal; (B) host animal–bacterial symbiont; (C) plant–animal–animal (three level of interactions); (D) between animals (intraspecific); (E) between microbes (intra- and interspecific); (F) host plant–bacterial symbiont; (G) plant–fungal; (H) host animal–fungal symbiont; (I) animal–animal (interspecific). Specimens are not drawn to scale. Some animal specimens are not so evident in the drawing (e.g., a mouse and spiders).

In 1959, Butenandt and co-workers6 described the first pheromone, a volatile 16-carbon alcohol, which was named bombykol after the moth's Latin name Bombyx mori. These authors conducted their research by combining different analytical instrumentation available at their time such as gas chromatography-mass spectrometry (GC-MS) and nuclear magnetic resonance (NMR), and bioassay methods. This discovery opened a new research frontier in the biological sciences: it demonstrated that animals do indeed use chemical signals to communicate and that these signals can be identified by different analytical methods.7 Since then, researches from all over the world have identified thousands of chemical signals in species across several taxa and inhabitants of all environments and microenvironments, including deserts and forests above and below ground, seashores, temporary ponds, rivers, lakes, and oceans.

Because chemical signals act in such diverse environments and organisms, they are represented by a broad array of molecules and exhibit a great variation in their chemical and physical properties. They include small water-soluble molecules, volatile organic compounds, proteins, and large saturated hydrocarbons.7 Scientific advances in different ‘omics’ technologies (i.e., genomics, transcriptomics, proteomics, metabolomics) and in its meta and holomics analogues have facilitated the identification and characterization of several of these molecules – even at very small concentrations and in complex biological matrices.7 These advances have also allowed the exploration of organism interactions that were unimaginable only a few years ago, such as those between plant–microbe, animal–microbe and microbe–microbe.8 This revealed a network connectivity across multiple levels of biological organization, which resembles the food web.4 Findings derived from these studies profoundly alter our conception of the natural world at the same time they offer deep challenges for biologists, chemists, and scientists from all disciplines interested in the study of chemical signals.

In this tutorial review, we present holistic approaches to the natural world that may be useful to address the role of chemical signals, as exemplified by the holobiont and infochemical web concepts. We introduce the most significant historical developments in analytical tools and present an overview of the state of-the-art and emerging advances (technological and intellectual) in different omics techniques that have contributed to our understanding of chemical signals. Building on the perspective of an integrative approach, we discuss the importance of integrating the enormous volume of information generated from each omics discipline and, particularly in animals, the contribution of other signal modalities. Finally, given that living processes have originated through evolution, we highlight that phylogenetic analysis represents an invaluable tool to understand the evolution of interactions in different organisms and of the molecules involved. We hope this tutorial provides a concise overview of analytical tools and important concepts from different scientific areas, such as chemistry, biology, or even informatics; together, this information can be used to enable students and young researchers to investigate the compounds that mediate ecological interactions from an integrative omics perspective.

2. Chemical communication

Signals are the building blocks of communication.9 Despite subtle variations, signals are considered to be sources of information in the form of energy or matter, acting in a communication process between the emitter (as the message) and the receiver (as the meaning). In chemical communication, the signal is composed of one or more molecules released to the environment and detected by the receiver through chemical receptors.7 Animals, plants, and microorganisms respond to the molecules perceived by signalling biochemical cascades, while animals also exhibit concerted complex neural events.

There are several terms used to denote different chemical signals depending on whether communication occurs between individuals of the same or different species (intra or interspecific, respectively), and on the net effect of the emitter and receiver (positive or negative; see Table 1).7 The first term used in literature referring to animal communication was ‘pheromone’ by Karlson and Luscher in 1959.10 It was proposed to refer to those substances secreted to the outside of an individual; this was meant to represent communication between members of the same species in contraposition to hormones that are secreted into the blood and serve humoral correlation within the organism. The different terminology of chemical signals may be very useful to examine specific interactions, but they may prove unsatisfactory when multiple types of interactions in the natural world are considered – either at the same time or in different timescales (Fig. 1).

Table 1 Terms and emerging ecological concepts that can be found in chemical signal studies
Term Definitions
Allelochemicals Semiochemicals mediating interspecific interactions (i.e., allomones, synomones, kairomones)
Allomones Semiochemicals that favoured the emitter
Cues Phenotypic traits (e.g., morphological, acoustic, olfactory, traits) that arise in another context than communication and are maintained due to different selection pressures. These traits may, however, be co-opted and have an effect in communication (exaptation)
Holobiont The ecosystem that is an individual animal and its many microbial communities. This term could be extend to other taxonomic units (e.g., plants and microbial communities)
Kairomones Semiochemicals that favoured the receiver
Pheromones Semiochemicals mediating intraspecific interactions (e.g., sexual pheromones, alarm pheromones)
Semiochemicals (also infochemicals) Chemical compounds involved in interaction between organisms
Signals Cues that have at least partly been modified by natural selection for its effectiveness in communication roles (adaptation)
Signature mixtures Semiochemicals presented as chemical mixtures mediating intraspecific interactions (e.g., kin recognition)
Synomones Semiochemicals that favoured the emitter and the receiver

2.1 An integrative view of chemical signalling

Chemical communication has traditionally focused on the identification and behavioural examination of molecules in a direct two-level interaction (emitter–receiver) that is usually under controlled laboratory conditions.7 Since the discovery of bombykol, several research groups have provided extremely valuable information about the structure and function of chemical signals in a diverse group of organisms. However, these approaches led to numerous questions – which are thus far unsolved – regarding the interactions between organisms in the natural world. A series of recent, surprising researches suggested a number of holistic concepts of the world. In 1991, Margulis and Fester11 introduced the term ‘holobiont’, referring to symbiotic associations throughout a significant portion of an organism's lifetime. Currently, this term considers each individual as an ecosystem itself, which would be a system, or complex, represented by the hosts and its associated microbiota.3,8 From this perspective, chemical communication at intra- and interspecific levels arises from the dynamic interaction networks between each of these complexes. The holobiont concept is nowadays used as a reference framework for several behavioural studies, which can be examined thanks to the availability of technological advances in the field of genomics.

At a broader scale, the holobiont is nested within communities of other organisms that comprise a web of interrelations and link species together into a dynamic ecosystem. These relationships are almost limitless and can involve many possible kinds of interactions, such as animal–plant, symbiont–host, parasite–host, predator–prey.4 This ecological concept was presented several decades ago as food webs, which represent the flux of material and energy derived from trophic interactions. Expanding this ecological concept, some authors proposed the term ‘infochemical web’ to refer to the information flow at all biological assembly levels.4 By adopting these emerging concepts to study communication, scientists are challenged to broaden their appreciation of organism interactions. Using them will, however, facilitate analyses of the structure and function of chemical signals in a multidimensional context and across timescales, as they occur in the natural world.

2.2 Diversity of chemical signals

Chemical signals include an enormous array of molecules with an extraordinary variation in their physical and chemical properties, including size, polarity, chirality, chemical reactivity, and stability. The diversity of molecules is affected by the environment in which they are secreted. In terrestrial environments, where they are often spread by air, they have a small molecular mass, and usually they are lipophilic and volatile molecules. For example, 4-methyl-3-heptanone is an ant alarm pheromone, macrolides represent amphibian volatile pheromones, and alkenes, alcohols, benzenoids, aldehydes, ketones, and terpenes have been implicated in microbe–microbe and plant–microbe interactions in the rhizosphere.5,7 They can also be transmitted through direct contact between the emitter and receiver. In this case, they tend to be more hydrophilic as exemplified by the plant allelochemical hordenine or the plethodontid modulating factor, which is a protein pheromone used during mating by some terrestrial salamanders.

In aquatic domains, chemical signals can be much more variable since they can be transported independently of their physical chemical properties. For instance, they include small lipid-soluble molecules, peptides and proteins.7 Under these conditions, the chemical signals must be very specific, and chemical receptors should be very sensible to detect the signal at very low concentrations. Because of the great diversity of molecules as well as the types of interactions that they mediate and environments in which they are secreted, analysing chemical signals require emerging biological concepts and broad technical approaches.

2.3 A general strategy for studying chemical signals

An initial consideration to address the complexity of chemical interaction in the natural world is that successful research depends on both technological and intellectual contributions. Regarding technological contributions, the use of different instrumentation and data analysis capabilities has allowed researchers around the world to generate a large volume of information. In addition, the scientific community is doing a great effort to deposit all generated data in the public domain to make it freely accessible for any researcher. This would represent access to knowledge for a larger number of researchers, and public in general, which may allow using, sharing and discussing their results with larger datasets.

Regarding the intellectual contributions, it is important to emphasize that given the vast amount of information that is continuously generated, these contributions are essential to take advantage of publicly accessible databases, as well as from the different available tools. These contributions include: asking the right question; formulate a good hypothesis; and design the experiment that will allow testing the hypothesis. Fig. 2 shows an overall picture of the different steps of research using the premise of the scientific method. This strategy may be helpful before, after and during the experimental process. For instance, among the topics indicated in the Fig. 2, curiosity is not only the first step, but also one of the greatest motivations in science. It represents the driven force that will allow generating continuous questions; such as, do plants communicate to each other? Is there any volatile substance involved in the process? Can they tell each other about the presence of a phytophagous insect? The combination of both technological and intellectual advances can now make it possible to revolutionize the way we see and understand the world.

image file: c7cs00368d-f2.tif
Fig. 2 A proposed general strategy for studying interactions mediated by chemical signals under an integrative omics approach. This strategy gives a general overview of the different steps of the scientific method, and highlight some important aspects that can serve as a guide for undergraduate and graduate students interested in chemical ecology.

3. Omics in chemical signalling

Since the early days in chemical ecology, the identification of the diverse set of chemical signals has been dictated by bioassay-guided isolation methods.7 Generally, these methods include a biological screening step (e.g., behavioural test) of the crude extract and are followed by multiple steps of separation until the acquisition of semi-purified or pure compounds (elucidated by spectroscopic and spectrometric techniques, which are later discussed in topics 3.1 to 3.3). Fig. 3 shows some specific scientific developments – technical and intellectual – that have greatly contributed to chemical signals investigations. Due to initial limitations of analytical tools such as low sensitivity, specificity, and flexibility, the identification process was time-consuming and required large amounts of biological material.7 For instance, the identification of bombykol took over twenty years and required hundreds of thousands of insects, and yet its structure lacks complexity. Current advancements enable the examination of active molecules at micro-, nano-, or even femtomolar scales, which minimizes the use of animals and plants among others, as will be discussed latter. These technological advancements combine well-known spectroscopic and spectrometry strategies and give support to emerging fields of large-scale data.
image file: c7cs00368d-f3.tif
Fig. 3 Timeline of technological and conceptual developments that have largely contributed to the study of chemical signals. The topics were selected with the purpose to put scientific progress into perspective, and they are not covering all key discoveries.

Many of these emerging fields designate the specific object of study by adding the suffix ‘-omics’ to previously used terms, for example proteomics for the study of proteins. Omic technologies provide a holistic view of a biological system in response to chemical signals, considering all levels of biomolecules. Such achievement is due to the development of analytical tools and high-throughput approaches, which progressively generates large data sets. The great amount of information requires the use of powerful informatics tools that are capable of analysing data, annotation to one or more reference databases, and integrating all omic fields, namely genomic, transcriptomic, proteomic and metabolomics. Overall, the emergence of omics has revolutionized many science areas. This topic presents the main analytical tools used in omics applied to chemical signals studies, and we discuss their advantages and limitations. A general outline of some methodological procedures covered in this tutorial is depicted in Fig. 4.

image file: c7cs00368d-f4.tif
Fig. 4 Flow chart of a typical ‘omics’ experiment for the three fields covered in this tutorial review. Although each omics field has its distinctive characteristics, from the sample acquisition to data analysis, they go through the same basic steps. See Fig. 5–7 for further methodological details.

It is worth noticing that the omics tools presented in this tutorial aim to accelerate and facilitate the identification of candidate molecules that may modulate ecological interactions. However, biological assays, such as behavioural experiments, are usually necessary in order to test the hypothesis about their function as chemical signals (Fig. 2). It is out of the scope of this review to discuss the biological assays, but in the context of the scheme shown in Fig. 2 it is important to mention that the experimental design can take into account both data acquisition and analysis, and functional validation.

3.1 Transcriptomics

Transcriptomics is among the earliest of the omic sciences. It studies the mRNA expression of a particular tissue at a given moment, which helps to understand the relationship between protein-coding genes and cell functioning.12 Because there are several pathways at which the genome expression can be regulated, the transcriptome content can drastically vary in response to biotic and abiotic environmental factors.

Altered gene expression levels in different samples may provide some clues about what genes are active in specific cells and/or tissues.12,13 In addition, comparative transcriptomics can potentially uncover the function of unknown genes. For example, if an unknown gene is expressed in male exocrine glands but not in females nor in other male tissues, it may be involved in reproduction. Thus, the discipline of comparative transcriptomics seeks differential gene expression, which may provide important information for understanding the function of such genes and the biological process in which they participate.

In a typical transcriptomic analysis (Fig. 5A), the levels of expression of hundreds (or even thousands) of genes are measured to understand the mRNA profile. Because of the large amount of genetic material that is examined simultaneously in a given sample, high throughput sequencing technologies are needed to estimate the expression of constitutive genes and to compare it with the expression of genes that are differentially expressed.13 Because of some advantages of next-generation sequencing methods (NGS) over the DNA microarray, for instance it may evaluate transcript levels of sequenced and unsequenced organisms, NGS is currently the most widely used sequencing technique. Currently used systems for NGS include the Genome Analyzer (developed by Solexa, now part of Illumina) and the Ion Torrent NGS.

image file: c7cs00368d-f5.tif
Fig. 5 Overview of a typical RNA-seq experiment. (A) RNA is isolated from the sample and reverse transcribed to cDNA. Then it is fragmented into short fragments (25–450 bp), adapter ligated and sequenced. Then, the reads are either aligned to a reference genome or de novo assembled if there are no genomic data available. (B) Modern high-throughput sequencing technologies have allowed researches worldwide to assess many comparative biology hypotheses. (C) Some examples of the use of protemics for the study of chemical signals.

In this tutorial review we will only give a brief description of its analytical procedures, which are summarized in Fig. 5A. However, for more methodological and practical considerations of the different steps of RNA-seq, a list of further references is provided in the ESI. The articles included in this list represent only a few examples of the bibliography available and aims to provide a useful guide, especially for young researchers that would like to become familiar with RNA-seq procedures.

The first step is to construct a RNA-seq library, whose construction depends on the protocol employed and on the organism of interest. In eukaryotes, the most traditional procedure includes the following: enrichment of mRNA using magnetic oligo (dT) beads, RNA sheared, and reversed transcription to cDNA. In prokaryotes, mRNA enrichment can be accomplished by removal of polyadenylated RNA as well as ribosomal RNA (both eukaryotic and bacterial). An example of a method to perform this removal is ‘cappable-seq’, which allows direct enrichment for the 5′ end of primary transcripts.14 Alternatively to reversed transcription, a single molecule of mRNA can be sequenced without prior conversion to cDNA, which is particularly suitable for short or degraded RNA samples.

The cDNA (or eventually the single-molecule mRNA) is sequenced on a high-throughput platform, such as Illumina, which analyses millions of short DNA fragments (or reads) of 25 to 450 base pairs during one sequencing run. The reads are then aligned to a reference transcriptome – or, more often, to a reference genome – or assembled de novo when genomes are not available. By using third-generation sequencing technologies, transcriptome assembly would probably be avoided in the future. There are two commercially available third-generation sequencing services, Pacific Biosciencies (PacBio) and Oxford Nanopore Technologies (ONT).15 Both are based in single molecule sequence platforms that allow longer sequencing read lengths than those of Illumina. However, both have similar drawbacks: higher sequencing error rate and lower throughput. Because long reads improve de novo assembly, combining third-generation sequencing technologies with Illumina sequencing may facilitate transcriptomics studies in non-model species.

After assembly, the gene expression level is estimated by mapping the reads to a specific gene. Subsequently, the read profile is normalized (to avoid bias in read quantification), and differential expression (DE) of genes are detected and compared by statistical tests. Alternatively, DE can be examined by real-time quantitative reverse transcription PCR (Real-Time qRT-PCR).16 This method is more accurate and easier to perform than transcriptome comparisons because it amplifies a specific gene, but it needs to have prior information about the transcript (or gene) sequence.

Transcriptomics is a very valuable tool with a wide range of applications in studies of chemical signals. For instance, it can reveal how gene expression varies between sex, or along the ontogeny, or even trace the evolutionary history of a specific class of molecules, such as protein pheromone and protein soluble receptor, through phylogenetic analyses (Fig. 5B). For example, using RNA-seq, Novo and co-workers (2013)17 determined that Attractin and Temptin, two invertebrate peptide sex pheromones, were present in all tissue samples of two species of earthworms (Hormogaster samnitica and H. elisae). However, they found that Temptin, which contained multiple paralogs, was slightly overexpressed in the digestive tissue, which led them to suggest that this pheromone could be secreted with the casts. In a different study, in combination with phylogenetic analysis, tissue-specific RNA-seq provided new information about the function and evolution of odorant-binding and chemosensory proteins in ants.18 The authors indicated that these two families of small soluble proteins are highly conserved across ants, and they suggested that these proteins are not likely involved in recognition of species-specific signals. An additional example of application of transcriptome analyses in chemical signal studies is depicted in Fig. 5E.19

3.2 Proteomics and peptidomics

The proteome of an organism encompasses the whole ensemble of expressed proteins at a particular time. In contrast to the genome, the proteome is dynamic and spatiotemporally diverse within an organism. It outnumbers the genes by means of different posttranscriptional mechanisms, as exemplified by splicing and truncation, and several hundreds of different posttranslational modifications. Besides, it is responsive to different physiological and environmental stimuli. Altogether, this plasticity turns the proteome into one of the largest information spaces of chemical signals in all levels of communication. A list of further references of methodological procedures and protocol reviews in proteomics is included as ESI. Since the aim of this tutorial is to provide a methodological overview, these references may aid initial researchers to get a more comprehensive view of the field.

Proteins and peptides may be involved in different ways in the whole process; they either act (1) as pheromones/allelomones themselves (e.g., major urinary proteins in mice [MUPs]),20,21 (2) as carriers of the signalling compounds (e.g., natural ligands of hamster aphrodisin),22 and (3) as signal receptors. The importance and widespread nature of proteins in communication as well as the broad availability of modern high-throughput analytical methodologies have turned proteomics into a sine qua non in modern chemical communication studies. Traditionally, proteomic research encompasses all experimental analysis of proteins, including (1) expression proteomics, which aims at large-scale protein identification and description of differential expression patterns and quantification; (2) functional proteomics, which studies protein functions and interactions; and (3) structural proteomics, whose scope extends to the study of the three-dimensional arrangement of amino acid residues and atoms in proteins.

Common longstanding techniques for the study of expression proteomics have been recently aided by the constantly increasing availability of genomic, transcriptomic, and proteomic databases. Typical workflows aimed at protein identification in a particular organelle, cell, tissue, or organism, are depicted in Fig. 6. At the root of every methodology there is a common protein enrichment and isolation step from the target tissue, which reduces the biochemical complexity of the sample and can proceed by the depletion of the highly abundant proteins (Fig. 6A).23 The protocols used will depend largely on the biological question being answered, and proteins or peptides can undergo a fractionation step dependent of the desired depth of coverage of the proteome as well as on previous hypothesis regarding on the nature of the proteins involved in chemical signalling, their physicochemical properties and size.

image file: c7cs00368d-f6.tif
Fig. 6 Overview of a common proteomic workflow used for protein identification. (A) Once the biological sample of interest is selected and isolated the first step involves a protein enrichment protocol. Mass spectrometry (MS) driven proteomics then proceed by (B) top-down experiments that measure whole proteins after a separation step, or (C and D) bottom-up experiments that begin with an enzymatic or chemical proteolytic step followed by the analysis of peptides. Protein digestion can be performed to the whole protein mixture followed by a chromatographic separation step of the peptides (C) or to isolated proteins (commonly by 2D Polyacrylamide gel electrophoresis (PAGE)) (D). (E) Peptides are analyzed by MS and/or MS/MS and compared to theoretical data from in silico digestions of protein sequences available in databases. (F) If there are no available sequences from the studied organism (neither from close species), peptide usually need to be de novo sequenced and searched against databases for homology driven identifications. (G) Some examples of the use of proteomics for the study of chemical signals.

Techniques that utilize mass spectrometry are then conducted for protein identification. Different strategies have been developed to approach this goal. First, top-down techniques analyse whole proteins by MS. In general, intact proteins undergo a separation step that usually consists of one liquid chromatography variant (Fig. 6B).24 Afterwards, fragments of the protein are generated by collisional activation, also known as collision-induced dissociation (CID), inside the mass spectrometer. The combination of modern multi-stage mass spectrometry (MSn) and Tandem mass spectrometry (MS/MS) analysis by applying different ionization sources (e.g., Electrospray ionization (ESI), Matrix-assisted laser desorption ionization (MALDI)), analysers (e.g., Tandem time-of-flight (TOF/TOF), Orbitrap) and fragmentation excitation techniques (CID, Electron-capture dissociation (ECD), and Higher-energy collisional dissociation (HCD)) allow for accurate intact protein mass determination, N and C terminus sequencing, protein isoform identification, and posttranslational modifications analysis (Fig. 6B).24,25

Bottom-up techniques are by far the most widespread among proteomic studies and commence with the proteolytic digestion of a protein mixture (Fig. 6C) or of isolated proteins (usually by bidimensional denaturing polyacrylamide gel electrophoresis, Fig. 6D). Peptide mixtures obtained in each case can be fractioned by chromatography before the injection in the mass spectrometer, or they can be considered for MALDI experiments. MS analyses (Fig. 6E top) result in peak lists that correspond to proteolytic peptide masses, which are the input to a peptide mass fingerprint technique (PMF, Fig. 6E). In addition, MS/MS experiments add substantial data that provide fragment peptide ions, which encompass more insightful information on the amino acid sequence of the peptides. Both high resolution mass spectrometry (HR-MS) and tandem MS data can be compared to proteins digested in silico that are present in available databases by means of different search engines such as MASCOT and SEQUEST.

Protein identification relies largely on available sequence information (DNA, RNA, proteins), and hence proteomics in many non-model organisms depends on other bioinformatics tools to de novo predict peptide sequences (e.g., PEAKS, PepNovo, pNovo) followed by homology driven protein identification (e.g., msBLAST, FASTS) (Fig. 6F).26 This last approach is especially helpful for the study of peptidic semiochemicals present in scent marks or extracellular fluids. Wise combinations of proteases followed by de novo sequencing pipelines have proven proficient for the full sequencing and identification of small proteins – even in non-model organisms.27 When semiochemicals can be traced to a particular tissue origin, new technologies like Next Generation Sequencing (NGS) (Section 3.1) are very useful and can easily broaden the accessibility to nucleotidic sequence information, making high-throughput proteomics possible even in non-model organisms. The combination of transcriptome data with the different proteomic workflows, as outlined above, renders protein identification as a quite straightforward task.

Quantification strategies based on high resolution mass spectrometry have evolved profoundly in the last years and represent a vast improvement over the binary approaches that rely solely on proteins identification. Different concentrations at the tissular, sexual or ontogenetic levels provide useful grounds to explore putative sources of protein chemical signals that can later be tested by experimental research. For instance, discovery of a highly expressed protein in an organ or tissue known to be involved in social interactions – relative to the expression in other tissues – may provide fruitful avenues for further behavioural research. In fact, proteomic quantification strategies have proven valuable in chemical signalling studies such as those involved in female post-mating behaviour induction by seminal fluid proteins (Sfps) in Drosophila species.28 Several methodologies for quantification exist and span labeled and label-free techniques, both aiming at relative and absolute determinations. The current advantages and limitations of each technique have been deeply investigated and a review of updated literature in the field is presented in the ESI. Development in proteomics is likely to discover many proteins and peptide semiochemicals across a wide range of taxa, as well as thoroughly contribute to signalling pathways elucidation. Three examples of application of proteomic techniques in the study of chemical signals are depicted in Fig. 6G.28–30

3.3 Metabolomics

Chemical signals were traditionally studied by bioassay-guided isolation. This procedure has showed some limitations, such as the need for a large amount of samples and many steps of separation and purification. Besides, is not unusual that the isolated molecules do not present the biological effect, once they only act in synergy with others molecules occurring in the original complex mixture. Recent improvements in equipment and technologies have allowed the introduction of the metabolomic concept in chemical ecology investigations, providing tools for analysing important chemical mediators even at low concentration and in complex mixtures.31 This topic aims to present, in a brief manner, some metabolomic strategies for the study of chemical signals in an ecological context. Some recent technological improvements and traditional analytical tools will be highlighted and their importance discussed in the context of this paper. In addition, some review articles, including seminal papers in MS- and NMR-bases metabolomics were added as ESI. This material may help initial investigators to complement the content of this topic with more detailed methodologies.

Although the word ‘metabolome’ was already used in 1998, the term ‘metabolomics’ was introduced only in 2001. It refers to the process of providing a complete set of non-genetically encoded substrates, intermediates, and products of metabolic pathways (<1500 Da) associated to a given biological system, such as cell, tissue, organism, in a particular physiological state.32 The exploration of the metabolic composition through metabolomics has led to new insights in diverse biological processes, allowing the screening of relevant mediator metabolites evolved in organisms interaction triggering the elucidation of regulative principles and pathways. The metabolomic concept has been expanded so that recently it has been applied to obtain spatial information considering three-dimensional content. While there is a need to clarify what is meant by 3-dimensional content, this approach will certainly revolutionize chemical signalling investigations.33

Metabolomics has also been widely used by chemical ecologists as “comparative metabolomic” approach, which enables the easy visualization of molecules upregulated in biological interactions. In this type of approach, it is first necessary to obtain the metabolic profile of each single organism; and then examine the metabolic profile when the organisms of interest are put to interact. The metabolites candidates as chemical signalling could be highlighted in the metabolic profile, directing and accelerating their identification. Specific chemical groups, or even the global metabolome could be monitored by comparative metabolomics. Once, by definition, metabolome includes all small molecules produced by an organism, which means large and complex data sets, it becomes necessary a combination of sophisticated equipment and elaborated strategies to analyse the higher number of metabolites as possible. Besides, the strategy may uncover biosynthetic pathways that are affected by the presence of other organisms, which helps to comprehend the interactions in different environments.31

For all metabolomic studies, it should be noted that each step of the whole procedure must be carefully defined – from the choice of the solvent used for extracting the metabolites in the biological sample (Fig. 7A) to the detector used and its parameters. First of all, because the sample collection and preparation are crucial steps for metabolomics, the researchers need to define which kinds of chemical class want to evaluate. Organic solvents are usually employed to extract compounds from the environment, and also to partition crude extracts. In general, in order to select more polar compounds, it should be used solvents higher in polarity, such as methanol, ethanol and water; likewise, in order to select non-polar compounds, it should be employed solvents lower in polarity, such as hexane and ether. It is important for young users to keep in mind that there is no universal solvent for such great diversity of metabolites, thus the combination of different strategies is highly recommended to clear the picture of the phenomena.

image file: c7cs00368d-f7.tif
Fig. 7 Overview of a generalized metabolomic protocol. (A) Sample preparation should be carefully defined. This step is crucial to select the kind of compounds to be evaluated (according to their solubility, e.g. partition of compounds in the system water/dichlorometane). (B) Routinely, a common workflow begins with one or more separation steps (e.g. liquid chromatography) designed to increase the number of detectable molecules, including those at low concentration. (C) The most common detection techniques are mass spectrometry (MS), nuclear magnetic resonance (NMR), and ultraviolet-visible (UV-VIS) and infrared (IR) spectroscopies. (D) Large data sets integrating the combined spectroscopic/spectrometric data can be analyzed by multivariate statistical analysis and depicted graphically for better visualization. Different data bases can be used for data annotation (e.g. the public platform from the Global Natural Product Social Molecular Networking (GNPS), which provides visualization by molecular networking). (E) Three examples of metabolomics applied to chemical signals investigation.

Considering the great diversity of chemical structures many analytical tools have been developed looking to improve the range of metabolomic studies. Considering that natural products comprehend complex mixtures, separation techniques (Fig. 7B) are almost indispensable in metabolomics because they increase the number of detected molecules, which promotes the analysis of compounds present at low concentrations. Gas chromatography (GC) and liquid chromatography (LC) are the separation techniques that are most widely applied in metabolomics. In particular, LC is preferred because of its flexibility to be coupled to many detection techniques (Fig. 7C), such as MS (Mass Spectrometry), NMR (Nuclear Magnetic Resonance), UV (ultraviolet), and Infrared (IR). Additionally, capillary electrophoresis (CE) is also used in some metabolomic studies, particularly for charged molecules.

Several separation approaches have been recently developed and introduced to metabolomics studies looking to improve the separation steps and to allow the analysis of lower amounts of sample. Ultra-high pressure liquid chromatography (UHPLC) can increase the chromatographic resolution owing to the smaller particle size of column stationary phase. Another important advance are microfluidic devices. They are an emerging multi-function and miniaturized separation technique. The interest in microfluidics has recently grown because the miniaturization of the system helps with sample preparation, separation, and detection of small quantities of sample. There are already some commercially available devices. A key example is the investigations of the chemical signals involved in sea urchins sperm's ability to reach a conspecific egg have been studied for over a century. This event has been attributed only to male aptitude due to the lack of further evidence. The use of a microfluidic chip coupled to ESI-MS allowed Hussain and co-workers34 to prove that the individual fertilization success is mediated by sperm chemotaxis, when the chemoattractant were removed from the gametic environment, the fertilization success decreased.

As described above, chromatographic techniques are very important in metabolomics by providing more detailed information of the samples. One of the key aspects of chromatography are the ability to be coupled to different detection techniques. Consequently, the choice of the detection technique is extremely important, since, in a way, it will limit the molecule group that can be analysed according to their physicochemical properties. Mass spectrometry (MS) is a detection technique that is most commonly used in metabolomics. Recent advances and developments have led MS to be more selective and sensitive overcoming detection limits. Besides, this technique can provide the molecular formula from high resolution (HR-MS) analysis and important structural information from tandem MS (MS/MS) experiments.35 This review also aims to call attention for important steps in the structure elucidation of chemical signals that were not directly identified by any MS and MS/MS database or molecular networking (Fig. 7A and B).

Overall, MS data acquisition can be split into two main procedures: targeted and untargeted analyses. Targeted analyses focus on known molecules or specific ones, whereas untargeted approaches provide data from all constituents, generating a fingerprint. Untargeted methodologies are the most used in metabolomics studies, including the investigation of molecules upregulated in interaction situations. Targeted and untargeted analyses usually rely on combination of two or more Mass Spectrometry instrumentations in order to span metabolites with a wide range of polarities. As an example, in the last five years our group seeks to understand how geographic distribution and taxonomy can influence the interaction of macro- and microorganisms associated with soft coral at the Brazilian coast (unpublished data). A much more inclusive perspective of the interaction is achieved when GC-MS, LC-DAD-IT, LC-DAD-TOF and MALDI-TOF/TOF analysis are combined. This example has the purpose to call the students attention for the importance to correlate the analytical platforms planning with the hypothesis and experimental design as suggest in Fig. 3.

It is worth noticing, in particular for new users, that in untargeted metabolomics LC-MS/MS experiments collect more information than we can annotate.36 As recently suggested by Aksenov and co-workers,36 on average only 2% to 10%, depending on the matrix complexity, can be annotated. Under this situation, unidentified compounds must be isolated for classical protocols for structural elucidation (applying UV, IR, NMR and MS; Fig. 7B) or the molecule can be elucidated also by MS/MS based on reasonable fragmentation mechanisms proposed (based on organic chemistry principles).35 Both data sets are considered essential to assist with the identification and characterization of the chemical structures. In addition, synthesis or semi-synthesis can be useful to confirm the proposed structure, and to afford large amounts of sample for biological assays.

New MS strategies for metabolomics studies such as MS-Imaging (MSI) and single-cell still in development and present exciting perspectives, but they are not yet commonly used in studies of chemical signals. MS-Imaging is an MS approach that has been increasingly applied to acquire information regarding the spatial distribution as well as the chemical profile, which adds more reliability to conclusions regarding interactions. Historically, MSI was firstly developed using secondary ion mass spectrometry (SIMS), followed by MALDI-MSI, LDI-MSI and DESI-MSI. Such techniques allow the analysis using none or few sample preparation steps, which make them appropriate for use directly in natural environments. Some known applications of MSI include the investigations of microbial interactions due to its capability of direct analysis of the culture media.33 Nonetheless, results derived from direct analysis should be carefully interpreted since some important chemical information may be hidden due to ionic suppression effect. In order to overcome this drawback, some authors have developed LC-MS based cartography (3D cartography). In 3D cartography several samples with a defined spatial position, such as one pixel, are analysed by LC-MS in a way to represent the 3D specific environment. The premise of all of these techniques is to use MS as a molecular microscope, providing metabolic information from a single-cell to its interaction with the environment. The information generated by its application may provide invaluable knowledge about several ecological functions, including symbioses and chemical defences.33

Up to this point we have presented how different metabolomic spectrometric methodologies can be used to evaluate multicellular systems providing an average at the molecular level. However, there is a growing interest from different disciplines in the development of single-cell metabolomics, which allow the detection of complete sets of molecules in a unique cell, considering proteins, peptides, and small organic molecules. The analysis of an individual cell might highlight important information about the molecular behaviour, which could be omitted in multicellular examinations. One of the biggest challenges of this technique is the cell isolation.37 Although the single-cell approach is not yet widely applied in chemical ecology studies; we believe that this MS-based metabolomic methodology may provide valuable information for the study of molecular interactions in the future. For example, it may help to improve the understanding of how cells communicate with each other, or to examine symbiotic relationship.

Besides the spectrometric tools previously introduced, NMR is a spectroscopy detection technique broadly applied in metabolomics studies. Contrary to MS, NMR is a non-destructive method, which allows the recovery of the sample and makes it very advantageous for chemical signalling studies. The strategy most used for metabolomics applying NMR is the acquisition of metabolite fingerprint (1H NMR) that allows an identification of the main chemical classes present in a natural crude extract. However, the interpretation may be more complicated, especially in those cases with heavy peak overlap. Despite 1D NMR (1H and 13C) being commonly used for metabolomics, 2D NMR (especially multinuclear NMR) provides details of the tridimensional arrangement. The combination of chemical shifts, peak intensity (number of protons) and spin–spin coupling can generate invaluable structural information, which are fundamental to determine structures of unknowns.38

The major disadvantage to be considered for using NMR in metabolomics is its intrinsic relatively low sensitivity, especially when compared to MS. Since as mentioned above, chemical signals are present at minute amounts in the sample, this limits its application in ecological interactions studies. Although the amount of sample is not a major problem to study interactions between different microorganisms, plant-microorganism, or plant–insect, it is extremely relevant in experiments involving animals. In particular, because lower amount of sample means lower number of sacrifices, which would help solving ethical problems related to the use of animals in research. For this kind of study, LC-MS/MS or GC-MS are highly preferred. However, when the chemical signal is an unknown metabolite, NMR should be applied for its identification. There are two tools that can be complemented to overcome NMR lack of sensitivity; cryogenic probe and LC-SPE-NMR. The first, drastically enhances the NMR sensitivity, and consequently decreased the amount of sample required for analysis. In the case of LC-SPE-NMR, the chromatographic peaks are automatically trapped on SPE cartridges, where the compounds can be concentrated by multiple injections in the LC method. The compound of interest can then be eluted with deuterated solvents for direct transfer to the NMR flow cell.38

Considering previous comments, it is possible to note that metabolomic approaches generate vast data sets, which present big challenges to the process of analysis and visualization. Several steps of computational processing are needed (Fig. 4), including pre-processing and statistical modelling methods (Fig. 7D), such as principal component analysis (PCA), hierarchical cluster analysis (HCA), partial least squares (PLS), and its orthogonal variant (OPLS). More recently, the development of molecular networking, allows forming molecules clusters that share certain chemical properties based on similarities in their molecular ion mass and fragmentation pattern. This emerging platform has eased the treatment and visualization of large MS/MS data sets, and has shown to be highly attractive in ecological studies. Two of the main reasons are because it enables the analysis of a large amount of data and because it allows the integration with other omics.36Fig. 7E presents three studies of cases employing different metabolomics tools to investigate ecological interactions.39–41

At last, after formulating the hypothesis, planning the experiments, collecting and analysing the data (Fig. 2), the identification of the molecules responsible for the interaction is a crucial step to reach a conclusion. Without that, the work will be incomplete. In this sense, one of the biggest challenges of the metabolomic field remains the annotation process, owing to the necessity of more complete spectral libraries. The majority of the compounds detected in metabolomics (∼98%) cannot be identified and were recently named ‘dark matter’.36 There are several libraries that allow search for chemical structures; however, most of them are only available commercially, such as NIST, Wiley, Metlin, WEIZMASS, whereas only a few are publicly accessible, such as GNPS and Massbank. Besides, there are many chemical spectral data published in scientific articles that are not deposited in databases. That being said, it is clear that metabolomics urgently needs a solution for the large quantities of dark matter as well as for those molecules that are not in repositories.

4. Omics for the analysis of host–symbiont associations

The biological sciences should draw a clear distinction between those terms associated to holobiont and others that do not describe host–symbiont associations. For instance, the term metagenome refers to the sum of genetic information from an environmental sample, whereas the term hologenome encompasses the collection of genetic content in a holobiont. Making a distinction between both is important in the context of chemical signals studies, because the analyses of environmental samples obviate the fundamentals of symbiosis, which is the premise of the holobiont concept.42 The same reasoning can also be applied to other omics techniques that aim to study hostsymbiont associations, as exemplified in holoproteomics and holometabolomics.

Although microorganisms were traditionally related to food cycling and pathogenesis, their influence on animal and plant interactions has been recognized several decades ago.3,5,43 However, such interactions were only possible to be examined from microorganisms that were able to grow in culture in the laboratory. The advent of next generation sequencing (NGS), and most recently third-generation sequencing techniques, has overcome this limitation by allowing culture-independent surveys with ease and at relatively low cost.44 The application of these technologies in microbial ecology is deeply changing the view that different scientific disciplines had on the biosphere in at least two ways. First, it is unveiling an unimaginable diversity of microorganisms including completely new large taxonomic groups. Second, it is showing the importance that associated microbes from different organ systems, such as gut and skin, have in the life of their host, including effects at physiological and behavioural levels.

Two strategies can be employed in hologenomics studies: marker gene (mostly 18s rRNA and 16s rRNA genes) and shotgun sequencing.44 Both differ in the complexity of the data analysis and in the type of information produced. In comparison, analysis of marker gene is relatively much more simple than that of host-associated total microbiomes, which present several challenges in terms of accuracy and computational demand. In terms of the information they produce, marker genes are frequently used to characterize the taxonomic composition and phylogenetic diversity of the microbial community, whereas shotgun sequencing can produce detailed metabolic and functional profiles. Mainly because of differences in the cost of sequencing and analytical demands, studies of marker gene are routinely performed for large samples, but studies employing shotgun sequencing are less common.

Some computational approaches have been proposed to predict the functional composition of a hologenome based on marker gene data and reference genome database.44 Although this type of approach may provide functional insights by only using 16s data, its application largely depends on the existence of reference genomes, which may be very limited for most non-model species in which host–symbiont associations are unknown. In a near future, the advent of more cost-effective shotgun sequencing analysis will largely facilitate functional annotation for a broader range of host organisms.

Culture-independent analyses can allow answer to ecological questions such as how microorganisms can influence host behaviour. In the 1970s, some authors found that scent glands of some mammal species contained bacteria that were odor-producers; they subsequently proposed the fermentation hypothesis: bacteria ferment nutrient-rich substrates in the glands and generate odorous metabolites that might be used by their hosts for intraspecific communication.45 In 2013, Theis and co-workers43 examined the diversity of the bacterial community in the scent glands of wild spotted and striped hyenas using NGS and classified the operational taxonomic units (OTU) using ribosomal databases. They also analysed the volatile profile of the odorous secretion through gas chromatography-mass spectrometry (GC-MS). Although the study does not establish causal links between the microbes and odors, it demonstrates a strong correlation between the bacterial community and volatile profiles of spotted hyena, supporting the fermentation hypothesis.

The combination of hologenomic analysis with other omics, such as holoproteomics and holometabolomics, will open research perspectives that were unimaginable a few years ago. An integrative omics approach for the analyses of host–symbiont associations can provide invaluable details of how these interactions occur and how they are regulated (at genomic and expression levels). Information available from the few examples in the literature (considering the vast diversity of life on earth) can become reference works for other taxa in which information is lacking. For instance, they might be applied in reforestation and pest control programs, in projects for increasing crop productivity, and even they might provide important information about the evolution of eusociality in insects and mammals.

5. Integrating information

In the previous topics, isolated examples of database and molecular networks have been presented and discussed, but it is important to reinforce that an initial barrier to data integration is the immense advance observed in instrumentation and methodologies applied to all omic disciplines. A single omic analysis requires the continuous evolution of informatics tools, including combination of statistical and computational methods, to evaluate large data sets. Databases compile a large amount of computational information and provide an enormous support for the identification of small and large biomolecules in omics investigations.36 NIST can be considered one of the pioneer databases for small-molecule identification, but there are numerous other databases. For instance; dictionary of natural products, collection of marine natural products (MarinLit) and the Database of pheromones and semiochemicals (Pherobase) (metabolite databases); the Kyoto Encyclopedia of Genes and Genomes (KEGG) (for genomics/transcriptomics); ArrayExpress (functional genomic); EMBL (European Bioinformatics Institute) (genomics, proteomics); and Uniprot and Proteomics Identifications Database (PRIDE) (proteomics). In addition, there are several databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis (see ESI).

A comprehensive understanding of all different aspects of chemical signals – from their production to their sensing and behavioural responses – requires an integration of the vast amount of biological data generated from different omics disciplines.46 The terminology used to refer to such integration is surprisingly diverse, for example, trans-omics, multi-omics, integratomics. Whatever term is chosen, the aim of combining omics strategies is to solve biological questions by interpreting these multiple disparate datasets. There are some examples of combined strategies in chemical signal studies that include two or more omics techniques. For instance, Poulson-Ellestad and co-workers (2014)47 used metabolomics and proteomics to characterize the physiological impacts of a dinoflagellate alga from Gulf of Mexico (Karenia brevis) allelopathy on two competing phytoplankton. Thalassiosira pseudonana, which was isolated from the North Atlantic Ocean where K. brevis does not occur, showed sensitivity with a completely altered metabolism. Proteomic analysis showed an increase in several metabolism enzymes, including glycolysis enzymes, such as pyruvate kinase. Metabolomic data showed enhanced acetate concentration as well as a decrease in the concentration of some metabolites, such as maltopentose, cellopentose, and amylopectin. Together, the data infer an allelopathy effect in this alga, enhancing glycolysis, as well as the synthesis and β-oxidation of fatty acids.

Despite few examples that combine in a single study different omics technologies, integrating information from multiple disparate datasets present several challenges, which can be summarized in the question: how can we integrate individual repositories, each with their own data model, metadata representation, and identifiers, and each with their own variable amount of annotations? In order to help solve this question, in 2016, a guideline was launched that aimed to make data “findable, accessible, interoperable and re-usable” – FAIR. More recently, the Omics Discovery Index was released, which is an open source platform that is based on the principles of FAIR and integrates genomics, transcriptomics, proteomics, and metabolomics.48 To date, OmicsDI comprises around 81000 datasets from 11 different repositories (e.g., genomics: European Genome-Phenome Archive (EGA); transcriptomics: ArrayExpress and Expression Atlas; proteomics: PRIDE; and metabolomics: GNPS).

One of the best advantages of OmicsDI is its capability to integrate databases into one framework and web interface by finding and linking existing data sets. Although much of the research effort for integrating omics disciplines is concentrated in the area of public health, this tool may be very useful in chemical signal studies. With an increase in the number of publicly accessible omics data, OmicsDI may provide the basis for the emergence of more comprehensive studies in chemical communication. For instance, consider a hypothetical study that seeks to identify volatile female pheromones in a wasp species through metabolomics (Fig. 8). The researchers might use the same platform and combine their results with available information of transcriptomics, proteomics, and genomics from the same – or even different – wasp species. With all of this data, the researchers would be able to not only characterize the molecule acting as the pheromone, but they would also find that the biosynthetic pathway is typical of bacteria, which would lead them to propose a symbiotic interaction for the secretion of this pheromone. They would also be able to explore the specificity of the pheromone receptor, and the factors that regulate gene expression in the wasp and in the symbiotic bacteria.

image file: c7cs00368d-f8.tif
Fig. 8 A hypothetical intraspecific interaction mediated for chemical signals in a wasp species. The emitter can be considered an emitter complex formed by the wasp host and symbiont bacteria. Integrative omics approaches that include genomics, transcriptomics, proteomics and metabolomics, and also their holomics analogues (e.g., hologenomics), can provide a comprehensive view of the communication process under study.

It seems natural to think that integration of databases from different omics would be the next step in chemical signals studies. Publicly accessible information along with a multi-omics integrated platform such as OmicDI can allow to propose several hypotheses that can be tested using a proper experimental design. As Cantley and Clardy (2015)49 stated: “the task of annotating the incredibly complex but important set of chemical interactions that literally make our life on Earth possible now begins”.

6. Biological and evolutionary considerations

In 1973, Theodosius Dobzhansky50 stated that “nothing in biology makes sense except in the light of evolution”. The evolutionary history of species embraces their interdependence and their relationship with the environment. This interdependence can be observed in the enormous number of interactions occurring in natural environments (Fig. 1). Several of these interactions are mediated by chemical signals, whereas others rely on signals from different sensory modalities. In this section, the importance of considering interactions at multiple taxonomic levels, the participation of different signal modalities, and the role of evolutionary perspectives will be addressed in relation to chemical signal studies.

6.1 Multilevel interactions and multimodal signalling

Chemical signals may serve more than one function. In a natural world, several interactions occur at the same time. While a male frog is calling and secreting volatile compounds to the environment to attract a female, a snake predator may sense this compound while hunting the frog. In this example, the same compound may be defined as a pheromone (in frog mating attraction) or a kairomone (in frog-snake interaction).

There is some agreement in the terminology used to define interactions (Table 1). It basically takes into account the costs and benefits of the interaction for both participants. However, because a single chemical signal may mediate many interactions, the costs and benefits can only be understood in a community context under the perspective of an infochemical web.51 For instance, this web may reveal a signalling connectivity within and across taxa levels, such as microbe–microbe, microbe–plant, microbe–animal, plant–plant, and animal–animal.

Despite that chemical signals are the most widespread form of communication in living organisms, communication in other sensory modalities, namely acoustic, visual, tactile and electric, is very important in several animal species. In many cases, these other sensory modalities can represent the main form of conveying information, for example calls in frogs and toads. Researches have traditionally focused on the analysis of individual signals in isolation. However, there is an increasing interest in the complexity of animal displays involving a combination of more than one signal in different sensory modalities, which is defined as multimodal communication.9 Although the analysis of multimodal communication possesses several experimental challenges, a multimodal framework is essential to interpret the function of chemical signals within the biological context of each species. For instance, mating in the fluorescent tree frog Boana punctata may rely on visual, acoustic, and chemical signals (Fig. 9).52 Because mating success B. punctata may be dependent on the interaction of these three signal modalities, examining chemical signals in isolation would limit our comprehension about the reproductive biology in this specie.

image file: c7cs00368d-f9.tif
Fig. 9 Multimodal signaling in the fluorescent frog Boana punctata. A – Acoustic signaling (males emit seven different types of calls); C – chemical signaling (males have sexually dimorphic skin glands with protein content in flanks and gular regions); V – visual signaling mediated by fluorescent compounds –Hyloin-G and Hyloin-L– from glands and lymph.

6.2 Phylogenetic perspective in chemical signalling

The evolutionary history of organisms – or more often, a specific taxonomic group – can be estimated through phylogeny. Phylogenetic analyses encompass a broad range of datasets, including molecular, chemical and morphological, which allow the patterns examination and evolution processes.53 While reviewing the scientific literature on chemical signals, we found very interesting examples that examined the evolutionary history of particular sets of molecules, such as peptides pheromones in earthworms17 and olfactory receptors, such as odorant binding proteins in ants.18 For this reason, we thought it was important to introduce a brief overview of this type of analysis and its potential application in chemical signals studies.

Three methods are commonly used in phylogenetic analysis: (1) maximum parsimony; (2) maximum likelihood; and (3) Bayesian inference. The last two are statistical methods based on models of molecular evolution. In addition, there are examples of studies (in particular those of DNA barcoding) that use neighbor joining (NJ). However, NJ only reflects the degree of similarity between the specimens examined using distance methods. Thus, NJ is not a phylogenetic method and does not reflect evolutionary relationships. The phylogenetic methods produce phylogenetic trees whose topology can vary largely depending on different phenomena that may affect the analysis; for instance, high number of homoplastic characters and differential rates of DNA evolution. The choice of the method is not trivial and it is influenced by philosophical and methodological considerations.53,54 Some researchers generally choose one of the methods, whereas others use a combination of two or three. In those cases in which two or three methods are compared, the topologies may be similar, reflecting a robust phylogenetic hypothesis, or they may be different, in which case the results should be discussed cautiously. Examples of open-source software that perform phylogenetic analyses include MEGA and TNT.

The application of DNA sequence technologies in phylogenetic analyses has substantially changed the understanding of the evolutionary relationships among several taxonomic groups during the past few decades. These technologies helped to recover well-sampled and supported phylogenetic trees that can serve as a reference phylogenetic hypothesis for ancestral-character-state reconstructions. If this character is a certain molecule (and/or receptor), the optimization can be used to predict the occurrence of chemical signals (and/or receptor) in species that have not yet been investigated or in species that have not yet been carefully examined.

7. Conclusions and perspectives

Chemical signals might be regarded as the primary way of conducting information in the biosphere, mediating a vast diversity of ecological interactions across several taxonomic groups from different environments and microenvironments. As a result of technological advances in the fields of chemistry and biology, and of emerging ecological concepts, the molecules involved in such interactions can be identified even from small amounts of samples, and at very low concentrations. Different omics technologies, mainly transcriptomics, proteomics and metabolomics have become broadly applied research tools in chemical signal studies to address distinct ecological questions. Importantly, the use of combined omics strategies for ecological interactions presents exciting challenges; for instance, integrating different omics data sets and making them freely available to search and download. In the future, we expect to see possible combinations of multi-omics strategy designs aiming at combining ecological concepts under an evolutionary perspective; and integrative analytical methods to unveil the function, structure, and regulatory pathways of different chemical signals.

Conflicts of interest

There are no conflicts to declare.


We thank Alan C. Pilon, Mariana Lyra, Martin Pereyra, Felipe Antunes Calil and Eduardo A. Silva-Junior for comments and discussion, and Gustavo R. Carrizo for the illustrations in Fig. 9. The authors thank Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) for project grants and fellowship supports (N. P. L. 2014/50265-3, F. C. N. 2014/12343-2 and A. E. B. 2014/20915-6, 2013/50741-7), CONICET doctoral fellowship (M. C. V.), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and Conselho Nacional de Desenvolvimento Cientifico e Tecnológico (CNPq) for project grants and fellowship supports (D. P. P. 150353/2017-0 and A. B. 150056/2016-8).


  1. O. R. Gottlieb, Micromolecular evolution, systematics and ecology: an essay into a novel botanical discipline, Springer Science & Business Media, 2012 Search PubMed .
  2. J. Meinwald and T. Eisner, Proc. Natl. Acad. Sci. U. S. A., 2008, 105, 4539–4540 CrossRef CAS PubMed .
  3. M. Dicke and W. Takken, Chemical ecology: from gene to ecosystem, Springer Science & Business Media, 2006 Search PubMed .
  4. M. Dicke and I. T. Baldwin, Trends Plant Sci., 2010, 15, 167–175 CrossRef CAS PubMed .
  5. V. Venturi and C. Keel, Trends Plant Sci., 2016, 21, 187–198 CrossRef CAS PubMed .
  6. A. Butenandt, R. Beckmann, D. Stamm and E. Hevker, Z. Naturforsch., B: Anorg. Chem., Org. Chem., Biochem., Biophys., Biol., 1959, 14, 283–284 Search PubMed .
  7. T. D. Wyatt, Pheromones and animal behavior: chemical signals and signatures, Cambridge University Press, 2014 Search PubMed .
  8. M. McFall-Ngai, M. G. Hadfield, T. C. Bosch, H. V. Carey, T. Domazet-Lošo, A. E. Douglas, N. Dubilier, G. Eberl, T. Fukami and S. F. Gilbert, Proc. Natl. Acad. Sci. U. S. A., 2013, 110, 3229–3236 CrossRef CAS PubMed .
  9. E. A. Hebets and D. R. Papaj, Behav. Ecol. Sociobiol., 2005, 57, 197–214 CrossRef .
  10. P. Karlson and M. Lüscher, Nature, 1959, 183, 55–56 CrossRef CAS PubMed .
  11. L. Margulis and R. Fester, Symbiosis as a source of evolutionary innovation: speciation and morphogenesis, Mit Press, 1991 Search PubMed .
  12. Z. Wang, M. Gerstein and M. Snyder, Nat. Rev. Genet., 2009, 10, 57–63 CrossRef CAS PubMed .
  13. J. A. Martin and Z. Wang, Nat. Rev. Genet., 2011, 12, 671–682 CrossRef CAS PubMed .
  14. L. Ettwiller, J. Buswell, E. Yigit and I. Schildkraut, BMC Genomics, 2016, 17, 199 CrossRef PubMed .
  15. C. Bleidorn, Syst. Biodivers, 2016, 14, 1–8 CrossRef .
  16. A. K. White, M. VanInsberghe, I. Petriv, M. Hamidi, D. Sikorski, M. A. Marra, J. Piret, S. Aparicio and C. L. Hansen, Proc. Natl. Acad. Sci. U. S. A., 2011, 108, 13999–14004 CrossRef CAS PubMed .
  17. M. Novo, A. Riesgo, A. Fernández-Guerra and G. Giribet, Mol. Biol. Evol., 2013, mst074 Search PubMed .
  18. S. K. McKenzie, P. R. Oxley and D. J. Kronauer, BMC Genomics, 2014, 15, 718 CrossRef PubMed .
  19. A. J. Berens, J. H. Hunt and A. L. Toth, Mol. Biol. Evol., 2014, 32, 690–703 CrossRef PubMed .
  20. P. Chamero, T. F. Marton, D. W. Logan, K. Flanagan, J. R. Cruz, A. Saghatelian, B. F. Cravatt and L. Stowers, Nature, 2007, 450, 899–902 CrossRef CAS PubMed .
  21. A. W. Kaur, T. Ackels, T.-H. Kuo, A. Cichy, S. Dey, C. Hays, M. Kateri, D. W. Logan, T. F. Marton and M. Spehr, Cell, 2014, 157, 676–688 CrossRef CAS PubMed .
  22. L. Briand, F. Blon, D. Trotier and J.-C. Pernollet, Chem. Senses, 2004, 29, 425–430 CrossRef CAS PubMed .
  23. Y. Zhang, B. R. Fonslow, B. Shan, M.-C. Baek and J. R. Yates III, Chem. Rev., 2013, 113, 2343–2394 CrossRef CAS PubMed .
  24. A. D. Catherman, O. S. Skinner and N. L. Kelleher, Biochem. Biophys. Res. Commun., 2014, 445, 683–693 CrossRef CAS PubMed .
  25. J. C. Tran, L. Zamdborg, D. R. Ahlf, J. E. Lee, A. D. Catherman, K. R. Durbin, J. D. Tipton, A. Vellaichamy, J. F. Kellie and M. Li, Nature, 2011, 480, 254–258 CrossRef CAS PubMed .
  26. A. Shevchenko, C.-M. Valcu and M. Junqueira, J. Proteomics, 2009, 72, 137–144 CrossRef CAS PubMed .
  27. J. Unsworth, G. M. Loxley, A. Davidson, J. L. Hurst, G. Gómez-Baena, N. I. Mundy, R. J. Beynon, E. Zimmermann and U. Radespiel, Sci. Rep., 2017, 7, 42940 CrossRef CAS PubMed .
  28. G. D. Findlay, X. Yi, M. J. Maccoss and W. J. Swanson, PLoS Biol., 2008, 6, e178 Search PubMed .
  29. B. Leroy, G. Toubeau, P. Falmagne and R. Wattiez, Mol. Cell. Proteomics, 2006, 5, 2114–2123 CAS .
  30. A. C. Nelson, C. B. Cunningham, J. S. Ruff and W. K. Potts, J. Evol. Biol., 2015, 28, 1213–1224 CrossRef CAS PubMed .
  31. C. Kuhlisch and G. Pohnert, Nat. Prod. Rep., 2015, 32, 937–955 RSC .
  32. M. Ernst, D. B. Silva, R. R. Silva, R. Z. Vêncio and N. P. Lopes, Nat. Prod. Rep., 2014, 31, 784–806 RSC .
  33. D. Petras, A. K. Jarmusch and P. C. Dorrestein, Curr. Opin. Chem. Biol., 2017, 36, 24–31 CrossRef CAS PubMed .
  34. Y. H. Hussain, J. S. Guasto, R. K. Zimmer, R. Stocker and J. A. Riffell, J. Exp. Biol., 2016, 219, 1458–1466 CrossRef PubMed .
  35. D. P. Demarque, A. E. Crotti, R. Vessecchi, J. L. Lopes and N. P. Lopes, Nat. Prod. Rep., 2016, 33, 432–455 RSC .
  36. A. A. Aksenov, R. da Silva, R. Knight, N. P. Lopes and P. C. Dorrestein, Nat. Rev. Chem., 2017, 1, s41570–s41017 Search PubMed .
  37. R. Zenobi, Science, 2013, 342, 1243259 CrossRef CAS PubMed .
  38. J. L. Wolfender, G. Marti, A. Thomas and S. Bertrand, J. Chromatogr. A, 2015, 1382, 136–164 CrossRef CAS PubMed .
  39. N. E. Fatouros, D. Lucas-Barbosa, B. T. Weldegergis, F. G. Pashalidou, J. J. van Loon, M. Dicke, J. A. Harvey, R. Gols and M. E. Huigens, PLoS One, 2012, 7, e43607 CAS .
  40. W. J. Moree, J. Y. Yang, X. Zhao, W.-T. Liu, M. Aparicio, L. Atencio, J. Ballesteros, J. Sánchez, R. G. Gavilán and M. Gutiérrez, J. Chem. Ecol., 2013, 39, 1045–1054 CrossRef CAS PubMed .
  41. M. Kamio, M. Koyama, N. Hayashihara, K. Hiei, H. Uchida, R. Watanabe, T. Suzuki and H. Nagai, J. Chem. Ecol., 2016, 42, 452–460 CrossRef CAS PubMed .
  42. S. R. Bordenstein and K. R. Theis, PLoS Biol., 2015, 13, e1002226 CrossRef PubMed .
  43. K. R. Theis, A. Venkataraman, J. A. Dycus, K. D. Koonter, E. N. Schmitt-Matzen, A. P. Wagner, K. E. Holekamp and T. M. Schmidt, Proc. Natl. Acad. Sci. U. S. A., 2013, 110, 19832–19837 CrossRef CAS PubMed .
  44. M. G. I. Langille, J. Zaneveld, J. G. Caporaso, D. McDonald, D. Knights, J. A. Reyes, J. C. Clemente, D. E. Burkepile, R. L. V. Thurber, R. Knight, R. G. Beiko and C. Huttenhower, Nat. Biotechnol., 2013, 31, 814–821 CrossRef CAS PubMed .
  45. M. Gorman, D. B. Nedwell and R. M. Smith, J. Zool., 1974, 172, 389–399 CrossRef .
  46. A. Ebrahim, E. Brunk, J. Tan, E. J. O'brien, D. Kim, R. Szubin, J. A. Lerman, A. Lechner, A. Sastry and A. Bordbar, Nat. Commun., 2016, 7, 13091 CrossRef CAS PubMed .
  47. K. L. Poulson-Ellestad, C. M. Jones, J. Roy, M. R. Viant, F. M. Fernández, J. Kubanek and B. L. Nunn, Proc. Natl. Acad. Sci. U. S. A., 2014, 111, 9009–9014 CrossRef CAS PubMed .
  48. Y. Perez-Riverol, M. Bai, F. da Veiga Leprevost, S. Squizzato, Y. M. Park, K. Haug, A. J. Carroll, D. Spalding, J. Paschall and M. Wang, Nat. Biotechnol., 2017, 35, 406–409 CrossRef CAS PubMed .
  49. A. M. Cantley and J. Clardy, Nat. Prod. Rep., 2015, 32, 888–892 RSC .
  50. T. Dobzhansky, Am. Biol. Teach., 1973, 75, 87–91 Search PubMed .
  51. C. Trapnell, A. Roberts, L. Goff, G. Pertea, D. Kim, D. R. Kelley, H. Pimentel, S. L. Salzberg, J. L. Rinn and L. Pachter, Nat. Protoc., 2012, 7, 562–578 CrossRef CAS PubMed .
  52. C. Taboada, A. E. Brunetti, F. N. Pedron, F. C. Neto, D. A. Estrin, S. E. Bari, L. B. Chemes, N. P. Lopes, M. G. Lagorio and J. Faivovich, Proc. Natl. Acad. Sci. U. S. A., 2017, 201701053 Search PubMed .
  53. D. L. Swofford, G. J. Olsen, P. J. Waddell and D. M. Hillis, in Molecular systematics, ed. D. M. Hillis, C. Moritz and B. K. Mable, Sinauer, Sunderland, 2nd edn, 1996, pp. 407–514 Search PubMed .
  54. D. Pol and M. E. Siddall, Cladistics, 2001, 17, 266–281 CrossRef .


Electronic supplementary information (ESI) available. See DOI: 10.1039/c7cs00368d

This journal is © The Royal Society of Chemistry 2018