Chemoinformatics Representation of Chemical Structures – A Milestone for Successful Big Data Modelling in Predictive Toxicology
Within the computational toxicology field, the representation of a chemical structure is considered as a key to predict/retrieve the toxicity information for a substance. Chemoinformatics provides efficient tools to computationally handle the chemical information. This is even more important in a big data era with an increasing amount of information on chemical compounds available, the endeavour to link activity information to chemicals, also across different databases, and the need of unambiguous identification of chemicals and taking into account structural features for modelling. This chapter gives an overview of the different aspects of chemical structure representation used in chemoinformatics. Various techniques for chemical information formalisation are provided, together with the different levels of structure representation starting from 0D (0 dimension) and going to the more complex 3D and 4D as essential for interactions with biomacromolecules. Structural descriptors that represent the chemical structure in the bioactivity modelling are introduced. Furthermore, the challenges in unique structure representations, chemical substances representation, as well as specific issues such as handling aromaticity and tautomerism are discussed. The approaches show how to represent structural information within chemical software applications in the context of storing/searching structural data in big databases and its use for predictive modelling purposes.