Kei-ichiro
Takahashi
a,
Hiroshi
Mamitsuka
a,
Masatoshi
Tosaka
b,
Nanyi
Zhu
b and
Shigeru
Yamago
*b
aBioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho Uji, 6110011, Japan
bInstitute for Chemical Research, Kyoto University, Gokasho Uji, 6110011, Japan
First published on 29th January 2024
Although a large amount of knowledge on copolymers and copolymerization has been already accumulated in the literature, there are no freely available databases with a proper graphical user interface (GUI) on copolymerization. Focusing on copolymerization with only two types of monomers (due to the focus on fundamentals) and copolymerization in radical polymerization, i.e. the most major polymerization, we present CoPolDB, a database of copolymerization with numerous helpful GUI functions, including: (1) graphically showing multiple connections (with different reactivity ratios) between two monomers and the corresponding copolymer, assisting users’ intuitive understanding and; (2) providing a list of alternative monomers for each monomer pair, according to the similarity to the original monomer. We believe that CoPolDB is a useful resource to understand the current copolymerization status entirely and comprehensively as well as an inspiring tool to promote polymer chemistry research. CoPolDB is available at https://www.copoldb.jp/.
![]() | ||
Fig. 1 Copolymerization: (a) the general reaction scheme and (b) the four propagation pathways involving the copolymerization. |
The structure of copolymers and the properties of the resulting polymer materials are determined by the reactivity of each monomer, defined using the monomer reactivity ratios, r1 = k11/k12 and r2 = k22/k21 (k11 and k12 refer to the rate constant of 1p reacting with 1 and 2, respectively, and k22, and k21 refer to the rate constant of 2p reacting with 2 and 1, respectively) (Fig. 1b). Thus, understanding and predicting the monomer reactivity is a fundamental and important research theme.1–3 Due to the long history of radical polymerization, a large amount of knowledge on copolymers and copolymerization has already been accumulated in the literature. Organizing this knowledge and presenting the organized information in a visually understandable manner would be useful for advancing research in polymer chemistry and relevant fields, not only scientifically but also from an engineering perspective. However, to the best of our knowledge, currently there are no freely available online data resources relevant to copolymerization, except for the following two cases.
(1) PolyInfo10 (https://polymer.nims.go.jp/): a database (DB) of polymers, including homopolymers, copolymers, polymer blends, composites, etc., where 105 properties are given for each polymer entry. PolyInfo is a useful resource, providing the properties of each polymer, but it gives no information on copolymerization kinetics and monomer reactivities.
(2) Recently, an online DB for rate coefficients in radical polymerization (https://sql.polymatter.net) was created.11 This is also a proper resource of monomers, copolymers and even copolymerization. However, this DB consists solely of the rate constant of homopolymerization. In addition, this DB has no graphical user interface (GUI), and this makes it hard for users to intuitively understand the reaction of copolymerization.
With the above considerations in mind, we propose CoPolDB (https://www.copoldb.jp/) equipped with a user-friendly GUI, which allows users to visually see the connection between monomers, copolymers, and copolymerization kinetics. CoPolDB (although it is currently limited to the copolymerization of two monomers) has a number of unique features, particularly the following four: (1) CoPolDB consists primarily of two parts – monomers and copolymers – which are connected to each other, since monomers are building blocks of copolymerization (Fig. 2 and 3). (2) CoPolDB includes kinetic parameters r1 and r2, from which the so-called copolymer composition curve, F1/f1 plot, can be calculated and visualized (F1 and f1 refer to the molar fraction of monomer unit derived from monomer 1 in a copolymer and the feed of monomer 1, respectively) (Fig. 4). This visualization allows users to intuitively understand the effect of the monomer reactivity ratio on the composition of synthesized polymers. Note that all data are included when multiple combinations of r1 and r2 were reported with scatter plots. (3) All copolymerization data are extracted from a polymer handbook,12 and each entry is hyperlinked to the original reference through a digital object identifier (DOI), if the DOI is available. (4) Only the minimum information is kept in CoPolDB, so that the size of the required storage is very small. Relevant information can be provided from internet resources, such as PubChem,13 through the identifiers generated by CoPolDB.
![]() | ||
Fig. 3 Interface examples: (a) Monomers “Browse”: for each monomer, the chemical structure, the monomer names, the molecular formula, the molecular weight, how many times used for generating copolymers, and how many times used in copolymerization reactions, are shown. The last two items have hyperlinks to the corresponding entries in the copolymers “Browse” page. (b) Copolymers “Browse”: a list of monomer pairs (each corresponding to a copolymer and linking to the detail page of (d)) is shown. This page can be switched to the corresponding Copolymerizations “Browse” page by clicking ‘copolymerizations’ in the upper right corner. (c) Monomers detail page: for a monomer (styrene), the identifiers, the molecular structure, the computed descriptors, the MOL file, and any similar monomers, are shown. (d) Copolymers detail page: the F1/f1 plots, the scatter plots and a reactivity ratio table with DOIs (see also Fig. 4) are shown. The alternative monomers usable in copolymerization reactions (Fig. 5) are suggested. |
![]() | ||
Fig. 4 Enlarged Fig. 3d. The reactivity ratios for the same monomer pair are shown in the F1/f1 (upper left) and scatter plots (upper right). The original reference of each ratio can be accessed by clicking the corresponding DOI. We can choose an arbitrary subset of the reactivity ratios, and the plots of only the selected ratios are shown. |
We emphasize that CoPolDB is the first freely available DB with a sophisticated GUI on copolymerization in polymer chemistry. Even across chemistry as a whole, it would be hard to find a free DB with an advanced GUI like CoPolDB. On the other hand, in biology, a lot of data and information have been stored in a variety of DBs and presented with proper GUIs, such as GenBank14 for DNA sequences and GEO (Gene Expression Omnibus)15 for gene expression data. These DBs are useful not only for understanding the current state of biology but also for promoting research in biology. This data-based aspect might be a point lacking in current chemistry. We hope that CoPolDB will be one of the first steps to add more data-driven viewpoints to the present chemistry.
(1) Monomer: each entry has the entry ID (Monomer_ID), the trivial and IUPAC names, the SMILES (Simplified Molecular Input Line Entry System)16 string, the InChI (International Chemical Identifier), the InChI key,17 the molecular weight (MW) and the molecular formula (MF).
Using the monomer identifiers, i.e. the SMILES and InChI key, the physical or chemical information can be retrieved from external resources and presented.
(2) Copolymer: each entry has the entry ID (Copolymer_ID), a pair of monomers (specified by two Monomer_IDs) and the “order” with a numerical value to specify the first and second monomers.
(3) Copolymerization: each entry has the entry ID, Copolymer_ID, the reactivity ratios r1 and r2, and the reference information (the title, the authorship and the DOI). In addition, each entry has the 95% confidence limits of the ratios if they are provided in the polymer handbook.12
These three tables are connected through Monomer_ID and Polymer_ID (Fig. 2), so that relevant information can be retrieved using these IDs. For instance, given a monomer, CoPolDB can provide the number of copolymers, which are synthesized from this monomer, and reversely, given a reaction of copolymerization, users can obtain the molecular formula of the two monomers used in the reaction.
Monomer: the SMILES code is created with reference to the original source and assigned to each monomer. Using the SMILES code, the molecular formula and molecular weight of a molecule are generated by RDKit.18 Similarly, using the SMILES code, the InChI and InChI key are generated using OpenBabel.19 The trivial and IUPAC names are obtained from PubChem by searching the SMILES code.
Copolymer and copolymerization: the information is generated from the literature based on manual curation (not by any software). The Copolymer table stores the two Monomer_IDs corresponding to the two monomers found in the corresponding literature. The Copolymerization table stores the corresponding Copolymer_ID, reactivity ratio and reference (title, author and DOI) information. The DOIs of all references are stored, unless they are neither accessible nor publicly available.
As of the time of writing this paper, CoPolDB has 864 monomers, 1954 copolymers, and 2991 reactions of copolymerization in the Monomer, Copolymer, and Copolymerization tables, respectively. For the DOIs, we identified 725 unique identifiers, which can be assigned to 2449 out of the 2991 reactions of copolymerization.
(i) Overview: monomers for radical polymerization, e.g. fundamental concepts of radical polymerization, monomer types, and structures of representative monomers, are described. Examples of representative monomers are also introduced.
(ii) Browse: a comprehensive list of all monomer entries with pagination and a sort function is presented, where each monomer is with six key pieces of information (Fig. 3a): (1) the 2-dimensional structural formula displayed as a PNG image generated by RDKit based on the SMILES notation, (2) the monomer name (the conventional name or the IUPAC name if the conventional name is unavailable), (3) the molecular formula (generated by RDKit), (4) the molecular weight (calculated by RDkit), (5) the number of copolymers linked to the monomer, and (6) the number of reactions of copolymerization linked to the monomer.
Users can sort monomer entries, using any items other than the structural formulas, in ascending or descending order. For example, by sorting the number of copolymers (5) in descending order, users can see the most frequently used monomers in copolymers in CoPolDB. In each page, hyperlinks are given to all items, except for the monomer name and the molecular weight. For example, by clicking the molecular formula, a list of monomers with the same molecular formula is shown. By clicking the number of copolymers (or copolymerizations), a list of copolymers (copolymerizations) synthesized from the same monomer is shown. That is, these hyperlinks allow users to access a variety of information on reactions of copolymerization. Also, by clicking the structural formula, the corresponding monomer detail page (Fig. 3c) is presented.
The monomer detail page has five sections: (1) Identifiers (ID), (2) Structures (ST), (3) Molecular Formula and Computed Descriptors (MD), (4) MOL File (MO) and (5) Similar Monomers (SM), giving users helpful insight into copolymerization reactions, since the properties and behavior of the resulting copolymer often depend on the arrangement of monomers within the polymer chain. These five sections are described further: (1) ID: if a monomer is registered in PubChem, the corresponding IUPAC name, canonical SMILES, and isomeric SMILES are kept linked to the monomer in PubChem. These can be used as the keywords to search external resources and tools to calculate the physical or chemical properties. (2) ST: the 2D and 3D structures obtained by RDKit18 and 3Dmol.js,20 respectively, are shown. To generate the conformation for the 3D structure, CoPolDB uses a distance geometry (DG) method called Experimental–Torsion–Knowledge Distance Geometry (ETKDG).21 (3) MD: the molecular formula and fifteen molecular descriptors, such as the molecular weight, calculated by RDKit are presented. (4) MO: a de facto standard for the data formats in chemoinformatics, MOL, is shown in a text format. This is the result of a conformational calculation to generate the 3D structure in ST. (5) SM: the Tanimoto similarity22 of two molecules (fingerprints) is given as two fingerprints (say A and B), defined as the ratio of the intersection (A∩B) to the union (A∪B). A list of monomers with the Tanimoto similarity of 0.25 or higher, computed using Morgan fingerprints,23 is presented in descending order of the Tanimoto similarity.
(iii) Search: users can search monomers using the following items: (a) the canonical or IUPAC name, (b) the molecular formula, (c) the SMILES code, (d) the InChI Key, (e) the minimum/maximum molecular weight, (f) the minimum/maximum occurrences in copolymers. From (a) to (d), partial text matching, i.e. checking whether the query text is included or not in the target text, is available. Users can search a molecular formula in the text, without thinking about the order of atom symbols. For example, C8H8 and H8C8 are treated as the same molecular formula. Users can input a query, being assisted by an auto-complete function. That is, if a user inputs two or more characters in a text field, such as a part of a name, a SMILES or a molecular formula, up to 10 candidate strings will be displayed for the user to choose from.
(iv) Statistics: the number of monomers in CoPolDB is shown, with the distribution of molecular weights and the distribution of the number of occurrences in copolymers.
(i) Overview: a chemical overview of radical polymerization, such as the principles of copolymerization and the structure of copolymers, is explained.
(ii) Browse: unlike in Monomers, there are two browse pages: Copolymers and Copolymerizations, which are switchable. The former shows a list of monomer pairs (Fig. 3b), each with a hyperlink to the detail page (Fig. 3d) of the corresponding copolymer, while the latter shows a list of copolymerizations, each also with a hyperlink to a detail page of the corresponding copolymer. The detail page has an interactive GUI (Fig. 4) to show and compare, for each monomer pair, various different reactivity ratios (from different references) by F1/f1 plots and scatter plots (upper left and right of Fig. 4, respectively), with the reported r1 and r2 pairs (lower half of Fig. 4) accessible to the original reference by clicking the corresponding DOI.
The reactivity ratios of monomers are crucial for the rational design and synthesis of copolymers with specific properties, since the reactivity ratios can vary due to a variety of factors, especially temperature. First, the different functional groups on the same monomer can have different reactivity ratios, due to the different reactivity towards the other monomer. Second, the environment of a reaction, such as temperature, pressure, solvent, and additives, all affects the reactivity ratios. In addition, the reactivity ratios can vary depending on the experimental method. Thus, even for the same monomer pair, various reactivity ratios can be obtained.
A F1/f1 plot is constructed by plotting the mole fraction of monomer 1, F1, in the copolymer against the mole fraction of that same monomer in the feed, f1. This is a well-used graphical tool for understanding copolymerization, more specifically, predicting the composition of the copolymer using the feed composition and then designing the process of copolymerization to have particular desired properties. This plot can also be used to compare the reactivity ratios of different monomer pairs, being useful for selecting particular monomers for copolymerization through the compatibility and expected copolymer properties. CoPolDB presents F1/f1 plots for all copolymerizations in the detail page to help users to predict copolymer properties and access experimental conditions of interest. Particularly, users can select one reactivity ratio among the various available, and check the information on the selected ratio, such as r1, r2 and the original paper, which provides this ratio, by clicking the corresponding DOI.
The detail page further presents – for a monomer of a copolymer – a list of monomers, each being able to be a replacement of the original monomer. Fig. 5 is an example detail page, showing the copolymer of styrene and methyl methacrylate: the replaceable monomers of styrene which are found in the copolymers of styrene, are sorted by the structural similarity to methyl methacrylate, and this is done in the same way for the methyl methacrylate side. Each monomer in this page has a hyperlink (at the number of reactions of the corresponding monomer), and by clicking the link, the detail page on the alternative copolymer appears.
![]() | ||
Fig. 5 Enlarged Fig. 3d: for the styrene and methyl methacrylate pair, the alternative monomers for styrene and methyl methacrylate are listed on the left and right columns, respectively. The number of registered reactions of copolymerization, including the corresponding molecule is shown, and the number is linked to the corresponding copolymers detail page. For a pair of monomers (A and B), CoPolDB generates a list of the alternative monomers for A in two steps: (1) in the registered monomer pairs, searching for monomers, which can be a pair with B, and (2) sorting the found monomers, according to the structural similarity to A. |
(iii) Search: an interface to search for the copolymerization reactions through the input queries of monomer information, such as the molecular formula, the SMILES code, and the InChI key. The interface accepts a range of monomer reactivity ratios as an input query.
(iv) Statistics: the number of polymers, copolymerization reactions and references with DOIs are shown, with the distributions of the number of copolymerizations per copolymer and the r1 and r2 reactivity ratios.
As mentioned in the Introduction, there are already a variety of DBs in biology. Currently these DBs are important resources as input for data-driven approaches, such as statistics, data science and machine learning (ML), for biology. For example, a ML model is trained from the gene expression data of patients with a certain disorder, and the trained model is used for finding future patients or biomarkers relevant to the disorder. The data stored in CoPolDB also would be able to be used for the same direction of the research in polymer chemistry.
Currently, CoPolDB stores only data published in the literature, and there are no functions for users to input the data into the DB. In biology, experimentally identified gene sequences must be registered in a DB, such as GenBank, when published. This type of rule has promoted the increasing size of DBs in biology. Similarly, in the future, we can modify CoPolDB to allow users to register copolymerization results. Also, we will provide CoPolDB with a function to allow users to read access to the data of CoPolDB. This extension will allow users to use the CoPolDB data for further studies on their own computers, such as building a ML model for predicting reactivity ratios. In fact, we think that applying the built ML models to the data in CoPolDB would be useful to find and fix incorrect data hidden in the database.
Footnote |
† If the molecular weights are the same, the order is determined based on molecular descriptor values, such as HeavyAtomCount and NumValenceElectrons, which are calculated by RDKit. |
This journal is © The Royal Society of Chemistry 2024 |