Chemical bibliographic databases: the influence of term indexing policies on topic searches †

a A comparative study of the three main chemical information systems (Scifinder, Web of Science and Scopus) was performed by studying the indexing policies of titles, abstracts and keywords within selected literature articles. Various chemical expressions were introduced as topic searches to illustrate the diﬀerent search tools related to term indexing. The resulting article lists were compared two-by-two by means of a script designed to identify common reference lists and specific ones to each editor. Analyzing these specific reference lists reveals that only partial coverage areas of references should be expected when querying a single platform. The discussion covers the term and keyword indexing policies, their influence on the retrievability of references and on the retrievability of the highly cited papers.


Introduction
If many previous studies compare bibliographic databases [1][2][3] in terms of citation analysis very few ones deal with the herein concerned topic.Falagas compared the strengths and weaknesses of PubMed, Scopus, WoS and Google Scholar providing an interesting overview about their main available search tools. 4This author introduced a single keyword as a topic search but did not provide any hit counts resulting from this particular search.An other in-depth analysis on chemical databases was proposed by Zass and shows some inconsistencies in indexing policies of the Chemical Abstracts Service (CAS) but his results were not compared with other major bibliographic platforms. 5These preliminary studies prompted us to analyze the consequences of term indexing policies on the number and on the consistency of retrieved answers by comparing the three above-mentioned platforms.
Term indexing has received much attention for many years from the herein compared information systems.The CAS indexes journal articles, among other document types, since the beginning of the twentieth century in a highly hierarchical way.The bibliographic CAplus database contains currently more than 40 million records covering a wide range of chemical domains including biochemistry, organic, macromolecular and applied chemistry as well as inorganic, analytical and physical chemistry. 6he CAS's title coverage comes close to 10 000 titles among them 1700 key journals are gathered to form a core journal list. 7From the outset CAS's indexing policy is document-oriented by the CAS that provides indexed terms from titles, abstracts and author keywords to a large extent in the CAplus database.In its current version, supplementary information using a hierarchical set of controlled terms is also provided. 8At the top level of the hierarchy a reference is first associated with one of the 80 CAS's sections and then indexing is divided into three main categories: concepts, substance related information and supplementary indexing terms. 9The concept category contains one or several subject headings at the first level and then terms or textmodifying phrases at the second level, both levels constituting the controlled vocabulary.Supplementary terms are keywords added by the editor that may be either different from controlled terms or may be excerpted from author keywords.The substance related information is categorized in a similar way i.e. the first level displays substance identifiers such as the Registry Number, the common chemical names linked with the official chemical name.The second level consists of index terms excerpted from the controlled vocabulary and also from CAS's specific terms such as substance roles. 10Thus this powerful indexing relies on both CAplus and Registry 11 databases enabling the user to retrieve a large reference set while using a text-only querying language. 12Moreover Scifinder, the CAS's web interface, enables reference searching from both CAplus and MEDLINE 13 databases, the indexing of the latter relying on the National Library of Medicine's controlled vocabulary thesaurus, named Medical Subject Headings (MeSH). 14Institut Charles Gerhardt Montpellier (ICGM), UMR 5253 CNRS-UM-ENSCM, Ecole Nationale Supe ´rieure de Chimie, 8, rue de l'Ecole Normale, 34296 Montpellier, France.E-mail: gilles.niel@enscm.frb Institut Charles Gerhardt Montpellier (ICGM), UMR 5253 CNRS-UM-ENSCM, Universite ´de Montpellier, Place E. Bataillon, 34095 Montpellier, France † Electronic supplementary information (ESI) available: Additional information on the diversity of studied domains (Table S1), on the expanded timespan from 1990-2005 (Table S2), the studied references in Table 8 (Doc 1), the studied references in Table 10 (Doc 2) and the source code of the script (Doc 3).See DOI: 10.1039/c5nj01077b Among the whole WoS's databases, the Science Citation Index Expandedt (SCIE) gives access to more than 40 million records from a large range of scientific domains. 15The 8500 indexed journals cover a larger set of scientific domains divided into 182 categories related to mathematics, physics, chemistry, biology, medicine, engineering, etc. 16 Besides the title, abstract and author keyword fields, WoS provides ESI, † gathered in the Keywords Plus s field. 17This information results from an algorithmic process that excerpts terms appearing at least two times in the titles of the cited references of a processed article. 18ciVerse Scopus 19 indexes more than 21 000 titles in all scientific topics classified into four domains: social sciences, physics, life and health sciences. 20These two latter domains are especially well represented and the total record number comes close to 50 millions today.The term indexing policy includes titles, abstracts, author keywords as well as matched terms.These matched terms include chemical names, CAS Registry Numbers, trade names, manufacturer names and index keywords.These index keywords form the hierarchically controlled vocabulary gathered in several thesauri such as the Compendex index, 21 EMTREE index, 22 MeSH, Species index, and GeoBase subject index. 23This list is non-exhaustive but refers to the main indexes concerned by this comparative study.
A second important factor concerns the query language and the related query tools.5][26] In recent years this led to the natural language query (NLQ) system, an algorithmic process that breaks down phrases into concepts. 27Different instructions of the process were first described by J. Williams 28 then thoroughly analyzed by A. Ben Wagner. 29The last step of the algorithmic process consists in truncating any remaining term that is not parsed in a prior instruction, thus the term 'organocatalysis' will furnish references containing the terms: organocatalysis, organocatalyst(s), organocatalytic, and organocatalys(z)ed.The main characteristics of the NLQ system lie in avoiding: (i) the use of Boolean operators that are interpreted like prepositions, 30 (ii) the use of proximity operators, and (iii) any knowledge about specific field searches.Prepositions are only used to break down phrases into simpler concepts.The NLQ process enables the end-user to focus on the scientific content owing to an easy-to-use topic search interface that may appear simpler at the outset by comparison to those of WoS or of Scopus.Both latter editors provide either basic or advanced search modes that enable searches on specific fields.WoS and Scopus provide a more classic use of Boolean operators including proximity operators thus giving the searcher a higher precision on the queried expressions.In advanced query mode, many different search fields of WoS and Scopus are searchable using a quite simple syntax based on field codes.
To assess the influence of some factors such as term indexing and journal coverage, we selected some single terms or short expressions that attempt to be representatives of different chemical domains such as organic and inorganic chemistry, analytical and physical chemistry, chemistry related to energy and fuels or materials science, biochemistry and molecular biology, and biotechnology and biochemical research methods (see ESI, † Table S1).All selected terms and expressions were submitted to the query interfaces of Scifinder, WoS and Scopus and the resulting hit sets were thoroughly analyzed.

Querying methods
This study was limited to some document types such as journal articles, book chapters, conference papers, notes, letters and reviews because all these citation types cover the most informative part of the chemical literature.As a second argument, chemists frequently need to refer to experimental procedures that are more often embedded in journal articles than in other document types.Thus meeting abstracts, errata and corrections were discarded from the initial queries.Patents were also discarded herein because they would require a parallel study owing to their intrinsic indexing that is distinct from the ones of academic papers.The CAplus and Medline databases were queried through the Explore References by Research Topic of Scifinder.The Science Citation Index Expanded and Conference Proceedings Citation Index of WoS were selected while querying these databases in advanced search mode.The three subject areas -life sciences, health sciences and physical sciences -were queried from Scopus's databases such as Embase and Medline. 31ost queries were performed in 2010 but some queries were performed in previous years to check the reproducibility of the initial results over a larger timespan.Only lists of English-written papers were saved and then exported into a standard bibliographic format for comparison.
Table 1 displays the Scifinder's specific queries corresponding to some selected terms and expressions and then the whole filtering process towards the selection of unique articles.Thus column 2 displays the queried terms as they were typed in the Research Topic form of Scifinder's interface and column 3 specifies which candidate list was chosen at the next step unless otherwise noted.Filtering by year and language leads to crude hit counts (column 4).Column 5 displays the hit counts after combining answer sets when required.The citation column 6 refers to all citations after automatic removal of duplicates from the CAplus and Medline databases while column 7 corresponds to article counts after the selection of document types such as journal articles, book chapters, conference papers, notes, letters and reviews.In some entries, discarding patents from this study involves a dramatic decrease between the citation column and the article one.Other document types, i.e. meeting abstracts, errata and corrections, were discarded from citation lists by means of a script, named Iddup, that will be described below.Unique articles in column 8 result from parsing each reference list by this script so that each list does not contain any duplicate reference.The differences between the reference counts of columns 7 and 8 result from incomplete duplicate removal between the CAPlus and Medline and from some errata that could not be filtered during the document type selection.
As pointed out by Ben Wagner, singular form vs. plural form queries in Scifinder may lead to somewhat different results.Therefore we tested each term or expression under both forms. 29In most cases the hit counts are equal except for entries 1, 2, 8, 9, 10, 11, 12 and 13.For example the answer lists corresponding to 'allene' (entry 1) and 'allenes' (entry 2) contain references where the queried term was found as a concept.Combining the two answer lists (917 hits) and then the removal of duplicates (771 citations) furnishes 630 journal articles.The expression 'N-heterocyclic carbene' (entry 8) leads to a greater hit count (668 hits) than the corresponding plural form (231 hits).A processing similar to entries 1 and 2 led to 671 citations and 508 journal articles.This emphasizes in such cases that both singular and plural forms need to be searched.With respect to the expressions 'modified nucleoside' and 'modified nucleosides' (entries 10 and 11) the largest list contains the smallest one after combining them.Because the references corresponding to the terms and expressions of entries 1, 2, 10 and 11 were selected through a concept search, the process of combining answer lists may be simplified by typing the singular and the plural forms within the same search and by using one of these forms within brackets.However this trick is not valid if the references corresponding to an expression are found containing this expression 'as entered' as in the cases of entries 8, 9, 12 and 13.Finally the term 'material' (entries 16 and 20) was searched as a concept on both databases under singular vs. plural forms, the resulting hit counts were found different from less than 0.05%.
The results of the expression in entry 15 are worthy of some specific explanations because we initially performed this search by selecting the expression 'copper (Cu) catalyzed arylation' found as a concept thus leading to 375 hit counts.This high value is mostly due to a high occurrence number of the term 'aryl' resulting from the truncating step of the NLQ process.In order to retrieve only chemically answers relevant to the arylation concept, we ruled out the term 'aryl' by building this query as follows: (i) references were found containing 'copper catalyzed' as entered (869 hits), (ii) references were found containing 'Cu catalyzed' as entered (254 hits), and (iii) the two answer sets were combined (104 hits).In parallel a reference list was found containing 'arylation' as entered (924 hits) and this latter hit set was intersected with the previously obtained 1044 hit set thus furnishing a final list of 76 hits.For entry 16 a similar process was set in order to get the terms 'hybrid' and 'materials' closer to each other.This query was built following the sequence: (i) references were found containing 'hybrid material' as entered (470 hits), (ii) references were found containing 'hybrid materials' as entered (780 hits), and (iii) the two answer sets were combined (1105 hits).In parallel a third reference set was found containing the concept 'nanoparticles' (39 914 hits) and this latter set was intersected with the 1105 hit count set providing 266 hits as a final result.Because all queries were performed in March, April and May 2013, the hit counts may vary slightly if performed now.
The first point we attempted to address is related to the nonnegligible proportion of duplicate answers observed within the Scifinder's answers whose total count is equal to 204 when summing all duplicates corresponding to each query.These internal duplicates were found among many Medline's articles that miss a DOI whereas the corresponding articles are assigned a DOI if the PubMed interface is queried.Among these 204 references, we observed too that some journal names are distinctly indexed between Medline and CAplus databases.
Representative examples are given in Table 2.With respect to the Scopus's and WoS's databases only one and two duplicates were found respectively.Tables 3 and 4 display the queries specific to WoS and Scopus, respectively, and the resulting hit counts related to the selected terms and expressions used within Scifinder's topic searches.Keeping in mind that Scifinder's topic searches include by default all indexing terms from titles, abstracts, index terms and supplementary terms we selected the corresponding WoS's search field TS (column 3) that covers the fields: title, abstract, author keywords and keywords Plus s .Queries to Scopus (column 3) were performed through the document search tab in basic mode together with the option gathering together title, abstract and keywords.By this way the retrieved answer lists are equivalent to the ones retrieved by using the field sum 'TITLE-ABS-KEY-AUTH' available in advanced search mode.In order to perform topic searches comparable to Scifinder's topic searches, the use of the right-hand truncation was systematically preferred because this enables a better control on WoS's and Scopus's queries.Boolean operators were also employed to target precisely all queries, especially the proximity operators, available in WoS and Scopus that retrieve the searched terms within the same bibliographic field.
The WoS's operator NEAR searches terms that are distant by default at a maximum of 15 terms but this distance may be shortened.Terms within double quotes were alternatively searched as an exact expression (entries 7, 11, 12 and 17, Table 3).The logic for the proximity operator W/n is similar in Scopus.This operator requires defining a number n equivalent to the distance between the searched terms.The automatic truncation in Scifinder was offset within WoS's and Scopus's searches by extensive use of wildcards as exemplified in entry 1 (Tables 3 and 4) thus enabling the terms 'allene(s)' or 'allenyl' or 'allenic' to be retrieved.

Result analysis automation
All article lists (Table 1, column 7 and Tables 3 and 4, column 4) were exported as text files in a tagged format in order to analyze them and to find both common and specific references to each editor.The RIS file format was chosen as an export file format from Scifinder and Scopus while WoS's data were exported into the CIW file format.In order to quickly identify duplicates among two or three reference lists we used the Iddup script whose main instructions are described as follows.For each single input file, Iddup furnishes two text files in the RIS format, the first file contains unique articles (Table 1, column 8; Tables 3 and 4, column 5) while the second file contains duplicate references.When analyzing two different input files, Iddup identifies first internal duplicates in each list, discards them and then compares pair to pair the remaining references of the two lists.As output files, Iddup provides a file containing common references and two files containing specific references from each input file.When comparing two references from input lists without internal duplicates, Iddup assigns each pair a score that is computed based on the following filters: initial score = 0 if same DOI then score = 10 (and references are identical) if similar title then increment score +3 if same journal then increment score +1 if same author count then increment score +0.5 if similar author and same position then increment score +0.5 if same starting page then increment score +1.5 if same volume then increment score +0.5 if same issue then increment score +0.5 if scores 4 5, then the two references are considered as identical.
The second instruction enables the script to overlook the next instruction in case of same DOIs are found.A similarity computing was introduced at the third instruction that compares the titles because many titles contain abbreviations or Greek characters that are not always indexed in the same way by the different editors.These statements prompted us to introduce a 12% similarity score -12% of the length of the longest title -that was computed using the Levenshtein distance. 32The influence of this parameter is discussed in Section 3.3.Likewise the author names present many discrepancies due to different spelling languages, typing errors or due to a different ranking in indexing their names.Our script was completed by correspondence arrays for some journal titles and for the Latin transcription of Greek characters.Finally Iddup discards citations corresponding to errata or corrections.

Comparison of reference lists
Unique articles of Tables 1, 3 and 4 are reported in Fig. 1.Overall the magnitude orders range similarly except for entries 2, 5, 7 and 14 that display higher article counts found by WoS and except for entry 15 where Scopus retrieves more articles than the other two systems.Scopus and Scifinder retrieve more articles in entries 9, 10, 15 and entries 1, 6 respectively.These results were refined through Iddup computing by identifying the common articles (column 4 in Tables 5-7) to each pair of editors and the specific articles to each editor (columns 3 and 6 in Tables 5-7).The union of the total article counts (column 7, Tables 5-7) is given by the sum of columns 3, 4 and 6 while column 8 represents the proportion of common articles to two editors.Preliminary observations show that these proportions vary dramatically from a maximum of 80.0 to a minimum of 11.4 percent (entries 4 and 15, Table 6).Higher proportions of common articles were generally observed for single-, double-or triple-term queries than for the queries including four terms.
Though the main results were recorded in 2010, we have extended the query timespan to the years 1990, 1995, 2000, and 2005 for the four expressions: 'allenes', 'peptidomimetics', 'battery electrodes' and 'band gap in solar cells'.These expressions were selected because their corresponding queries furnished sufficient hit counts to be representative as soon as 1990.For example expressions such as 'organocatalysis' or 'N-heterocyclic carbenes' returned no answer in 1990 and 1995 and were thus discarded.A second selection criterion was based on variable lengths of these four expressions.
Full resulting data are included in the ESI † (Table S2).As general conclusions of this supplementary study, we noticed that: (i) the three databases lead to different result sets as in 2010, (ii) large non-overlapping result sets were found during the years 1990, 1995, 2000, and 2005, and (iii) the proportion of overlapping papers increases over the years except for 'peptidomimetics'.
In order to close this section, we may mention that the overall averages of shared references by Scifinder/WoS, Scifinder/Scopus and Scopus/WoS are 40.8,46.8 and 52.2% respectively.

Discussion
These quite low overlaps between the three information systems may appear surprising but at least one precedent was observed in the computer sciences. 33

Influence of term indexing
Which are the reasons why these differences are often so high?To answer this question, some reference lists corresponding to specific references (columns 3 and 6) were selected and each reference of these lists was thoroughly examined in order to determine for which reason this reference was found by one editor or omitted by another one.Such reasons may be related a priori to journal indexing or keyword indexing but we finally found some other reasons that enabled us to assign each reference to one of the following categories: -Journal: journal indexing may be absent or is stopped before 2010 or issue indexing is incomplete.
-Document types: Conference Proceedings, Book Reviews, and International Symposia that are not homogeneously indexed by the editors.
-Index terms: Indexing terms, Keywords and Keywords Plus s .In case of Scifinder, supplementary terms are included in index terms.
-Modified terms: (a) some journals do not provide any abstract; in those cases Scifinder designs an abstract that seems to be excerpted from the article conclusion, (b) some queried terms are  a Entries 1-16 correspond to the queried expressions of previous Tables 3 and 4. b Specific articles to Scifinder.c Shared articles by both editors.d Specific articles to WoS. e Sum of columns 3, 4 and 6.
f Proportion of common articles to two editors.-Abstracts: though provided by the journal, some abstracts are not indexed.
-Author keywords: though provided by the publisher, some author keywords are excluded from indexing.
-Different year: some issues are assigned a different year because the dates of the online publication and of the printed version are different.
-Wrong DOI: typographic errors were found in agreement with recent similar observations. 34We noticed that a nonnegligible amount of articles were missing an assigned DOI.Indeed concatenation of all articles from a particular editor followed by the removal of internal duplicates revealed that 8.7, 6.5 and 4.7% of articles from Scifinder, Scopus and WoS, respectively, were missing a DOI. -Miscellaneous.
ESI † (Doc 1) details the whole results corresponding to the 'organocatalysis' queried term, the 'N-heterocyclic carbenes', the 'phosphine ligands' and the 'viscosity of ionic liquids' expressions.Table 8 displays the results obtained for the 'organocatalysis' queried term.The main observed differences arise from the Index Terms row.The Keywords Plus s indexing of WoS provided more articles than those retrieved by Scopus's or Scifinder's term indexing, this latter editor showing the weakest efficiency of its term indexing policy within this example.We also checked the relevance of 50 randomly selected references from the 234 references only retrieved by the Keywords Plus s .At least 45 over these 50 references were strongly related to organocatalysis.With respect to the Modified terms row, Scifinder designed an abstract excerpted from the article conclusion in one case and in the other one a hyphen was introduced in the term 'organocatalytic by WoS' (Table 8, column 3).On the same row (Table 8, entry 4, column 4), a hyphen was introduced in the term 'organocatalytic' eight times by Scifinder and in one case the term 'organocatalyst*' was shortened to 'catalyst*' within the title.Within the Abstracts row the reference found by Scifinder (Table 8, column 3) presents an abstract that was not indexed by WoS.In the case of the journal 'Angewandte Chemie, International Edition in English', we checked 500 articles of this journal and we found that they were missing an indexed abstract by WoS.This statement is valid up to 2010 but many abstracts are indexed in more recent years.In column 4 (entry 5) the 10 references found specifically by WoS result from a left truncation of the term 'organocatalyst' to 'catalyst' in Scifinder.More surprising are the 149 references (entry 6, column 4) where Scifinder modified the original author keywords by shortening or suppressing the queried term.
Five articles were indexed by WoS with one misspelled character on their DOI compared to the original DOI (Table 8, entry 8).Finally the miscellaneous category contains articles where: (i) the filters applied to the document types during the querying step differ from one editor to another one thus during the analysis step Iddup discards citations corresponding to some unwanted document types i.e. book chapters and corrections, and (ii) the 0.8 similarity score on the titles and on the author names was in one case the reason why two references were wrongly differentiated.If we consider all articles of a particular editor that are classified in the index terms or modified terms or abstracts or author keywords categories, the next question remains to verify whether the concurrent editor's database is really missing this specific information or not?To check this hypothesis we injected the DOIs or the bibliographic data of a given editor's articles corresponding to the above-mentioned indexing categories into the query interface of the concurrent editor.The results are displayed in the last row (Table 8, entry 10).For example 6 over 7 specific articles retrieved by Scifinder (Table 8, column 3) are also present in the WoS thus emphasizing the importance of the Scifinder's indexing policy in this case.Once this statement has been established we noted that only 22 articles (Table 8, entry 1) from specific journals and 5 articles (Table 8, entry 2) from the document type category belong specifically to Scifinder.The vast majority of articles retrieved by WoS (Table 8, entry 10, column 4) would have been retrieved likewise by Scifinder if different indexing rules have been applied.
Comparing Scifinder and Scopus (Table 8, columns 5 and 6) on their specific references led to similar observations.The coverage of journals is in favour of Scifinder whereas Scopus retrieves a higher article count owing to its term indexing.Moreover Scopus indexes in the case of 3 reviews not only the abstracts but also the tables of contents where the queried term is present.We noticed too that author keywords were neither suppressed nor modified.
By comparing Scopus and WoS (Table 8, columns 7 and 8), we observed that WoS shows a high count of articles retrieved by the Keywords Plus s indexing.Among the 11 articles included in the abstracts category (Table 8, column 7) 3 reviews are present indexed by Scopus within their tables of contents.The 8 remaining articles of the abstracts category correspond to references for which WoS did not index the abstract.We observed that the different year category displays a rather important amount of articles: 11 articles are indexed by WoS in 2009 or 2011 and 3 articles are indexed by Scopus in 2009.Obviously these articles would have been retrieved by a multiple-year query.In the wrong DOI category were found the same articles as previously noticed.
In order to confirm the results displayed in Table 8, we analyzed some data from two-term queries and a three-term query (Table 9).The first studied expression was 'N-heterocyclic carbenes' (Table 9, columns 3 and 4) and the articles retrieved by Scifinder and Scopus respectively.Here again the influence of term indexing is predominant but to a smaller extent than previously.Within the modified terms category we observed that in some cases Scifinder developed the NHC acronym to 'N-heterocyclic carbenes' thus enabling the corresponding article to be retrieved.Finally 6 over 7 articles present in the miscellaneous category (Table 9, column 3) correspond to misspellings or typographic errors from Scopus.
The next results concerned the two-term expression 'phosphine ligands' and the retrieved articles by Scopus and WoS (Table 9, columns 5 and 6).Apart from the predominant influence of term indexing by both editors, Scopus offers in this case a slightly better journal coverage and a better abstract coverage.In the miscellaneous category Scopus retrieved some articles containing the expanded forms of the 'phosphine' term such as 'bisphosphine' or 'triphenylphosphine'. Finally we looked at the three-term query 'viscosity of ionic liquids' (Table 9, columns 7 and 8) and examined the specific articles retrieved by Scopus and WoS.The observed proportions within the different categories are similar to those obtained in previous cases, the index term category remaining the main differentiating one.
These last results (Tables 8 and 9) were not computed by any algorithmic process and only affect a part of the study presented in Tables 5-7.Nevertheless they reveal some interesting trends about the scope and the limits of term and keyword indexing policies of Scifinder, Scopus and WoS.If we focus now on the values displayed in different columns of entry 10 (Tables 8 and 9), we observe that a high proportion of articles retrieved in the indexing categories by a particular editor are present in both other editor's databases.Ultimately this emphasizes the influence of term and keyword indexing policies of these editors because most informative articles are shared by the three editors.In other words the proportion of information specific to a given editor is not as high as it could be expected from preliminary results displayed in Tables 5-7.Moreover the term and keyword indexing policies clearly

Conclusions
Topic searches in chemical information systems are expected to return precise answers and we attempted to show in the first section of this paper how it can be challenging to query the web interfaces of Scifinder, Scopus and WoS using the most suitable syntax.If the personal learning involvement is shorter when starting a topic search with Scifinder, the higher precision of WoS's and Scopus's query languages may justify a slightly higher learning period.Crude results of these topic searches using simple terms or expressions up to four-term queries show rather uniform trends of the three information systems in retrieving large reference lists but with a noticeably greatest hit count retrieved by WoS over the whole answer sets.This feature results from a combined citation and semantic indexing affording new indexed terms that really expand capacities of topic searches.Though the coverage of common references retrieved by Scifinder, Scopus and WoS was shown to be incomplete owing mainly to keyword indexing (and to journal indexing though to a lesser extent), most of the references are shared by the three information systems including highly cited papers.Ideally they should be queried to get exhaustive answer lists or they should combine the powerful capacities of reliable thesauri and citation computing. 35 ) catalyzed arylation (TS = (''copper catalyzed'') OR TS = (''Cu catalyzed'')) AND TS = arylation 105 12 Hybrid materials and nanoparticles TS = (''hybrid material*'') AND TS = nanoparticle* 256 13 Viscosity of ionic liquids TS = viscosity AND TS = ionic liquid* 360 14 Band gap in solar cells TS = ((band NEAR gap) AND (solar NEAR cell*)) 677 15 Statistical analyses of DNA microarrays TS = statistical analyses of dna microarrays 93 16 Surface area in mesoporous materials TS = (''surface area'') AND TS = (mesopor* material*) 207a Article counts after filtering by year, language and document type.b After parsing by the Iddup script.
indexed using a hyphen included in the retrieved term i.e. organocatalytic, (c) the journal title is indexed in two different spellings, and (d) author keywords or titles or abstracts are modified.

Table 1
Scifinder's queried expressions and the filtering process Combine answer sets b Citations c Articles d The two concepts ''surface area'' and ''mesoporous materials'' closely associated with one another

Table 2
Different indexed journal titles between CAPlus and Medline 18Acta Crystallographica, Section E. Structure Reports Online Acta crystallographica.chromSection E, Structure reports online 39 Angewandte Chemie, International Edition Angewandte Chemie (International ed. in English) 86 Chemistry --A European Journal Chemistry (Weinheim an der Bergstrasse, Germany) Chemistry -A European Journal

Table 3
Queries and results from WoS

Table 4
Queries and results from Scopus Article counts after filtering by year, language and document type.b After parsing by the Iddup script. a

Table 5
Iddup parsing of reference lists from Scifinder and WoS b Common c Uniq. articles Spec.d Union e

Table 6
Iddup parsing of reference lists from Scifinder and ScopusThis b Common c Uniq. articles Spec.d Union e journal is © The Royal Society of Chemistry and the Centre National de la Recherche Scientifique 2015 New J. Chem., 2015, 39, 8807--8817 | 8813

Table 7
Iddup parsing of reference lists from Scopus and WoS Entries 1-16 correspond to the queried expressions of previous Tables3 and 4. b Specific articles to Scopus.c Shared articles by both editors.d Specific articles to WoS. e Sum of columns 3, 4 and 6.
a f Proportion of common articles to two editors.

Table 8
Study of reference lists corresponding to the 'organocatalysis' term

Table 9
Study of reference lists corresponding to the expressions 'N-heterocyclic carbenes', 'phosphine ligands' and 'viscosity of ionic liquids' WoS's and Scopus's specific references are stable.In a second time the variation of the Levenshtein distance only affects Scifinder's specific references for values less than or equal to 9. the