Exploring the mastery of French students in using basic notions of the language of chemistry

Sophie Canac; Isabelle Kermen

doi:10.1039/C6RP00023A

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/C6RP00023A (Paper) Chem. Educ. Res. Pract., 2016, 17, 452-473

Exploring the mastery of French students in using basic notions of the language of chemistry

Sophie Canac *^a and Isabelle Kermen ^b
^aLDAR (EA 4434), univ. Paris-Est Créteil, univ. Artois, univ. Paris-Diderot, univ. Cergy Pontoise, univ. Rouen, F-94010 Créteil, France. E-mail: sophie.canac@u-pec.fr
^bLDAR (EA 4434), univ. Artois, univ. Paris-Est Créteil, univ. Paris-Diderot, univ. Cergy Pontoise, univ. Rouen, F-62300 Lens, France. E-mail: isabelle.kermen@univ-artois.fr

Received 18th January 2016 , Accepted 22nd May 2016

First published on 23rd May 2016

Abstract

Learning chemistry includes learning the language of chemistry (names, formulae, symbols, and chemical equations) which has to be done in connection with the other areas of chemical knowledge. In this study we investigate how French students understand and use names (of chemical species and common mixtures) and chemical formulae. We set a paper and pencil test composed of open-ended and multiple choice questions (5 questions in total) to students (N = 603) who have been learning chemistry for 2 years (age 14) and others for up to 7 years (age 19, first year university). For all grade levels we found that the students have great difficulties understanding notions introduced right from the first two years of chemistry teaching. The scientific name opposed to a common name does not seem to be a relevant tool used by the students to classify chemical species and mixtures. They struggle to decode a chemical formula out of the context of a chemical equation and fail to decode them in that context. The students surveyed are not able to correctly associate with a name or a formula, both macroscopic (a pure substance or a mixture) and microscopic (an atom or a molecule) criteria. They seem to have mainly a microscopic reading of the names and the chemical formulae. Therefore we think that the language of chemistry could be a source of trouble for the learning of the notion of substance. These results confirm the need to offer teachers new didactical tools to develop the teaching of the language of chemistry.

Introduction: the working-out of a language

In this brief introduction, we review the historical basis of the modern language of chemistry and some commonalities and specificities of this language compared with an ordinary one.

According to Laszlo (1993), before Lavoisier, chemistry and language did not match. Lavoisier thought that language was important not only to name objects of science but also to make them intelligible to people who were not alchemists. Following Condillac's principles he argued that a science cannot exist without a language.

“It is impossible to separate the nomenclature from science and science from the nomenclature because any physical science is necessarily composed of three things: the series of facts that constitute science; the ideas that remind of them; the words that express them. The word should give birth to the idea; the idea should depict the fact: these are three impressions of the same seal; and because words do keep the ideas and convey them, the result is that language cannot be improved without improving science, neither can science without language” (Lavoisier, 1801).

Dagognet (2002, p. 7) regarded Lavoisier as being at the origin of “the semantic and classifying chemical revolution”. In the nineteenth century empirical chemical knowledge and chemical theories developed together with the working-out of the language of chemistry which would support these new theories.

The language of chemistry that conveys the new theories is an essential element of the developing process of chemistry in the nineteenth century. Indeed three moments which link empirical knowledge, theories and language arose:

• The working-out of a nomenclature by Lavoisier relies on the concept of indecomposable substance and on laboratory experiments that decompose and synthesize substances. Water no longer has the status of an element. This nomenclature is worked-out against the phlogiston theory and opens up the way for chemistry to become a science.

• The working-out of the symbols and the arithmetic by Berzelius is linked with the atomic theory and the fixed combinations of atoms. “Since Berzelius, chemists have been able to translate the laboratory experiments into chemical equations written by means of formulae” (Laugier, 1998).

• The working-out of structural formulae due to the dramatic development of organic chemistry is linked with the concept of molecules and of atom arrangement in a molecule. For chemists such as Laurent, Couper or Butlerov the chemical structure becomes the foundation of the chemical properties of compounds. “The chemical nature of a composed molecule depends on the nature and quantity of its elementary constituents and on its chemical structure” (Kluge and Larder, 1971, p. 290). According to Dumon and Luft (2008) “these planar molecular structures, which can be reproduced on a sheet of paper or on the blackboard, were powerful auxiliaries for chemists attempting to interpret chemical changes”.

The development of a nomenclature, a specialized ordinary language (Jacob, 2001), is a necessity for the existence of a chemical science (Lavoisier, 1801). The use of this language is “essential for effective chemical communication” and “a pre-condition for the formation of general chemical theories” (Jacob, 2001).

The symbolic language is more than a representation tool. The representation of the composition of each substance by means of letters and figures “enables the formal handling of symbols independently of their empirical signifieds as far as the chemist follows some general rules” (Laszlo, 1993). Klein (2001) argued that this representational system allowed some significant advances between 1820 and 1850. She showed that Dumas proposed the concept of substitution by using the Berzelius system of representation of chemical substances. She regarded the chemical formulae as real “paper tools” (Klein, 2001).

“The signs denote the ideas and there is an algebra which is both harmonious and necessary giving them their internal binding. Science is nothing more than the quest for this algebra, for this combinatory which renders words able to express things” (Laszlo, 1993). Chemists worked out what is regarded by some people as a real language including an alphabet, a syntax and semantic rules (Jacob, 2001). Chemical symbols constitute the alphabet. Words are built from the associations of symbols following “orthographic” rules and are the chemical formulae. The sentences resulting from the association of words according to grammatical rules are the chemical equations. For example, valency, and electronegativity are orthographic rules whereas the conservation of elements and charges are grammatical rules, these rules are named syntactical rules (Jacob, 2001). Chemical formulae can be segmented “as if they were terms” (Laszlo, 1993). In a chemical formula such as (CH₃)₂C₃H₅OH or in its name 3-methylbutan-1-ol we can recognize the roots, the prefixes, and the suffixes as a grammarian would do with words. In the water formula H₂O and in sodium hydroxide NaOH, the O symbol stands for the oxygen atom and has the same meaning in both formulae whether it is combined with sodium or two hydrogen atoms. “This transferability means that the handling of chemical formulae is a language very close to ordinary language” (Laszlo, 1993). In these examples, the oxygen atom does not vary but is not exactly identical, its electronic organisation being different. “The language of chemistry is more than a code, because it includes semantic shifts as a language does” (Laszlo, 1993).

Can the symbolic language of chemistry be considered as an ordinary language? Weininger (1998) thought that it is “plausible to approach the language of chemistry just as we would any other language”. But, unlike an ordinary language which seems to be built from conventions and more or less rational conceptions, the chemical symbolic system relies on empirical data. Contrary to a classical alphabet, the symbols are not without any meaning and not limited in number. Their number has not stopped increasing since Berzelius because new elements have been discovered. The formula (or the name in an ordinary language) is used to name chemical species but not only these. It enables us to know its composition and sometimes its structure. Jacob (2001) illustrated this point by comparing the formula NaCl to the word “screwdriver”. We know that the former is salt and composed of chlorine and sodium and that the latter means a tool the function of which the word structure indicates by chance but we have no idea of its composition (a handle and a blade). Moreover, “the ability to evoke a fictional world indistinguishable from the real one is another characteristic that the chemical code shares with the natural languages” (Weininger, 1998).

Finally, one could think that a symbolic language is an effective communication tool and thus that a single signifier† is associated with each signified and inversely. But, as for words in an ordinary language, the signified of a symbol or a formula may depend on the context. What does C stand for? For the chemist, it represents carbon, but is it the chemical element, the atom or the chemical species? In the context of a chemical equation such as C_(s) + O_2(g) = CO_2(g), the chemist is able to interpret both C letters and to say they have not the same meaning.

The language of chemistry and the concept of chemical species developed in the 19th century. The chemical formulae and the structural formulae which appeared with microscopic entities (atoms and molecules) became the representation tool of the chemical species. At the beginning of the 19th century the chemical species was linked to the notion of simple substance (Lavoisier) and at the end of the century it was seen as composed of a single microscopic entity. “Chemist's expectation to denote unambiguously a one-to-one correspondence of word with thing (name with substance) induced Kekule to mobilize hundreds of colleagues from all Europe to the first international conference on science held in 1860 at Karlsruhe. The goal was to give each substance a name, and compounds were to be given names synthesized from the names of their parts” (Chamizo, 2013, pp. 1620–1621). The notion of pure substance evolved with the development of chemistry in the 20th century: allotropes, non-stoichiometric compounds, chemistry of solutions…… “Pure” water includes both water molecules H₂O, and hydroxide and hydronium ions. How is it then possible to define a pure substance? Schummer (1998) argued that the notion of pure substance is an artificial but necessary notion. Van Brakel (2014) differentiates pure substance and chemical species: “a material, phase, or substance has a (macroscopic) chemical composition and a (molecular) species composition” (Van Brakel, 2014, p. 21). In the rest of the paper we will not make the distinction between pure substance and chemical species in the sense proposed by Van Brakel (2014).

Like every language, the language of chemistry has to be learnt and taught. The specialized (ordinary) language (Jacob, 2001) is a necessity for the teacher to communicate with the students. The learning of the symbolic language and that of the specialized language are interwoven. In the following section, we argue that learning the language of chemistry is quite precise.

Learning the language of chemistry is specific

Johnstone (2006) was one of the first educators to pay much attention to the difficulties, particularly cognitive overload, that students faced when listening to a teacher frequently moving between the signs of the language of chemistry, the experimental facts and their interpretation in terms of microscopic entities. Drawing on Johnstone's triangle (Johnstone, 1993), Kermen and Méheut (2009) set out a tetrahedron the apexes of which represent four interacting levels of knowledge: two model levels (microscopic and macroscopic), the empirical level and the symbolic level. Models are expressed with ordinary language (sentences in French or English) and specialized language and represented by specific signs (symbols or equations) and experiments are also described with ordinary and specialized languages and specific signs (drawings…) (Fig. 1).


	Fig. 1 Symbolic level interaction with empirical, microscopic models and macroscopic models levels. (Kermen and Méheut, 2009).

Kermen and Méheut did not focus on the role of the symbolic level but others did such as Talanquer (2011) who considered three chemistry knowledge types about experiences, models and visualisations. According to him (Talanquer, 2011), visualisations include all the signs that “facilitate qualitative and quantitative thinking and communication about both experiences and models in chemistry”. These signs are chemical symbols, iconic symbols, mathematical symbols, graphs and so on.

Taking another stance, Taber (2013) decided not to consider the symbolic level as a discrete level of chemical knowledge in the same way as the macroscopic and the microscopic levels. The chemist language‡, particularly about chemical reaction, is a meta-level between the macroscopic one (the substances) and the microscopic one (the particles). The chemist uses the same symbols in both levels. From a single writing the chemist can work in both levels and move from one to the other. According to Taber (2013) the chemical equation which is entirely symbolic can be read and used by the expert both at the experimental level and at the macroscopic and microscopic concept levels (Fig. 2). This can be seen as a lack of precision by an observer but is actually a richness for the chemist and can become a difficulty for a learner.


	Fig. 2 Symbolic representations bridging two levels of concepts (the macroscopic and the microscopic ones) extract from Taber (2013).

Like Taber (2013) we think that language is a knowledge domain not to be set on the same plane as the empirical level and the model levels. Empirical results and models are the core of the activity of experimental sciences. Models interpret and predict the achieved experiments. The language in chemistry is at the interface of models and experiments. “The heuristic function of signs consists to enable constant going to and fro between signs and experimentation” (Edeline, 2009).

Learning the language of chemistry is possible if one knows syntactic rules stemming from models and one correctly uses empirical knowledge, that is to say semantic rules. The signifier is built from laws stemming from models. The signified relies upon empirical knowledge. Laszlo (1993) claimed that the mastery of symbol definitions and of words composed of symbols following strict rules can be done without thinking of their empirical signifieds and that it is the real scientific rigour.

Learning chemistry without the language of chemistry is impossible but the language of chemistry can only be learned from theoretical and empirical knowledge of chemistry. “From a psychological viewpoint, the development of concepts and of words meaning is a single process” (Vygotski, 1997, p. 297). Chemistry learning requires multiple going to and fro between the symbolic level, the empirical level and the model level. A teacher should be able to help the students to work with the four knowledge levels (empirical, macroscopic model, microscopic model and representations) and to make links between them. “A teacher using a sign without any explanation can refer both to macroscopic concepts and microscopic ones whereas a student may only see a letter or a figure without any link with the observed chemical phenomenon” (Dehon and Snauwaert, 2015). A student may become rather adept at handling a chemical equation but without really understanding the underlying concepts and associated models (Talanquer, 2011). Thus the learner cannot attempt by himself to link the symbolic level and other levels and the teacher’s role is of utmost importance.

Already known students’ difficulties

Chemists can move from one level to another because they master the entire body of knowledge. But the student who does not master this knowledge encounters numerous difficulties in correctly using the different levels and moving from one to the other. Taskin and Bernholt (2014) reviewed research conducted in the last 30 years that dealt with students’ understanding of chemical formulae. They noticed that the students had conceptual and syntactic difficulties using symbols. Chemical symbols may be understood as a simple abbreviation of the substance name (Al-Kunifed et al., 1993). The students struggle to decode the different types of representations (molecular formulae and submicroscopic diagrams) or to interpret the underlying model of the used representation, for example SiO₂ is equivalent to Si₂O₄ (Keig and Rubba, 1993). Looking at molecular formulae, the students have an additive conception of compounds rather than an interactive one (Ben-Zvi et al., 1987; Roletto and Piacenza, 1994), for instance CH₂O is seen as a carbon linked to water. Using molecular formulae in chemical equations sometimes comes down to a simple arithmetic handling of symbols and to confusing subscripts and coefficients (Nakhleh and Mitchell, 1993; Sanger, 2005). Studies in France and Tunisia (which were not reported in the Taskin and Bernholt's review) show that the students have difficulties applying rules and conventions about chemical formulae in chemical reactions (Carretto and Viovy, 1994; Fillon, 1997; Mzoughi-Khadhraoui and Dumon, 2012). Fillon (1997) reported that the students (in grade 9) write a chemical equation either with names or formulae being guided by the search of a permutation symmetry effect. This leads them to write CuO + C→ CO + Cu instead of 2CuO + C→ CO₂ + 2Cu. The students struggle to interpret a chemical equation from a microscopic viewpoint, they look for alternative solutions because they do not master the symbolic level (Laugier and Dumon, 2004). Le Maréchal and Cross (2010) considered that the students’ difficulties arise from managing the rules of the knowledge level, in moving from one level to another or in choosing the level(s) that enables/enable them to efficiently solve the problem under scrutiny.

Objectives of the study and research questions

Our first goal is to examine the way the nomenclature and the chemical formulae are introduced in chemistry teaching in France. Is it done in relation to the level of models and the empirical level? A symbol is linked to both microscopic and macroscopic levels and may both represent the chemical element, the atom or the molecule and the elementary substance§. Is the problem of the polysemy of names and formulae being taught? We examined the curriculum and the textbooks of the first three years of chemistry teaching to list what instructional guidelines the teachers should follow. We also made a very brief literature review to determine what proposals the teachers make to cope with these difficulties.

In the following sections we set out the results of these preliminary investigations which will then enable us to specify our research questions.

The curriculum of the first three years of chemistry teaching in France

In this curriculum, the word “nomenclature” is not present. Regarding scientific names of chemical species, no knowledge and no skills are required. Some names such as dioxygen¶ or carbon dioxide are quoted in the curriculum from the first year of chemistry teaching and thus before the concept of molecule as a combination of atoms is introduced. Although in the ordinary language the term “oxygen” is used to speak of the component of air, the teacher may use the word “dioxygen” without giving any specific explanation. The names of chemical species were worked out in relation to models but this is never made explicit in this curriculum. Its purpose is just to clarify the notions of pure substance and mixture. The curriculum does not specify any guidelines for how to teach nomenclature, yet the curriculum also specifies some chemical names.

In contrast, the introduction of chemical formulae is proposed in relation to the model of the molecule as a combination of atoms, but the formula is only set out as a communication and representation tool. The language of chemistry is both microscopic and macroscopic, this double aspect constitutes its interpretative difficulty but also its richness. This richness is a real tool to work in chemistry (Klein, 2001) but it is never evoked in the French curriculum. The chemical formula is introduced in such a way that it seems to only represent the microscopic level. And the polysemy of names and formulae is totally ignored.

The textbooks of the first three years of chemistry teaching in France

In line with the curricula, the textbooks do not propose any activity around the scientific names of chemical species. They use ordinary names as well as scientific names without ever making any distinction or any link between them. The exercises that are proposed about chemical formulae basically revolve around atoms counting and writing rules (type of the symbols to be used, position of subscripts). The beginning student should understand on his/her own that the scientific names and the chemical formulae may be used to represent both the empirical level and the microscopic and macroscopic model levels. No exercise mentions the polysemic aspect of names, symbols or formulae.

Common names like water can represent the elementary substance as well as the mixture (think of tap water or mineral water which are often simply called “water”)|| whereas some names like coal only stand for mixtures. Scientific names like carbon dioxide are used at the macroscopic level to name the chemical species but also at the microscopic level to name the molecule. The name “dioxygène” that we translated into dioxygen (see footnote‡) represents both the molecule and the chemical species. The symbols are polysemic too. For instance, the letter “C” represents the carbon atom, the chemical species and the chemical element. Some textbooks even speak of the carbon “molecule”. The formulae are introduced on a microscopic level, and as soon as it is done, the textbooks carelessly use them as standing for the molecule and for the chemical species without stressing the difference.

Finally no mention is made of how scientists developed the language of chemistry in the history of science and of the early representations which have been discarded in favour of present names and symbols. Reading the curricula and the textbooks, we might think that learning the language of chemistry is obvious and does not raise any specific difficulty. Furthermore, this seems to reduce the importance of learning the language of chemistry compared with learning the empirical level and the model levels.

What the teachers do

According to Duval (1993), the teachers underestimate the students’ difficulties when they use different semiotic registers in mathematics. Given some similarities between mathematics and chemistry which deal with inaccessible concepts (numbers) or objects (molecules) and represent them with various semiotic registers, the same problem exists in chemistry teaching. The teachers do not seem to direct strategies enabling the students to work on the distinction between the object and the sign that stands for it, the change in semiotic registers or the choice of the appropriate register to achieve the task at stake (Le Maréchal and Cross, 2010). The chemical equation which is the core of the French curriculum is a polysemic concept which integrates the meaning of the signs, the identification of chemical species and the ability to move from the microscopic to the macroscopic level or conversely. The teacher moves easily from one level to another but this remains invisible for a student (Barlet and Plouin, 1994). The teaching of chemistry language is not taken into account in some other countries (Mzoughi-Khadhraoui and Dumon, 2012; Kaya and Erduran, 2013). As far as we know the chemistry teacher training in France does not tackle the teaching-learning of semiotic problems.

Research questions

The language of chemistry poses real difficulties to the students. Relying on the overview of the curriculum, of the textbooks and previous research, we think that these difficulties are currently ignored in chemistry teaching in France. The first years of chemistry teaching obey the following pattern: the experiments are introduced first, then the model (atoms and molecules) and finally the language of chemistry to represent the microscopic level. However the history of science shows that language, empirical knowledge and models developed jointly.

We hypothesize that teaching how the language of chemistry is built and developed should enable to favour learning the empirical level and the model levels. A student should be able to associate a scientific name to the notion of elementary substance whereas he/she should know that a usual name corresponds to a mixture. Introducing chemical formulae should allow teachers and students to work on the notion of a molecule as a fixed combination of atoms and on the notion of a chemical reaction as a rearrangement of atoms between reactants and products.

Thus, our research questions are as follows:

Which criterion do the students use to classify names from a list of usual and scientific names?

What characteristics of the macroscopic level (pure substance or mixture) and of the microscopic level (atom or molecule) do the students associate with a name and/or to a formula? We will examine whether the students’ answers for the names and the formulae are different.

Are they able to handle the concept of molecule as a fixed combination of atoms the number of which is constant, out of context and in the context of the use of a chemical reaction (this being the main use of chemical formulae at the beginning of chemistry study)?

Our final goal, which will not be developed in this paper, is to propose to teachers teaching-learning sequences aiming to work on the scientific names of chemical species and the development of chemical formulae. To elaborate these didactic tools, we first make an inventory of some students’ difficulties about these two points in France (Duit et al., 2012).

Methodology

Elaboration of the questionnaire

To answer the research questions, we developed a paper and pencil questionnaire (Appendix 1) containing 5 questions. To test the relevance of the questions and finalize their development, we conducted exploratory interviews with ten students (fourth year of chemistry study) in two different schools. To allow more fruitful exchanges, they were interviewed in pairs. All the questions involve concepts and notions introduced in the first and second year of chemistry teaching in France.

Question 1: the distinction between scientific and common names

To answer the first research question, the students are asked to classify names from a list of common and scientific names. The common names exclusively correspond to mixtures: air, coal, oil, and milk. Some of the scientific names are well-known to the students: carbon dioxide and methane. The others are less well-known but the syntactic construction of the name emphasizes its scientific nature (for example hydrogen peroxide, cyclohexane). All the names in the list can be found in the chemistry textbooks starting with the first year.

In the exploratory interviews conducted with four students (grade 10), three of them spontaneously proposed a classification based on the states of matter and the fourth made a distinction between chemical names and common names which allowed her to recognize pure substance and mixture. This shows that the task was within the reach of the students.

Questions 2 and 3: criteria associated with the names and formulae

The exploratory interviews showed that the students were not able to answer an open ended question such as “what does the letter C mean to you?”. In question 1 we tested which criteria the students could spontaneously propose regarding the names of chemicals. We chose to develop multiple choice questions in question 2 (and 3) to explicitly check whether they could associate given criteria to names (and formulae).

We prepared two series of names (dioxygen, water, cyclohexane, methane, carbon, carbon dioxide) and formulae (O₂, H₂O, C, CO₂, C₂H₆O, Fe). For each name and formula four criteria (to be chosen) are proposed: pure substance, mixture, atom and molecule. We refer to “pure substance and mixture” as macroscopic criteria (Macro) and to “atom and molecule” as microscopic criteria (micro). For each proposal the students can tick one or more of the four criteria.

In the list of names, the carbon only corresponds to the answer “atom” so the correct answer is “pure substance and atom”. The correct answers for the other names are “pure substance and molecule”. Water may have an ambiguous status being both the name used by chemists and the common name. In the list of formulae, the correct answer for carbon and iron is “pure substance and atom” and for the others is “pure substance and molecule”. The lists also include names or formulae probably less well-known to the first year students of chemistry study (methane, cyclohexane, C₂H₆O, Fe). Finally, the answers regarding the chemical species which are in both lists (dioxygen and O₂, water and H₂O, carbon and C, carbon dioxide and CO₂) should give us elements of comparison between the students’ understanding of names and formulae.

Questions 4 and 5: decoding a chemical formula

To develop these two questions that will enable us to answer the third research question, we rely on research studies quoted by Taskin and Bernholt (2014). We propose three pairs of chemical formulae: H₂O and H₂O₂, O and O₂, CH₄ and C₂H₈. For each pair the students are asked whether the chemical formulae represent the same molecule. The first pair should allow us to verify the students’ ability to accurately count the number of atoms in a formula. The following pairs should allow us to verify their mastery of the concept of molecule as a fixed combination of atoms. Do the students make out the difference between the atom and the molecule in elementary substances (O and O₂)? Do they consider two identical formulae, the number of atoms of which is proportional? For this question we rely on Keig and Rubba’s research (Keig and Rubba (1993)).

Then, we propose two pairs of chemical equations:

– One for the decomposition of water into dihydrogen and dioxygen

2H₂O → 2H₂ + O₂ and H₂O → H₂ + O

– The second for the combustion of methane which produces carbon dioxide and water

CH₄ + 2O₂→ CO₂ + H₄O₂ and CH₄ + 2O₂→ CO₂ + 2H₂O.

Strictly speaking the law of conservation of elements is verified, these equations seem correct but in each pair, one of the equations includes an incorrect chemical formula (O or H₄O₂). In each question it is clearly specified that there is formation of dioxygen (in the first case) and formation of water (in the second case). We ask the students whether each equation is valid. We want to check whether the students who correctly answer the question on the pairs of formulae are still correct when the context changes. Not to add further difficulties, these equations include only simple molecules and stoichiometric numbers. Students in the last two years of high school and university students could even think they are trivial.

For all these questions, the students can explain their choices.

Data collection and sample

The questionnaire was administered to students who have been learning chemistry for two years (age 14) and others for up to seven years (age 19, first year university) in different parts of France. All the students surveyed already have the knowledge required to answer the questionnaire. Teachers were asked to administrate the paper and pencil test to their students in an ordinary teaching session. We collected 603 answers from December 2014 to June 2015. The answers are anonymous, but the students had to mention their grade and school. All students participated in the research as volunteers.

We grouped the answers into four categories corresponding to the students’ orientation changes in the French school system:

• Group 1: age 14–15: this group corresponds to the second or third year of chemistry study. We collected 233 answers in 9 classes coming from 6 secondary schools.

• Group 2: age 16: it is the fourth year of chemistry study and the first year of high school (not vocational high school) for the 16 years old students. We collected 178 answers in seven classes coming from 5 high schools.

• Group 3: age 17–18: this group corresponds to the fifth or sixth year of chemistry study in high schools. These students study science in depth. We collected 147 responses in six classes coming from 6 high schools.

• Group 4: university: this group corresponds to the seventh year of chemistry study and the first year university. These students, aged 19, study general chemistry for a whole year. We collected 45 answers from students belonging to the same university. We will not systematically analyse the results of this group because it is statistically less reliable. At the beginning of our research, we did not plan to investigate chemistry knowledge from the second year of chemistry study up to the university level. But after we got the first results, we decided to add this group.

Analysis methods

Analysis of the answers to question 1: the distinction between scientific and common names. In this question students have to give a classification of chemical species or mixtures from a list of names. We want to verify whether the students use scientific names to make a classification. We first start the analysis by dividing the students’ answers in two categories:

• The first category is called “Name” category. The students classify the names splitting scientific terms and common terms. To belong to this category, it is necessary for the student to group on one side: oil, milk, and air, and on the other side: carbon, cyclohexane, hydrogen peroxide, carbon dioxide and methane.

Example (group 2: age 16):

“Assembly of molecules: air, coal, oil, milk;

Molecules: cyclohexane, hydrogen peroxide, carbon dioxide, methane;

Atom: carbon”

• The second category is called “No Name”. The students do not rely on scientific names and propose classifications that indifferently mix the scientific and common names.

Of course there is a third category, the students who do not answer.

All chemical species may not be present in the classifications proposed by the students. Cyclohexane and hydrogen peroxide correspond to two chemical species which are little known of the students, although mentioned in textbooks from the first year of chemistry study. The presence of one of these two names in the group of scientific names made by a student determines our choice to put the answer in the “name” category. Likewise, the presence of one of these names in a group of common names made by a student determines our choice to categorise the answer in the “no name” category.

We then analysed the criteria proposed by the students in their classifications. Four subcategories emerged:

• The students classify the names according to two criteria which are pure substance and mixture. We call this sub-category “pure/mix”. In this subcategory, we also analysed the accuracy of the answers.

• The students group the names according to the state of matter: gas, liquid, or solid. We call this subcategory “g/l/s”.

• The students write on one side the names they think correspond to the microscopic level and on the other side the other names. We called this subcategory “Macro/micro”.

Example of answer (group 1: age 14–15):

“Substances that can be touched: coal, oil, milk

Substances that cannot be touched: air, carbon, cyclohexane peroxide, dioxide, methane”

When the students propose the term molecule or atom in their classification, we consider that this answer belongs to the subcategory “Macro/micro” even if terms belonging to the previous two subcategories appear (“pure/mix” or “g/l/s”). For us the “Macro/micro” criterion prevails over the others.

Example (group 3: age 17–18):

“Mixture: milk, air, oil, coal

Molecules: cyclohexane, hydrogen peroxide, methane, carbon dioxide

Atom: carbon”

Example (group 1: age 14–15):

“Gas: air, peroxide, dioxide, methane

Liquid: petroleum, milk

Molecule: carbon, cyclohexane, coal”.

• The students also group the names according to the following criteria toxic/non-toxic, chemical substance/non-chemical substance, pollutant/non-pollutant. We put these answers in a subcategory “natural/not natural”.

Example (group 1: age 14–15):

“Toxic: cyclohexane, hydrogen peroxide, oil, coal

Natural: air, carbon, carbon dioxide, milk, methane”

Finally when the criteria used by the students to make groups of names could not be found we put the answers in the “others” subcategory and when they did not name the groups they made we put the answers in the “no criterion” category. To classify the students’ answers, we never considered the accuracy of their proposals, except in the “pure/mix” subcategory. If a liquid is classified as a solid, this does not prevent us from counting the answer in the “g/l/s” subcategory.

Analysis of the answers to questions 2 and 3: association of macroscopic and microscopic criteria with names and formulae. For each group, we identified the percentages of correct answers by chemical species for the names and formulae for all four criteria: pure substance, mixture, atom and molecule. We then restricted the analysis to chemical species which appear both in the names and formulae. These species are the most frequently encountered by the students in the chemistry study: carbon, oxygen, carbon dioxide and water. We then mentioned, for all groups, the percentages of correct answers for the macroscopic criteria: pure substance (“pure”) and mixture (“mix”), which we call “Macro”, and for the microscopic criteria: molecules (“m”) and atoms (“a”), which we call “micro”. We also looked at the answers including no macroscopic criteria ticked by the students or no microscopic criteria. Moreover we also examined the evolution of answers according to the grade level. For this, in each group we looked for the number of students able to give more than 50% correct answers for the names and then for the formulae. Then using a Fisher's exact test, we compared the results in group 1 and group 3. This test is similar to a chi-squared test regardless of the size of the sample. Finally, we compared the answers obtained for the same chemical species from the name and from the formula for both macroscopic and microscopic criteria and by the grade level.

Analysis of the answers to questions 4 and 5: decoding empirical chemical formula linked with the concept of molecule. For each group of students, we collected the percentages of correct answers for each pair of formulae the students had to compare. We then identified the percentage of students answering that the equation containing the formula “O” or “H₄O₂” is wrong. We looked at the students’ justifications and retained all those indicating that O does not represent oxygen and H₄O₂ does not represent water. Finally, we restricted the study to the students who answer correctly for the three pairs of formulae and analysed their answers for the equations.

Analysis of results

Question 1

Students taking into account the “scientific” names. Question 1 is in Appendix 1. The students had to group names from the list into different categories. We divided the students’ answers into two categories: those who made a difference between scientific names and common names to propose classifications from the list (“name” category) and those whose classifications did not rely on the name (“no name” category).

Only a quarter (26%) of the students (Table 1) used the scientific names to provide classifications. Half the students (49%) in group 1 (Table 1) did not answer this question and nor did about a quarter in the other groups. The higher the grade, the higher the number of students’ answers. The opposition between scientific and common names is used much more when the students have been learning chemistry much longer. But it is only in the last group that this prevails.

Table 1 Distribution of the students' answers between the three categories

Students' group and age	Categories
Students' group and age	No answer (%)	No name (%)	Name (%)
Group 1: age 14–15 (N = 233)	49	33	18
Group 2: age 16 (N = 178)	29	45	26
Group 3: age 17–18 (N = 147)	25	42	33
Group 4: university (N = 45)	20	38	42
All answers	35	39	26

Criteria proposed by the students. We grouped the students’ classification proposals into four subcategories (Chart 1).


	Chart 1 Distribution of criteria proposed by the students (N = 603).

The two most often used criteria are:

– names associated with a microscopic level and a macroscopic level (24%)

– classification as gas, liquid, and solid (17%)

The “pure substance/mixture” criterion is not widely used (6%) nor is the “natural/not natural” criterion (5%). If we analyse the answers of the “pure substance/mixture” category, half of them are incorrect. 16 students out of 34 write that the common names (milk, coal, oil) correspond to pure substance and/or that the scientific names (hydrogen peroxide, cyclohexane …) are associated with mixtures. The distribution of sub-criteria in the two categories of answers is displayed in Chart 2.


	Chart 2 Distribution of criteria proposed by the students (N = 603) in each category: “Name” or “no Name”.

In the “Name” category, the opposition between microscopic and macroscopic is the first criterion proposed (“Macro/micro” = 17%). All the students classify the “scientific” names in the microscopic level and the usual names in the macroscopic level.

Example group 3:

“Products: coal, oil, milk, air

Molecules: cyclohexane, hydrogen peroxide, carbon dioxide, methane

Other: carbon”

In the “No Name” category, the state of matter is the first criterion (“g/l/s”: 17%) proposed. When the students use this criterion, the common names and the scientific names are classified together. For example, the common name “air” and the scientific name “methane” correspond to gases. So when the students choose to classify names according to the state of matter (“g/l/s”), they put “air” and “methane” together. Therefore the criterion “g/l/s” does not appear in the “name” category.

Finally, only 8 out of 603 students (1%) explain clearly that they rely on common names and scientific names to classify the list of names. These eight students belong to the higher grade levels (the latter two groups).

Two examples:

group 4: “Common name: air, milk, coal, oil - chemical term: carbon, cyclohexane, hydrogen peroxide, carbon dioxide, methane.”

group 3: “Common language: air, coal, oil, milk - Scientific language: carbon, cyclohexane, hydrogen peroxide, carbon dioxide, methane”

Comments. The large number of “no answer” indicates that the students are not used to this type of question. In the curricula, nothing is proposed in connection with the scientific names. We did not find any exercises in textbooks offering activities using scientific names and common names. However, classifying chemical species into categories is part of the work of the chemist (acid/base, metals/non-metals, gas/solid/liquid, organic/mineral…) (Schummer, 1998). The difference in nature between the scientific name and the common name does not seem to help the students to classify a list of names. Only a quarter of them actually mention this difference. Few students use the “pure substance and mixture” criterion (fewer than 6%) and, when they do, half of them make a mistake. They associate the scientific names with mixtures. The microscopic/macroscopic opposition is the first criterion used by the students. Scientific names are then associated with the microscopic level (atom or molecule). So the students do not seem to be able to use the scientific names and the common names to correctly distinguish chemical species from mixtures, which are concepts associated with the macroscopic level. Although the notion of pure substance is introduced in the first year of chemistry teaching in France, the students are not able to use this concept.

Questions 2 and 3

Correct answers. First from a list of names and then from a list of chemical formulae, the students must choose one criterion or more among the following four: pure substance (ps), mixture (mix), atom (a) and molecule (m) (Appendix 1). More than 98% of the students answered these two questions.

The percentages of correct answers are in Table 2 for the names and in Table 3 for the formulae.

Table 2 Distribution of correct answers for each name and by grade levels

Students' group and age	Correct answers (%)
Students' group and age	Carbon (ps and a)	Water (ps and m)	Dioxygen (ps and m)	Carbon dioxide (ps and m)	Methane (ps and m)	Cyclohexane (ps and m)
Group 1: age 14–15 (N = 233)	9	8	11	4	3	3
Group 2: age 16 (N = 178)	21	19	8	3	2	2
Group 3: age 17–18 (N = 147)	22	16	14	5	7	4
Group 4: university (N = 45)	40	18	33	7	11	9

Table 3 Distribution of correct answers for each formula and by grade levels

Students' group and age	Correct answers (%)
Students' group and age	C (ps and a)	H₂O (ps and m)	O₂ (ps and m)	CO₂ (ps and m)	Fe (ps and a)	C₂H₆O (ps and m)
Group 1: age 14–15 (N = 233)	13	6	15	5	7	2
Group 2: age 16 (N = 178)	29	8	15	3	25	2
Group 3: age 17–18 (N = 147)	22	12	16	6	22	4
Group 4: university (N = 45)	49	16	33	4	36	7

The percentages are very low: few students are able to choose correctly among the four criteria: pure substance, mixture, atom and molecule (Tables 2 and 3). This is the case even for the most common chemical species such as dioxygen, carbon and carbon dioxide. We wanted to check whether the students answer in a similar way on microscopic criteria and on macroscopic criteria for certain names or certain formulae. Using R software we drew a similarity tree according to Lerman (2008) with the correct answers to the names (Chart 3) and with the correct answers to the formulae (Appendix 2 Chart 9) taking all levels into account.


	Chart 3 Similarity tree for all correct answers (N = 603) on macroscopic criteria (pure substance or mixture) and on microscopic criteria (atom or molecule) to the name question.

Looking at Chart 3, the height (the value at which a horizontal line is drawn between two vertical lines) is the lowest for the following variables: “microDioxygen” and “microCarbondioxide”. The more the similarity is important, the more the height is low. This means that the group of students who answered correctly that dioxygen is a molecule and that the group of students who answered carbon dioxide is a molecule are greatly similar. It is the same conclusion for cyclohexane and methane, but the height is a little higher. The height is the highest for the variable “macroWater” and all other variables are on macro criteria for the other names. Thus the group of students who answered that water is a pure substance and the group of students who gave correct answers on macroscopic criteria for the other names are very little similar.

Globally we get two separate classes for the names question: a class grouping all the correct answers on macroscopic criteria and a class grouping all the correct answers on microscopic criteria (see Chart 3). We have the same kind of tree for the question on formulae; the correct answers on macroscopic criteria are separate from the correct answers on microscopic criteria (Appendix 2). This means that statistically the group of students who gave correct answers on macroscopic criteria is not the same group as those who answered correctly for microscopic criteria.

The results are so low for both the names and formulae (Tables 2 and 3) that we decided to study in depth the students' answers of the first three groups which are statistically reliable. Examining the similarity trees led us to analyse the detailed results between the names and formulae on microscopic criteria and on macroscopic criteria. Because we did not get any dissimilarity between the class of names or the class of well-known formulae (carbon, water, dioxygen, carbon dioxide or C, O₂, CO₂ and H₂O) and the class of names or the class of less-known formulae (cyclohexane and methane or Fe and C₂H₆O), we restricted the study to four chemical names: water, carbon, dioxygen, carbon dioxide and to four chemical formula (C, O₂, CO₂ and H₂O). They correspond to the chemical species that the students usually know.

We tried to determine why and how the students went wrong. Thus we wondered whether the students chose mainly a macroscopic or microscopic single criterion, whether their choice showed that they mastered one or both types of criterion (macroscopic or microscopic), whether they answered differently for names and for formulae.

Criteria chosen by the students. We analysed in detail the macroscopic or microscopic criteria chosen by the students for each chemical both for the names and formulae. We tried to see whether the students selected a macroscopic and a microscopic criterion (Macro and micro) or whether they ticked a single criterion: a macroscopic one (only Macro) or a microscopic one (only micro).

The results for dioxygen are put on Chart 4a for the name and Chart 4b for the formula.


	Chart 4 Criteria selected by the students for dioxygen (name and formula).

Less than a third of the students selected a macroscopic and a microscopic criterion (Macro and micro). There was no significant evolution with the grade level. A majority of students only chose a microscopic criterion (“only micro”). This choice increased with the grade level (63% in group 3 for the name or formula). In contrast the percentage of students only choosing a macroscopic criterion (“only Macro”) decreased with the grade level (10% for the name and 8% for the formula in group 3). We did not notice any difference in the choice of criteria between the name and formula. We got similar results for carbon and carbon dioxide (Appendix 3).

In the case of water (Chart 5) we noticed a significant difference with the name (Chart 5a) for groups 1 and 2. The students chose mainly a macroscopic single criterion (47% for group 1 and 40% for group 2). This difference was still visible in Chart 3 where the lack of similarity for the answer to the name for water with the other names could be noted. The “water” name is ambiguous. It is both a scientific and a common name. This could explain the difference we got only in this case.


	Chart 5 Criteria selected by the students for water (name and formula).

Correct answers for microscopic and macroscopic criteria. We looked in detail at the students’ answers for microscopic criteria and then for macroscopic criteria. The percentages of correct answers are reported in Charts 6 and 7.


	Chart 6 Percentage of correct answers (atom or molecule) for each name and each formula.


	Chart 7 Percentage of correct answers (pure substance) for each name and each formula.

Except for “water” in groups 1 and 2 (Charts 6a and 7a), the results were always better on microscopic criteria (Charts 6 and 7). The answers are similar for the four chemical species on the microscopic level (Chart 6). Conversely the correct answers on the macroscopic criteria (Chart 7) vary from simple to quadruple depending on the chemical species. For example (Chart 7a), 11% of the students in group 2 answered correctly for carbon dioxide, 28% for dioxygen, 31% for carbon and 53% for water. We tested the effect of the grade level on the percentage of correct answers using a Fisher's exact test. We applied the test among the students in group 1 (2nd and 3rd year chemistry study) and the students of group 3 (5th and 6th year chemistry study). We counted the students who ticked more than 50% of the correct macroscopic and microscopic criteria for all chemicals. We got the following results:

• with the name:

– concerning the macroscopic criteria, χ² = 0.267 and p-value = 0.605 > 0.05. The null hypothesis is true at the 5% level. So the correct answers of students and their grade level are independent.

– concerning the microscopic criteria, χ² = 54.1 and p-value = 1.87 × 10⁻¹³ < 0.05. Unlike the macroscopic criteria, the results for the microscopic criteria properly reflect the grade level.

• with the formula:

– concerning the macroscopic criteria, χ² = 3.25 and p-value = 0.0715 > 0.05. The p-value is lower than the one calculated with the ‘names’ question but we can still conclude that the students’ answers and the grade level are independent.

– concerning the microscopic criteria, χ² = 49.5 and p-value = 1.94 × 10⁻¹² < 0.05. The results are the same with the formulae and with the names.

So the results for the microscopic criteria are well connected to the grade level. But the number of years of chemistry study does not increase the number of correct answers for the macroscopic level with the names and with the formulae. The lowest results are obtained for “Macro” criteria with “carbon dioxide” and “CO₂” (Chart 7). We compared the percentages of students who chose the pure substance (“ps”) and mixture (“mix”) criteria for “carbon dioxide” and “dioxygen” and those who did not answer (Table 4).

Table 4 Comparison of the students' answers between “dioxygen” and “carbon dioxide” for macroscopic criteria

Students' group and age	Answers (%)
	Dioxygen			Carbon dioxide
	ps	Mix	No answer	ps	Mix	No answer
Group 1: age 14–15 (N = 233)	27	25	46	12	42	46
Group 2: age 16 (N = 178)	28	18	54	11	39	50
Group 3: age 17–18 (N = 147)	24	10	66	8	29	63

When the students gave an answer, the main selected criterion was “mixture” for carbon dioxide and “pure substance” for dioxygen whereas it should be pure substance for both. The name of the compound which can be segmented into two parts (oxide and carbon) may be the cause of this confusion. With the formulae, we noticed that the answers were wrong especially if the formula is composed of different atoms. In group 2, 10% of the students answered correctly for CO₂ and 17% for H₂O whereas 34% answered correctly for O₂ and 43% for C (Chart 7b). It might be due to misunderstandings between elementary substance and pure substance as often reported in research (Roletto and Piacenza, 1994; Solomonidou and Stavridou, 1994; Fillon, 1997; Sanger, 2000; Kind, 2004; Stains and Talanquer, 2007). But this does not explain the low rate of correct answers for “carbon” or “dioxygen” – fewer than 31% and 28% respectively (Chart 7a) – and for “C” and “O₂” – fewer than 43% and 34% respectively (Chart 7b).

Comparison of answers to macroscopic criteria between the questions on names and formulae. We compared the answers given on the macroscopic criteria to the names and formulae (water, carbon, dioxygen, carbon dioxide and H₂O, C, O₂, CO₂) which correspond to the same chemical species.

We got the same results with “CO₂” and “carbon dioxide” (Chart 8a). The most significant differences were observed with “H₂O” and “water” (three times more correct answers with the name for group 1 and group 2) (Chart 8d). For “C/carbon” and “O₂/dioxygen” pairs, we obtained slightly better answers with formulae (Chart 8c and d). We examined whether this difference was significant by applying a McNemar’s test. This is a statistical test used on paired nominal data. It is based on the counting of subjects who change their answers between two sets of measures. This test allows us to conclude that:


	Chart 8 Comparison of the correct answers (pure substance) between name and formula.

• The results are statistically better in the case of water with the name regardless of the grade level and in the case of carbon with the formula in groups 1 and 2 (the p-values are lower than 0.05).

• The results are statistically identical between formulae and names in the case of dioxygen and carbon dioxide regardless of the grade level (the p-values are higher than 0.05).

According to the results above, we believe that:

• There is no difference between formulae and names in the case of dioxygen and carbon dioxide because the syntax of the two names (in French) gives indications on the chemical formula.

• In the case of carbon and water, the name does not give any information on the chemical formula.

• The name “water” is above all a common term for students beginning to study chemistry. This may explain why the results are different from those obtained for the other chemical species.

Comparison of answers between the names and formulae for microscopic criteria. We compared the answers given on the microscopic criteria to the names and formulae (water, carbon, dioxygen, carbon dioxide and H₂O, C, O₂, CO₂) which correspond to the same chemical species (Appendix 4). Whatever the grade level and the chemical species, the students gave more correct answers from formulae for the atom and molecule criteria. But this difference was statistically significant only for water and carbon in group 1 (McNemar’s test). For students beginning to study chemistry, we found a significant difference in the results between formula and name for C/carbon and for H₂O/water. In these two examples, the name does not give any indications of the formula. This difference does not exist for CO₂/carbon dioxide and for O₂/dioxygen where the name conveys information about the composition.

Comments. More than three-quarters of high school students were wrong when they selected criteria among the four proposed (pure substance, mixture, molecules and atoms), whether for the name or the formula. We observed this even for common chemical species like water, carbon, oxygen and carbon dioxide.

The low percentage of totally correct answers may be explained as follows.

• The groups of students who gave the correct answer for macroscopic criteria were not similar to those who gave the correct answer for microscopic criteria (Chart 3).

• Less than a third of the secondary school students selected both a macroscopic and a microscopic criterion.

• For macroscopic criteria, less than a quarter of the students in group 3 answered correctly. No significant improvement with grade level was observed. The students did not seem to master the concepts of pure substance and mixture.

To complete this last point, we noticed that they also have far less difficulty for the microscopic criteria with the formulae and with the names than for the macroscopic ones. The number of correct answers on the microscopic criteria (atom and molecule) improved with the number of years of study. More than 70% of students in group 3 (5 and 6 years of chemistry study) were able to give correct answers for both the names and formulae. Moreover, the higher the number of years of chemistry study, the more the students seemed to have a microscopic vision of the name or formula (correct and wrong answers) which we still noticed with question 1. These results seem to be consistent with curricula that explicitly deal with the microscopic level of the formulae. And, except for the first year of chemistry teaching, macroscopic criteria are always implicit in the syllabus.

For the chemical species under scrutiny, we do not observe significant differences in answers between the formula and name when the syntax of the name provides information on the composition of the molecule (e.g. dioxygen and carbon dioxide). But this is no longer the case when the name does not provide any information (such as carbon and water). The formula then helps the students in group 1 (the second and third year of chemistry study) to correctly answer for microscopic criteria. However on the macroscopic criteria, the formula can cause more errors (for example with H₂O and water). Another cause of errors may be the confusion between the concepts of pure substance and elementary substance (carbon dioxide and CO₂) or an additive conception of compounds (Ben-Zvi et al., 1987; Roletto and Piacenza, 1994).

Questions 4 and 5

The answers to these two questions should allow us to examine the students' ability to decode a chemical formula, their understanding of the concept of molecule and their ability to decode formulae in a chemical equation.

Question 4 results. We asked the students to indicate whether the formulae correspond or not to different molecules for the three following pairs: H₂O and H₂O₂, O and O₂, CH₄ and C₂H₈ (Appendix 1). For each pair we wrote down the percentage of students who answer that the formulae do not correspond to the same molecule. This is the correct answer (Table 5).

Table 5 Percentage of students who answer that each pair does not correspond to the same molecule

Students' group and age	Correct answers (%)
	The two formulae do not correspond to the same molecule			Correct answer for the three pairs
	H₂O and H₂O₂	O and O₂	CH₄ and C₂H₈	Correct answer for the three pairs
Group 1: age 14–15 (N = 233)	57	48	43	21
Group 2: age 16 (N = 178)	85	70	56	42
Group 3: age 17–18 (N = 147)	87	77	58	50
Group 4: university (N = 45)	98	76	67	53

For the first example (H₂O and H₂O₂), 85% of students in group 2 (fourth year of chemistry study) were able to answer correctly. But only 56% believed that CH₄ and C₂H₈ do not correspond to the same molecule (Table 5). In each group a quarter of students stated that CH₄ and C₂H₈ are identical. Other students answered they did not know or did not answer. Around half the students surveyed who answered that both formulae are identical justified this and the most widely used argument for them was that C₂H₈ is twice CH₄ (Table 6).

Table 6 Main argument given by the students who answer that CH₄ and C₂H₈ are identical

Students' group and age (total number)	Number of students
Students' group and age (total number)	Answering that CH₄ and C₂H₈ are identical	And giving a justification	Argument: C₂H₈ is twice CH₄
Group 1: age 14–15 (N = 233)	50	22	14
Group 2: age 16 (N = 178)	42	20	17
Group 3: age 17–18 (N = 147)	37	29	26
Group 4: university (N = 45)	11	10	10

Examples of explanations:

Example group 3: “C₂H₈is a CH₄molecule with a stoichiometric ratio of 2”

Example group 4: “C₂H₈it's twice CH₄”

Almost a quarter of university students still did not know the difference between the atom represented by O and the molecule represented by O₂ (Table 5).

Example group 4: “If O₂is divided by 2, we get O”

And finally, only 37% of all students were able to correctly answer for the three pairs (21% group 1, 42% group 2, 50% group 3 and 53% group 4) (Table 5)

Question 5 results. In this question, we asked the students to decide on the validity of the chemical equations that were proposed (Appendix 1). We collected the percentage of students who answered that the chemical equations including the chemical formulae O and H₄O₂ were wrong (Table 7).

Table 7 Correct answers to the equations containing the wrong chemical formula

Students' group and age	Correct answers (%)
Students' group and age	“No” to the equation with “O”	“No” to the equation with “H₄O₂”	“No” to both equations
Group 1: age 14–15 (N = 233)	15	23	7
Group 2: age 16 (N = 178)	13	22	6
Group 3: age 17–18 (N = 147)	45	39	29
Group 4: university (N = 45)	67	80	58

Less than half the students in groups 1, 2, and 3 considered that the chemical equations including O and H₄O₂ were incorrect and only a third (29%) rejected the incorrect equations in group 3 (Table 7). Looking at their justifications enables us to say that less than 20% of the students in groups 1, 2 and 3 managed to give a correct justification (Table 8).

Table 8 percentages of students who give a justification to question 5 (a pair of equations)

Students' group and age	Decomposition of water (%)		Combustion of methane (%)
Students' group and age	Justifications	Correct justifications	Justifications	Correct justifications
Group 1: age 14–15 (N = 233)	27	3	21	3
Group 2: age 16 (N = 178)	44	6	31	6
Group 3: age 17–18 (N = 147)	46	17	39	17
Group 4: university (N = 45)	82	51	78	53

Finally fewer than 10% of the students (total number 603) gave a correct answer to both chemical equations and explained clearly that the formulae O and H₄O₂ were incorrect.

Comparison of answers to questions 4 and 5. We check whether the students who gave correct answers for the formulae in question 4 also gave correct answers for the equations in question 5.

• Comparison of answers on O and O₂ (question 4) and on the equation containing O (question 5)

We first selected the students who considered that the formulae O₂ and O did not correspond to the same molecule (110 in group1, 124 in group 2, 113 in group 3 and 28 in group 4) (Table 5). Among them, we sought those who answered that the equations containing the formula O for oxygen were wrong.

Secondary school students in groups 1 and 2 who answered that the formulae O and O₂ did not correspond to the same molecule were not able to use this knowledge in the context of the chemical equation (only 13% in group 2 and 9% in group 1). Only 3% in group 1, 6% in group 2 and 22% in group 3 were able to explain that the equation was not correct because it did not contain the correct oxygen formula (Table 9). The answers to both questions (4 and 5) are consistent only for the university students (71% group 4) (Table 9).

Table 9 correct answers to the equation containing the chemical formula O among the students who consider that O and O₂ are different

Students' group and age	Correct answers (%)
Students' group and age	“No” to the equation with “O”	“No” to the equation with “O” with correct justification
Group 1: age 14–15 (N = 110)	9	3
Group 2: age 16 (N = 124)	13	6
Group 3: age 17–18 (N = 113)	52	22
Group 4: university (N = 28)	71	68

• Comparison of answers between CH₄ and C₂H₈ and on the equation containing H₄O₂

We selected the students who considered that C₂H₈ and CH₄ formulae did not correspond to the same molecule (100 in group1, 100 in group 2, 85 in group 3 and 30 in group 4) (Table 5). Among them we looked for those who answered that the equations containing the formula H₄O₂ were wrong.

We found quite comparable results (Table 10) with the previous situation (Table 9).

Table 10 Correct answers to the equations containing the chemical formula H₄O₂ among the students who consider that C₂H₈ and CH₄ are different

Students' group and age	Correct answers (%)
Students' group and age	“No” to the equation with “H₄O₂”	“No” to the equation with “H₄O₂” with correct justification
Group 1: age 14–15 (N = 100)	27	5
Group 2: age 16 (N = 100)	18	6
Group 3: age 17–18 (N = 85)	48	25
Group 4: university (N = 30)	83	60

• Comparison of answers between question 4 and question 5:

We then selected the students who correctly answered all three (Table 5). Among them, we looked for those who gave correct answers for the two equations with the wrong formulae. Among the 221 students who gave a correct answer to question 4, only 25% rejected the wrong equations in question 5. See Table 11.

Table 11 Correct answers to the equations containing the chemical formula O and H₄O₂ among the students who correctly answer question 4

Students' group and age	Correct answers (%)
Students' group and age	“No” to equations with “O” and with “H₄O₂”
Group 1: age 14–15 (N = 48)	27
Group 2: age 16 (N = 75)	18
Group 3: age 17–18 (N = 74)	48
Group 4: university (N = 24)	83

Comments. Fewer than 40% of students answered that a molecule is an entity composed of atoms the number of which is fixed and cannot change (question 4). Only a quarter of these students who correctly answered question 4 were able to apply this concept in using a chemical equation. Finally fewer than 10% of all students surveyed were able to indicate explicitly that “O” is not dioxygen and “H₄O₂” is not water when they were confronted to chemical equations involving these formulae.

In school curricula and textbooks for the first three years of chemistry teaching, work on the chemical formulae seems to focus on counting the atoms in the molecule. But our results show that this kind of work does not guarantee that the students understand that there is only one formula for a given molecule. Therefore students would probably need to work differently on the chemical formula. We think it is essential to make the students handle chemical formulae in the context of chemical equations from the beginning of chemistry study.

Discussion

Let us remember the research questions to sum up our findings for each question and then comment on them. We discuss our results further.

Our first research question was about the criterion used by the students to classify a list of names. According to our results the students did not propose a classification of names into mixture and pure substance (fewer than 6%). The most widespread criterion of classification was the opposition between microscopic and macroscopic and in that case scientific names were only associated with the microscopic level. The scientific name opposed to a common name did not seem to be a relevant tool used by the students to classify chemical species and mixtures.

Our second research question examined what characteristics of the macroscopic level and of the microscopic level the students associate with names and chemical formulae. What criteria did they choose? Only a third of the students surveyed associated pure substance or mixture (which we will call macroscopic criteria) and atom or molecule (which we call microscopic criteria) with a name or a formula, and those who answered correctly for microscopic criteria were not those who gave a correct answer for macroscopic criteria. The students mainly chose microscopic criteria (atom or molecule) for the formulae and the names (those which seemed scientific to the students), and this microscopic “vision” increases with the number of years of chemistry study. In the case of chemical formulae, we thought we could get this result because in the syllabus the formulae are introduced with the concepts of atom and molecule (grade 8). But this result is much more surprising in the case of names. The concept of pure substance is introduced in grade 7, and the substances are named without any study of the nomenclature. So, the chemical language does not seem to facilitate the shifting between the macroscopic and the microscopic levels (Taber, 2013). Did the students answer correctly? They gave more correct answers for the microscopic criteria as the number of years of chemistry study increased whereas there was no improvement for the macroscopic criteria. The incorrect answers on macroscopic criteria revealed that the students confused pure substance with elementary substance. Was there any difference between names and formulae? When the syntax of the name gives an indication on the chemical composition of the substance, the percentages of answers (correct or not) are similar to those for the formulae. So, the syntax facilitates the choice between the atom and the molecule but is a source of error for the choice between pure substance and mixture. Some previous studies already noticed this confusion among the students who were asked about particular drawings (Sanger, 2000; Stains and Talanquer, 2007) or chemical formulae (Roletto and Piacenza, 1994; Fillon, 1997). Our study shows that this confusion seems also to occur when the students are asked about names and probably rely on the syntax of the name. The understanding of names or formulae as part of the language of chemistry could have been a source of confusion.

Our third research question sought to determine whether the students were able to handle the concept of molecule out of context and in the context of a chemical equation. The students surveyed struggled to decode a chemical formula out of the context of a chemical equation and failed to decode them in that context. We found out that the students did not master the use of simple chemical formulae like those of dioxygen or water in the context of a chemical equation. They were not able to use the concept of molecule as a fixed and constant combination of atoms, which confirms previous results (Ben-Zvi et al., 1987; Al-Kunifed et al., 1993; Sanger, 2005).

Finally this study enables us to show that teaching the language of chemistry is not a priority in the French curriculum but the students right up to the last year of secondary school have great difficulties using and interpreting this language. Could the language of chemistry be the source of a wrong interpretation of the concept of chemical substance? Or, by contrast does the wrong understanding of the concept of substance lead the students towards a partial understanding of the language of chemistry, namely a microscopic vision of names and formulae? We cannot answer. We make the assumption that this microscopic vision of names and formulae is due to the introduction of the formulae with the concepts of atom and molecule in the syllabus and to the lack of a specific work on the nomenclature.

Previous studies (Stavridou and Solomonidou, 1998; Johnson, 2000; Solomonidou and Stavridou, 2000) already showed that it is important to build the concept of chemical substance in conjunction with the concept of chemical reaction. Different studies addressed the learning of substance more or less directly; pedagogical progressions were proposed to link microscopic and macroscopic models (de Vos and Verdonk, 1987; Sanger, 2000), other studies dealt with formulae and substances (Roletto and Piacenza, 1994) or with formulae and microscopic representations (Sanger, 2000). Some researchers recently suggested that the concept of “chemical identity” (Ngai et al., 2014; Sevian and Talanquer, 2014) becomes the central issue of the chemical syllabuses. This concept relies on the assumption that each chemical substance can be differentiated from other substances because it possesses a specific property “that makes it unique” (Enke, 2001 quoted by Sevian and Talanquer, 2014). These researchers proposed a learning progression aiming at helping “the students build connections between macroscopic experiences and submicroscopic models of matter” and promoting “the development of an interactionist view of matter” (Ngai et al., 2014). “The suggested curricular sequence designed to foster students’ conceptual understanding of chemical identity at the macroscopic and microscopic levels” (Ngai et al., 2014) mentions the different types of models to be used, the way to analyse the properties of substance, the evolution of the concept of chemical identity from novice to advanced learner. All these aspects are extremely relevant and cannot be ignored but they do not seem to include a specific learning of the chemical language. Our results showed that the students had a microscopic reading of the names and the chemical formulae and did not link them to the notion of chemical substance. Therefore they would probably have trouble making the right connections between macroscopic and microscopic levels. Consequently, the language of chemistry could become an impediment to the learning of chemical identity.

All the reports quoted do not propose a progression linking all three aspects: macroscopic, microscopic and language.

Perspectives

Like Lavoisier who was convinced that a systematic nomenclature should facilitate the study of chemistry, we think that teaching the language of chemistry and specifically the scientific names and chemical formulae should be developed in the first year of chemistry study in France. Teaching progressively with numerous examples (Nelson, 2002) and promoting the connection between representations and theoretical notions (Gilbert and Treagust, 2009) could be done with “identity cards”. Each time a new chemical species is introduced in the curriculum and thus in the classroom, an “identity card” could be created and the chemical name and the common name, if any, could be distinguished on this card as well as the formula and the chemical properties. Each year new information would be put on these “identity cards”, when new empirical knowledge and new models are introduced to the students like microscopic knowledge (the chemical formula of the molecule in the second year chemistry study, the ionic formulae in the third year chemistry study,…) and macroscopic knowledge (pure substance versus mixture in the first year, acids and bases, metals and non-metals in the third year,…).

Curricula must change and stress that it is important to teach the different levels of chemistry knowledge (Kaya and Erduran, 2013) as well as the language. But we think that teacher training also has to tackle this issue. Our current project is to investigate how the chemical formula could be introduced in the second year of chemistry study in France. Following the framework of didactical engineering (Artigue, 1990) we intend to propose a teaching-learning sequence relying on historical texts used as didactical tools according to de Hosson’s view (de Hosson, 2011; de Hosson and Décamp, 2014). This sequence aims at working on both the elaboration of the concepts of molecule and of chemical equation, and the elaboration of a representation of a molecule by a chemical formula. We think that if the students work simultaneously on the language of chemistry and on the models of chemistry, they should acquire both types of knowledge better.

Appendix 1: questionnaire distributed to the students

All questions are translations of the original French used in the questionnaire

Investigating the representations used in chemistry

This questionnaire is not a control. It is anonymous. This is a survey as part of a research project to find out what you understand. Do not try to give the “right” answer. For us the “right” answer is actually “what you think”!

Grade: Name of the school:

1. Can you put some names in the list below into categories? If so, name your categories and specify the names of the list that you put into it.

List of names: air; carbon; cyclohexane; coal; hydrogen peroxide; oil; carbon dioxide; milk; methane

……………………………………………………………

2. Indicate what each name represents for you (you can tick one or more answers per line):

Dioxygen:	Pure substance□	Mixture□	Molecule□	Atom□	I don't know□
Water:	Pure substance□	Mixture□	Molecule□	Atom□	I don't know□
Cyclohexane:	Pure substance□	Mixture□	Molecule□	Atom□	I don't know□
Methane:	Pure substance□	Mixture□	Molecule□	Atom□	I don't know□
Carbon:	Pure substance□	Mixture□	Molecule□	Atom□	I don't know□
Carbon dioxide:	Pure substance□	Mixture□	Molecule□	Atom□	I don't know□

3. Indicate what each formula or symbol represents for you (you can tick one or more answers per line):

O₂:	Pure substance□	Mixture□	Molecule□	Atom□	I don't know□
H₂O:	Pure substance□	Mixture□	Molecule□	Atom□	I don't know□
C:	Pure substance□	Mixture□	Molecule□	Atom□	I don't know□
CO₂:	Pure substance□	Mixture□	Molecule□	Atom□	I don't know□
C₂H₆O:	Pure substance□	Mixture□	Molecule□	Atom□	I don't know□
Fe:	Pure substance□	Mixture□	Molecule□	Atom□	I don't know□

4. According to you, do these empirical formulae correspond to the same molecule?

H₂O and H₂O₂

yes □

no □

I don't know □

Explanation: …………………………………………………………..…………………………………………………………………………….……………………………………

O and O₂

yes □

no □

I don't know □