Teaching of experimental design skills: results from a longitudinal study

L. Szalay *a, Z. Tóth b and R. Borbás c
aFaculty of Science, Institute of Chemistry, Eötvös Loránd University, Pázmány Péter sétány 1/A, H-1117 Budapest, Hungary. E-mail: luca.szalay@ttk.elte.hu
bFaculty of Science and Technology, Institute of Chemistry, University of Debrecen, Egyetem tér 1., H-4032 Debrecen, Hungary
cSzent István Secondary School, Ajtósi Dürer sor 15., 1146 Budapest, Hungary

Received 10th November 2020 , Accepted 27th July 2021

First published on 28th July 2021


This paper reports the findings of the second and the third year of a four year longitudinal empirical research into the modification of ‘step-by-step’ instructions to practical activities which require one or more steps to be designed by the students. This method had been successfully applied for a short period for 14–15 years old students. However, the first year of the current longitudinal study, investigating the approach's effectiveness (a) for younger students and (b) over a period of time, did not produce the expected results. Therefore the research model was modified at the beginning of the second year which began in September 2017 with over 800 13–14-year-old students. Each school year they spent six lessons carrying out practical activities using worksheets we provided. The participating classes were allocated to one of three groups. Group 1 was the control group. Students simply followed the step-by-step instructions. Groups 2 and 3 were experimental groups. Group 2 students followed the same instructions, but from the beginning of the second school year their worksheets explained the principles related to the experimental design of the step-by-step experiments carried out. Group 3 students followed the same instructions, but one or more steps were incomplete and students were required to design these steps, as in the first year. However, from the second year onwards Group 3 students were taught the relevant principles of the experimental design before they started planning and carrying out the experiments. The impact of the intervention on the students’ experimental design skills and disciplinary content knowledge was measured by structured tests. After the second school year of the project it was clear that both types of instruction (used in case of Group 2 and Group 3) had a significant positive effect on the results of the students disciplinary content knowledge and experimental design skills. However, the development seemed to stop in the third year of the project. In the third year, a student's achievement was influenced mostly by their school ranking.


Content knowledge and development of process skills in science

“How do they know what goes on inside an atom?” asked Bill Bryson (2004) in his widely praised book “A Short History of Nearly Everything”. It seems he did not get a satisfactory answer from the author of his 1950's school textbook. “It was not exciting at all. …It was as if he wanted to keep the good stuff secret” complained a frustrated Bryson. He wanted to know not just the scientists’ answers, but the processes by which the answers were reached.

Seeing science as a finished (albeit impressive) building, instead of one still under construction can lead to loss of interest and motivation. This is not the only problem that may appear in students’ life. From time to time everybody needs to make informed decisions about their own or other people's lives that require a judgement as to whether a scientific-looking statement might be true or false. Therefore, people need a knowledge of science, along with technology, to make rational decisions and participate in society (e.g.Sjřberg and Schreiner, 2006; Kousa et al., 2018). Everybody should understand at least the basics of the scientific process how the scientific content knowledge is built: how hypotheses and models are created, how investigations and experiments are designed, how data and evidence are gathered and how the results are interpreted, discussed and accepted or rejected by the scientific community.

Several expressions are used in the literature to describe this dichotomy. Walker (2007) differentiates between ‘scientific knowledge (hypotheses, theory and law)’ and ‘scientific process: basic and integrated process skills (among the latter: controlling variables)’. Seery et al. (2019) interpret Kirschner's (1992) definition of the ‘substantive structure of science’ as ‘the body of knowledge making up science’; as opposed to Kirschner's term ‘syntactical structure of science’, explained by Seery et al. (2019) as ‘the habits and skills of those who practice science’. Lederman and Lederman (2012) called these two components ‘content knowledge’ and ‘knowledge of inquiry methods’. One goal of science education is to produce scientifically literate people armed with both, enabling them to make intelligent decisions and differentiate between science and pseudoscience. The PISA 2018 science framework (OECD 2019) asserts that scientific literacy requires knowledge of the concepts and theories of science, and knowledge of the common procedures and practices associated with scientific enquiry. (‘Enquiry’ and ‘inquiry’ are used interchangeably in the literature.) The terms ‘scientific literacy competencies’ or ‘scientific practices’ are also used by Cannady et al. (2019) to frame the notion of science processes as a set of core skills that supports the learning of science. They argue that the general mastery level in science practices can be conceptualized as scientific sensemaking. This requires skills and knowledge, including epistemological understandings of the process of knowledge generation and that scientific sensemaking can improve with effective science instruction. For many years the primary goal of science literacy was to improve thinking skills, including the ability to engage in research activity and planning research studies (AAAS, 1993; NRC, 1996) that are among the scientific process skills (SPS). Therefore, science content and SPS are two primary components of science education (Tosun, 2019). SPS is also referred to as scientific method, scientific thinking, critical thinking, scientific inquiry, scientific reasoning skills and thinking skills. They are used interchangeably to describe the SPS needed to be acquired by the students in the 21st century (Kambeyo, 2017). SPS include skills that can be used by a science literate person to increase the quality and the standards of life (Williams et al., 2004).

Inquiry and the lost opportunities in teaching science in the lab

Nowadays it is widely accepted that it is important to integrate scientific content and science practices (NRC, 2012; NGSS 2013; Cannady et al. 2019; OECD 2019). Several approaches have been used to achieve this integration (Walker et al., 2016). Curricular reforms have proposed the incorporation of more inquiry-based instruction in science courses (NRC, 2005; Lyons, 2006; Chandrasena et al., 2014; Seery et al., 2019). School chemistry laboratories are the obvious places where students should be introduced to the procedural knowledge and skills in science (Bybee, 2000, Walker et al., 2016). However, far too often opportunities are lost. Students conduct laboratory experiments following recipes (Akkuzu and Uyulgan 2017). These step-by-step protocols allow students to carry out experiments without thinking about the reasoning behind the procedures (Woolfolk, 2005; Boyd-Kimball and Miller, 2018). Allen et al. (1986) stated that the cookbook nature of verification experiments can inhibit intellectual stimulation. They deprive students of opportunities to develop key skills, such as practice experimental design and critical thinking. Shiland (1999) and Domin (1999) stated that laboratory activities based on the confirmatory approach do not improve students’ cognitive skills, since they cannot engage them in critical and higher-order thinking.

These experiences usually fail to develop a wider range of transferable skills. They lack any real-world context (Johnstone and Al-Shuaili, 2001; McDonnell et al., 2007). The development of higher-order thinking skills (Zolller and Tsaparlis, 1997), requires the integration of conceptual knowledge and procedural skills with a complex problem context (Chonkaew et al., 2016). The basic skills required in general school science curricula (Nagy et al., 2015) consist of data handling technique, identification of variables, setting research questions, hypothesis formulation, variable planning, experimental plans and making conclusions.

Such opportunities could be provided with a different approach to the practical work (Goodey and Talgar 2016). Alongside critical thinking, secondary school students should learn about experimental design and laboratory skills/techniques. These are important goals for undergraduate chemistry laboratory (Bruck et al., 2010; Bretz et al., 2013; Bruck and Towns, 2013) and it is reasonable to suggest they could be introduced, at the appropriate level, in schools. One approach is the use of pre-laboratory activities. Students may be asked to plan an experiment using provided information and criteria for a good experimental design. Critical thinking was elicited when this approach was used during a chemical inquiry practical in senior-year secondary school (Brederode et al., 2020).

Further, what could be considered then as a ‘real inquiry’ that is worth doing? According to the simplest definition, inquiry-based learning happens when students are supposed to proceed in the way scientists work (PRIMAS, 2017). Other sources are more specific, each listing activities that characterise inquiry-based education (Gott and Duggan, 1998; Phillips and Germann, 2002; Kahn and O’Rourke, 2005; Zimmerman, 2005; Mayer, 2007; Rocard Report 2007; Akkuzu and Uyulgan, 2017). Three main phases may be separated: (1) formulating research questions, (2) planning a method to get the answer and carrying out the investigation, (3) interpreting the results (Schwab,1962; Colburn, 2000; Nowak et al., 2013). These phases are applicable to four types of inquiry. (1) Open inquiry: students decide both on the question to investigate and the method they use to get to an answer. (2) Guided/bounded inquiry: the teacher provides the question, but the students decide on the test method used to answer it. (3) Structured inquiry: both the question and the method are provided to the students, but the outcome is unknown to them. (4) Confirmation inquiry: the question, the method and even the outcome are known in advance by the students (Tafoya et al., 1980; Lederman, 2004; Bell et al., 2005; Fay et al., 2007; Walker, 2007, Wenning, 2007; Banchi and Bell, 2008; Akkuzu and Uyulgan, 2017). There are other classifications. Fradd et al. (2001) defined six levels of inquiry according to the role of the teacher and the students. Domin (1999) distinguishes between the expository approach, when the students should deduce some findings, from the outcome and the discovery approach, when students need to recognise trends or pattern that explain their results. Domin (1999) calls the two approaches when the student must develop their own procedure, problem-based (requiring deduction) and inquiry-based (requiring induction). However, Duch et al. (2001) described problem-based learning as a composite of inquiry-based and context-based learning.

It is challenging to find effective ways to support students within the inquiry process and there is a fine balance between providing too little or too much guidance (Kahn and O’Rourke, 2005). However, Shiland (1999) and Domin (1999) referred to laboratory activities based on the confirmatory approach as closed or low inquiry laboratories, that Xu and Talanquer (2013) characterised as “essentially no inquiry”. According to Banchi and Bell (2008) and Chin and Chia (2006) only the guided and open inquiry develop the students’ skills at a higher level.

Parameters influencing achievement in science

Snook et al. (2009), in their critical review of Hattie's (2008) book, warn of the difficulty of clearly defining variables in education. According to an OECD report (2005), the largest source of variation in student learning is attributable to differences in what students bring to school – their abilities and attitudes, and family and community. Snook et al. (2009) quote Gray, Jesson and Jones (1986), who summarised their large scale research in Britain by saying that around 80% of the difference in achievement can be explained by the intake and half of the remaining difference may be explained by the school's examination policies. This would leave 10% to be explained by other variables within the school. Harker (1995) claimed that “anywhere between 70–80% of the between schools variance is due to the student ‘mix’ and only between 20% and 30% is attributable to the schools themselves”. However, Harker (1996) found significant differences between schools in their results even after the influence of social background was controlled (the “value added” effect). Hattie (2003) estimates that 40% of the improvement can be explained by school and teacher influences, still smaller than the influence of social background and other parameters. Snook et al. (2009) summarise several studies, stating that most of the variance comes from the social variables and only a small part from the school, but that in the school the teacher is the most important variable. Zeidan and Jayosi (2015)'s study reported that the SPS level of students in rural middle schools was higher on average than those in urban middle schools. However, Tosun (2019) found that school location did not have a predictive effect on the students’ SPS level. Tosun's (2019) study revealed that the most important predictive variables on student's SPS level were gender, grade level and mothers’ education level from the examined demographical features. Onukwo (1995) also found a significant gender difference in the levels of SPS, whereas others did not find any significant difference in students’ acquisition of SPSs with respect to gender (Ofoegbu, 1984, Walters and Soyibo, 2001; Böyük et al., 2011; Güden and Timur, 2016). There has been a long, ongoing debate about the gender differences in science achievement that varied according to the subject matter under study, according to Becker (1989). Jäncke summarized and interpreted the current status of sex/gender differences in terms of brain anatomy, brain function, cognition, and behavior in his review paper (Jäncke, 2018). He concluded that most of these differences are not large enough to support the assumption of sexual dimorphism. However, traditions, law, and expectations obviously could influence how boys and girls perform in school. Eccles and Davis-Kean (2005) published that parents’ education predicts children's educational outcomes. A significant positive relationship between parents’ education level and their childrens’ academic achievements was found by Asad khan, Iqbal and Tasneem (2015) in India. The studies by Karar and Yenice (2012) and Germann (1994) also reported that the SPS of students differed according to their parents’ education level. Parents’ educational backgrounds enhanced the rate of acquisition of students’ SPS (Martina, 2007). According to other studies (Böyük et al., 2011; Ocak and Tümer, 2014; Tosun, 2019), a significant difference was found between the SPS level of the students whose mothers are high school or university graduates and those whose mothers are primary school graduates. Interestingly, Kambeyo (2017) states that although fathers had no significant influences on students’ scientific inquiry skills in Namibia, students whose mothers did not finish primary education outperformed those students whose mothers reached secondary and higher education levels. He suggests a possible explanation. In Namibia, children from low-income society tends to work harder than those from affluent society, because they want to come out of poverty level and live a better life.

Factors and conditions hindering the application of inquiry methods

There are reasons why the open and the guided inquiry instructions are rarely implemented in secondary classrooms (Roehrig and Luft, 2004). The primary reason cannot be that teachers do not know about these methods, beause even those trained to use inquiry strategies often use teacher-centered approaches in their school classrooms (Simmons et al., 1999). Many publications identify problems that inhibit the successful application of inquiry methods. These include time and resource constraints, large group size, broad curriculum, lack of student motivation and background knowledge. Also, some teachers are uncomfortable managing inquiry situations (Lagowski, 1990, Edelson et al., 1999; Lawson, 2000; Walker, 2007; PRIMAS, 2011; Akkuzu and Uyulgan, 2017; George-Williams et al., 2018; Hennah, 2019). Several authors call for suitable teaching materials that would efficiently help the teachers to overcome those hurdles (e.g.Powell and Anderson, 2002; Markic and Abels, 2014; Benny and Blonder, 2018; Kousa et al., 2018). Perhaps, however, assessment has the greatest influence on the learning experience. Therefore, assessment should include thought-provoking items to evaluate higher level skills, not just knowledge-recall (Kahn and O’Rourke, 2005, Walker, 2007). Chemistry teachers need support to develop these (Schafer and Yezierski, 2020).

Modifying step-by-step recipes to inquiry activities

One approach that has proved beneficial at university level is to modify traditional expository/step-by-step/‘cookbook’ laboratory procedures into inquiry activities (Walker et al., 2016, Boyd-Kimball and Miller, 2018). The latter authors redesigned an advanced biochemistry lab to transition students from step-by-step laboratory procedures to open inquiry, via the use of guided inquiries. Goodey and Talgar (2016) found that a guided inquiry course (when the “cookbook modules” of the control group followed the same sequence of related experiments as the modules in the inquiry set), improved the students’ experimental design ability. Designing experiments is part of a scientific investigation (Zimmerman, 2005). According to Tosun (2019) designing an experiment, including changing and controlling variables, is part of experimental scientific process skills. Cannady et al. (2019) also say that designing investigations includes the control of variables. Interventions focused on students’ manipulation of materials can improve their ability to design investigations and causal inference (Triona and Klahr, 2010). Thinking before is also critical for a well-planned design (Chonkaew et al., 2016).

Previous results

Two studies (Szalay and Tóth, 2016; Szalay et al., 2020) provided preliminary results for the research described in this paper. The first, a brief research project, changed ‘step-by-step’ instructions into practical activities that required some stages to be designed. This was straightforward, requiring limited time and effort. The students were 14–15 year old. A control group followed step-by-step recipes. An experimental group did the same experiments, but had to design some steps. The result of the intervention was that the experimental design skills (EDS) of the experimental group developed significantly more than that of the control group (Szalay and Tóth, 2016). A similar idea was also used later at university level by Rodriguez and Towns (2018).

The second was a report on the first year of a four-year longitudinal research project that started in September 2016. The approach was similar to the first study, but with more than 900 students aged 12–13 involved. They were divided into three groups. Two groups were defined as in the first study. Group 1 was the control group following step-by-step recipes. Group 3 did the same experiments but had to design some of the steps. However, there was a third group (Group 2) that followed step-by-step recipes, but students were also given theoretical experimental design tasks. In the first year of the longitudinal study these three groups used the worksheets in six 45-minute lessons. However, for Group 3 no significant effect of the intervention was detected by the structured tests taken at the end of the first school year. It was thought that most of the students were probably still in Piaget's concrete operational stage (Cole and Cole, 2006) and that this explained the lack of effect. Padilla (1990) found that experimental skills are closely related to the formal thinking abilities described by Piaget. Nor perhaps, was the cognitive load placed on the Group 3 students managed properly (Sweller, 1988). It is also possible that some teachers of the Group 3 provided experimental design steps when time was short.

Research questions (RQ)

Since the method did not appear to work for younger students and/or the longer term, the research model was modified. From the beginning of the second school year students were taught the relevant principles of the experimental design, either after doing step-by-step experiments (experimental Group 2) or before doing the same experiments partly designed by themselves (experimental Group 3). Group 1 was the control group, with students following step-by-step instructions without any explanation of the principles of the experimental design (Fig. 1).
image file: d0rp00338g-f1.tif
Fig. 1 Research model applied in the second and the third school year of the project.

In the remaining three school years of the project, answers to these questions were sought.

RQ1: Was there any significant change in the students’ ability to design experiments (experiment design skills, EDS), in either of the experimental groups (Group 2 and Group 3) compared to the control group, because of the intervention after changing the research model?

RQ2: Did students in the experimental groups achieve significantly different scores on questions that assess disciplinary content knowledge (DCK) than control group students, because of these interventions?

RQ3: Was there a difference in EDS between students of Group 2 and Group 3?

Research method and research design

Eighteen of the twenty-four in-service chemistry teachers and the five university chemistry lecturers remained members of the research group. Twelve chemistry teachers worked only for one or more years in the project, since they took maternity leave, retired, left the participating school, or changed teaching groups. Five pre-service chemistry teacher students participated for one or two years only. As for the first year of the project, six student worksheets and teacher guides were written for each of the remaining three school years. At the start of the study (September 2016) 883 participating seventh grade students took the first test (Test 0). At the end of each of the first three school years they also took additional tests (Test 1 at the end of the first school year by 853 students, Test 2 at the end of the second school year by 812 students, and Test 3 at the end of the third school year by 722 students). Frustratingly, the 4th year of the project could not be finished as planned in June 2020, because of disruption caused by the Covid-19 pandemic. However, twenty-two teachers agreed to get as many students as possible to complete the remaining student sheets and Test 4 from September 2020, enabling the final results to be collected and analysed in the 2020/2021 school year. Further, Covid-19 has meant that some schools have been unable to complete the work. Therefore, only results of the second and the third years of the project are published here. The research model modified in the beginning of the second year of the project is summarised on Fig. 1.

For each group, the intervention took seven chemistry lessons both in the second (Grade 8) and third (Grade 9) school years. Teachers chose when the six lessons using the worksheets provided took place. After this students took Test 2 in Grade 8 and Test 3 in Grade 9.


The students came from eighteen Hungarian secondary schools, thirty-one classes. They were taught by teachers in the research group. All teachers were voluntary participants. Participating students must attend schools that teach chemistry from Grade 7 to Grade 10. In our publication (Szalay et al., 2020) describing the first year of the project, it was explained why the sample was not representative of the whole school population. Rather, it was representative of higher achieving students. While unfortunate, there is no practical way to follow the development of the students’ knowledge and skills for four years who change school at the age of 14.

When the project started in autumn 2016, 920 students were involved. All were 7th grade 12–13-year-olds. They remained in the project in the second and third years. However, if any missed one test they were no longer considered to be part of the sample.

The classes were assigned randomly to three groups in the beginning of the project (Szalay et al., 2020). In the second and third years, Group 1 continued to carry out step-by-step experiments only. However, Groups 2 and 3 (the experimental groups) were treated differently to their first year. From the beginning of the second year Group 2 continued to carry out the same step-by-step experiments as Group 1, but after doing the experiments relevant principles of experimental design, written on their student sheets, were discussed with them by their teachers. Group 3 carried out the same experiments as Group 1, but with one or more steps missing. Students were required to design the missing steps and complete the experiment (see Tables 1 and 2). Note: Students worksheets and teacher guides are numbered according to the school years, with 1–6 used in the first school year, 7–12 in the second and 13–18 in the third. Unlike year 1, relevant principles of experimental design were given in Group 3's student sheets and discussed with the students before they designed the missing steps.

Table 1 Topics of the student worksheets and teacher guides used in the second school year, 2017/2018 (No. 7–12), and what the students learnt about the experimental design
No. Topic Experiments that Group 1 and Group 2 students had to do following step-by-step instructions, but Group 3 students had to plan before doing the experiment What did the experimental groups learn about experimental design? (Group 2: AFTER doing the step-by-step experiments; Group 3: BEFORE designing and doing the same experiments)
7. Acids, bases, indicators, neutrali-sation, chemistry in the household Students do step-by-step experiments to observe the colour of phenolphthalein in acidic/alkaline solutions and how it changes during neutralisation. Students are given two glasses. There is the same amount of distilled water in both glasses, but one drop of white vine vinegar in one of them and two drops of the same vinegar in the other glass. The Group 3 students are asked to design an experiment to determine which of the glasses contains two drops of vinegar. 1. Fair testing. The dependent variable (quantity of sodium hydroxide solution added to the content of both glasses until the colour of the phenolphthalein changes) depends only on the independent variable (quantity of vinegar in the glasses); all other variables are held constant (e.g. concentration of sodium hydroxide solution, the Pasteur pipette used).
2. Quantitative analysis (the basic idea of acid–base titration).
(3. The possible causes of measurement errors.)
8. Reactivity series of metals, reactions of metals Students investigate the reactivity of magnesium, zinc and copper with cold and hot water, dilute hydrochloric acid and solutions of one another's salts by doing step-by-step experiments. They are given iron nails. Groups 1 and 2 get instructions to determine the place of the iron in the metal reactivity series. Group 3 do the same, except they are not given instructions. They must plan what to do. 1. Fair testing.
2. Choosing equipment and materials: considering the possible alternative solutions, depending what is available.
3. Algorithmic thinking: deciding whether the order of the steps of the experiment is important or not (it is not).
9. Reactions of iron, natural waters, quality of drinking water Students do step-by-step experiments to observe that iron(III) ions form a red compound with thiocyanate ions and the intensity of the colour depends on the concentration of the iron(III) ions (all other variable kept constant). Group 3 students have to decide whether a water sample has got too much iron(III) ions in it to make it inappropriate for drinking or not. They are given the official concentration limit issued by the authorities. 1. Fair testing.
2. Qualitative analysis: sensitive and characteristic reaction of iron(III) ions.
3. Quantitative analysis: the properties (e.g. the colour) of a solution depend on the concentration of the solute, colorimetry, calibration series.
10. Hardness of water, water softeners, precipitation reactions Students read about problems caused by hard water. They do step-by-step experiments to find out which cations cause the hardness of the water, and how to show their effects by measuring the height of soap foam after shaking. Using a table that shows which anions form precipitates with the cations, Group 3 students must work out which compounds could be used as water softener, and show this by experiment. 1. Fair testing.
2. Using data: interpreting a table about the solubility of salts.
3. Qualitative analysis: precipitation reactions of calcium ions and magnesium ions.
4. Semi-quantitative measurements testing the hardness of the “water samples” (solutions) before and after adding the softeners by measuring the height of the soap foam (by a ruler), formed after shaking the solutions the same time with the same intensity.
11. Reactions of calcium carbonate with acids, calcination of limestone, lime slaking Students do step-by-step experiments to model calcination and lime slaking by holding a piece of limestone (and later a piece of eggshell) in the flame and then dropping it in distilled water containing phenolphthalein. They also model the effect of the limescale remover on limescale. Group 3 students are asked to design and do an experiment modelling how the waterbed made of limestone can neutralise acidic rain, whereas sand does not have this effect. 1. Fair testing.
2. Model experiments: modelling chemical reactions taking place in the industry, in the household and in the environment.
3. Choosing equipment and materials: what could replace the water of the lake, the acid rain, the limestone or the sand on the bottom of the lake? How can be the changes of pH followed?
4. Control experiment: how does the colour of the red cabbage indicator change without limestone/sand after adding acid to it?
5. Reference material: comparing the effect of limestone and sand.
12. Nutrients (fat, carbo-hydrates, proteins, nutritional components of milk) Students do step-by-step experiments to test for the oil in an oil-and-water emulsion, reductive sugars using Fehling's reaction, and proteins using the biuret test. Group 3 students must design and carry out experiments to test whether milk contains all those three types of nutrients. 1. Fair testing.
2. Choosing equipment and materials, and what is needed to test the milk sample for each nutrient
3. Reference materials: comparing the extraction of oil from oil-and-water emulsion and milk; comparing the effect of Fehling reagent on glucose solution and milk; and comparing the effect of the biuret reagents on egg white solution and milk.

Table 2 Topics of the student worksheets and teacher guides used in the third school year, 2018/2019 (No. 13–18), and what the students learnt about the experimental design
No. Topic Experiments that Group 1 and Group 2 students had to do following step-by-step instructions, but Group 3 students had to plan before doing the experiment What did the experimental groups learn about experimental design? (Group 2: AFTER doing the step-by-step experiments; Group 3: BEFORE designing and doing the same experiments)
13. Structure of atoms, excitation of electrons, flame tests Students do a step-by-step experiment to observe the flame test of a sodium chloride solution. The relationship of the excitation energy of the atoms of certain elements and the colour of their flame tests is explained. All students are given two solutions, each containing a metal salt. Groups 1 and 2 students do flame tests following instructions and explain the difference in excitation energies with that of sodium. Group 3 students do the same, except that they must design an experiment to identify the metal ions and decide whether their excitation energies are higher or lower than that of sodium. 1. Qualitative analysis: flame tests.
2. Using data: interpreting the tables containing the colours of regions of the visible spectrum, their wavelength range, and the relationship between the wavelength and energy of electromagnetic radiation.
3. Understanding the steps of scientific investigations: finding a research question; collecting previous theoretical and practical knowledge; designing the experiments; collecting and evaluating data; drawing conclusion; answering the research question.
14. Structure of matter, bonding energy, amount of matter Students are given data to calculate the volume of the water in the oceans. They are asked to determine the volume of a drop of water, and from that to calculate how many drops of water are in the oceans. Group 3 students design experiments to determine the volume of one drop of water, one drop of ethanol, and one drop of a 50% by volume ethanol–water mixture as accurately as they are able. The results must be compared and interpreted. 1. Fair testing.
2. Choosing equipment: considering the possible alternative solutions to measure the volume of liquids.
3. Error of measurement: ways of reducing measurement errors.
4. Using data: calculations of volume and mass of a drop of liquid, the amount of matter and the number of particles in it, comparing the surface tension of water, ethanol and their mixture; interpretation of the relationship of these amounts.
15. Heat of dissolution, exothermic and endothermic processes Students do step-by-step experiments to investigate exothermic and endothermic dissolutions, modelling how the self-heating and self-cooling cups work. They read an advertisement for sodium acetate heat pad and must decide if it would work. Group 3 students are asked to design a model experiment to answer the question. 1. Fair testing.
2. Model experiments: differentiation of the important and unimportant aspects while modelling how products based on exothermic and endothermic dissolution work.
3. Drawing conclusions from the results of the model experiment to differentiate between science and pseudoscience.
16. Rate of chemical reactions and factors influencing the reaction rate Students watch the ‘elephant toothpaste’ demonstration to learn about influencing the rate of reaction by a catalyst (the decomposition of hydrogen peroxide catalysed by potassium iodide). They do step-by-step experiments to find out how the concentrations of sodium thiosulfate and hydrochloric acid solutions influence their reaction rates. Group 3 students must design an experiment to decide which of three hydrochloric acid solutions, of differing concentrations, the unknown solution has. Finally, the students investigate the effect of heat on the rate of reaction. 1. Fair testing: identification of independent variable (concentration of hydrochloric acid) and dependent variable (time needed to see the formation of colloidal sulphur) and controlling variables (concentration of thiosulfate solution, temperature, volumes of solutions etc.).
2. Semiquantitative analysis: detemination of the range of concentration of hydrochloric acid by measuring the time needed to see the formation of colloidal sulphur.
17. Strong and weak acids and bases, hydrolyses of salts, natural and artificial indicators Students fill in a table about the colours of various indicators in acidic, neutral, and alkaline solutions using the information collected as homework. They do step-by-step experiments to differentiate between a strong and a weak acid, and a strong and a weak base by red cabbage indicator. They investigate the pH of salt solutions using universal indicator. Group 3 student must design an experiment to identify potassium hydroxide, sodium carbonate and nitric acid solutions of the same concentration. They must also design what solutions and indicators should be used to paint a Hungarian flag. 1. Fair testing: controlling variables (concentration and volume of solutions).
2. Qualitative analysis: identification of solutions by their pH.
3. Choosing equipment and materials: pair the acidic/neutral/alkaline solutions with certain indicators to get the colours of the Hungarian flag (red, white/colourless, green).
18. Redox reactions, oxidation as giving off electron(s), reduction as gaining electron(s). As a homework, students must balance equations describing various reactions when hydrogen peroxide is either an oxidative or a reductive agent. Group 3 students are asked to identify a pattern: when oxygen gas is among the products, then the hydrogen peroxide reduces its reaction partner. Also, they must design an experiment to prove this, using the materials and equipment available. 1. Practicing the steps of the scientific investigations: working out patterns by observation, asking a research question, creating a hypothesis, designing experiments to test ideas, collecting data, interpreting the results and drawing conclusions.
2. Choosing equipment and materials: what is needed to test for the presence of oxygen gas?
3. Qualitative analysis: proving the production of oxygen gas by a glowing splint.

Class sizes varied between 13 and 35 students in the second year and 14 and 35 in the third year of the project, reflecting typical class sizes in Hungarian schools. The method used to produce random samples is described in the previous study (Szalay et al., 2020). Some teachers participated with only one class, whereas others with two classes. If a teacher had two classes of students participating in the research and one class had been chosen randomly to be in Group 1, the other class would be in Group 2 or Group 3. This choice was also random.

Teachers had written permission of their school principals to participate. Also, as there were no institutional ethics committees or boards in the participating schools, only those students were involved whose parents gave their written permission for their children to participate in the research. Teachers told students that test results would not count in their school chemistry assessment, but they would be participating in work that aims to improve the teaching of chemistry.

Student worksheets

In each of the four years six student worksheets and teacher guides were written for each group, describing practical activities, each planned for a 45-minute lesson (see the English translation of the 7th Student worksheet and teacher notes in Supplement 1, ESI as an example). 18 student sheets are available in English, http://ttomc.elte.hu/publications/90). They were piloted by the groups with students working in small teams.

Topics were cross-referenced to the curriculum, together with the experimental design tasks given to the Group 3 students on the student worksheets (see Table 1 for the second year and Table 2 for the third year). As in the first year, the student worksheets had an introduction intended to stimulate interest and curiosity. As with other studies (e.g.Hennah, 2019), the activities had to fit curriculum timetabling.

The student worksheets provided important DCK that was needed to solve the experimental design task and develop EDS. As in the first year of the project, each experimental design task required problem-solving skills and related to the topic of the lesson. These were guided inquiries. Group 1 and Group 2 students followed the same sequence of experiments. Unlike Group 3 students, they were provided with step-by-step instructions. This was similar to the investigation of the effect of the guided inquiry that Goodey and Talgar (2016) used in a university biochemistry laboratory course aiming to improve experimental design ability.

The difference between the first and the second/third year treatment of the Group 3 was that in the first year the students were not given help on their student sheets to design the experiments. However, in the second year and subsequently to reduce the Group 3 students’ cognitive load (Behmke and Atwood, 2013; van Merrienboer et al., 2003; Paas et al., 2003; Sweller, 2004), student worksheets were scaffolded. Group 3 students were taught the relevant principles of the experimental design before they began planning and carrying out the experiments. This is similar to advice on experiment design given in bounded inquiry modules used by Goodey and Talgar (2016) at undergraduate level. Group 3 students worked collaboratively to propose, design, and troubleshoot their own experiments, rather like the undergraduates in Boyd-Kimball and Miller's research (2018). Each series of experiments in our project had required students to gather their previous theoretical and practical knowledge, answer questions, and do step-by-step experiments. This is in accord with Crujeiras-Pérez and Jiménez-Aleixandre's (2017) suggestion that students need previous knowledge relevant to designing an experiment, as well as an opportunity to reflect on the design adopted.

Our worksheets also have similarities with the unfinished recipes prepared by Seery et al. (2019) that scaffold EDS and echoing Kirschner's (1992) suggested middle ground between expository and inquiry. As in the first year of the project, our experimental design tasks were built on the following components of the EDS defined and assessed by Csíkos et al. (2016): identification and control of variables (including the principle of ‘fair testing’, in other words ‘how to vary one thing at a time’ or ‘other things/variables held constant’); choosing equipment and materials and determination of the correct order of the steps of the experiment. The two types of analyses, qualitative and quantitative were covered. The concept of systematic and random error, introduced in the first year, were reinforced. According to Metz (1998) separating random error from true effects is not a skill that eighth grade students spontaneously engage in without scaffolding, but emerging understanding is evident (Masnick and Klahr, 2003). Since developing and using models (Lehrer and Schauble, 2000) is an important scientific skill, designing experiments modelling certain environmental, everyday, industrial or laboratory processes was also included among the tasks.

In their meta-analysis Furtak et al.’ (2012) concluded that evidence from some studies suggests that teacher-led inquiry has a greater effect on student learning than student-led inquiry. So, from the beginning of the second school year Group 2 students continued to do the same step-by-step experiments as Group 1. However, their worksheets explained the design of the step-by-step experiments. Group 2 students did paper-based experimental design tasks in their first year, but not in their second. This was done to exclude the effect of practising for the test. The approaches taken with Groups 2 and 3 resemble the use of postlab and prelab questions asked by Rodriguez and Towns (2018) to engage university undergraduate students in critical thinking.

Monetary and time expenses remained a consideration when planning activities and assessment. If these constraints were ignored, it is unlikely that the tried and tested activities would be suitable and widely used once this research project finished (Boesdorfer and Livermore, 2018). Detailed teacher guides were also provided for each student worksheet (see the links to their Hungarian versions in Supplement 1, ESI).


The effects of the different types of instructional interventions were shown by randomized control trials (Szalay et al., 2020). It is insufficient to ask the students about their perceived improvement, as there appears to be a disconnection in the students’ perceived and actual skill level, even at undergraduate level (Boyd-Kimball and Miller, 2018). There is general agreement that assessments and tasks should go beyond content knowledge, and require more than factual recall (Cooper, 2013; Reed and Holme, 2014; Rodriguez and Towns, 2018; Underwood et al., 2018), which is unavoidable when inquiry skills are assessed. The complexity of measuring disciplinary research skills development was given as one reason that assessment of these skills is largely ignored, even in the higher education (Rueckert, 2008; Feldon et al., 2010; Timmerman et al., 2010; Harsh et al., 2017). Harsh (2016) identified Experimental Problem Solving as a core element of scientific literacy (AAAS, 2011). It includes the ability to ‘‘define a problem clearly, develop testable hypotheses, design and execute experiments, analyze data, and draw appropriate conclusions’’ (American Chemical Society [ACS], 2008). For measuring EDS, Csíkos et al. (2016) developed independent tasks concerned with identifying variables, choosing equipment, and determining the experimental sequence. To measure the development of EDS in the present project, problem solving tasks were used that required the application of the components of EDS defined by Csíkos et al. (2016).

The tasks needed to be different in each test for the reasons previously described (Szalay et al., 2020). Meanwhile a study published by Cannady et al. (2019) reinforced the concern that using the same instrument in a pre/post setting could invite repeated testing effects wherein students can remember correct answers. Tests 2 and 3 contained DCK and EDS tasks that could be undertaken after students had completed the tasks on the student worksheets. Only the DCK given in the National Curriculum of Hungary (2012) could be assessed on the tests Szalay et al., 2020).

EDS tasks had to be in contexts relevant to their previously gained knowledge and understanding. However, the main goal of the research remained to develop EDS that can be applied under different circumstances (Szalay et al., 2020). This also accords with the approach of Cannady et al. (2019), that tasks must integrate content with which students are familiar, and focus on the ability to apply the practices. Other authors support these views (Zimmerman, 2000 and 2007; Tosun, 2019). Cannady et al. (2019) emphasize the importance of including and supporting access to the necessary content knowledge within the assessment. This approach is also used in PISA assessments of scientific literacy competencies (OECD, 2017). It was also considered while developing the EDS tasks of the present project.

Test questions were structured according to the categories of the Cognitive Process Dimension of the Revised Bloom Taxonomy (Bloom et al., 1956; Krathwohl, 2002) as interpreted in previous publication (Szalay et al., 2020). Therefore, the different levels of the DCK are represented on the tests, and the EDS tasks require the application of higher order cognitive skills. Each test consists of eighteen compulsory items, each worth 1 mark. Nine were to assess EDS. The other nine were to assess DCK, with three items each for recall, understanding and application. By analysing the results, the effects of the different types of treatments on EDS and DCK could be investigated.

As for Tests 0 and 1, students had 40 minutes to complete Test 2 (English translation, Supplement 2, ESI) and 40 minutes for the Test 3 (English translation, Supplement 3, ESI). Students were coded so that teachers knew their identities and gender, but the researchers only received the anonymous data coded for statistical analysis. These codes were used throughout the project.

The following tasks were used on Tests 2 and 3 to compare the development of students’ EDS across the three groups.

Test 2; Task 2.b:

There is 6% vinegar and 20% vinegar in two identical looking plastic bottles. The labels have fallen off from both bottles. You must decide, by doing an experiment, which bottle contains the more concentrated vinegar. (Smelling does not give a satisfactory answer and you must not taste the vinegar.) The following materials and equipment are available: 2 empty glasses (you can only use both of them once), 2 spoons, 4 droppers (with a scale on them showing the volume), aqueous solution of an alkaline caustic drain cleaner, red cabbage juice. (Vinegar is acidic and red cabbage juice indicates the changes of pH by its change of colour.)

Question 1: What materials would you put into one glass and what into the other glass?

Question 2: In what order would you put the materials into one glass and into the other glass?

Question 3: How much would you put of the different materials into one glass and how much into the other glass?

Question 4: What different experiences would you expect in the case of each glass?

Question 5: Based on your experiences, how could you decide which bottle contains the more concentrated vinegar?

Test 2; Task 7:

There are three test tubes. One contains silver ions (Ag+), another aluminium ions (Al3+), and the third zinc ions (Zn2+). All are colourless aqueous solutions. (There is about 1 cm3 dilute solution in each.)

There are two labelled bottles beside them, one contains ammonia (NH3) solution and the other sodium hydroxide (NaOH) solution. The table below shows what would be seen if a little (a few drops) or much more (several cm3) ammonia solution or sodium hydroxide solution was added to the test tubes containing the solutions of the different ions.

ion + little NH3 solution + much NH3 solution
Ag+ brown precipitate brown precipitate, dissolves
Al3+ white precipitate white precipitate
Zn2+ white precipitate white precipitate, dissolves
ion + little NaOH solution + much NaOH solution
Ag+ brown precipitate brown precipitate
Al3+ white precipitate white precipitate, dissolves
Zn2+ white precipitate white precipitate, dissolves

(a) What is the fewest number of tests needed to decide which solution of metal ions is in each of the three test tubes? Explain your answer.

(b) Only one of the solutions (NH3 or NaOH) can be used to determine which ion is in which test tube. Which one? Explain your answer.

(c) How should your chosen solution (NH3 or NaOH) be added to the solutions in the test tubes?

(d) What would you see when your chosen solution (NH3 or NaOH) is added to the test tube containing the solution of aluminium ions (Al3+)?

Test 3; Task 3:

The equation of the reaction that takes place between the bromine water and the methanoic acid:

Br2 + HCOOH = 2HBr + CO2

Bromine water is yellow. The other reactant and products are colourless. Your task is to show that the rate of reaction depends on the concentrations of the starting materials. The following are available: methanoic acid solution (in a glass bottle), bromine water (in a glass bottle), distilled water (in a flask), 4 beakers (50 cm3, with a scale on their side to show the volume), 4 eyedroppers (without a scale), 4 Pasteur pipette (with a scale on their side to show the volume), 4 measuring cylinder (10 cm3, with a scale on their side to show the volume), stopwatch.

(a) What equipment and how many pieces of them are needed among the above listed ones for the experiment?

(b) Which materials and how much of them would you put into the chosen equipment at the time of the preparation of the experiment?

(c) How would you start the experiment?

(d) What experiences would you expect by the end of the experiment?

(e) Based on your experiences, how could you decide, how the rate of reaction depends on the concentrations of the starting materials?

Test 3; Task 8:

You want to identify the following three water samples, with the help of acid–base indicators:

(A) Rainwater that is only slightly acidic because of the carbon dioxide dissolved in it, and its pH is 5.6.

(B) Acid rainwater that was collected in a polluted area and its pH is 2.8.

(C) A water sample collected from the Lake Balaton that has got a pH 8.0, because the waterbed is made of alkaline rocks.

You can do no more than two experiments, and you can choose indicators from the following table. (Between the two pH values belonging to the two colours, the transition colour of the indicator could be seen that changes depending on the pH.)

Name of the indicator One colour of the indicator
phenolphthalein colourless, if pH ≤ 8.2
bromothymol blue yellow, if pH ≤ 6.0
crystal violet green, if pH ≤ 0.8
litmus red, if pH ≤ 5.0
methyl orange red, if pH ≤ 3.1
Name of the indicator The other colour of the indicator
phenolphthalein purple, if pH ≥ 10.0
bromothymol blue blue, if pH ≥ 7.6
crystal violet blue, if pH ≥ 2.6
litmus blue, if pH ≥ 8.0
methyl orange orange, if pH ≥ 4.4

(a) Which indicator would you use for Experiment 1? What colour change would you expect for each of the water samples?

(b) Which indicator would you use for the Experiment 2? What colour change would you expect for each of the water samples?

(c) Would false result be obtained if identical amounts of the water samples, but different amounts of indicators were used in the experiments that you have described? Explain your answer.

(d) Would you get a different result in the experiments that you described, if we replaced (as a model) the water sample “B” by 0.1 mol dm−3 acetic acid that has got a pH 2.7? Why?


Nowadays it is accepted that it is possible to measure scientific practices in content through relatively short pencil paper static instruments (Cannady et al., 2019). These are generally easier to administer and can be used in a broader range of learning environments. The assessment criteria need to reflect closely the nature of the enquiry (inquiry), as well as the characteristics of the specific method of assessment (Kahn and O’Rourke, 2005). However, there is a fine balance between test validity and measuring the so called “deep learning, which is the relationship between the content” (Hattie, 2015). In terms of validity, the more similar the assessment tasks are to the ones used for the trials of the intervention methods under investigation are the better. But the results must show whether the students can apply the EDS at any time they need to (e.g. when they have to judge whether a scientific-looking statement is based on a well-designed scientific research or not).

Similar to Test 1, the first versions of Tests 2 and 3, including marking instructions, were made and corrected following suggestions from the university educators in the research group. Participating teachers had not seen the test before piloting the six student worksheets of the respective school years. This was to avoid tasks on Tests 2 and 3 influencing the pilot. However, to ensure test validity, Test 2 was tried with three classes/groups (N1 = 27, N2 = 12, N3 = 13, altogether 52) of 13–14 years old students not participating in the research in March 2018. For the same reason (to ensure test validity), Test 3 was tried with three classes/groups (N1 = 26, N2 = 15, N3 = 11, altogether 52) of 14–15 years old students not participating in the research in October 2018. Both tests and their marking instructions were further revised in response to results of the trial before they were filled in by the students participating in the sample.

Participating teachers marked the students’ tests, recording the marks in an Excel spreadsheet as instructed (see Supplement 2 and 3, ESI). As there was an element of subjectivity in the grading protocol, the research group tried to standardize the grading to ensure that the application of the rubric is the same for each test, as done by Goodey and Talgar (2016). An experienced chemistry teacher reviewed the teachers’ marking and suggested modifications to the marking instructions. After discussions within the team, alterations were made. Based on these, the teachers’ marks were changed to ensure that a unified marking process, free from individual teachers’ decisions was used.

Several factors can threaten the validity of inferences made of any quasi-experimental work in chemistry education research (e.g.Shadish et al., 2002; Mack et al. 2019). Comparing outcomes of the tests written by Groups 2 and 3 with those of Group 1 (control group) was done to eliminate threats due to maturation (Shadish et al., 2002). Students’ natures and circumstances vary widely. A large cohort of students is important and, therefore, our research sample consisted of 920 students, from several schools in various parts of the country. The classes were randomly assigned to either the control group or one of the experimental groups. The content of Tests 1, 2 and 3, used to measure the outcome, differed. This was unsurprisingly unavoidable (as explained in the section written about the tests). However, construct validity (Cronbach and Meehl, 1955) was achieved, at least in part, by using the same structure for all tests, i.e. the same number of items related to the levels of the Revised Bloom Taxonomy. Statistical methods (see following section) have also been used to counterbalance the covariations.

Statistical methods

The number of students (N) in each group completing the tests are shown in Table 3. The sample was smaller after each test because some students did not complete it. Following the incompletion of a test, that student was excluded from the analysis and future test.
Table 3 The number of students (N) in each group completing the tests
Test Group 1 Group 2 Group 3 Σ
Test 0 298 291 294 883
Test 1 291 283 279 853
Test 2 272 258 282 812
Test 3 240 225 257 722

The following data were collected and analysed statistically:

• Student total scores for Tests 0, 1, 2 and 3 (T0 to T3).

• Student scores (marks) for EDS tasks (Tests 0, 1, 2 and 3).

• Student scores (marks) for DCK tasks (Tests 0, 1, 2 and 3).

• Gender of the student.

• School ranking. The student's school ranking amongst Hungarian secondary schools, according to the website “legjobbiskola.hu”. The participating schools were grouped into high, medium, and low ranking categories and a categorical variable was used according to these three levels (Appendix, Table 15). This allowed a statistical assessment of the impact of participating schools “quality” on the development of the students’ knowledge and skills.

• Mother's education. Two categories were formed depending on whether or not the student's mother had a degree in higher education. This categorical variable was intended to characterise the student's socioeconomic status.

• Note: Each test contained questions concerning the students’ science/chemistry grade and his or her attitude toward science or chemistry and the scientific experiments. However, the analysis of the students’ answers of the attitude questions is not included in the present study.

The results were analysed statistically. It was assumed that apart from the three types of instructional methods used during the intervention for Group 1, 2 and 3, other parameters (e.g. gender, mother's education, school ranking) and a covariate (student scores for T0 test) had also influenced the results. Therefore, the statistical analysis of data was accomplished by analysis of covariance (ANCOVA) of the SPSS Statistics software. Theobald and Freeman (2014) demonstrated how multiple linear regression techniques can distinguish treatment effects from student factors on an outcome variable across groups (Mack et al., 2019). However, if only the effects of continuous independent variables are investigated, linear regression analysis is equivalent with using ANCOVA, where the independent variables are the covariates. Nevertheless, regression analysis cannot handle nominal variables, like the ‘groups’. In this case dummy variables must be computed. In case of ANCOVA, nominal variables are the fix factors. That is the explanation, why the statistical analysis of data gained from the tests in the present research project was accomplished by ANCOVA. In the case of multiple comparisons Bonferroni correction was applied. Details of the other parameters and covariates were published earlier (Szalay et al. 2020). However, the statistical analyses which results are published in this present paper did not investigate the changes in the students’ grades and attitude, neither their effect on the development of the students’ EDS.

Analysis of the students’ scores in tests

There were significant differences in gender, mother's education, school ranking and the student scores for T0 among the three groups. Therefore, the following method was used to create a reduced sample for Groups 1, 2 and 3 that did not differ significantly in any parameters listed above (Appendix, Table 16). First, the students were chosen in each of the Group 1, 2 and 3 who were identical in terms of each examined parameter. Then other students were involved in each group who differed from the ones that had been chosen first only in one parameter, meanwhile making sure that the number of students remained the same in each group. This process was continued until it could be ensured that the three groups were not significantly different in any of the examined parameters. This way the total number of students (N) participating in all of the four tests was reduced from 692 to 480 (160 for each group). Three tables show the distribution of the students among the categories of mothers’ education (Appendix, Table 17), gender (Appendix, Table 18), and school ranking (Appendix, Table 19), respectively in the groups.

In the ANCOVA analysis total scores on Test 1 (T1total), Test 2 (T2total) and Test 3 (T3total) were the dependent variables, respectively. According to the preliminary analysis, the mothers’ education and the gender had a significant effect on the T0 results, but had little effect on the students’ achievement in T1, T2 and T3 tests. It might be assumed that the T0 results represent differences observed among mother's education and gender. However, over time a higher ranking school may provide more opportunities and greater incentive for development than a lower ranking school. If so, this suggests that the school ranking remains an important parameter. Therefore, at the time of the further calculations, only the group and the school ranking were parameters, and prior knowledge (represented by the T0 scores) was the covariant. According to the preliminary analysis, class size had a significant effect on the results. It is not independent of the school ranking, however, as without exception, class sizes were low (<20) in the high ranking schools and high in several of the medium and low ranking schools. For this reason, class size was not used as a parameter, but it was considered as being involved in the school ranking.

According to the Shapiro-Wilk tests the test scores are normally distributed in representative small groups (N = 24 or 48) (Appendix, Table 20). There are linear relationships between the scores on T0 and scores on T2 and T3 tests (Appendix, Table 21). The same approach was taken with the analyses of the students’ scores on the DCK tasks and on the EDS tasks on Tests 1–3 (at the end of Grade 7–9, resp.).

Results and discussion

Table 4 shows the average raw scores, prior to ANCOVA analysis, and their standard deviations (SD) on the whole tests for the three groups for each test. The achievement of the Group 2 students exceeded that of the other two groups in the end of the first year of the project (T1total). Both experimental groups (Groups 2 and 3) had significantly better results than the control group (Group 1) at the end of the second school year (Grade 8, T2total). However, the experimental groups did not do significantly better than the control group by the end of the Grade 9 (T3total).
Table 4 The average of the students’ total scores and their SD-s in Test 0–3 (N = 160 for each group)
Group T0total (SD) T1total (SD) T2total (SD) T3total (SD)
Group 1 42.4% (13.2%) 35.7% (15.7%) 28.8% (20.3%) 35.4% (20.6%)
Group 2 41.9% (14.1%) 43.4% (21.6%) 40.0% (20.7%) 34.9% (21.2%)
Group 3 42.4% (12.7%) 35.6% (18.9%) 40.0% (22.3%) 38.7% (21.2%)

The ratios of the average scores of the experimental groups compared to that of the control group's in the beginning of the research project (‘Start’) and in the end of each school year (Grade 7, 8 and 9) are shown in Table 5.

Table 5 Average total scores of the experimental groups divided by the average total scores of the control group in Test 0–3 (N = 160 for each group)
Ratio Start Grade 7 Grade 8 Grade 9
Group 2/Group 1 0.989 1.217 1.388 0.988
Group 3/Group 1 0.999 0.997 1.388 1.094

To investigate which parameters might have influenced these changes, ANCOVA analysis described in the previous section was applied for the total scores and each test. Partial Eta Squared (PES) values characterising the effect sizes are shown in Table 6.

Table 6 The effects of the assumed parameters (sources) and the covariate on the changes in the whole test in the beginning of the project (T0) and in the end of Grade 7 (T1), Grade 8 (T2) and Grade 9 (T3) (N = 480)
Parameter (source) PES (partial eta squared)
T0total T1total T2total T3total
a Significant at p < 0.0125 level.
Group 0.000 0.042a 0.095a 0.007
School ranking 0.037a 0.018 0.176a 0.180a
Prior knowledge (T0total) 0.003 0.110a 0.115a

The effects of the different instruction methods used in the cases of the different groups are shown in the row ‘Group’. According to these data, the intervention had only a weak significant effect at the end of Grade 7, but a stronger effect in the end of the following school year (Grade 8). In contrast, calculations showed no significant effect of the intervention at the end of Grade 9. However, the ranking of the students’ school already had a weak but significant effect on the students’ achievement at the beginning of the project (T0). It became stronger in the second (T2) and third (T3) years. The influence of the prior knowledge (represented by the T0total) can be explained by the fact that the higher ranking schools are able to select among many more applicants than the other schools. Unfortunately, the difference among the average achievement of the students from the different schools grew by the years. The T0total has a relatively strong effect in the second and the third year. This might be caused by the better prepared students coping more easily with the growing amount of more demanding tasks.

Looking at the average scores on the tasks intended to measure DCK, the main trends are similar to those seen in the full tests. The intervention had a weak significant positive effect on the results of Group 2 in Grade 7 and on the results of both Groups 2 and 3 in Grade 8 (Table 7). This is also shown by the relative average scores (Table 8) and the PES values (Table 9). However, in Grade 9, Group 3 performed a little better on the DCK tasks than the other groups.

Table 7 The average of the students’ scores and their SD-s in the DCK tasks of T0–3 (N = 160 for each group)
Group 1 57.6% (15.5%) 41.1% (17.5%) 33.7% (22.0%) 35.6% (21.8%)
Group 2 56.2% (15.8%) 48.3% (22.6%) 44.4% (23.7%) 38.3% (21.5%)
Group 3 58.1% (16,0%) 38.6% (19.3%) 45.1% (25.8%) 41.7% (21.5%)

Table 8 Average scores in the DCK tasks of the experimental groups divided by the average scores of the control group in Test 0–3 (N = 160 for each group)
Ratio Start Grade 7 Grade 8 Grade 9
Group 2/Group 1 0.976 1.175 1.318 1.076
Group 3/Group 1 1.009 0.939 1.338 1.171

Table 9 The effects of the assumed parameters (sources) and the covariate on the changes in DCK tasks in the beginning of the project (T0) and in the end of Grade 7 (T1), Grade 8 (T2) and Grade 9 (T3) (N = 480)
Parameter (source) PES (partial eta squared)
a Significant at p < 0.0125 level.
Group 0.001 0.045a 0.070a 0.018
School ranking 0.025a 0.013 0.142a 0.123a
Prior knowledge (T0DCK) 0.002 0.039a 0.036a

School ranking had a significant positive effect on students’ DCK scores at the beginning of the project (T0). It became stronger in the second (T2) and third (T3) years (Table 9). In contrast, prior knowledge only had a positive significant effect in Grades 8 and 9.

The raw data, prior to ANCOVA analysis, showed lower EDS scores than DCK scores (Table 10). The trends shown by the changes of the relative averages compared to that of the control groups’ (Table 11), and by the changes of the Partial Eta Squared values characterising the effect of the intervention (Table 12) are similar again to the changes of the results in the whole test.

Table 10 The average of the students’ scores and their SD-s in the EDS tasks of T 0–3 (N = 160 for each group)
Group 1 27.2% (18.6%) 30.3% (21.5%) 24.0% (26.3%) 35.1% (24.9%)
Group 2 27.7% (18.8%) 38.6% (24.9%) 35.6% (26.1%) 31.5% (26.6%)
Group 3 26.7% (16.8%) 32.6% (24.9%) 34.9% (28.4%) 35.6% (26.4%)

Table 11 Average scores in the EDS tasks of the experimental groups divided by the average scores of the control group on Test 0–3 (N = 160 for each group)
Ratio Start Grade 7 Grade 8 Grade 9
Group 2/Group 1 1.018 1.274 1.483 0.897
Group 3/Group 1 0.982 1.076 1.454 1.014

Table 12 The effects of the assumed parameters (sources) and the covariate on the changes in EDS tasks in the beginning of the project (T0) and in the end of Grade 7 (T1), Grade 8 (T2) and Grade 9 (T3) (N = 480)
Parameter (source) PES (partial eta squared)
a Significant at p < 0.0125 level.
Group 0.001 0.025a 0.057a 0.002
School ranking 0.025a 0.017 0.124a 0.168a
Prior knowledge (T0EDS) 0.001 0.075a 0.090a

The intervention seemed to have a weak significant effect on the achievement of the Group 2 on the EDS tasks in the end of Grade 7. Whereas it had a somewhat stronger significant effect on the results of both experimental groups in Grade 8, with Group 2 students significantly outperforming Group 3 students. However, there was no significant difference between the average scores of the three groups on the EDS tasks in Grade 9. School ranking had a significant effect on the students’ achievement on the EDS tasks in the first year of the project. It became stronger in Grade 8 and 9, when the effect of prior knowledge also became significant.


The results might be interpreted in a way that although the intervention accelerated the development of the EDS of the experimental groups, the method used cannot raise it to a much higher level in long term than the “natural” process of the development of the abstract thinking. In other words, the concrete cases of the student sheets might have helped the experimental groups to understand the principles and practice of the experimental design that requires abstract thinking, but when students have reached Piaget's formal operational stage (Cole and Cole, 2006), they can work out how to do it by themselves. In a 2015 video interview John Hattie was asked why he thinks the average effect size of the inquiry-based learning had been found so low. He argued that inquiry-based learning is often introduced too early and before students know enough content to make a meaningful inquiry. It also could be too early in the sense that the abstract thinking is not developed enough yet.

Another possible reason for a significant increase in students’ EDS not being seen during the project's third year is that the students were less motivated by Grade 9. According to some of the participating teachers, many students had decided about their further education by this stage. If that did not need chemistry to gain a place in university, they usually put less effort in learning the subject in Grades 9 and 10.

Coincidentally, Grade 9 chemistry in Hungary requires a high level of abstract thinking relative to learning in Grades 7 and 8. This may reduce a student's enthusiasm. Perhaps they found learning chemistry hard and this also had an influence on the work they were prepared to invest. Students knew from the beginning that project test results would not count in their school chemistry assessment, perhaps enhancing these effects. This was unavoidable since teachers used different marking schemes to arrive at students’ end of semester grades. Even students intending to study chemistry further may have recognised that they do not need EDS to attend the chosen university, rightly or wrongly. The national curriculum valid from September 2020 explicitly prescribes the development of EDS, influencing textbooks and workbooks. Unfortunately, development of EDS will probably be still neglected if teachers are not convinced that EDS are important for their students’ entry to Higher Education. Assessment drives teaching and learning (e.g.Kahn and O’Rourke, 2005). Teaching EDS will only become standard practice when the final national exams include their assessment.

School ranking had a strong effect on the students’ test results Tables 6, 9 and 12). This could support both the effect of the development of the abstract thinking and the effect of the lack of motivation on Grade 9 results. It is reasonable to assume that in higher ranking schools the students’ abstract way of thinking could develop more quickly in a more challenging environment. On the other hand, those students are probably also encouraged, and often urged by their teachers and parents, to reach higher achievement in several different subjects and activities. Therefore, they might feel obliged to put more effort into writing a test, even if its result does not count at the time of their formal assessment. Snook et al. (2009), in a commentary on Hattie's book (2008), quotes the OECD volume (2005) which states that the largest source of variation in student learning is due to differences in what students bring to school – their abilities and attitudes, and family and community. Apart from these social variables the “school effects” (policies, principal, buildings, school size, curriculum, teachers) are regarded to be important and among them the teacher is considered to be the most important variable (Snook et al., 2009).

The relatively high positive development of the Group 2 students’ EDS in Grade 8 (Table 10, T2EDS) compared to Group 3 students might have been caused by the different treatments of the two groups. Group 2 students did not have to plan experiments. Therefore, those classes probably had more time to discuss with their teacher why the experiments were planned as they were (as written on the student sheets). In contrast, Group 3 students had to design experiments, in teams. Teachers facilitated each team's work and had to share their attention amongst the teams. Some teams might have successfully worked out what to do in time. But there might have been others that could not quite manage, and simply watched and followed other teams. Also, teachers might have helped those teams more to plan the experiment if they thought it was the only way to finish the lesson in time. Assuming the latter happened, it would have reduced the development of learning experimental design skills considerably compared with Group 2 students.


Summary of the results and answers to the research questions

The statistical analysis of the results measured at the end of Grades 7–9 showed that two parameters had significant effect on the students’ scores: the intervention and the school ranking (Table 13). It is shown that the effect of intervention had relatively great effect on the students’ scores on both the whole test and the EDS tasks at Grade 8. However, school ranking had a strong effect on students’ scores on the tasks intending to measure EDS at Grade 8 and an even greater effect at Grade 9. It seems school ranking had increasingly positive effect on the students’ EDS than the intervention. Table 14 shows the development of Groups 2 and 3 compared to Group 1 in Grade 7, 8 and 9. It seems that the intervention only had a significant effect on Group 2 students’ development in Grade 7 but accelerated both experimental groups’ development in Grade 8. However, the effect was temporary. The control group caught up with the experimental groups in Grade 9 in terms of the development of the EDS. Only a very weak significant effect could be detected on the DCK. The positive effect of the intervention on the development of the EDS proved to be only transitional.

The answers to the research questions are as follows.

RQ1: A significant (albeit moderate) positive change was detected in the students’ ability to design experiments (EDS) in the experimental groups in contrast to the control group. This is the result of the intervention in the second year of the project (Grade 8). However, this difference disappeared by the end of the third year (Grade 9).

RQ2: Both methods used in the interventions seemed to have a significant positive effect on DCK in Grade 8. However, only a very weak effect was detected in Group 3 in Grade 9.

RQ3: Group 2 students had significantly higher average scores on the EDS tasks of Test 2 than Group 3 students. However, no significant difference was found between Groups 2 and 3 at the end of Grade 9.


The sample was not representative of the examined cohort of the Hungarian students, let alone at international level. This is unfortunate, but it was the only way to track the effect of interventions on the students’ development for four years.

Performance on any assessment is at least partially driven by the students’ motivation for success on the measure and test taking abilities (Cannady et al., 2019), and our research is not an exception. The first sentences in the beginning of each test reminded students that the research aimed to make the teaching of chemistry interesting and effective. Asking them to complete the test according to the best of their knowledge was intended as incentive, but it probably did not convince all students. Tests and student sheets were designed to make the work more interesting. Not all students might have found this to be the case. Unsurprisingly, ability scores are influenced by motivational levels and scientific sensemaking is no exception.

Despite our efforts to organise a randomized controlled trial and evaluate the results using an appropriate statistical method, the statement asserted by Mack et al. (2019) still applies to our research: “No single study can evaluate every variable and every theoretical relationship underlying an instructional model. Therefore, intervention studies should progressively build upon one another with evidence for the relationships between people, treatments, outcomes, and settings accumulating over time.”

The instruments used (40 minutes paper-based tests) could only provide a limited picture of how students benefited from the interventions. Research literature has repeatedly found that practical work does not offer a significant advantage in developing students’ scientific conceptual understanding as measured using written tests (Abrahams, 2017, Hennah, 2019). Nevertheless, this was the most likely way to ensure all students had an equal chance to complete the test successfully. This requirement would have been even more difficult to meet at a practical exam.

Finally, one interpretation of results that showed an improvement on just one measurement, could be due to statistical noise or is something peculiar to that measurement. It is possible that other instruments would have shown different effects.

Implications and further research

This research needs to be extended. It is widely accepted, and often emphasized, how important it is for the society that science education positively influence students’ general attitude, social skills and problem solving abilities (Kahn and O’Rourke, 2005; Casner-Lotto and Barrington, 2006; Weaver et al., 2008; Snook et al., 2009; Hanson and Overton, 2010; Jaschik, 2015; Sarkar et al., 2016; George-Williams et al., 2018). Many of these skills overlap with science process skills, including the ability to design experiments (Goodey and Talgar, 2016). Both future scientists and responsible citizens benefit from these skills, for example, to recognise pseudoscience. It is also important that enough students choose science and technology-based careers (Kousa et al., 2018). But social and school variables limit the effect of instruction methods.

However, teachers, teacher educators and educational researchers should do all they can to provide the best opportunities for each and all students to learn science.

From the test results of Group 2, it may be concluded, and suggested to teachers, that discussing experimental design with students after they have done a step-by-step experiment is a worthwhile activity. However, asking students to design experiments, even in a scaffolded way (as with Group 3) needs careful planning. This is especially important when choosing the age when it will be used. Reid and Ali (2020), discussing the implications of Piaget's and Ausubel's findings, warn that the potential to think in abstract emerges slowly during adolescence. They suggest that we can only be confident that students are capable of thinking in terms of hypotheses, a critical skill in scientific thought, by the age of 16.

The question remains whether EDS could be developed further in the two experimental groups in the last part of the present project. However, the relevance of Test 4 results will be limited, because disruptions to teaching caused by Covid-19. Therefore, a thorough enquiry about the participating teachers’ knowledge and opinion concerning the intervention might help to interpret the data.

Two aspects are important in our long-term research plans. Students seem to need even more guidance and detailed instructions on how to design experiments, which is needed even at undergraduate level (George-Williams et al., 2018). The usefulness of an experimental design template, like the one described by Cothron et al. (2000), should be investigated. Apart from providing the questions to be investigated and the basic principles of experimental design, they need to be taught in a systematic way to find the equipment and materials, the independent and dependent variables, the parameters that had to be held constant, the source of possible errors etc. A meta-analysis of studies on guided inquiry instructions concluded that more specific guidance during inquiry assignment results in higher quality learning products (Lazonder and Harmsen, 2016).

Secondly, students need motivation. Chemistry topics that impact the environment and human health are likely to be favoured by students and are congruent with augmenting their self-expression in chemistry (Ross et al., 2020). This could be accompanied by the systems thinking that is a possible way to address emerging global challenges in the context of a rapidly changing world (e.g.Mahaffy et al., 2018). Therefore, many science educators’ interest has turned recently toward this approach. The prompt used should stimulate student interest and curiosity (Cothron et al., 2000) to involve them in the learning of chemistry and systems thinking seems to be a promising approach.

Conflicts of interest

There are no conflicts of interest to declare.


A: tables for statistical analysis

Table 13 The effects (Partial Eta Squared) of the significant parameters (‘sources’) on the students’ scores in Tests 1–3, and on their DCK and EDS tasks (N = 480)
Source T1 (Grade 7) T2 (Grade 8) T3 (Grade 9)
a Significant effect (p < 0.0167).
Group 0.042a 0.045a 0.025a 0.095a 0.070a 0.057a 0.007 0.018a 0.002
School ranking 0.018a 0.013 0.017a 0.176a 0.142a 0.124a 0.180a 0.123a 0.168a

Table 14 The effects (partial eta squared) of instruction on the development of experimental groups compared to that of the control group in Grade 7, 8 and 9 (N = 480)
Group T1 (Grade 7) T2 (Grade 8) T3 (Grade 9)
a Significant effect (p < 0.0167).
Group 2 0.032a 0.025a 0.024a 0.081a 0.058a 0.049a 0.001 0.009 0.001
Group 3 0.000 0.002 0.002 0.063a 0.048a 0.037a 0.007 0.017a 0.000

Table 15 Ranking of the participating schools (according to the school ranking of the website “legjobbiskola.hu”)
Ranking High Medium Low
a Number of students completing Test 0.
School ranking 4, 15, 17, 29, 32 36, 41, 72, 84, 91, 114, 170 198, 214, 289, 325, 416, 526
N 254 345 284

Table 16 There is no significant difference amog the groups in the assumed parameters in the reduced sample at the beginning of Grade 7 (N = 480)
Parameter X 2 df p
Gender 0.202 2 0.904
Mother's education 2.333 2 0.311
School ranking 4.555 4 0.336
T0total 26.13 28 0.566
T0DCK 13.83 16 0.612
T0EDS 21.00 18 0.279

Table 17 Distribution of the students among the categories of mothers’ education in the groups (N = 480)
Mothers’ education Group 1 Group 2 Group 2 Total
No degree in higher education 34 31 42 107
Has a degree in higher education 126 129 118 373
Total 160 160 160 480

Table 18 Distribution of the students among the categories of gender in the groups (N = 480)
Gender Group 1 Group 2 Group 2 Total
Boy 74 72 70 216
Girl 86 88 90 264
Total 160 160 160 480

Table 19 Distribution of the students among the categories of school ranking in the groups (N = 480)
School ranking Group 1 Group 2 Group 2 Total
Low 53 67 54 174
Medium 68 66 72 206
High 39 27 34 100
Total 160 160 160 480

Table 20 Shapiro-Wilk test of normality (N = 24)
Test/DCK or EDS tasks Statistic df p
T0 0.985 24 0.963
T1 0.979 24 0.877
T2 0.977 24 0.838
T3 0.960 24 0.438
T0DCK 0.963 24 0.496
T1DCK 0.970 24 0.673
T2DCK 0.965 24 0.540
T3DCK 0.960 24 0.445
T0EDS 0.944 24 0.205
T1EDS 0.933 24 0.113
T2EDS 0.921 24 0.062
T3EDS 0.936 24 0.134

Table 21 Linear correlations with T0 test and DCK and EDS tasks (N = 480)
T 1 T 2 T 3
a p < 0.01.
Total 0.075 0.364a 0.383a
DCK 0.042 0.229a 0.231a
EDS 0.060 0.302a 0.327a


This study was funded by the Content Pedagogy Research Program of the Hungarian Academy of Sciences. (Project No. 471026.) Many thanks for all the colleagues’ and students’ work.


  1. AAAS (American Association for the Advancement of Science), (1993), Benchmark for science literacy, New York: Oxford Press.
  2. AAAS (American Association for the Advancement of Science), (2011), Vision and change in undergraduate biology education: a call to action, Washington, DC: American Association for the Advancement of Science.
  3. Abrahams I., (2017), Minds-On Practical Work for Effective Science Learning, in Taber K. S. and Akpan B., (ed.), Science Education. New Directions in Mathematics and Science Education, Rotterdam: Sense Publishers, p. 404.
  4. Akkuzu N. and Uyulgan M. A., (2017), Step by step learning using the I diagram in the systematic qualitative analyses of cations within a guided inquiry learning approach, Chem. Educ. Res. Pract., 18, 641.
  5. Allen J. B., Barker L. N. and Ramsden J. H., (1986), Guided inquiry laboratory, J. Chem. Educ., 63(6), 533–534,  DOI:10.1021/ed063p533.
  6. American Chemical Society Committee on Professional Training (ACS), (2008), Development of Student Skills in a Chemistry Curriculum, (accessed November 2012 from: http://portal.acs.org/portal/PublicWebSite/about/governance/committees/training/acsapproved/degreeprogram/CNBP_025490).
  7. Asad khan R. M., Iqbal M., Tasneem S., (2015). The influence of parents’ educational level on secondary school students’ academic achievements in district Rajanpur, J. Educ. Pract., 6 (16), 76–79.
  8. Banchi H. and Bell R., (2008), The many levels of inquiry, Sci. Child., 46(2), 26–29.
  9. Becker B. J., (1989), Gender and science achievement: A reanalysis of studies from two meta-analyses. J. Res. Sci. Teach., 26(2), 141–169.
  10. Behmke D. A. and Atwood C. H., (2013), Implementation and assessment of Cognitive Load, Chem. Educ. Res. Pract., 14, 247–256.
  11. Bell R. L., Smetana L. and Binns I., (2005), Simplifying inquiry instruction: Assessing the inquiry level of classroom activities, Sci. Teach.72 (7), 30–33.
  12. Benny N. and Blonder R., (2018), Interactions of chemistry teachers with gifted students in a regular high-school chemistry classroom, Chem. Educ. Res. Pract., 19, 122–134.
  13. Bloom B. S., Engelhart M. D., Furst E. J., Hill W. H., Krathwohl D. R., (1956), Taxonomy of Educational Objectives: Part I, New York: Cognitive Domain, McKay.
  14. Boesdorfer S. B. and Livermore R. A., (2018), Secondary school chemistry teacher's current use of laboratory activities and the impact of expense on their laboratory choices, Chem. Educ. Res. Pract., 19, 135–148.
  15. Boyd-Kimball D. and Miller K. R., (2018), From Cookbook to Research: Redesigning an Advanced Biochemistry Laboratory, J. Chem. Educ. 2018, 95, 62–67.
  16. Böyük U., Tanık N. and Saracoglu S., (2011), Analysis of the scientific process skill levels of secondary school students based on different variables, J. Tubav Sci., 4(1), 20–30.
  17. Brederode M. E., Zoon S. A., Meeter M., (2020), Examining the effect of lab instructions on students’ critical thinking during a chemical inquiry practical, Chem. Educ. Res. Pract., 21, 1173–1182.
  18. Bretz S., Fay M., Bruck L. B., Towns M. H., (2013), What faculty interviews reveal about meaningful learning in the undergraduate laboratory, J. Chem. Educ., 90 (3), 281–288.
  19. Bruck A. D. and Towns M., (2013), Development, Implementation, and Analysis of a National Survey of Faculty Goals for Undergraduate Chemistry Laboratory, J. Chem. Educ., 90 (6), 685–693.
  20. Bruck L. B., Towns M., Bretz S. L., (2010), Faculty perspectives of undergraduate chemistry laboratory: Goals and obstacles to success. J. Chem. Educ.87 (12), 1416–1424.
  21. Bryson B., (2004), A Short History of Nearly Everything, London, UK: Black Swan, pp. 17–24.
  22. Bybee R. W., (2000), Teaching science as inquiry, in Minstrell J. and van Zee E. (ed.), Inquiring into inquiry learning and teaching in science, Washington, DC: American Association for the Advancement of Science, pp. 20–46.
  23. Cannady M. A., Vincent-Ruzb P., Chungc J. M. and Schunn C D., (2019), Scientific sensemaking supports science content learning across disciplines and instructional contexts, Contemp. Educ. Psychol., 59, 101802.
  24. Casner-Lotto J. and Barrington L., (2006), Are They Really Ready to Work? Employers’ Perspectives on the Basic Knowledge and Applied Skills of New Entrants to the 21st Century US Workforce, ERIC.
  25. Chandrasena W., Craven R., Tracey D. and Dillon A., (2014), Seeding Science Success: Relations of Secondary Students’ Science Self-concepts and Motivation with Aspirations and Achievement, Aust. J. Educ. Dev. Psychol., 14, 186–201.
  26. Chin C. and Chia L. G., (2006), Problem-based learning: using III-structured problems in biology project work, Sci. Educ., 90(1), 44–67.
  27. Chonkaew P., Sukhummek B. and Faikhamta C., (2016), Development of analytical thinking ability and attitudes towards science learning of grade-11 students through science technology engineering and mathematics (STEM education) in the study of stoichiometry, Chem. Educ. Res. Pract., 17, 842.
  28. Colburn A., (2000), An inquiry primer, Sci. Scope, 23(6), 42–44.
  29. Cole M. and Cole S. R., (2006), Fejlődéslélektan, Osiris Kiadó, Budapest, pp. 481–505, pp. 642–656, Hungarian translation of Cole M. and Cole S. R., (2001), The Development of Children, 4th edn, New York: Worth Publishers.
  30. Cooper M. M., (2013), Chemistry and the Next Generation Science Standards. J. Chem. Educ., 90 (6), 679–680.
  31. Cothron J. H., Giese R. N., & Rezba R. J., (2000), Students and research: Practical strategies for science classrooms and competitions, 3rd edn, Dubuque, IA: Kendall/Hunt Publishing Company.
  32. Cronbach L. J. and Meehl P. E., (1955), Construct Validity in Psychological Tests. Psychol. Bullet., 52, 281–302.
  33. Crujeiras-Pérez B. and Jiménez-Aleixandre M. P., (2017), High school students’ engagement in planning investigations: findings from a longitudinal study in Spain, Chem. Educ. Res. Pract., 18, 99–112.
  34. Csíkos C., Korom E. and Csapó B., (2016), Tartalmi keretek a kutatásalapú tanulás tudáselemeinek értékeléséhez a természettudományokban, Iskolakultúra, 26, 3, 17 DOI:10.17543/ISKKULT.2016.3.17.
  35. Domin D. S., (1999), A review of laboratory instructional styles, J. Chem. Educ., 76(4), 543–547.
  36. Duch B. J., Groh S. E. and Allen D. E., (2001), Why problembased learning, The Power of Problem-Based Learning: A Practical “How-to” for Teaching Undergraduate Courses in Any Discipline, Sterling, Va: Stylus, pp. 3–11.
  37. Eccles J. S., Davis-Kean P. E., (2005). Influence of parents’ education on their children’ educational attainments: the role of parent and child perceptions, London Rev. Educ., 3(3), 191–204.
  38. Edelson D.C., Gordin D.N. and Pea R.D., (1999), Addressing the Challenges of Inquiry-Based Learning Through Technology and Curriculum Design, J. Learn. Sci., 8, 391–450.
  39. Fay M. E., Grove N. P., Towns M. H. and Bretz S. L., (2007), A rubric to characterize inquiry in the undergraduate chemistry laboratory, Chem. Educ. Res. Pract., 8 (2), 212–219.
  40. Feldon D. F., Maher M. A. and Timmerman B. E., (2010), Performance-based data in the study of STEM PhD education, Science, 329, 282–283.
  41. Fradd S. H., Lee O., Sutman F. X., Saxton M. K., (2001), Promoting science literacy with English language learners through instructional materials development: A case study. Billingual Res. J., 25(4), 417–439.
  42. Furtak E. M., Siedel T., Iverson H. and Briggs D. C., (2012) Experimental and Quasi-Experimental Studies of Inquiry-Based Science Teaching: A Meta-Analysis, Rev. Educ. Res., 82, 300–329.
  43. George-Williams S. R., Soo J. T., Angela L. Ziebell A. L., Thompson C. D. and Overton T. L., (2018), Inquiry and industry inspired laboratories: the impact on students’ perceptions of skill development and engagements, Chem. Educ. Res. Pract., 19, 583.
  44. Germann P. J., (1994), Testing a model of science process skills acquisition: an interaction with parents, education, preferred language, gender, science attitude, cognitive development, academic ability, and biology knowledge, J. Res. Sci. Teach., 31(7), 749–783.
  45. Goodey N. M., Talgar C. P., (2016), Guided inquiry in a biochemistry laboratory course improves experimental design ability, Chem. Educ. Res. Pract., 17, 1127.
  46. Gott R. and Duggan S., (1998), Understanding Scientific Evidence – Why it Matters and How It Can Be Taught, in Ratcliffe M. (ed.), ASE (The Association for Science Education) Guide to Secondary Science Education, Cheltenham: Stanley Thornes, pp. 92–99.
  47. Gray J., Jesson D., Jones, B., (1986), Towards a framework for interpreting examination results, in Rodgers R. (ed.), Education and social class, London: Falmer Press, pp. 51–57.
  48. Güden C. and Timur B., (2016), Ortaokul öğrencilerinin bilimsel süreç becerilerinin incelenmesi (Çanakkale örneği) [Examining secondary school students’ cognitive process skills (Canakkale sample)], Abantİzzet Baysal Üniversitesi Eğitim Fakültesi Dergisi, 16(1), 163–182.
  49. Hanson S. and Overton T., (2010), Skills required by new chemistry graduates and their development in degree programmes, Hull, UK: Higher Education Academy UK Physical Sciences Centre.
  50. Harker R., (1995), Further comment on ‘So Schools Matter?’. N. Z. J. Educ. Stud., 30 (1), 73–76.
  51. Harker R., (1996), On ‘First year university performance as a function of type of secondary school attended and gender.’ N. Z. J. Educ. Stud., 32 (2), 197198.
  52. Harsh J. A., (2016), Designing performance-based measures to assess the scientific thinking skills of chemistry undergraduate researchers, Chem. Educ. Res. Pract., 17, 808.
  53. Harsh J. A., Esteb J. J. and Maltese A. V., (2017), Evaluating the development of chemistry undergraduate researchers’ scientific thinking skills using performance-data: first findings from the performance assessment of undergraduate research (PURE) instrument, Chem. Educ. Res. Pract., 18, 472.
  54. Hattie J., (2003), Teachers make a difference: What is the research evidence? Paper presented to the Australian Council for Educational Research annual conference: Building teacher quality, retrieved 13 February 2009 from http://www.leadspace.govt.nz/leadership/articles/teachers-make-a-difference.php.
  55. Hattie J., (2008), Visible learning: A synthesis of over 800 meta-analyses relating to achievement, London: Routledge.
  56. Hattie J., (2015), John Hattie on inquiry-based learning (video interview), 2015. nov. 9., available online at https://www.youtube.com/watch?v=YUooOYbgSUg (accessed 28 October, 2020).
  57. Hennah N., (2019), A novel practical pedagogy for terminal assessment, Chem. Educ. Res. Pract., 20, 95.
  58. Jäncke L., (2018), Sex/gender differences in cognition, neurophysiology, and neuroanatomy, F1000Research, 7, F1000 Faculty Rev-805 DOI:10.12688/f1000research.13917.1.
  59. Jaschik S., (2015), Well-Prepared in Their Own Eyes, Retrieved from Inside Higher Ed website: http://www.insidehighered.com/news 2015/01/20.
  60. Johnstone A. and Al-Shuaili A., (2001), Learning in the laboratory; some thoughts from the literature, Univ. Chem. Educ., 5(2), 42–51.
  61. Kahn P. and O’Rourke K., (2005), Understanding Enquiry-Based Learning, in T. Barrett, I. Mac Labhrainn and H. Fallon (ed.), Handbook of Enquiry & Problem Based Learning, pp. 1–12.
  62. Kambeyo L., (2017), The Possibilities of Assessing Students’ Scientific Inquiry Skills Abilities Using an Online Instrument: A Small-Scale Study in the Omusati Region, Namibia, Eur. J. Educ. Sci., 4(2), 1–21, ISSN 1857- 6036.
  63. Karar E. E. and Yenice N., (2012), The investigation of scientific process skill level of elementary education 8th grade students in view of demographic features, Proc. – Soc. Behav. Sci., 46, 3885–3889.
  64. Kirschner P. A., (1992), Epistemology, practical work and academic skills in science education, Sci. Educ., 1(3), 273–299.
  65. Kousa P., Kavonius R. and Aksela M., (2018), Low-achieving students’ attitudes towards learning chemistry and chemistry teaching methods, Chem. Educ. Res. Pract., 19, 43.
  66. Krathwohl D. R., (2002), A Revision of Bloom's Taxonomy: An Overview, Theory Into Practice, vol. 41, Number 4, College of Education, The Ohio State University, pp. 212–218.
  67. Lagowski J. J., (1990), Entry-level science courses: the weak link, ACS Publications.
  68. Lawson A. E., (2000), Managing the inquiry classroom; problems and solutions. Am. Biol. Teach., 62(9), 641–648.
  69. Lazonder A. W. and Harmsen R., (2016), Meta-analysis of inquiry-based learning: effects of guidance, Rev. Educ. Res., 86, 681–718.
  70. Lederman N.G., (2004), Laboratory experiences and their role in science education, America's lab report, Washington, D.C: National Academies Press.
  71. Lederman N. and Lederman J., (2012), Nature of scientific knowledge and scientific inquiry, in Fraser B. J., Tobin K. and McRobbie C. J. (ed.), Second international handbook of science education, Dordrecht: Springer, pp. 335–359.
  72. Lehrer R., Schauble L., (2000), Modeling in mathematics and science, New Jersey, NJ: Lawrence, Erlbaum, pp. 101–159.
  73. Lyons T., (2006), Different Countries, Same Science Classes: Students’experiences of school science in their own words. Int. J. Sci. Educ., 28(6), 591–613.
  74. Mack M. R., Hensen C. and Barbera J., (2019), Metrics and Methods Used To Compare Student Performance Data in Chemistry Education Research Articles, J. Chem. Educ.96, 401–413.
  75. Mahaffy P. G., Edward J. Brush E. J., Julie A. Haack J A., Ho F. M., (2018), Journal of Chemical Education Call for Papers—Special Issue on Reimagining Chemistry Education: Systems Thinking, and Green and Sustainable Chemistry, J. Chem. Educ., 95, 10, 1689–1691.
  76. Markic S., Abels S., (2014), Heterogeneity and diversity: a growing challenge or enrichment for science education in German schools? Eurasia J. Math., Sci. Technol. Educ., 10(4), 271–283.
  77. Martina E. N., (2007), Determinants of J.S.S. studentsallenge or enrichment for science educass skills, Master Thesis, Nsukka: University of Nigeria.
  78. Masnick A. M. and Klahr D., (2003), Error matters: An initial exploration of elementary school children's understanding of experimental error. J. Cognit. Dev., 4, 67–98.
  79. Mayer J., (2007), Inquiry as Scientific Problem Solving, in Kruger D. and Vogt H. (ed.), Theories in Biology Didactic, Heidelberg: Springer, pp. 177–186.
  80. McDonnell C., O’Connor C. and Seery M. K., (2007), Developing practical chemistry skills by means of student-driven problem based learning mini-projects, Chem. Educ. Res. Pract., 8(2), 130–139.
  81. Metz K. E., (1998), Emergent understanding of attribution and randomness: Comparative analysis of the reasoning of primary grade children and undergraduates, Cognit. Instr., 16, 285–365.
  82. Nagy L.-é., Korom, E., Pásztor, A., Veres, G., & B. Németh, M., (2015), A természettudományos gondolkodás online diagnosztikus értékelése [Online diagnostic assessment of scientific reasoning], in Csapó B., Korom E. and Molnár G. (ed.), A természettudományi tudás online diagnosztikus értékelésének tartalmi keretei [Framework for the online assessment of scientific reasoning, in Hungarian]. Budapest: Oktatáskutató és Fejlesztő Intézet.
  83. National Curriculum of Hungary, (2012), available on line: https://ofi.oh.gov.hu/sites/default/files/attachments/mk_nat_20121.pdf (accessed 28 October, 2020).
  84. NGSS, (2013), Next Generation Science Standards, https://www.nextgenscience.org/search-standards (accessed 28 October, 2020).
  85. Nowak K. H., Nehring A., Tiemann R. and Upmeier zu Belzen A. (2013) Assessing students’ abilities in processes of scientific inquiry in biology using a paper-and-pencil test, J. Biol. Educ., 47(3), 182–188.
  86. NRC (National Research Council), (1996), National science education standards, Washington, DC: National Academies Press.
  87. NRC (National Research Council), (2005), America's Lab Report: Investigations in High School Science, Washington, DC: National Academies Press.
  88. NRC: National Research Council, (2012), A framework for K-12 science education: Practices, crosscutting concepts, and core ideas, National Academies Press.
  89. Ocak I, Tümer H., (2014), Thcass primary school'students (Afyonkarahisar sample), Afyon Kocatepe Univ. J. Sci. Eng., 14, 1–21.
  90. OECD, (2005), Teachers matter: Attracting, developing and retaining effective teachers, Overview, Paris: OECD, p. 2, http://www.oecd.org/dataoecd/39/47/34990905.pdf (accessed 28 October, 2020).
  91. OECD, (2017), PISA 2015 Technical Report, Computer-based tests, ch. 18, pp. 369–374.
  92. OECD, (2019), “PISA 2018 Science Framework”, PISA 2018 Assessment and Analytical Framework, OECD Publishing, Paris,  DOI:10.1787/f30da688-en (accessed 28 October, 2020).
  93. Ofoegbu L. I. J., (1984), Acquisition of science process skills among elementary pupils in some northern states of Nigeria, Unpublished PhD dissertation, Nsukka: University of Nigeria.
  94. Onukwo G. I. N., (1995), Development and validation of a test of science process skills in integrated science, Unpublished PhD dissertation, Nsukka: University of Nigeria.
  95. Paas F., Tuovinen J. E., et al., (2003), Cognitive Load Measurement as a Means to Advance Cognitive Load Theory, Educ. Psychol., 38(1), 63–71.
  96. Padilla M., (1990), The Science Process Skills, Paper presented at the annual meeting of the National Association for Research in Science Teaching, French Lick, IN.
  97. Phillips K. A. and Germann P. J., (2002), The inquiry ‘I’: a tool for learning scientific inquiry, Am. Biol. Teach., 64(7), 512–520.
  98. Powell J. and Anderson R. D., (2002), Changing teachers' practice: curriculum materials and science education reform in the USA, Stud. Sci. Educ., 37, 107–135.
  99. PRIMAS project, Deliverable D.4.1, Version 1, 31st March, 2011: Promoting inquiry-based learning (IBL) in mathematics and science education across Europe, PRIMAS guide for professional development providers, available online: https://primas-project.eu/wp-content/uploads/sites/323/2017/10/PRIMAS_Guide-for-Professional-Development-Providers-IBL_110510.pdf (accessed 28 October, 2020).
  100. Reid N. and Ali A. A., How do students learn?, Making Sense of Learning, Switzerland, AG: Springer Nature, 2020, pp. 18–19.
  101. Rocard M., (2007), Science Education NOW: A Renewed Pedagogy for the Future of Europe, Brussels: European Commision. Directorate-General for Research.
  102. Reed J. J. and Holme T. A., (2014), The Role of Non-Content Goals in the Assessment of Chemistry Learning, in Kendhammer L. K. and Murphy K. L., (ed.), Innovative Uses of Assessment for Teaching and Research, Washington, DC: American Chemical Society, pp. 147–160.
  103. Rodriguez J.-M. G. and Towns M. H., (2018), Modifying Laboratory Experiments To Promote Engagement in Critical Thinking by Reframing Prelab and Postlab Questions, J. Chem. Educ., 95, 2141–2147.
  104. Roehrig G. H., Luft J. A., (2004), Inquiry Teaching in High School Chemistry Classrooms: The Role of Knowledge and Beliefs, J. Chem. Educ., 81, 10, 1510.
  105. Ross J., Guerra E. and Gonzales-Ramos S., (2020), Chem. Educ. Res. Pract., 21, 357–370.
  106. Rueckert L., (2008), Tools for the Assessment of Undergraduate Research Outcomes, in Miller R. L. and Rycek R. F. (ed.), Developing, Promoting and Sustaining the Undergraduate Research Experience in Psychology, Washington, DC: Society for the Teaching of Psychology, pp. 272–275.
  107. Sarkar M., Overton T., Thompson C. and Rayner G., (2016), Graduate Employability: Views of Recent Science Graduates and Employers, Int. J. Innov. Sci. Math. Educ., 24(3), 31–48.
  108. Schafer A. G. L. and Yezierski E. J., (2020), Chemistry critical friendships: investigating chemistry-specific discourse within a domain-general discussion of best practices for inquiry assessments, Chem. Educ. Res. Pract., 21, 452.
  109. Schwab J. J., (1962), The teaching of science as enquiry, in Schwab J. J. and Brandwein P. F. (ed.), The teaching of science, Cambridge, MA: Harvard University Press.
  110. Seery M. K., Jones A. B., Kew W. and Mein T., (2019), Unfinished Recipes: Structuring Upper-Division Laboratory Work To Scaffold Experimental Design Skills, J. Chem. Educ., 96, 53–59.
  111. Shadish W. R., Cook T. D., Campbell D. T., (2002), Experimental and Quasi-Experimental Designs for Generalized Causal Inference, Houghton Mifflin: Boston, MA, p. 2002.
  112. Shiland T. W., (1999), Constructivism: the implication for laboratory work, J. Chem. Educ., 76(1), 107–109.
  113. Simmons P. E., Emory A., Carter T. Coker T., Finnegan B., Crockett D., Richardson L., Yager R., Craven J., Tillotson J., Brunkhorst H., Twiest M., Hossain K., Gallagher J., Duggan-Haas D., Parker J., Cajas F., Alshannag Q., McGlamery S., Krockover J., Adams P., Spector B., LaPorta T., James B., Rearden K., Labuda K., (1999), Beginning Teachers: Beliefs and Classroom Actions, J. Res. Sci. Teach., 36, 930–954.
  114. Sjřberg S. and Schreiner C., (2006), How do students perceive science and technology? Sci. Sch., 1, 66–69.
  115. Snook I., O’Neil J., Clark J., O’Neil A., Opneshaw R., (2009), Invisible Learnings: A commentary on John Hattie's book Visible learning: A synthesis of over 800 metaanalyses relating to achievement, N. Z. J. Educ. Stud.44(1):93–106.
  116. Sweller J., (1988), Cognitive Load during Problem Solving: Effects on Learning, Cognit. Sci., 12, 257–285.
  117. Sweller J., (2004), Instructional Design Consequences of an Analogy between Evolution by Natural Selection and Human Cognitive Architecture, Instr. Sci., 32(1/2), 9–31.
  118. Szalay L., Tóth Z., (2016), An inquiry-based approach of traditional'step-by-step’ experiments, Chem. Educ. Res. Pract., 17, 923–961.
  119. Szalay L., Tóth Z., Kiss E., (2020), Introducing students to experimental design skills, Chem. Educ. Res. Pract., 21, 331 – 356.
  120. Tafoya E., Sunal D., Knecht P. (1980). Assessing inquiry potential: a tool for curriculum decision makers. Sch. Sci. Math., 80, 43–48.
  121. Theobald R. and Freeman S., (2014), Is It the Intervention or the Students? Using Linear Regression to Control for Student Characteristics in Undergraduate STEM Education Research, CBE-Life Sci. Educ., 13, 41–48.
  122. Timmerman B., Strickland D., Johnson R. and Payne J., (2010), Development of a universal rubric for assessing undergraduates’ scientific reasoning skills using scientific writing, [Online]. University of South Carolina Scholar Commons, http://scholarcommons.sc.edu/.
  123. Tosun C., (2019), Scientific process skills test development within the topic “Matter and its Nature” and the predictive effect of different variables on 7th and 8th grade students’ scientific process skill levels, Chem. Educ. Res. Pract., 20, 160–174.
  124. Triona L. M., Klahr D. (2010). Point and click or grab and heft: Comparing the influence of physical and virtual instructional materials on elementary school students' ability to design experiments. Cognit. Instr., 21(2), 149–173.
  125. Underwood S., Posey L., Herrington D., Carmel J., Cooper M., (2018), Adapting Assessment Tasks To Support Three-Dimensional Learning. J. Chem. Educ.95 (2), 207–217.
  126. van Merrienboer J. J. G., Kirschner P. A., et al., (2003), Taking the Load Off a Learner's Mind: Instructional Design for Complex Learning, Educ. Psychol., 38(1), 5–13.
  127. Walker M., (2007), Teaching inquiry based science, LaVergne, TN: Lightning Source.
  128. Walker J. P., Sampson V., Southerland S. and Enderle P. J., (2016), Using the laboratory to engage all students in science practices, Chem. Educ. Res. Pract., 17, 1098.
  129. Walters Y. B. and Soyibo K., (2001), An analysis of high school student's performance on five integrated science process skills, Res. Sci. Techol. Educ., 19(2), 133–143.
  130. Weaver G. C., Russell C. B. and Wink D. J., (2008), Inquiry-based and research-based laboratory pedagogies in undergraduate science, Nat. Chem. Biol., 4, 577.
  131. Wenning C. J., (2007), Assessing inquiry skills as a component of scientific literacy. J. Phys. Teach. Educ. Online, 4(2), 21–24.
  132. Williams W. M., Papierno P. B., Makel M. C. and Ceci S. J., (2004), Thinking like a scientist about real world problems: the Cornell institute for research on children science education program, J. Appl. Dev. Psychol., 25, 107–126.
  133. Woolfolk A., (2005), Educational Psychology, Boston, MA: Allyn and Bacon.
  134. Xu H. and Talanquer V. (2013), Effect of the level of inquiry of lab experiments on general chemistry students’ written reflections. J. Chem. Educ.90 (1), 21–28.
  135. Zeidan A. H. and Jayosi M. R., (2015), Science process skills and attitudes toward science among Palestinian secondary school students, World J. Educ., 5(1), 13–24.
  136. Zimmerman C., (2000), The development of scientific reasoning skills, Dev. Rev., 20, 99–149.
  137. Zimmerman C., (2005), The Development of Scientific Reasoning Skills: What Psychologists Contribute to an Understanding of Elementary Science Learning, Final Draft of a Report to the National Research Council Committee on Science Learning Kindergarten through Eighth Grade, Illinois State University.
  138. Zimmerman C., (2007), The development of scientific thinking skills in elementary and middle school, Dev. Rev., 27, 172–223.
  139. Zolller U. and Tsaparlis G., (1997), Higher-order and lower-order cognitive skills: the case of chemistry, Res. Sci. Educ., 27, 117–130.


Electronic supplementary information (ESI) available. See DOI: 10.1039/d0rp00338g

This journal is © The Royal Society of Chemistry 2021