Introducing students to experimental design skills

L. Szalay; Z. Tóth; E. Kiss

doi:10.1039/C9RP00234K

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/C9RP00234K (Paper) Chem. Educ. Res. Pract., 2020, 21, 331-356

Introducing students to experimental design skills

L. Szalay *^a, Z. Tóth ^b and E. Kiss ^a
^aEötvös Loránd University, Faculty of Science, Institute of Chemistry, Pázmány Péter sétány 1/A, H-1117 Budapest, Hungary. E-mail: luca@caesar.elte.hu
^bUniversity of Debrecen, Faculty of Science and Technology, Institute of Chemistry, Egyetem tér 1., H-4032 Debrecen, Hungary

Received 9th October 2019 , Accepted 31st October 2019

First published on 3rd December 2019

Abstract

The results of an earlier empirical research study on modifying ‘step-by-step’ instructions to practical activities requiring one or more steps of the experiments to be designed by students initiated a longitudinal study to investigate the effectiveness of the approach for younger students and over a period of time. The longitudinal study that followed took the form of a four year research project that began in September 2016. Over 900 students have been involved. All were 12–13 years old in the beginning of the study. Each year they spend six lessons carrying out practical activities using worksheets we provide. This paper reports the findings of the first year, when the participating classes were allocated to one of three groups. Group 1 was the control group. Students simply followed the step-by-step instructions. Groups 2 and 3 were experimental groups. Group 2 students not only followed the same instructions, but also had to complete experimental design tasks on paper. Group 3 students followed the same instructions, but one or more steps were incomplete and students were required to design these steps. The impact of the intervention on the students’ experimental design skills, disciplinary content knowledge and attitude toward chemistry is measured by structured tests. After the first school year of the project it was clear that the type of instruction only had a weak significant positive effect on the results of the Group 2 students’ disciplinary content knowledge. No significant effect of the intervention could be detected on the changes in the students’ grades and attitudes toward the subject, which only seemed to depend on the ranking of their schools. This paper provides the interesting details of the results of the first year (pilot) of the research and discusses changes to the approach that have been made for the remaining three years of the project.

Introduction

Achievement goals and their assessment in Hungary

Hungarian chemistry students may be divided into two categories. The progress and achievements of Category 1 students are assessed by their chemistry teachers. There is no external evaluation. Category 2 students are those wishing to attend certain university courses. They sit an externally set and assessed final chemistry examination at the end of the secondary school, when they are 18–19 years old. In the last five years Category 2 students constituted just 3% of all students who completed their secondary school course (statistics published by the Educational Authority of Hungary, 2018). It should also be noted that not all students progress to secondary school. Therefore, the vast majority of Hungarian students are assessed internally, using tests chosen or written by their chemistry teachers and marked by them. In accord with the principles of fair testing, the chemistry teachers only ask questions on content they have taught and assess students in ways familiar to their students.

The Hungarian National Curriculum (National Curriculum of Hungary, 2012) sets ambitious goals for achievement in scientific literacy. In the 2018 revision (Plan of the National Curriculum of Hungary, 2018) the goal of being able to plan and design chemical experiments has been added. However, there is no mechanism for checking whether or not those goals have been achieved by Category 1 students. A possible approach is for students to do chemistry practicals designed to develop scientific literacy.

This can be time-consuming. Teachers must be convinced that it is worth the effort. Many are reluctant to stray from content knowledge prescribed by the curriculum and are attached to their established textbooks. Providing tried and tested teaching materials, readily and freely available, may alleviate this problem. It is important, however, to recognise that teachers will only use these teaching materials if consideration is given to the conditions and constraints under which they work. New teaching materials must be relatively straightforward to apply to their own circumstances.

The present National Curriculum of Hungary (2012) supports Seviana and Talanquer's statement (2014) that besides future professional chemists all responsible citizens need to be scientifically literate. It follows that contemporary science education should produce scientifically literate adults. Further, even those who do not pursue a career in science will benefit from the skills taught in the classroom (Metz, 2004; ONeill and Polman, 2004; Ford and Forman, 2006).

Despite curriculum revisions, Hungarian students’ PISA (Program for International Student Assessment) scores in science have decreased. Discussions as to why the scores have decreased have increased, intensifying when the latest PISA results are published (e.g.OECD, 2007; OECD 2016). Stakeholders in public education, including politicians and journalists, call for changes in the teaching of science. They demand that students are taught knowledge that can be applied in real life situations. Curricula come and go, and various projects are started and finished. Still systematic and compulsory measurement of the development of competencies in science at the national level does not happen. Chemistry teachers of Category 1 students remain responsible for the approaches they use to teaching, learning and assessment. There is no watchdog.

The PISA 2015 framework (OECD, 2013) states that scientific literacy (SL) requires knowledge of the concepts and theories of science. It also requires knowledge of the common procedures and practices associated with scientific enquiry, together with an understanding of how these procedures and practices enable scientific advancement. (Note: The PISA 2015 Draft Science Framework uses the term ‘enquiry’. Enquiry and inquiry are often used interchangeably. However, the former usually refers to asking about something and the latter to a formal investigation. The PISA 2015 Draft Science Framework also uses ‘competence’ instead of competency. Competence is the ability to do something successfully. Competency is the skill set required for a person to display competence.)

There are three components of the scientific knowledge in the PISA 2015 framework: content knowledge (concepts and ideas of science); procedural knowledge (procedures and strategies used in scientific enquiry); and epistemic knowledge (the manner in which way ideas are justified in science). In each knowledge area three aspects of SL are assessed: explain phenomena scientifically; evaluate and design scientific enquiry; and interpret data and evidence scientifically (OECD, 2013; OECD, 2016).

Scientific thinking and its development by authentic inquiry

The science education research literature suggests that the problem concerning the learning opportunities to gain scientific literacy is not country-specific. Kuhn (2010) asserted that science education does not necessarily involve scientific thinking. Considering learning experiences commonplace in much of science education, she observed that information may be presented or a phenomenon demonstrated, with questions the new information is intended to answer either left unclear or externally imposed. She argued that students may, in such cases, respond routinely, avoiding scientific thinking completely. She defined scientific thinking as knowledge seeking. This includes any purposeful thinking intended to enhance the seeker's knowledge. Chinn and Malhotra (2001, 2002) argued that if schools do not focus on the ‘core attributes’ of authentic inquiry, then the cognitive processes developed will be very different from those used in real inquiry; moreover, students may develop epistemological understanding that is not just different, but antithetical to that of authentic science. However, Zimmermann (2007) quoted Chinn and Malhotra who also warned that “there is no way to condense authentic scientific reasoning into a single 40 to 50 min science lesson”. Zimmermann concluded that curricula will need to incorporate numerous composite skills, and further research will be needed to determine in what order such skills should be mastered, and which early acquisitions are most effective at supporting the development of subsequent acquisitions. According to Zimmermann (2007) the approach taken to assessing the effectiveness of instructional interventions is also an issue to be resolved. The literature contains a vast array of examples of practical and theoretical approaches to help students acquire and consolidate experimental design skills. Their relative merits are the subject of an ongoing debate. In an educational climate that endorses increased standardized testing as one method of accountability, assessments of scientific thinking are subject to the discussions and disagreements about whether they are valid measures.

According to Zimmermann's definition (2007), scientific thinking includes the skills involved in inquiry, experimentation, evidence evaluation and inference that lead to conceptual change or scientific understanding. She also quoted other authors (p. 173): “Scientific thinking is defined as the application of the methods or principles of scientific inquiry to reasoning or problem-solving situations, and involves the skills implicated in generating, testing and revising theories, and in the case of fully developed skills, to reflect on the process of knowledge acquisition and change (Koslowski, 1996; Wilkening and Sodian, 2005; Kuhn and Franklin, 2006). Participants engage in some or all of the components of scientific inquiry, such as designing experiments, evaluating evidence and making inferences in the service of forming and/or revising theories about the phenomenon under investigation.”

Inquiry has been characterised in various ways. However, the active involvement of students by engaging them in scientific practices and collaboration is typically mentioned as characteristics (Hake, 1998; Pintrich, 2003; Minner et al., 2010). This interpretation embraces epistemic processes of scientists, including posing questions, designing experiments and argumentation. Holistic learning objectives are also important features. Researchers and practitioners often differentiate types of inquiry. For example, in the classification made by Bell et al. (2005), there are four levels which differ in how much information the teacher provides. In a confirmatory inquiry the research question and procedure are provided. Students only have to confirm a previously taught relationship. A structured inquiry is similar, except that students do not know the expected outcome. In a guided inquiry, students are given a research question and have to develop a procedure to solve the problem. In an open inquiry, students come up with their own research question and procedure. The type of inquiry when the teacher poses the research question but the students must design and accomplish the experiment is also called bounded inquiry by Wenning (2007). According to some other interpretations, for example, the definition in a survey report published in the PRIMAS project (PRIMAS, 2013), inquiry only happens when students inquire and pose questions, explore and evaluate, and the problems they address are relevant to them. In this sense, only the guided or bounded inquiry and the open inquiry could be considered as inquiry-based learning.

Instructional methods to learn the process skills of investigation

Kuhn (2010) said that there is a divergence of opinion as to the most productive instructional methods to learn the process skills of investigation and inference that lie at the heart of authentic scientific thinking. The control of variables is central to all investigative work. It is key to experimental design and is a domain-general strategy (Klahr, 2000). Klahr and colleagues (Chen and Klahr, 1999; Klahr and Nigam, 2004) focused on single-session direct instruction, specifically of the control-of-variables strategy. Other studies engaged children in the practice of scientific inquiry over longer periods of time (Kuhn and Pease, 2008; Lehrer et al., 2008). Direct instruction about controlling variables was effective, but diminished over time, as did the ability to transfer the knowledge to other situations. The Dean and Kuhn (2007) study showed that sustained practice with problems requiring the strategy overcame issues of knowledge retention and transfer. Further, they found that a group engaged in practice alone, with no direct instruction, performed as well after several months as the group who in addition had received direct instruction. Kuhn and Phelps (1982) in their investigation using fourth- and fifth-graders found that successful students understood the principles of experimental planning and could apply them. The researchers speculated that students who considered and discarded invalid and retained valid strategies achieved some level of metastrategic understanding. Without any direct instruction but with frequent practice, half of the students were able to produce successful solutions consistently and these students were more likely to employ valid experimentation strategies and to understand why such strategies were effective. According to Øyehaug and Holt (2013) it may be a necessary step for students to develop fragmented and incomplete understanding and draw wrong conclusions in the learning process. It is not a problem if students later restructure and reorganize their knowledge structures. Bullock and Ziegler (1999) used story problems that involved manipulating a number of variables to determine which were important in producing a particular outcome. In a longitudinal study to assess the logic of experimentation, children were tested once a year beginning in the third grade with data reported through to the sixth grade. Children were able to produce tests to compare two things, but it was not until the fifth grade that the production of appropriate controlled tests was evident.

Kanari and Millar (2004) argued that authentic scientific thinking involves an evaluation of primary data sources. Kuhn and Ho (1980) examined children's inferences from data they collected themselves or from second-hand data already collected by another child. Children evaluating secondhand data did make progress, but not to the same extent and speed as children who conducted the experiments. Kuhn and Ho suggested that an ‘anticipatory scheme’ that results from designing and generating data may be responsible for the differences in progress. In another empirical research study, an awareness of one's memory limitations and the need to keep records appeared to emerge between the ages of 10 and 13 and is directly related to successful performance (Siegler and Liebert, 1975).

Lehrer et al. (2001) suggested that experimentation can (or should) be thought of as a form of argument. Therefore, the experiment should be more closely aligned with the idea of modelling rather than the canonical method of investigation. According to Simon (2001), students need to learn science in contexts in which they are able to find patterns in the world, where curiosity and surprise are fostered, and such contexts would be ‘authentic’. This might also influence the students’ attitudes towards science. Attitude toward science is important, since it determines the level of interest, sustains the engagement and motivates to take action (Schibeci, 1984).

Constraints to conducting inquiry

Data collected and evaluated at the time of a survey mentioned above (PRIMAS, 2013) showed that Hungary belonged to the groups of countries where teachers were less positively oriented towards the inquiry-based learning and saw system restrictions as more of a hindrance than their colleagues of some other nationalities. Several reasons can be found in the literature as to why inquiry is not used as frequently as it is suggested by many educational experts. One of the difficulties is the assessment of such activities (e.g.Zoller, 2001; Branan and Morgan, 2010; Briggs et al., 2011; Seviana and Talanquer, 2014; Crujeiras-Pérez and Jiménez-Aleixandre, 2017). Another is teachers’ beliefs (e.g.Cheung, 2011; Herrington et al., 2011). Cheung (2011) wrote that some non-users may believe guided-inquiry labs are not feasible in their chemistry classes due to constraints such as large class size, lack of time and the need to prepare students for public examinations. Another belief is that inquiry might lead to less robust comprehension of science concepts than when they are taught by more traditional teaching strategies (Criswell, 2012). Herrington et al. (2011) found that in most cases substantial changes to classroom instruction did not occur until after the materials adaptation component (when teachers were asked to try and develop an activity to introduce a topic, as opposed to one that requires students to apply many concepts in designing a procedure). Boesdorfer and Livermore (2018) pointed out that while most teachers are using laboratory activities regularly at the current funding levels, monetary and time expenses were shown to impact the specific choice of laboratory activity.

Previous results

In our earlier empirical research project (Szalay and Tóth, 2016) step-by-step instructions for two relatively simple and inexpensive experiments, well known to the teachers, were modified to practical activities requiring one or more steps to be designed by the students. All students did the same experiments. However, the control group students were given full step-by-step instructions, while the experimental group students were given incomplete instructions and asked to design the missing step or steps. The impact of the intervention was measured by tests taken before and after the intervention, the results of which were statistically analysed. A positive change in the designing skills in both the control group and the experimental group was observed. However, the experimental group's scores were significantly higher.

Research questions (RQ)

The hypothesis emerging from the previous results described above was that it is useful to modify step-by-step practical laboratory activities to ones where the same experiments have to be partially designed by students. As a result of the intervention, students might learn and consolidate experimental design skills that are important components of scientific literacy. It was also assumed that this method is valuable even if it is used only a few times in each of the four school years while Hungarian students learn chemistry. On the other hand, the importance of actually carrying out the designed experiments was questioned. It might be enough to solve experimental design tasks on paper. However, any students should not be deprived of carrying out experiments, since that would be a bad practice and because of that morally unacceptable (Taber, 2014). Therefore, the following question was raised: if another experimental group of students not only follows step-by-step instructions while doing experiments (in the same way as the control group), but also gets tasks on paper to design experiments (‘theoretical experimental design’), how would their experimental design skills develop compared to the ones who carried out the experiments they designed (‘experimental design in practice’)? We wanted to study the long term impact of these limited instructions on the development of the students’ experimental design skills (EDS), disciplinary content knowledge (DCK), and attitude toward the subject and the scientific experiments. It was necessary to assess and compare the effectiveness of those two instructional interventions in randomised controlled trials in which the control group did not design experiments, but only followed step-by-step instructions to carry out experiments. According to this, the following research questions were composed.

RQ1: Is there any significant change in the students’ ability to design experiments (experimental design skills, EDS), in any of the experimental groups compared to the control group, as a result of the intervention?

RQ2: Do students in the experimental groups achieve significantly different scores on questions that assess disciplinary content knowledge (DCK) than students in the control group do, as a result of the intervention?

RQ3: Does it matter if the students carry out the designed experiments, or does designing the experiments in theory (a paper-based activity) have a similar effect?

RQ4: Does the intervention change how much the students like science/chemistry; how important they think experiments are in science to test an idea and whether they prefer step-by-step experiments to the ones they must design?

It was also thought that if any (or both) type(s) of instructions has (or have) a positive effect, the ready-made teaching materials used for the research would provide a significant help for the teachers to develop experimental design skills, given the constraints and conditions under which teachers work.

Research method and research design

The main goal of our empirical research was to try out the two different types of practice opportunities described under the heading ‘Research questions (RQ)’ and compare their results with those of the students who only carry out ‘step-by-step’ experiments. The long term plan is to carry out a longitudinal research over four school years that started in September 2016 in Hungary. The research group consists of twenty-four in-service chemistry teachers, five university chemistry lecturers and some pre-service chemistry teacher students (four of them have been involved). In each school year of the four year research project six student worksheets and teacher guides are written and piloted (altogether twenty-four over the duration of the project). At the start of the study (September 2016) the participating 12–13-year-old students (seventh graders) took the first test (called Test 0). At the end of each school year they also take tests (Test 1 in the end of the first school year, Test 2 in the end of the second school year, etc.). In this present study the results of the first year of the four year research project are published as the outcome of a pilot of the research method.

The research model is summarised in Fig. 1.


	Fig. 1 Research model of the first year (pilot) of the four year research project.

For each group the intervention took eight chemistry lessons in the school year 2016/2017. Teachers could choose the dates on which the lessons would take place. In the first lesson students took Test 0. In the next six lessons they followed their student worksheets to complete the practical activities. Students took Test 1 in the eighth (final) lesson.

Sample

The students came from eighteen Hungarian secondary schools, thirty-one classes taught by twenty-four teachers. These teachers were voluntary participants. Many had participated in a professional development program some years before the empirical research project started. Participating students must remain in the same school for the four years of the project. This only allows students in schools that teach chemistry from grade 7 to grade 10 to participate.

Students are selected for entry to schools by an entrance examination. Those not selected remain in their primary schools and may sit a further entrance exam again at the age of 14. Therefore, students participating in the project represent a sample of the higher achieving students rather than the whole school population. This is unfortunate, but there is no practical way to follow the development of the students’ knowledge and skills for four years who change school at the age of 14.

920 students were involved, all 12–13-year-old 7th graders. The students’ classes were assigned randomly to the following three groups:

Group 1: carry out step-by-step experiments only.

Group 2: carry out the step-by-step experiments as Group 1 and do theoretical experimental design tasks on paper.

Group 3: carry out the step-by-step experiments as Group 1, but with one or more steps missing. Students are required to design the missing steps and complete the experiment. For example, they had to choose the necessary materials and equipment from the ones provided or decide how to identify and control the variables or determine the right order of the steps or a variation of them (see Table 1).

Table 1 Topics, student worksheets and teacher guides used in the school year 2016/2017 and the experimental design tasks given to the Group 2 and Group 3 students

No.	Topic	Experiments that Group 1 and Group 2 students had to do following step-by-step instructions, but Group 3 students had to plan before doing the experiment	Experimental design tasks for Group 2 students done as paper-based exercises after following step-by-step instructions for the same experiments that Group 1 students had to do
1	The particle model of matter	Students were given some water-based paint. They had to determine whether the paint particles move faster in cold water or in hot water. The Group 3 students were given a list of materials and equipment that were available and may be used.	1. Students were asked to design an experiment to show that, as well as the paint particles, water particles also move faster at higher temperatures. They are told that there is a way in which water particles can be marked, enabling their whereabouts to be tracked.
1	The particle model of matter		2. Students were asked to design an experiment to find out if particles move faster in hot air than in cold air.
2	Physical and chemical properties of matter	When baking powder becomes moist, carbon dioxide gas is produced and helps make cookies rise. Students had to find out from which two components of baking powder carbon dioxide gas is produced.	1. Students were told that some people in the old days believed that baking powder is more effective to rise cookies when it is dissolved in vinegar. They were asked to design an experiment to investigate this claim.
2	Physical and chemical properties of matter		2. Baking soda is also used to make cakes rise because when it is heated strongly carbon dioxide gas is produced. Students were asked to design an experiment to support this statement.
3	Dissolution and bonding	Students were asked to determine whether iodine is more soluble in oil than in water. Group 3 students were given a list of materials and equipment that were available and may be used. Oil was not among the materials, but petrol was provided. (Students got to know from the preliminary experiments that oil mixes with petrol.)	Students were reminded of the coloured layers in goulash soup. They were asked whether red paprika powder is water soluble or oil soluble, and to design an experiment to test their idea.
4	Composition of solutions	Rum was added to a chocolate sauce of the ‘Gundel’ pancake and then ignited at the table. The alcohol in the rum burnt with a blue flame and made the dish tasty. Students were told to assume that half of the volume of the chocolate sauce was rum and had to determine the minimum concentration of the rum needed to set alight the chocolate sauce.	1. Students were told that often it is important to know the concentration of a solution. A saline drip (0.9% by mass solution of salt) was given as an example and it was explained that solutions of higher or lower concentrations may be dangerous, even lethal. Students were asked to design an experiment to determine the concentration of a salt solution.
4	Composition of solutions		2. Students were told that sugar in grapes and other fruits is changed to alcohol by the fungi in the yeast. They produce carbon dioxide. However, the fungi die if the alcohol content is too high. They were asked to plan an experiment to determine the maximum alcohol concentration in which the fungi yeast can grow.
5	Separation of mixtures	Students were reminded of the Cinderella fairy tale and how her wicked stepmother threw lentils into the ashes of a fire and made Cinderella pick out the lentils. They were asked to imagine an even odder stepmother who mixed iron filings, copper sulfate powder, sand and mustard seed. Students had to separate this mixture into its components.	1. Students were told that the main components of ‘Vegeta’ (a stock cube) are dried vegetables, salt, flavouring (insoluble in water) and food colouring. Students were asked to use their knowledge of separation techniques to devise a method for separating the main components of ‘Vegeta’.
5	Separation of mixtures		2. Students were told that ‘Vegeta’ also contains fats and oils. They were asked to plan a method for separating these substances from ‘Vegeta’.
6	Identification of materials	1. Students were given three dark grey powders (zinc, graphite and iodine), dilute hydrochloric acid and petrol. They had to use the two liquids to identify the powders.	Students were asked to name a black or dark grey powder and a white powder not used in the step-by-step experiments. Therefore, they had to choose materials other than zinc, graphite, iodine, baking soda, powdered Hyperol and caustic soda. Students had to design experiments that would enable them to identify the black/dark grey powder and the white powder by using any physical or chemical processes.
6	Identification of materials	2. Students were given baking soda, powdered Hyperol (an adduct of hydrogen peroxide with urea) and caustic soda flakes and asked to make a solution of each in distilled water. They had to determine which test tube contains which substance.

Group 1 is the control and Groups 2 and 3 are the experimental groups.

Class sizes varied between 15 and 36 students, reflecting typical class sizes in Hungarian schools. The method used to produce random samples was straightforward. Classes were numbered and each number was written on a piece of paper. These pieces of papers were folded and placed into a box, then pulled out randomly to place them in the three groups described above (Group 1, Group 2 and Group 3). Some teachers participated with only one class, whereas others participated with two classes. If a teacher had two classes of students participating in the research and one class had been chosen randomly to be in Group 1, the other class would be in Group 2 or Group 3. This choice was also random.

Teachers had written permission of their school principals to participate. Students’ parents gave their written permission for their children to participate in the research. Teachers explained to the students that test results would not count in their school chemistry assessment, but they would be participating in work that aims to improve the teaching of chemistry.

Student worksheets

In the first year of the four year research project six student worksheets and teacher guides describing practical activities (each planned for a 45 minutes chemistry lesson) were written, one set of six for each group (see the English translation of the 1st student worksheet and teacher notes in Appendix 1). Each activity of the instructional interventions described by the student worksheets is built on students’ experiments. They were piloted by the groups with students working in teams, because scientific thinking is most often social in nature, rather than a phenomenon that occurs only inside people's head (Kuhn, 2010).

Topics were matched with the curriculum together with the experimental design tasks given to the Group 2 and Group 3 students on the student worksheets (Table 1). These topics were chosen because participating students in each school will have covered the knowledge content by the end of the project's first year (pilot). The six student worksheets provided important disciplinary content knowledge (DCK) that each seventh grader should learn at chemistry lessons and that was needed to solve the experimental design task (aiming to develop experimental design skills, EDS).

Each experimental design task was a problem-solving task, related to the given topic of the lesson. These were not verification type experiments or confirmatory inquiry, but could be called guided inquiry (Bell, Smetana and Binns, 2005) or bounded inquiry (Wenning, 2007) tasks, since students were given a research question and had to develop a procedure to solve the problem. All of them were relatively complex, because our aim is to develop the students’ experimental design skills such that they can use them to tackle any experimental design task where they have the necessary background knowledge. These skills are needed in real life situations, for example, when differentiating between science and pseudoscience and when deciding if an experiment or investigation is capable of producing evidence to support a theory or hypothesis.

Csíkos et al. (2016) produced a framework to assess students’ scientific inquiry skills. It has been used in the SALIS project (SAILS, 2015) and was based on models published earlier by Fradd et al. (2001) and Wenning (2007). Csíkos et al. (2016) defined and assessed the following components of the experimental design skills: identification and control of variables (including the principle of “how to vary one thing at a time” or “other things/variables held constant”); choosing equipment and materials and determination of the correct order of the steps of the experiment.

The experimental design tasks presented to the students in our present research also focused on the components defined by Csíkos et al. (2016). It was decided not to give direct instruction on the control of variables, assuming that without any direct instruction but with frequent enough practice students eventually attain some level of metastrategic understanding (Kuhn and Phelps, 1982). This is supported by the fact that the production of appropriate controlled tests was evident for fifth grade students in Bullock and Ziegler's experiment (1999) and we started to work with seventh graders (12–13 years old). Piaget's formal operational stage starts around the age of 12 (Cole and Cole, 2006), so it was reasonable to assume that these students can solve experimental design tasks. After doing this several times in concrete cases, they could be expected to understand and generalise the main principles of the idea of the experimental design. However, the need to keep records appeared to emerge only between the ages of 10 and 13 (Siegler and Liebert, 1975). Therefore, each student worksheet asks the students to write a description of the designed experiment, to keep a record of their observations and measurements, and to write down their interpretations and conclusions.

In addition to the components of the experimental design skills defined by Csíkos et al. (2016), the experiments were intended to help the students to understand the basics of the qualitative and quantitative analysis, algorithmic thinking, types of measurement errors and how to reduce them, and the idea of modelling a real-life situation by designing an experiment.

Different types of tasks were given to the two experimental groups of students to investigate whether it is important to carry out the designed experiments or it is just as (or even more) efficient to design experiments on paper. Concerning the development of data evaluation skills, Kuhn and Ho (1980) suggested that designing and generating data might be a better approach, because children who conducted the experiments made more progress than the ones who were evaluating secondhand data. Therefore, students in one of the experimental groups (Group 3) had to design and carry out experiments (by guided inquiry, according to Bell et al., 2005; or bounded inquiry, according to Wenning, 2007), once it was ensured that they had the necessary knowledge and skills (both theoretical and practical), which is important according to Bruck and Towns (2009). Applying open inquiry, when students come up with their own research question and procedure (Bell et al., 2005), might have reduced the chances that teachers can fit the activities into their normal classroom practice.

According to Simon (2001), students need to learn science in contexts in which they are able to find patterns in the world, where curiosity and surprise are fostered. So, the student experiments chosen for the instructional intervention should fit into the curriculum but need to have interesting contexts. Therefore, the student worksheets were composed in a way that the topics appeared in an interesting or at least everyday context that assumingly seemed relevant to the students. The hope was that the intervention positively influences the students’ attitudes toward chemistry (Schibeci, 1984).

While planning the activities and the assessment of the intervention, monetary and time expenses had to be considered, according to the teachers participating in the empirical research project. Leaving the constraints unattended, the tried and tested activities would not be widely used (Boesdorfer and Livermore, 2018) once this research project finished, even if they prove to be effective in developing experimental design skills.

Detailed teacher guides were also provided for each student worksheet (see the links to their Hungarian versions in Appendix 1).

Tests

The effects of the different types of instructional intervention needed to be shown by randomized controlled trials. Statistical tests were necessary to determine whether the differences in outcomes under the different conditions are more likely due to the difference in the types of intervention than to other incidental or accidental factors. Class size, socioeconomic factors and teaching style could not be controlled. However, sufficient teachers and classes were involved, such that randomisation was likely to lead to three heterogenous groups which together contained a similar mixture of teachers and classes (Taber, 2013).

From the results of tests taken before and after the intervention, we hoped to compare the effects of the different types of intervention in all important aspects listed under the heading ‘Research questions (RQ)’ (experimental design skills, disciplinary content knowledge and attitude). Each test had to have the same number measurable items to assess the various levels of disciplinary content knowledge and experimental design skills.

To measure the development of the experimental design skills (EDS), we used problem solving tasks that required the application of the components of experimental design skills defined by Csíkos et al. (2016). The EDS tasks needed to be different in each test for three reasons. First, the chances of the successful solution of a task would be higher if it was used the second time, since students might discuss it with others in between times. Secondly, the goal of the research is to develop experimental design skills that may be applied under different circumstances than when the intervention happened. It was necessary, therefore, to show that the transfer has happened successfully. Thirdly, the EDS tasks had to be put into contexts relevant to the previously gained knowledge, understanding that this increases over time. Only the disciplinary content knowledge (DCK) given in the National Curriculum of Hungary (2012) could be assessed on the tests. The DCK covered also changes from test to test for two reasons. The first is the same as in the case of the EDS task. The second is that the DCK learnt by the students increased and spread over time.

The different levels of the DCK had to be represented on the tests. The DCK test questions were structured according to the first three categories of the Cognitive Process Dimension of the Revised Bloom Taxonomy: Remember, Understand and Apply (Krathwohl, 2002). It is not possible to cover all four categories of the Knowledge Dimension of the Taxonomy Table. This would require far more items than can be fitted into a 40 minute test. (The tests can only contain items that students could solve in 40 minutes, since lessons last only 45 minutes in Hungarian schools.)

The EDS test questions intended to measure higher order cognitive skills (HOCS, Tomperi and Aksela, 2014) that represent the other three categories of the Cognitive Process Dimension of the Revised Bloom Taxonomy (Analyse, Evaluate, Create). In our view, correct design of even a relatively simple, but unknown experiment provides evidence that a student is thinking in a correct scientific way (Szalay and Tóth, 2016).

Students had 40 minutes to complete Test 0 (see its English translation in Appendix 2) and 40 minutes for Test 1 (see its English translation in Appendix 3). Students were coded so that their teachers knew the students’ identities, but the researchers did not.

Each test consisted of eighteen compulsory items, each worth 1 mark. Nine were intended to assess experimental design skills (EDS). The others were intended to assess students’ disciplinary content knowledge (DCK), with three items each for recall, understanding and application (according to the Revised Bloom Taxonomy).

There were additional questions, not marked but which provided researchers with useful information. One question (5-point Likert scale) was planned to assess the students’ attitudes toward the subject called ‘science’ learnt in the previous two school years on Test 0 and toward chemistry on Test 1. Another question (5-point Likert scale) intended to measure how important the students thought it is to test ideas in science by doing experiments. Students were also asked to give their gender and their mark in science in the end of the previous school year on Test 0, and chemistry in the half term on Test 1. There was an extra question (5-point Likert scale) on Test 1 concerning the students’ preferences to carry out step-by-step experiments or experiments they are required, in part, to design.

Test 1 contained DCK and EDS tasks that could be answered after students had completed the tasks on the six student worksheets provided in the first year of the project. The diversity of the EDS tasks reflects our notion that metastrategic understanding of the experimental design skills should be developed by completing the tasks given on the student worksheets. (As mentioned before, the main goal of the research is to develop the students’ ability to design any experiment, provided that they have the necessary theoretical and practical knowledge.) The following tasks were used to compare the development of the Group 1, Group 2 and Group 3 students’ experimental design skills.

Task 1 (Test 0 question 2.a): The volume of water increases when it is frozen. How can you determine the number of times bigger the volume of ice is compared to the volume of water? Choose the equipment and the materials you need of the following items. (Note: you do not need all of them.) Describe how you would do the experiment and the calculation.

• water

• salt

• ice cubes

• ice cube tray

• freezer

• ruler

• permanent marker

• string

• volume measuring vessel

• glass jar (cylindrical, without a top)

• spoon

Task 2 (Test 0 question 2.b): How could you increase the accuracy of your measurement compared to the one you described above?

Task 3 (Test 0 question 6.a): To determine the energy stored in a food, a sample of the food is burned (combusted) and the energy released is measured. The energy content of the walnut, for example, can be determined by burning a piece of it and using the flame to heat some water. We know how much energy is needed to increase the temperature of 1 kg of water by 1 °C. What quantities must be measured to be able to calculate the energy content of the walnut?

1st quantity:

2nd quantity:

3rd quantity:

Task 4 (Test 0 question 6.b): Other factors can affect the result of the measurement (that shows how much energy had been released). Name one or more of these factors.

Task 5 (Test 0 question 6.c): The energy content determined by the experiment is less than the true value. Explain why.

Task 6 (Test 1 question 2.a): Sea salt is made from seawater. Seawater is left to evaporate in sandy reservoirs. The solid that can be collected is a mixture of salt contaminated by sand. For the further treatment it is important to know what mass of salt is contained in a 100 g of sand-polluted salt. How could you separate the salt from the sand and determine the mass of the purified salt? Write down the steps of the designed process.

Task 7 (Test 1 question 2.b): Give a possible error that would mean the measurement as described would not be accurate.

Task 8 (Test 1 question 7.b): In a messy household the following substances are kept in unlabelled boxes: tartaric acid, caustic soda (NaOH), powdered Hyperol and baking soda. We want to decide which substance is stored in which box. An aqueous solution of each substance has been made. Plenty of clean test tubes are available, as is phenolphthalein indicator. Samples taken from the solutions can be added to one another. Write a plan listing the steps you would take to identify the substances. Record your expected observations and write your conclusions.

Validity

The question arises as to whether or not experimental design skills can be assessed effectively using written tests. Early PISA tests used such an approach (e.g.OECD, 2007) and in PISA 2015, the primary mode of assessment was computer-based written tests (OECD, 2017).

The research project started on 1st September 2016 when the school year began. Using Test 0 we wanted to assess the participating students’ knowledge of chemistry gained while studying science in the previous two years. Therefore, the research group prepared the first version of Test 0, with marking instructions, in the summer holiday. Each member of the research group (the participating teachers and university educators) had the opportunity to send their opinion of the test to the research group leader. Using this feedback, a revised test and marking instructions were produced. Two 12–13-year-old students, not participating in the project, were asked to complete the test. Following a discussion of the results by the research group, the test and marking instructions were refined further. (Only two students were available as it was holiday time. Nonetheless, their input was useful.)

The first version of Test 1 with its marking instructions was made and corrected in response to suggestions from the university educators in the research group. Participating teachers had not seen the test before piloting of the six student worksheets. This was to avoid the tasks on Test 1 influencing the pilot. However, Test 1 was tried with three classes (N₁ = 26, N₂ = 20, N₃ = 20, altogether 66) of students not participating in the research. The test and marking instructions were further revised in response to the results of the trial.

Participating teachers marked the students’ completed tests (Test 0 and Test 1) recording the marks in an Excel spreadsheet according to the instructions provided (see Appendices 2 and 3). A pre-service chemistry teacher student and the research group leader compared the teachers’ markings and corrected the marking instructions again. Then the pre-service chemistry teacher student corrected the marking of each test, to make sure that the marking process is unified and free from individual decisions made by the teachers.

Statistical methods

The results were analysed statistically. It was assumed that apart from the three types of instructional methods used during the intervention, other parameters and covariates had also influenced the results. Therefore, the statistical analysis of data was accomplished by analysis of covariance (ANCOVA) of the SPSS Statistics software.

883 students completed Test 0 and 853 students both tests (Test 0 and Test 1). Therefore, only their results have been analysed. The number of students (N) in the groups completing Test 1:

Group 1: 291 students

Group 2: 283 students

Group 3: 279 students

The following data were collected and analysed statistically:

• Student total scores for Test 0 and Test 1.

• Student scores (marks) for experimental design skills (EDS) tasks (Test 0 and Test 1).

• Student scores (marks) for disciplinary content knowledge (DCK) tasks (Test 0 and Test 1).

• Gender of the student.

• Grades. Student end-of-school year grade for science (Test 0) or end-of-semester grade for chemistry (Test 1).

• Answers to Attitude Question 1 (AQ1), ‘Enjoyment of subject’. How much the student likes science (Test 0) or chemistry (Test 1) on a 5-point Likert scale (0 point: does not like it, 4 points: likes it very much).

• Test 0 and Test 1. Answers to Attitude Question 2 (AQ2), ‘Importance of experiments’. How important the student thinks it is to test scientific ideas experimentally on a 5-point Likert scale (0 point: unimportant; 4 points: very important).

• Test 1. Answers to Attitude Question 3 (AQ3), ‘Preference of step-by-step experiments’. How much the student prefers step-by-step experiments to those he/she has to design, on a 5-point Likert scale (0 point: dislikes; 4 points: likes).

• School ranking. The student's school ranking amongst the Hungarian secondary schools, according to the website “legjobbiskola.hu”. The participating schools were grouped into high, medium and low ranking categories and a categorical variable was used according to these three levels (Appendix 4, Table 10). This allowed a statistical assessment of the impact of participating schools’ ‘quality’ on the development of the students’ knowledge and skills.

• Mother's education. Two categories were formed depending on whether or not the student's mother had a degree in higher education. This categorical variable was intended to characterise the student's socioeconomic status.

Analysis of the students’ scores

It was assumed that the students’ gender, the ranking of their school and their socioeconomic status characterised by their mother's education might influence the extent of the development of the students’ knowledge and skills. Therefore, the effects of these parameters on the total scores achieved on Test 0 (T0_total), together with the group in which the student was placed, were examined by applying ANCOVA analysis. It was found that the students’ gender did not have a significant effect. However, the school ranking and the mother's education both had a significant effect on the students’ achievement on Test 0. The total scores of the groups (Group 1, Group 2 and Group 3) were also found to be significantly different (Appendix 4, Table 11).

Therefore, the matched pair method was used to create a reduced sample of each Group 1, Group 2 and Group 3 that were not significantly different in any of the parameters mentioned above (Appendix 4, Table 12). This way the total number of students (N) was reduced from 853 to 510.

Repeating the ANCOVA analysis on the reduced sample (N = 510), only the mother's education had a significant effect on the students’ total scores on Test 0 (Appendix 4, Table 13). Later each following ANCOVA analysis was made on a sample reduced by the matched pair method.

In the next analysis total scores on Test 1 (T1_total) were the dependent variable. The parameters were the group, the mother's education, the school ranking and the gender. It was assumed that, apart from the parameters mentioned above, the following covariates might have influenced the results of Test 1:

• The total scores achieved on Test 0 (T0_total).

• Student end-of-school year grade for science, given on Test 0 (called the ‘Grade’).

• How much the student liked science, given on Test 0 (called the ‘Enjoyment of subject’).

• How important the student thought it is to test scientific ideas experimentally, given on Test 0 (called the ‘Importance of experiments’).

Analysis showed no significant effect of any of the covariates mentioned above (including T0_total) on the students’ total scores on Test 1 (Appendix 4, Table 14). Therefore, the effects of these covariates were not examined in the repeated analysis of the students’ total scores on the Test 1 so that they would not affect the results. Only the effects of the following basic parameters were examined: group, mother's education, school ranking and gender.

The same approach was taken with the analyses of the students’ scores on the DCK tasks and on the EDS tasks. As presented in Appendix 4, Table 13, only the mother's education had a significant effect on both the students’ scores on the DCK tasks of Test 0 (T0_DCK) and on the EDS tasks of Test 0 (T0_EDS). It was found that the only covariate that had a significant effect on the students’ scores on the DCK tasks of Test 1 (T1_DCK) was T0_DCK, whereas none of the covariates and only the school ranking among the parameters had a significant effect on the students’ scores on the EDS tasks of Test 1 (T1_EDS; see Appendix 4, Table 14). Therefore, only the effect of T0_DCK and the parameters were examined in the repeated analysis of the students’ scores on the DCK tasks of Test 1, whereas only the effects of the parameters were examined in the repeated analysis of the students’ scores on the EDS tasks of Test 1.

Analysis of the changes in students’ attitudes and grades

The differences of the students’ answers given in the two tests (Test 1 and Test 0) were analyzed as continuous variables applying ANCOVA, with the exception of AQ3 (‘Preference of step-by-step experiments’), when the dependent variable was the value (5-point scale) given by the student in Test 1. The parameters were the group, school ranking, mother's education and gender in each case. When AQ3 (‘Preference of step-by-step experiments’) was the dependent variable, the covariates were T0_total, the value given by the student (5-point Likert scale) for the ‘Enjoyment of subject’ (AQ1) in Test 0, the value given by the student (5-point Likert scale) for the ‘Importance of experiments’ (AQ2) in Test 0 and the end-of-school year grade the students wrote he/she got in science in Grade 6. When the dependent variable was AQ1 or AQ2 or the grades, then all the others named as covariates above (in the case of AQ3) became covariates (Appendix 4, Table 15). The correlations among the assumed covariates are shown in Appendix 4, Table 16.

Results and discussion

Results according to types of tasks

The results of the model calculations described in the previous section showed only a weak effect of the intervention (see ‘Group’ in Table 2, i.e. the different instructional methods used in the cases of the different groups) and the rankings of the students’ schools (the partial eta squared values, PES, were only 0.014 and 0.020, respectively). It is worth noting that contrary to the results of the test written in the beginning of the school year (T0_total), the mother's education had not got any significant effect on the total scores of the test written in the end of the school year (T1_total).

Table 2 The effects of the assumed parameters (“sources”) on the students’ total scores on Test 1 (T1_total) (N = 510)

Source^a	df	F	Sig.	Partial eta squared
a Parameters are called “sources” in SPSS.
Group	2, 503	3.613	0.028	0.014
Mother's education	1, 503	0.087	0.768	0.000
School ranking	2, 503	5.197	0.006	0.020
Gender	1, 503	0.599	0.439	0.001

According to the estimation of parameters (Parameter Estimates) students of Group 3 had slightly worse results on the whole Test 1 than students of Group 1 (control), but the difference was not found significant. On the other hand, the students of Group 2 had slightly better results on the whole Test 1 than students of Group 1 (control), but this difference was not found to be significant either (see Table 3). However, Group 2 students’ development was found to be significantly higher on the whole test than that of Group 3 students (p = 0.026).

Table 3 The estimated average of the students’ total scores on Test 1 (T1_total) according to the type of intervention (‘Group’) and the significance of difference to the results of the control group (Group 1) (N_Group1 = 170; N_Group2 = 170; N_Group3 = 170)

Group	Estimated average (T1_total) (%)	Significance of difference	Partial eta squared
Group 1	37.05	—	—
Group 2	40.49	p = 0.081	0.006
Group 3	35.29	p = 0.371	0.002

In the case of the tasks intended to measure disciplinary content knowledge (DCK sub-test) the previous knowledge characterised by the T0_DCK covariate showed a significant effect in the preliminary model calculations (Appendix 4, Table 14). Therefore, it was taken into consideration in the repeated calculations, as well as the assumed parameters. As shown in Table 4, only the type of intervention (‘Group’) had a significant effect, but even that was found to be very weak (PES 0.029).

Table 4 The effects of the assumed parameters (“sources”) and the T0_DCK covariate on the students’ scores on the DCK tasks of Test 1 (T1_DCK) (N = 510)

Source	df	F	Sig.	Partial eta squared
Group	2, 502	7.419	0.001	0.029
Mother's education	1, 502	0.665	0.415	0.001
School ranking	2, 502	2.412	0.091	0.010
Gender	1, 502	0.139	0.709	0.000
T0_DCK (%)	1, 502	3.189	0.075	0.006

According to the estimation of parameters (Parameter Estimates) students of Group 3 had worse results on the DCK sub-test of Test 1 than students of Group 1 (control), but the difference was found to be significant only at 0.087 level. On the other hand, the students of Group 2 had significantly better results on the DCK sub-test of Test 1 than students of Group 1 (control) and Group 3 (Table 5).

Table 5 The estimated average of the students’ scores on the DCK sub-test of Test 1 (T1_DCK) according to the type of intervention (‘Group’) and the significance of difference to the results of the control group (Group 1) (N = 510)

Group	Estimated average (T1_DCK) (%)	Significance of difference	Partial eta squared
Group 1	42.51	—	—
Group 2	46.98	p = 0.034	0.009
Group 3	38.92	p = 0.087	0.006

In the case of the tasks intended to measure experimental design skills (EDS sub-test), no covariates showed any significant effect in the preliminary model calculations (Appendix 4, Table 14). Therefore, only the assumed parameters were taken into consideration in the repeated calculations. As shown in Table 6, only the school ranking had a significant effect, but even that was found to be very weak (PES 0.020).

Table 6 The effects of the assumed parameters (“sources”) on the students’ scores on the EDS tasks of Test 1 (T1_EDS) (N = 510)

Source	df	F	Sig.	Partial eta squared
Group	2, 503	0.631	0.532	0.003
Mother's education	1, 503	0.575	0.449	0.001
School ranking	2, 503	5.232	0.006	0.020
Gender	1, 503	0.964	0.327	0.002

According to the estimation of parameters (Parameter Estimates) students of Group 3 had worse results on the EDS sub-test of Test 1 than students of Group 1 (control), whereas the students of Group 2 had better results than Group 1, but the difference was not found to be significant in either of those cases (Table 7).

Table 7 The estimated average of the students’ scores on the EDS sub-test of Test 1 (T1_EDS) according to the type of intervention (‘Group’) and the significance of difference to the results of the control group (Group 1) (N = 510)

Group	Estimated average (T1_EDS) (%)	Significance of difference	Partial eta squared
Group 1	31.89	—	—
Group 2	34.46	p = 0.306	0.002
Group 3	32.17	p = 0.908	0.000

Summarising the results of the statistical analysis of Test 0 and Test 1, it seems that only the type of instructional method (‘Group’) and the school ranking had a significant (albeit weak) effect (Table 8).

Table 8 The effects of the significant parameters (“sources”) on the students’ scores on Test 0 and Test 1 (N = 510)

Source	Total		DCK		EDS
Source	T0	T1	T0	T1	T0	T1
Group		0.014		0.029
School ranking		0.020				0.020
Mother's education	0.087		0.071		0.037

The ranking of the student's school influenced mainly the results of the EDS sub-test (T1_EDS), whereas the type of instructional method (‘Group’) influenced mainly the scores gained on the DCK tasks (T1_DCK).

The model calculations show that the type of instruction (‘Group’ parameter) had a weak significant effect mainly on the whole test and on the DCK sub-test (Table 9).

Table 9 The estimated average of the students’ scores on the whole test and on the sub-tests of Test 1 according to the type of intervention (‘Group’) and the significance of difference to the results of the control group (Group 1) (N = 510)

Group	T1_total (%)	T1_DCK (%)	T1_EDS (%)
Group 1	37.05	42.51	31.89
Group 2	40.49	46.98	34.46
Group 3	35.29	38.92	32.17
p < 0.05	Group 2 – Group 3	Group 1 – Group 2	—
p < 0.05	Group 2 – Group 3	Group 2 – Group 3	—

Attitude questions (AQ)

AQ1 (‘Enjoyment of subject’). The participating students had not studied chemistry before the research project began. In the previous 2 years they had studied science. Therefore, they were asked to answer the question about the ‘Enjoyment of science’ in Test 0. By Test 1 they had studied chemistry for a school year and the question was about how much they like chemistry. The statistical analysis of the answers to 5 point Likert scale (Category 0–4) questions suggests that students like ‘chemistry’ less than ‘science’ (studied in the previous two school years). This may be because the content knowledge taught in chemistry, physics and biology in grade 7 is much more challenging than that taught in science in grades 5 and 6. The parameter ‘School ranking’ and two covariates (the results of Test 0, T0_total and the ‘Importance of experiments’) influenced the change in the ‘Enjoyment of subject’ significantly (Appendix 4, Table 15). The two covariates had a negative correlation coefficient with the change in the ‘Enjoyment of subject’. Overall, the negative change is bigger in the case of students who achieved better on Test 0. The same was true for the ‘Importance of experiments’. Students who, at the beginning of the school year, considered experiments in science important appeared to have a bigger negatíve change in the ‘Enjoyment of subject’ by the end of the school year. It could be said that “they had more to lose” in the aspect of the ‘Enjoyment of subject’. It seems evident that students with better results on Test 0 probably understood the importance of experiments in science more and liked the subject more in the beginning of the school year than the students who had worse results on Test 0. This statement is supported by the correlation calculations. A medium or strong positive correlation was found among the ‘Grade’, the ‘Enjoyment of subject’ and the ‘Importance of experiments’ (Appendix 4, Table 16). The change in the ‘Enjoyment of subject’ is positive in the high ranking schools and negative in the lower ranking schools. The differences are significant (Appendix 4, Table 17). It appears that students from the high ranking schools liked chemistry more than they liked science in the beginning of the school year. This was not true for students in medium ranking schools and markedly different for students in low ranking schools.

AQ2 (‘Importance of experiments’). The students were asked their view about the importance of testing scientific ideas by experiments, in both Test 0 and Test 1. Unfortunately, the students in each group considered the role of science experiments significantly less important when they had studied chemistry for a year than they did when they had studied science (the previous two school years). Perhaps they were faced by more rules and laws in grade 7 without any empirical evidence than in science. This is discouraging, probably showing that the main message of the chemistry taught in Hungarian schools is missing some important aspects. None of the parameters had any significant effect on this change (Appendix 4, Table 15). In the case of AQ2 the effect of ‘School ranking’ is opposite to the one experienced in the case of AQ1: the students of the high ranking schools regard the roles of experiments less important after learning chemistry for a year than the students of the lower ranking schools (Appendix 4, Table 18).

Grades in science/chemistry. The students got significantly lower marks in chemistry than previously in science. Only the school ranking has got a significant effect on this change (Appendix 4, Table 15). The higher ranking the school, the more the grades decreased. This might be caused partly by the fact that the requirements are more demanding in higher ranking schools and/or the fact that students attending higher ranking schools had better marks in science than the ones who attended lower ranking schools (Appendix 4, Table 19).

AQ3 (‘Preference of step-by-step experiments’). The preference of step-by-step experiments to experiments students must design was evaluated on Test 2. This was measured by their response to the statement: “I prefer the step-by-step experiments to the ones that I have to design” using a 5-point Likert scale (Categories 0–4). The estimated group average values are between 3 and 4. These results show that each group has a strong preference for step-by-step experiments. No significant effect of the type of intervention could be detected by either the ANCOVA analysis or the chi-squared test in this aspect in any groups. According to the ANCOVA analysis the only parameter that has a significant influence is the school ranking (Appendix 4, Table 15). The higher the school ranking the more the students reject designing experiments. However, the difference is only significant between the high and the low ranking schools (Appendix 4, Table 20).

It may be that Hungarian students do not usually design experiments. Facts, laws and logical explanations of the results of known experiments are the main ingredients of their chemistry curriculum (Chemistry Curricula of Hungary, 2012). Further, the design of experiments may have been considered too challenging for them.

On the whole, a significant effect of the intervention could not be detected on the changes in the students’ grades and attitudes. These aspects only seemed to depend on the school ranking. Negative tendencies were detected in the aspects of grades and attitudes. The only positive change was that the students of the high ranking schools seemed to like chemistry more in the end of the school year than they had liked science in the beginning of the school year.

Conclusions

Summary of the results and answers to the research questions

Only for Group 2 students was a small positive effect of the intervention detected on the results of the disciplinary content knowledge (DCK) sub-test. The differences between the Group 2 students’ Test 1 and Test 0 results were significantly higher than those between the results of the Group 3 students on the DCK tasks and on the whole tests too. No significant effect of the intervention was found in terms of designing experiments on the scores of any experimental groups.

The answers to the research questions are as follows:

RQ1: No long-term significant change in the 12 to 13-year-old students’ ability to design experiments (experimental design skills, EDS) could be detected in any of the experimental groups compared to the control group, as a result of the intervention.

RQ2: In terms of the disciplinary content knowledge (DCK), designing experiments on paper (Group 2) seemed to have a small, but significant positive effect. Carrying out the designed experiments (Group 3) had no significant effect on the scores the students achieved on DCK tasks.

RQ3: There was no statistically significant difference found between the average scores of the students of the experimental groups considering the extent of the development in the experimental design skills (EDS).

RQ4: No significant effect of the intervention on the students’ average attitudes and grades was shown by the results of the statistical analysis in any of the groups. Each group's attitude toward chemistry (‘Enjoyment of subject’, AQ1) was more negative than their attitude toward the ‘science’ subject learnt in the previous two school years (when they were grade 5 and 6 students). This is not surprising, since the ‘science’ subject had been much less demanding than the chemistry they studied in grade 7. The results of each group show that students think it is less important to use experiments to support scientific statements (AQ2) after one year of the intervention than before starting to study chemistry. It is evident that students need to understand why experiments and experimental design are important in science and they need to get help to learn how to do it well. Students much prefer step-by-step experiments to experiments which they must, in part, design (AQ3).

It is disheartening that negative changes in the attitudes were detected across the groups. The negative trends in the attitude of the students (who had been carefully selected by their schools among the highest attainers in their cohort in Hungary) may suggest that the Hungarian chemistry curriculum is over-crowded and de-motivating. Teaching and learning probably concentrate on disciplinary content knowledge. This is supported by the fact that the vast majority of students participating in the project regarded the role of experiments in science less important by the end of their first year of studying chemistry than they did in the beginning. That happened despite probably spending more time with student experiments than students in many other Hungarian schools. In addition, they did not only carry out verification type experiments, but new concepts and relationships were introduced by the help of the student worksheets, which could be considered as ‘authentic’ inquiry (Simon, 2001). Group 3 students also had to design some steps of the experiments before doing those at the time of each practical activity, whereas Group 2 students solved interesting paper-based experimental design problems. Therefore, the blame cannot lie simply in an overloaded curriculum and traditional teaching methods. Six lessons are at least 8–17% of the entire time that these students spent in studying chemistry at school in grade 7. Nevertheless, the intervention did not have a significant positive effect, either on the development of the experimental design skills, or on the attitude of the experimental groups. That was unexpected after the encouraging results from the previous research (Szalay and Tóth, 2016), when the simple method used in the case of Group 3 in the present project was applied.

At least one explanation of the lack of significant development of the experimental design skills of Groups 2 and 3 students and their general rejection of designing experiments might be that many of the participating 12–13-year-old students could not think about multiple variables in systematic ways yet. Most of those students were probably still in Piaget's concrete operational stage (Cole and Cole, 2006), when the operations are done in the presence of the participating objects and events. Therefore, after solving the concrete experimental design tasks on their student worksheets, they could not generalise the knowledge gained meanwhile, and because of that, at the time of Test 1 they still did not understand the main abstract principles needed to solve any experimental design tasks. The conclusion can be drawn that many of those 12–13-year-old students had probably not achieved a satisfactory level of metastrategic understanding with the ‘frequent enough practice’, as it had been expected after reading what Kuhn and Phelps (1982) reported. Although students in Group 3 had to carry out the designed experiments, they were not more successful in applying their knowledge when solving the experimental design tasks on Test 1 than Group 2 students and there was no significant difference in this aspect between the Group 1 (control) and Group 3 students either (Table 9). This seems to justify what Sweller (1988) emphasised: inquiry approaches can be less effective than traditional approaches such as direct instruction, if the cognitive load placed on students is not properly managed. So, it could be reasonably assumed that designing and carrying out experiments were simply too much to handle for the 12–13-year-old students in Group 3. However, the present results do not mean that designing experiments in practice might not work better with older students, whose abstract way of thinking is more developed, as we observed in the previous project (Szalay and Tóth, 2016). But this earlier method needs to be modified, since it does not appear to work well enough for younger students and/or longer term.

It is also possible that some of the teachers assigned to the Group 3 students provided some of the missing experimental design steps. Although the teachers were given detailed instructions for the implementation of the experimental design tasks, some of them worked with many students (e.g. 36) in parallel. It might have happened that when certain teams of these classes encountered difficulties and the time was running out toward the end of the lesson, some of these teachers helped these teams more than directed. This might have been a reason for the lack of the significantly different development of the experimental design skills of the Group 3 students (compared to that of the control group).

Hane (2007) reported that even university students’ understanding of experimental design was improved by inquiry activities when they had increasing levels of responsibility for designing experiments or observational studies. Concepts of experimental design were also emphasized in inquiry-based components of the course lecture. Gott and Dugan (1995) warned that not all inquiry-based laboratory tasks are appropriate to engage students in scientific practices and competencies, as they depend on their structure and requirements. Taber (2011) stated that teachers need to see student learning as an incremental process, and so plan teaching in terms of ‘suitable, regularly reinforced, learning quanta’. Several other authors have also argued that students need scaffolding from the teacher to solve inquiry type tasks (e.g.Puntambekar and Kolodner, 2005; Blanchard et al., 2010; Crujeiras-Pérez and Jiménez-Aleixandre, 2017). Considering all of these observations and the experiences of the first year of the present research project, it was assumed that it would be a safer and more promising approach to teach the basic principles of experimental design than just let the students solve several concrete experimental design problems and expect them to work out the general concepts by themselves. This is also in agreement with Baird's view (1990) that purposeful inquiry does not happen spontaneously – it must be learned.

A discussion with an educational psychologist was initiated to better understand why the first year intervention had not produced the expected outcome, neither in terms of the development of the experimental design skills, nor in terms of the change in attitude. It was suggested that (as well as the yet not developed enough abstract way of thinking) Prensky's (2001) often quoted statements about the ‘digital natives’ have to be considered. Participating students’ preferences might not suit the ‘traditional’, ‘serious’, ‘systematic’ and ‘logical’ ways of thinking and making investigations in science. Therefore, according to the educational psychologist's advice – besides more training – students nowadays probably need more patience and motivation than in the past.

There might be other reasons why students participating in the first year of the project preferred the ‘step-by-step’ experiments to the ones they designed. Experimental design obviously requires more effort and thinking than accomplishing experiments according to step-by-step descriptions. Bolte et al. (2013) reasoned that the ‘achiever’ and ‘conscientious’ students do not like inquiry. It is feasible that the students were afraid of being in control, since they had not been used to that situation. As Cheung (2011) wrote, there are always students who feel uncomfortable when asked to plan their experiments. However, according to Deters (2005), these students have to be convinced that inquiry-based practical work is worthwhile because it gives them an opportunity to find their own answers to problems, not simply be told how to do something. With this comes a ‘sense of accomplishment’ and pride for a ‘job well done’. By doing this, they also develop scientific and other (soft) skills needed for their lives.

Therefore, it was concluded that this research project should continue, though modified to include the teaching of the main concepts and principles of the experimental design skills. There is hope that it would alleviate the students’ cognitive load, which made the experimental design so difficult for them. This way, with practice and patience they might get the necessary metacognitive understanding to design experiments. This should also be helped by the fact that with time more and more of them will reach Piaget's formal operational stage when they will be capable of metacognitive thinking (Cole and Cole, 2006).

Implications and further research

Discussions with experts of psychology and assessment lead to the thought that a detailed explanation/practice of the main principles of designing experiments (for instance, ‘other things/variables held constant’) is more promising than simply asking the students to design one or more steps of some experiments in theory (on paper) or in practice. Therefore, the first school year of the project was considered as a ‘pilot’ and a different type of intervention was applied in the remaining three years in this aspect. No other changes were made to the first year approach.

To reduce the Group 3 students’ cognitive load, student worksheets were scaffolded in the following school years. Students of Group 3 were given prompts and clues before they started to do the planning and carrying out the experiments. The hope remains that students would benefit from this type of intervention, based on Siegler and Liebert's (1975) finding that grade 5 and 8 students learning about factors, levels and tree diagrams were more successful in the manipulation and isolation of variables when they applied the ideas to new situations in practice than students who were simply taught the conceptual framework.

Since Group 2 students’ results on the experimental design tasks were not found to be significantly better than those of the control group, it was evident that their treatment had to be changed too. When deciding how, Furtak et al.'s (2012) meta-analysis was considered. They concluded that evidence from some of the studies suggests that teacher-led inquiry has a greater effect on student learning than student-led inquiry. Crujeiras-Pérez and Jiménez-Aleixandre (2017) suggested that students need previous knowledge on which aspects should include an experimental design as well as reflection about them was also taken into consideration. Since engaging students in reflecting scientific investigations could improve their ability to plan investigations, in the following school years Group 2 student worksheets explained the design of the step-by-step experiments that students had just carried out. They did not get paper-based experimental design tasks anymore, which also helps in excluding the assumption that if they do better on the experimental design tasks of the test, it is caused by the similarity of their treatment to the experimental design tasks of the tests.

Therefore, after the first year of this project the question still arises as to whether the experimental design skills can be developed effectively in any of the ways described above and used in the case of the two experimental groups from the second year of the present project. The evaluation of the results of the intervention of the second and third years is in progress, but the preliminary results are promising.

Conflicts of interest

There are no conflicts of interest to declare.

Appendix 1: an example of student worksheets and teacher notes used in the school year 2016/2017

It is important to note that the student worksheets are not intended to be stand alone. They were used in class with an accompanying dialogue from the teacher. In other words, the teachers talked students through the sheets. The following student worksheet and teacher notes were part of a teacher guide file that contained detailed instructions for teachers on how to prepare and guide the students through the activities. The six complete files for the six student worksheets and teacher guides are available in Hungarian at the following link: http://ttomc.elte.hu/publications/90; titled:

1. feladatlap: A mi világunk – a részecskék világa

2. feladatlap: Hogyan működik a sütőpor?

3. feladatlap: Oldás és kötés

4. feladatlap: Milyen tömény rum kell a Gundel-palacsintához?

5. feladatlap: Segítsünk Hamupipőkének!

6. feladatlap: Fekete, fehér, igen, nem…

The English translations of the other five files containing student worksheets and teacher notes used for the research in the school year 2016/2017 are available on the website of the research project (http://ttomc.elte.hu/publications/90), titled:

1.-6. Student sheets and teacher notes used in the school year 2016/2017

(Last visited: 05.10.2019.)

1st Student worksheet: Our world – the world of particles

(type 1: ‘step-by-step’ version for Group 1 students)

Each substance consists of particles which are constantly moving. You are going to investigate what determines how quickly particles move in gases and liquids. (In solids, they can only vibrate.) Your observation will be explained by the particle model of matter.

Experiment 1: (a) Each member of your group should measure the time taken by the fragrance particles to reach them after being let out/poured out at the teacher's desk. Calculate the average of the times measured by the group. …………… minutes.

(b) Similarly calculate the average distance of your group from the teacher's desk, approx. ……………………… m.

(d) Why do you think the speed calculated was not the same for all groups in the class?

……………………………

(e) One particle of oxygen gas travels about 500 meters in 1 second. However, you found that particles moved only a few meters in the air in 1 minute. Can you think of a reason why this happened?

……………………………

Experiment 2: Pour cold tap water into a plastic bowl. Wait for approx. 5 minutes until the fluid is calm and does not move. Then close to the surface of the water, drop 1 drop of paint solution over the “X” mark and record what is happening. Explain what you see.

Observation: ……………………………

Explanation: ……………………………

Experiment 3: Pour cold water into a bowl to a depth of about 1 cm. Wait for about 5 minutes until the liquid settles. Add 1 drop of paint solution over the “X” mark. When the edge of the paint spot reaches the innermost circle (1st) drawn on the bottom of the tray, start the stopwatch. Measure the time until the particles of the paint reach the 2nd circle, and then after reaching the 3rd circle. Repeat the experiment with warm water. Measure the distance between the 1st and 2nd circles, and the 2nd and 3rd circles with a ruler.

Observations	Time	Distance	Speed of movement
Cold water, between the 1st and 2nd circles	…… seconds = …… min.	…… cm	…… cm/min.
Cold water, between the 2nd and 3rd circles	…… seconds = …… min.	…… cm	…… cm/min.
Warm water, between the 1st and 2nd circles	…… seconds = …… min.	…… cm	…… cm/min.
Warm water, between the 2nd and 3rd circles	…… seconds = …… min.	…… cm	…… cm/min.

The average speed of the particles at the class level in cold water between the 1st and 2nd circles is ………… cm/min.

The average speed of the particles at the class level in warm water between the 1st and 2nd circles is ……… cm/min.

Explanation: Underline the correct word and then complete the sentence.

The particles move in warm water at a lower/higher speed than in cold water because

……………………………

Why do you think different groups calculated different values for the speed of the particles under the same conditions (e.g. in cold water)?

……………………………

4. Based on your experiments, do the particles travel further in air or in water? Underline the correct answer. What is the reason for this?

……………………………

5. Homework:

(a) At a given temperature and if there is nothing in the way, particles of oxygen travel at 461 m/s, particles of nitrogen travel at 492 m/s and particles of hydrogen travel at 1844 m/s between the collisions. What can you conclude from this?

……………………………

(b) Make a drawing in your notebook or on the back of this worksheet with arrows to show the route of a particle (represented by a small circle) in a gas as it travels from one wall to the opposite wall.

1. Student worksheet: Our world – the world of particles

(type 2: ‘step-by-step’ version + theoretical experiment-design task for Group 2 students)

It is the same as the type 1 student worksheet (‘step-by-step’ version for Group 1 students), but the students also have to solve the tasks below.

(c) If you could mark the particles of water, design an experiment to show that at higher temperatures not only paint particles move faster, but the water particles too. Describe the experiment.

……………………………

(d) How could you show that particles are moving faster in hot air than in cold air? Describe how you would prepare and carry out the experiment in practice.

……………………………

1. Student worksheet: Our world – the world of particles

(type 3: experiment-designing version for Group 3 students)

Up to this point it is the same as the type 1 student worksheet (‘step-by-step’ version for Group 1 students), but it is continued with the experimental design task below.

Experiment 3: Design and carry out an experiment to measure whether the particles move faster in cold water or in hot water. Suitable tools and materials: cold and warm tap water, paint solution, plastic tray, dropper (Pasteur pipette), ruler, permanent marker, stopper (on a mobile phone).

Plan of the experiment:

……………………………

Observations and measurement:……………………………

The remaining part is the same as text following this experiment on the type 1 student worksheet (‘step-by-step’ version for Group 1 students).

1. Student worksheet: Our world – the world of particles

(teacher notes)

(b) Similarly calculate the average distance of your group from the teacher's desk, approx. 5 m.

(d) Why do you think the speed calculated was not the same for all groups in the class?

E.g. we could not measure the distance and time with the same accuracy; not everyone has the same reaction time; our noses are not equally sensitive to that smell; we were moving while we were measuring; in the case of a larger distance, the particles of the fragrance occurred “rarely” in the air.

(e) One particle of oxygen gas travels about 500 meters in 1 second. However, you found that particles moved only a few meters in the air in 1 minute. Can you think of a reason why this happened?

Because of the collisions, the particles went in “zigzag”.

Observation: The ink slowly spreads into the liquid.

Explanation: The paint and water particles move and mix.

Experiment 3: [Only for type 1 and 2 student worksheets.] Pour cold water into a bowl to a depth of about 1 cm. Wait for about 5 minutes until the liquid settles. Add 1 drop of paint solution over the “X” mark. When the edge of the paint spot reaches the innermost circle (1st) drawn on the bottom of the tray, start the stopwatch. Measure the time until the particles of the paint reach the 2nd circle, and then after reaching the 3rd circle. Repeat the experiment with warm water. Measure the distance between the 1st and 2nd circles, and the 2nd and 3rd circles with a ruler.

Observations	Time	Distance	Speed of movement
Cold water, between the 1st and 2nd circles	125 seconds = 2.1 min.	1 cm	0.5 cm/min.
Cold water, between the 2nd and 3rd circles	91 seconds = 1.5 min.	1 cm	0.7 cm/min.
Warm water, between the 1st and 2nd circles	22 seconds = 0.4 min.	1 cm	2.5 cm/min.
Warm water, between the 2nd and 3rd circles	25 seconds = 0.4 min.	1 cm	2.5 cm/min.

The average speed of the particles at the class level in cold water between the 1st and 2nd circles is 0.5–1 cm/min.

The average speed of the particles at the class level in warm water between the 1st and 2nd circles is 2–5 cm/min.

Explanation: Underline the correct word and then complete the sentence.

The particles move in warm water at a lower/ [h with combining low line] [i with combining low line] [g with combining low line] [e with combining low line] [r with combining low line] speed than in cold water because in the hot water the particles move more intensely (“their energy is higher”).

Why do you think different groups calculated different values for the speed of the particles under the same conditions (e.g. in cold water)?

For example, because we did not drip the paint solution into the water in the same way, we did not measure the time and distance with the same accuracy, there was a ‘current’ (turbulence) in the fluid during the measurement, the paint particles were not evenly dispersed in the liquid during the measurements.

Experiment 3: [Only for type 3 student worksheets.] Design and carry out an experiment to measure whether the particles move faster in cold water or in hot water. Suitable tools and materials: cold and warm tap water, paint solution, plastic tray, dropper (Pasteur pipette), ruler, permanent marker, stopper (on a mobile phone).

Plan of the experiment: For example, it can be similar to the one described in the ‘step-by-step’ version, or the time required to travel between two designated points can be measured, or how far the paint drop can travel in a given time can be measured.

Observations and measurement: If the temperatures of cold and warm water and other conditions are similar, then the speed of the particles obtained should be similar to the ones measured in the ‘step-by-step’ experiment.

4. Based on your experiments, do the particles travel further [i with combining low line] [n with combining low line] [a with combining low line] [r with combining low line] or in water? Underline the correct answer. What is the reason for this?

In the air, because the particles in gases are further apart, they move more easily.

5. Homework:

The reason for the differences can be that the particles are not the same (different in size or weight).

Note: In the frame symbolizing the vessel, the circles of the gas particles must be roughly evenly distributed. Straight arrows must indicate the free path length between the two collisions and their direction. Collisions can occur with other particles and the wall of the vessel too. The targeted particle moves not only towards the opposite wall, but also in any other direction (e.g. to the side and even backwards). If the task is modified in a way that the students need to draw multiple frames, in each frame the location of the surrounding particles must also change, which shows that meanwhile they are moving too.

(c) [Only for type 2 student worksheets.] If you could mark the particles of water, design an experiment to show that at higher temperatures not only paint particles move faster, but the water particles too. Describe the experiment.

How much time is needed for the particles of water to take the same distance in the water at two different temperatures (or the distance they make within the same time) has to be measured.

(d) [Only for type 2 student worksheets.] How could you show that particles are moving faster in hot air than in cold air? Describe how you would prepare and carry out the experiment in practice.

First, how much time it takes for the perfume particles to reach a certain distance in the room at the given temperature has to be measured. The air in the room should then be warmed up (e.g. with a high-power radiator). By operating a fan, it is necessary to ensure that the air temperature is as evenly warm as possible. After stopping the fan, we have to wait for a while to let the air to settle. Then we have to do exactly the same measurement as we did at a lower temperature.

END OF THE 1st STUDENT WORKSHEETS AND TEACHER NOTES

Appendix 2: Test 0

School number:…… Teacher number:… Group number:…… Student number:…

The aim of our research is to make the teaching of chemistry as interesting and effective as possible.

Thank you for completing this test according to the best of your knowledge, because you help our work.

1. (a) What is the visible sign of a boiling in a liquid while it is heated?

…………………

(b) In one dish, we boil 1 litre of water and in the other dish 2 litres of water. In which case is more heat required if the initial temperatures are the same? How many times more heat is needed?

…………………

2. (a) The volume of water increases when it is frozen. How can you determine the number of times bigger the volume of ice is compared to the volume of water? Choose the equipment and the materials you need of the following items. (Note: you do not need all of them.) Describe how you would do the experiment and the calculation.

• water

• salt

• ice cubes

• ice cube tray

• freezer

• ruler

• permanent marker

• string

• volume measuring vessel

• glass jar (cylindrical, without a top)

• spoon

…………………

(b) How could you increase the accuracy of your measurement compared to the one you described above?

…………………

3. (a) What is between the particles of a gas?

…………………

(b) The drawing illustrates an experiment when an uninflated balloon and an inflated balloon are placed in the two pans of the scale. The balloons have the same mass.

Use dots (·) to show where air particles are in the diagram wherever there is air.

Points should be closer together where more particles are in a given volume.

…………………

4. (a) Explain the difference between melting and dissolution.

…………………

(b) Drawing A shows the beginning of distillation of a solution. Particles of the solvent are shown by white circles and the solid solute particles by black circles. Complete drawing B to show where the particles of solvent and solute are when the distillation is stopped after a while.

5. (a) Name the component of air that feeds combustion

…………………

(b) Which gas is there more in the exhaled air than in the inhaled air?

…………………

6. (a) To determine the energy stored in a food, a sample of the food is burned (combusted) and the energy released is measured. The energy content of the walnut, for example, can be determined by burning a piece of it and using the flame to heat some water. We know how much energy is needed to increase the temperature of 1 kg of water by 1 °C. What quantities must be measured to be able to calculate the energy content of the walnut?

1st quantity: …………………

2nd quantity: …………………

3rd quantity: …………………

(b) Other factors can affect the result of the measurement. Name one or more of these factors.

…………………

Please give us the following information! Your gender: boy/girl (Underline the right answer!)

• In the 6th grade, the end-of-the-school year grade you got from science: …………………

• The larger the number you circle, the more you preferred the subject called ‘science’.

(0: you did not like it at all, 4: you really liked it): 0 1 2 3 4

• The bigger the number, the more you consider it is important to test ideas in sciences by experiments (0: not important at all; 4: very important): 0 1 2 3 4

Instructions given to the teachers to mark the students’ answers of Test 1

Please complete the columns of the Excel spreadsheet with the marks obtained by following the instructions below. A student's marks should be written in the appropriate row of the Excel spreadsheet.

Columns A–D contain information about the student's identity.

Columns E–V contain marks for the student's answers.

Columns W contains the student's gender.

Columns X contains the student's science mark in the previous year.

Columns Y–Z contain the student's attitude responses.

Column ‘A’:

School number (see it in the table sent by Luca Szalay on 16th September 2016).

Column ‘B’:

Teacher number (see it in the table sent by Luca Szalay on 16th September 2016).

Column ‘C’:

Group number (class) (see it in the table sent by Luca Szalay on 16th September 2016).

Column ‘D’:

Student number: The number of student in the alphabetic list of names of the group (class).

Column ‘E’ (task 1.a)

If the word “bubble” or a synonym appears in the answer. Mark: 1

In any other case. Mark: 0

1 item: recall (disciplinary content knowledge task: DCK task)

Column ‘F’ (task 1.b)

If the expression “2 litres” and the word “double” also appear in the answer. Mark: 1

In any other case. Mark: 0

1 item: understanding (DCK task)

Column ‘G’ (task 2.a)

If a correct method for the measurement of the volume of the water (or the height of water in case of identical area) appears in the answer. Mark: 1

In any other case. Mark: 0

1 item: higher order cognitive skills (experimental design task: EDS task)

Column ‘H’ (task 2.a)

If a correct method for the measurement of the volume of the ice (or the height of ice in case of identical area) appears in the answer. Mark: 1

In any other case. Mark: 0

1 item: higher order cognitive skills (EDS task)

Column ‘I’ (task 2.a)

If dividing the volume/height of the ice by the volume/height of the water appears in the answer. Mark: 1

In any other case. Mark: 0.

1 item: higher order cognitive skills (EDS task)

Column ‘J’ (task 2.b)

If a correct method is described to increase the accuracy of the measurement (see examples in the teacher's guide). Mark: 1

In any other case. Mark: 0

1 item: higher order cognitive skills (EDS task)

Column ‘K’ (task 3.a)

If the word “nothing” or “vacuum” or any synonym expression appears in the answer. Mark: 1

In any other case. Mark: 0

1 item: understanding (DCK task)

Column ‘L’ (task 3.b)

If the points are on the drawing where there is air and the points are denser inside the balloon. Mark: 1

In any other case. Mark: 0

1 item: application (DCK task)

Column ‘M’ (task 3.c)

If the answer proves that the student understands and can apply the following correlation: there are more particles in a unit of volume inside the (blown up) balloon than in the unit of volume of the air around and/or the pressure is higher inside the (blown up) balloon than in the air around it. Mark: 1

In any other case. Mark: 0

1 item: application (DCK task)

Column ‘N’ (task 4.a)

If the answer proves that the student understands: there are at least two substances in the case of dissolution. Mark: 1

In any other case. Mark: 0.

1 item: understanding (DCK task)

Column ‘O’ (task 4.b)

If the right hand side drawing shows that only the particles of solvent are in the right hand side vessel and the particles of solute are not. Mark: 1

In any other case. Mark: 0

1 item: application (DCK task)

Column ‘P’ (task 5.a)

If the word “oxygen” appears in the answer. Mark: 1

In any other case. Mark: 0

1 item: recall (DCK task)

Column ‘Q’ (task 5.b)

If the expression “carbon dioxide” appears in the answer. Mark: 1

In any other case. Mark: 0

1 item: recall (DCK task)

Column ‘R’ (task 6.a)

If the mass of the walnut appears in the answer. Mark: 1

In any other case. Mark: 0

1 item: higher order cognitive skills (EDS task)

Column ‘S’ (task 6.a)

If the mass or the volume of the water appears in the answer. Mark: 1

In any other case. Mark: 0

1 item: higher order cognitive skills (EDS task)

Column ‘T’ (task 6.a)

If the temperature of the water or the change in temperature of the water or the temperatures of the water before the beginning and after the finishing of the warming appear in the answer. Mark: 1

In any other case. Mark: 0

1 item: higher order cognitive skills (EDS task)

Column ‘U’ (task 6.b)

If a circumstance/condition appears in the answer that influences indeed the result of the measurement (see examples in the teacher's guide). Mark: 1

In any other case. Mark: 0

1 item: higher order cognitive skills (EDS task)

Column ‘V’ (question 6.c)

If the answer proves that the student understands: there is a loss of heat or not only the water is warmed, but the materials around it too (e.g. the vessel, the air). Mark: 1

In any other case. Mark: 0

1 item: higher order cognitive skills (EDS task)

Column ‘W’

1: If the gender in the answer concerning the student's gender is a boy.

2: If the gender in the answer concerning the student's gender is a girl.

Column ‘X’

The student's grade in science in the end of the 6th grade.

Column ‘Y’

The answer given by the student to the question how much he liked the subject “science”. (Insert the number circled by the student.)

Column ‘Z’

The answer given by the student to the question how important he thinks it is in science to test ideas by experiments. (Insert the number circled by the student.)

END OF EVALUATION OF TEST 0

Appendix 3: Test 1

School number: …… Teacher number: …… Group number: …… Student number: ……

The aim of our research is to make the teaching of chemistry as interesting and effective as possible.

Thank you for completing this test according to the best of your knowledge, because you help our work.

1. (a) What colour can be seen when iodine dissolves in petrol?

……

(b) Alcohol dissolves in water and in petrol. Explain this using your knowledge of the structure of alcohol particles.

……

2. (a) Sea salt is made from seawater. Seawater is left to evaporate in sandy reservoirs. The solid that can be collected is a mixture of salt contaminated by sand. For the further treatment it is important to know what mass of salt is contained by a 100 g of sand-polluted salt. How could you separate the salt from the sand and determine the mass of the purified salt? Write down the steps of the designed process.

……

(b) Give a possible error that would mean the measurement as described would not be accurate.

……

3. (a) How could you test that the glass was full with carbon dioxide?

……

(b) How could you test that the potato contains starch?

……

4. (a) Using your knowledge of the structure of matter, explain why sugar dissolves more slowly in cold water than in hot water. The cold water and the hot water have the same volume, and the same amount of sugar is put into both. Neither is stirred.

……

(b) We make a solution from plant leaves that contain green coloured materials. The dissolved substances in this solution are separated using a white chalk standing in the solution (as shown in the figure). The solution is drawn up into the chalk and the dissolved substances separate, appearing as coloured stripes of different heights. Why do the dissolved substances travel at different speeds up the chalk?

……

5. How can hydrogen gas be produced in a test tube?

……

6. At a party organised for adults a spritzer is made using 1 dl (100 cm³) wine containing 12 percent by volume alcohol and 3 dl (300 cm³) soda water. Calculate the percent by volume of alcohol in the spritzer; show your working.

……

7. (a) We mix hydrochloric acid and sodium-hydroxide solutions. Why can red cabbage juice be used to decide whether the particles influencing the acidity/alkalinity of hydrochloric acid or the sodium-hydroxide solution were in excess before we mixed the solutions?

……

(b) In a messy household the following substances are kept in unlabelled boxes: tartaric acid, caustic soda (NaOH), powdered Hyperol and baking soda. We want to decide which substance is stored in which box. An aqueous solution of each substance has been made. Plenty of clean test tubes are available, as is phenolphthalein indicator. Samples taken from the solutions can be added to one another. Write a plan listing the steps you would take to identify the substances. Record your expected observations and write your conclusions.

……

Please give us the following information!

The end-of-semester grade you got in chemistry:

• The larger the number you circle, the more you like chemistry:

(0: you did not like it at all, 4: you really liked it): 0 1 2 3 4

• The bigger the number, the more you consider it is important to test ideas in sciences by experiments (0: not important at all; 4: very important): 0 1 2 3 4

• The bigger the number, the more you agree with the following statement:

“I prefer the step-by-step experiments to the ones that I have to design.”

0 1 2 3 4

• Continue the following sentences. I find the most interesting at the chemistry lessons, when………

……………

I find the most boring at the chemistry lessons, when……………

……………

Instructions given to the teachers to mark the students’ answers of Test 2

Teachers correcting the test can judge whether the particular answer is accepted, since that should be determined by the meaning of the answer.

Please fill in the columns of the Excel spreadsheet with the marks obtained by following the instructions below. A student's marks should be written in the appropriate row of the Excel spreadsheet.

Columns ‘AA’–‘AL’ contain marks for the student's answers.

Column ‘AM’ contains the student's end-of-semester grade in chemistry.

Columns ‘AN’–‘AP’ contain the student's attitude responses.

Column ‘AA’ (task 1.a)

If one of the colours given in the teacher's guide (purple/pink/lilac) is in the answer and no other colour(s) is/are mentioned. Mark: 1

In any other case. Mark: 0

1 item: recall (disciplinary content knowledge task: DCK task)

Column ‘AB’ (task 1.b)

The particles/molecules of alcohol have water-friendly/polar/water soluble and oil-friendly/apolar/oil-soluble parts. Mark: 1

In any other case. Mark: 0

1 item: understanding (DCK task)

Column ‘AC’ (task 2.a)

Alternative answer I:

Step 1: Adding water to the mixture (and stirring). Mark: 1

Step 2: The sand is filtered or the solution is decanted. Mark: 1

Step 3: The water is evaporated from the solution or the sand is dried. Mark: 1

Step 4: Measurement of the mass of the salt after the evaporation and drying or measurement of the mass of the dried salt and subtraction of its mass from the 100 g mass of the original mixture. Mark: 1

Alternative answer II:

Step 1: Measurement of the mass of water. Mark: 1

Step 2: Adding the water with known mass to the mixture (and stirring). Mark: 1

Step 3: The sand is filtered or the solution is decanted. Mark: 1

Step 4: Measurement of the mass of filtrate/salt solution and subtraction of the mass of added water. The increase of mass was caused by the salt. Mark: 1

4 item: higher order cognitive skills (experimental design task: EDS task)

Column ‘AD’ (task 2.b)

If a concrete and acceptable factor that can cause error (see the teacher's guide) appears in the answer. Mark: 1

In any other case. Mark: 0

1 item: higher order cognitive skills (EDS task)

Column ‘AE’ (task 3.a)

Use of a burning splint/candle or lime water. Mark: 1

In any other case. Mark: 0

1 item: recall (DCK task)

Column ‘AF’ (task 3.b)

Use of iodine solution. Mark: 1

In any other case. Mark: 0

1 item: application (DCK task)

Column ‘AG’ (task 4.a)

Particles move faster at higher temperature or particles move more slowly at lower temperature. Mark: 1

In any other case. Mark: 0

1 item: understanding (DCK task)

Column ‘AH’ (task 4.b)

Alternative answer I:

Particles of the different solutes are attracted/bonded with different forces to the particles of chalk or different strength relationship forms among the particles of chalk and the different particles of solutes. Mark: 1

Alternative answer II:

The reason is the different polarity or structure of the particles. Mark: 1

In any other case. Mark: 0

1 item: application (DCK task)

Column ‘AI’ (task 5)

Alternative answer I:

From (hydrochloric) acid with zinc or magnesium or any other metal that is applicable. Mark: 1

Alternative answer II:

From water with an alkali metal or alkali earth metal. Mark: 1

Alternative answer III:

From water with electrolysis if the hydrogen is collected in a separate test tube. Mark: 1

In any other case. Mark: 0

1 item: recall (DCK task)

Column ‘AJ’ (task 6)

The volume of the solution is increased four times by the dilution. Therefore the concentration is decreased to one-fourth of the original (3 percent by mass) or any other correct calculation.

Mark: 1

0: If the calculation is not correct.

1 item: application (DCK task)

Column ‘AK’ (task 7.a)

Alternative answer I:

The red cabbage juice indicates with different colours if the solution is acidic/neutral/basic. Mark: 1

Alternative answer II

The red cabbage juice is a natural (acid–base) indicator. Mark: 1

In any other case. Mark: 0

1 item: understanding (DCK task)

Column ‘AL’ (task 7.b)

Alternative answer I:

Step 1: Phenolphthalein is added to one part of each sample. The 2 colourless solutions contain tartaric acid and hydrogen peroxide. The two purple ones contain caustic potash and baking soda. Mark: 1

Step 2: One solution that is colourless with phenolphthalein is added to another part of the solutions that became purple (pink, cyclamen, magenta) with phenolphthalein. Mark: 1

(a) If there is a fizz in one case, then that solution which was colourless with phenolphthalein contains tartaric acid and the other solution which was colourless with phenolphthalein contains hydrogen peroxide. The solution that was purple with phenolphthalein and fizzed contains baking soda, and the one which did not fizz is the solution of caustic potash. Mark: 1

(b) If there is no fizz in any of the two cases, then the solution which was colourless with phenolphthalein contains hydrogen peroxide and the other solution which was colourless with phenolphthalein contains tartaric acid. Mark: 1

In the latter case a step 3 is also needed. The tartaric acid solution is added to both samples that was purple with phenolphthalein. The one that fizzes is the solution of baking soda, and the one that does not fizz is the solution of caustic potash. Mark: 1

Step 2 (alternative): One solution that became purple with phenolphthalein is added to another part of the solutions that was colourless with phenolphthalein. Mark: 1

(a) If there is a fizz in one case, then this solution which was purple with phenolphthalein contains baking soda and the other solution which was purple with phenolphthalein contains caustic potash. The solution that was colourless with phenolphthalein and fizzed contains tartaric acid, and the one that did not fizz is the solution of hydrogen peroxide. Mark: 1

(b) If there is no fizz in any of the two cases, then the solution which was purple with phenolphthalein contains caustic potash and the other solution which was purple with phenolphthalein contains baking soda. Mark: 1

In the latter case a step 3 is also needed. The baking soda solution is added to both samples that was colorless with phenolphthalein. The one that fizzes is the solution of tartaric acid, and the one that does not fizz is the solution of hydrogen peroxide. Mark: 1

Alternative answer II:

Step 1: Pairs of the samples are added together in every possible combination. When there is a fizz, one of those solutions is tartaric acid and the other is baking soda. Mark: 1

Step 2: The solutions of tartaric acid and baking soda are separated by adding phenolphthalein to them. The one that is purple is baking soda. Mark: 1. The other is tartaric acid. Mark: 1

Step 3: Caustic potash and hydrogen peroxide are identified by adding phenolphthalein to both of them. The one that becomes purple is caustic potash, and the other is hydrogen peroxide. Mark: 1.

Alternative answer III: Any other plan that ensures the correct identification of the substances. (For example, according to a method that has been tried in practice and works: cca. the same mass of caustic potash and baking soda are dissolved in cca. the same volume of water and the same number of drops of phenolphthalein is added; the purple colour of the caustic soda solution is more intense than the colour of the solution of baking soda.) In the case of alternative solutions for each substance that could be correctly identified 1 mark is given (altogether 4 substances, 4 marks).

4 item: higher order cognitive skills (EDS task)

Column ‘AM’

The student's end-of-semester grade in chemistry.

Column ‘AN’

Insert the number circled by the student.

Column ‘AO’

Insert the number circled by the student.

Column ‘AP’

Insert the number circled by the student.

Evaluation of the answers given to the last two questions concerning motivation is not required.

END OF EVALUATION OF TEST 1.

Appendix 4: tables for statistical analysis

Table 10 Rankings of the participating schools (according to the school ranking of the website “legjobbiskola.hu”)

Ranking	High	Medium	Low
a Number of students completing Test 0.
School ranking	4, 15, 17, 29, 32	36, 41, 72, 84, 91, 114, 170	198, 214, 289, 325, 416, 526
N	254	345	284

Table 11 The effects of the assumed parameters (“sources”) on the students’ total scores on Test 0 (T0_total) (N = 853)

Source^a	df	F	Sig.	Partial eta squared
a Parameters are called “sources” in SPSS.
Group	2, 846	10.36	0.000	0.024
Mother's education	1, 846	59.97	0.000	0.066
School ranking	2, 846	11.81	0.000	0.027
Gender	1, 846	0.000	0.997	0.000

Table 12 The effects of the assumed parameters (“sources”) on the students’ total scores on Test 0 (T0_total) before and after applying the matched pair method

Source	Before (N = 853)		After (N = 510)
Mother's education	X ²(2) = 13.4	p = 0.001	X ²(2) = 0.000	p = 1.000
School ranking	X ²(4) = 103	p = 0.000	X ²(4) = 0.170	p = 0.997
Gender	X ²(2) = 1.28	p = 0.527	X ²(2) = 2.66	p = 0.265

Table 13 The effects of the assumed parameters (“sources”) on the students’ scores on Test 0 after applying the matched pair method (N = 510)

Source	df	F	Sig.	Partial eta squared
On the total test (T0_total)
Group	2, 503	0.002	0.998	0.000
Mother's education	1, 503	48.01	0.000	0.087
School ranking	2, 503	1.321	0.268	0.005
Gender	1, 503	0.300	0.584	0.001

On the DCK tasks (T0_DCK)
Group	2, 503	0.885	0.413	0.004
Mother's education	1, 503	38.22	0.000	0.071
School ranking	2, 503	1.367	0.256	0.005
Gender	1, 503	0.1509	0.220	0.003

On the EDS tasks (T0_EDS)
Group	2, 503	0.622	0.537	0.002
Mother's education	1, 503	19.52	0.000	0.037
School ranking	2, 503	0.365	0.694	0.001
Gender	1, 503	0.102	0.750	0.000

Table 14 The effects of the assumed parameters (“sources”) and covariates on the students’ scores on Test 1 (N = 510)

Source	df	F	Sig.	Partial eta squared
On the total test (T1_total)
Group	2, 499	3.492	0.031	0.014
Mother's education	1, 499	0.001	0.979	0.000
School ranking	2, 499	4.350	0.013	0.017
Gender	1, 499	0.645	0.422	0.001
T0_total (%)	1, 499	0.999	0.318	0.002
Grade	1, 499	0.007	0.935	0.000
Enjoyment of subject	1, 499	0.326	0.568	0.001
Importance of experiments	1, 499	0.035	0.851	0.001

On the DCK tasks (T1_DCK)
Group	2, 499	8.076	0.000	0.031
Mother's education	1, 499	0.437	0.509	0.001
School ranking	2, 499	2.686	0.069	0.011
Gender	1, 499	0.232	0.630	0.000
T0_total (%)	1, 499	4.034	0.045	0.008
Grade	1, 499	1.538	0.215	0.003
Enjoyment of subject	1, 499	0.652	0.420	0.001
Importance of experiments	1, 499	0.007	0.935	0.000

On the EDS tasks (T1_EDS)
Group	2, 499	0.500	0.607	0.002
Mother's education	1, 499	0.733	0.392	0.001
School ranking	2, 499	3.586	0.028	0.014
Gender	1, 499	0.836	0.361	0.002
T0_total (%)	1, 499	20.12	0.157	0.004
Grade	1, 499	1.374	0.242	0.003
Enjoyment of subject	1, 499	0.000	0.982	0.000
Importance of experiments	1, 499	0.284	0.594	0.001

Table 15 The effects of the assumed parameters (“sources”) and covariates on the changes in attitudes and grades (N = 510)

Source	df	F	Sig.	Partial eta squared
Enjoyment of subject (AQ1)
Group	2, 500	1.755	0.174	0.007
Mother's education	1, 500	0.057	0.812	0.000
School ranking	2, 500	6.783	0.001	0.026
Gender	1, 500	0.000	0.985	0.000
T0_total (%)	1, 500	73.58	0.007	0.015
Grade	1, 500	1.939	0.164	0.004
Importance of experiments	1, 500	21.46	0.000	0.041

Importance of experiments (AQ2)
Group	2, 500	0.613	0.542	0.002
Mother's education	1, 500	0.159	0.691	0.000
School ranking	2, 500	2.301	0.101	0.009
Gender	1, 500	0.322	0.571	0.001
T0_total (%)	1, 500	1.570	0.211	0.003
Grade	1, 500	2.771	0.097	0.006
Enjoyment of subject	1, 500	3.364	0.067	0.007

Changes in grades
Group	2, 500	1.614	0.200	0.006
Mother's education	1, 500	0.190	0.663	0.000
School ranking	2, 500	4.045	0.018	0.016
Gender	1, 500	1.793	0.181	0.004
T0_total (%)	1, 500	2.382	0.123	0.005
Enjoyment of subject	1, 500	2.608	0.107	0.005
Importance of experiments	1, 500	0.002	0.966	0.000

Preference of step-by-step experiments (AQ3)
Group	2, 500	1.891	0.152	0.008
Mother's education	1, 500	0.035	0.852	0.000
School ranking	2, 500	2.408	0.091	0.010
Gender	1, 500	0.001	0.970	0.000
T0_total (%)	1, 500	0.007	0.936	0.000
Enjoyment of subject	1, 500	1.568	0.211	0.003
Importance of experiments	1, 500	1.156	0.283	0.002
Grade	1, 500	2.327	0.128	0.005

Table 16 Correlations among the assumed covariates (N = 510)

	Grade	Enjoyment of subject	Importance of experiments	T0_total
*p < 0.01.
Grade	1	0.188*	0.084	0.222*
Enjoyment of subject		1	0.305*	0.219*
Importance of experiments			1	0.150*

Table 17 The estimated average of the change in the ‘Enjoyment of subject’ according to the ranking of their schools and the significance of difference to the results of the students of the high ranking schools (N = 510)

School ranking	Estimated average change in the ‘Enjoyment of subject’	Significance of difference	Partial eta squared
Low	−0.429	p = 0.001	0.023
Medium	−0.173	p = 0.035	0.009
High	+0.151	—	—

Table 18 The estimated average of the change in the ‘Importance of experiments’ according to the ranking of their schools and the significance of difference to the results of the students of the high ranking schools (N = 510)

School ranking	Estimated average change in the ‘Importance of experiments’	Significance of difference	Partial eta squared
Low	−0.647	p = 0.028	0.010
Medium	−0.706	p = 0.039	0.008
High	−1.055	—	—

Table 19 The estimated average of the change in the students’ grades according to the ranking of their schools and the significance of difference to the results of the students of the high ranking schools (N = 510)

School ranking	Estimated average change in grades	Significance of difference	Partial eta squared
Low	−0.205	p = 0.005	0.015
Medium	−0.338	p = 0.055	0.007
High	−0.560	—	—

Table 20 The estimated average of the students’ ‘Preference of step-by-step experiments’ according to the ranking of their schools and the significance of difference to the results of the students of the high ranking schools (N = 510)

School ranking	Estimated average change in grades	Significance of difference	Partial eta squared
Low	3.337	p = 0.005	0.015
Medium	3.469	p = 0.055	0.007
High	3.575	—	—

Acknowledgements

This study was funded by the Content Pedagogy Research Program of the Hungarian Academy of Sciences (Project No.: 471026). Many thanks for all the colleagues’ and students’ work.

References

Baird J. R., (1990), Metacognition, purposeful inquiry and conceptual change, in Hegarty-Hazel E. (ed.), The student laboratory and the science curriculum, London: Routledge, pp. 183–200.
Bell R. L., Smetana L. and Binns I., (2005), Simplifying inquiry instruction: assessing the inquiry level of classroom activities, Sci. Teach., 72 (7), 30–33.
Blanchard M. R., Southerland S. E., Osborne J. W., Sampson V. D., Annetta L. A. and Granger E. M., (2010), Is inquiry possible in light of accountability? A quantitative of the relative effectiveness of guided inquiry and verification laboratory instruction, Sci. Educ., 94, 577–610.
Boesdorfer S. B. and Livermore R. A., (2018), Secondary school chemistry teacher's current use of laboratory activities and the impact of expense on their laboratory choices, Chem. Educ. Res. Pract., 19, 135–148.
Bolte C., Streller S. and Hofstein A., (2013), How to motivate students and raise their interest in chemistry education? in Eilks I. and Hofstein A. (ed.), Teaching Chemistry – A Studybook, Sense Publishers, pp. 67–95.
Branan D. and Morgan M., (2010), Mini-Lab Activities: Inquiry-Based Lab Activities for Formative Assessment, J. Chem. Educ., 87, 69–72.
Briggs M., Long G. and Owens K., (2011), Qualitative Assessment of Inquiry-Based Teaching Methods, J. Chem. Educ., 88, 1034–1040.
Bruck L. B. and Towns M. H., (2009), Preparing Students to Benefit from Inquiry-Based Activities in the Chemistry Laboratory: Guidelines and Suggestions, J. Chem. Educ., 56, 820–822.
Bullock M. and Ziegler, A., (1999), Scientific reasoning: Developmental and individual differences, in Weinert F. E. and Schneider W. (ed.), Individual development from 3 to 12: Findings from the Munich Longitudinal Study, Cambridge: Cambridge University Press, pp. 38–54.
Chemistry Curricula of Hungary, grade 7 – grade 12, (2012), available online: http://kerettanterv.ofi.hu/04_melleklet_7-12/index_6_gimn.html (last visited: 07.01.2019).
Chen Z. and Klahr D., (1999), All other things being equal: children's acquisition of the control of variables strategy, Child Dev., 70, 1098–1120.
Cole M. and Cole S. R., (2006), Fejlődéslélektan, Budapest: Osiris Kiadó, pp. 481–505; pp. 642–656, Hungarian translation of Cole M. and Cole S. R., (2001), The Development of Children, 4th edn, New York: Worth Publishers.
Cheung D., (2011), Teacher Beliefs about Implementing Guided-Inquiry Laboratory Experiments for Secondary School Chemistry, J. Chem. Educ., 88, 1462–1468.
Chinn C. A. and Malhotra B. A., (2001), Epistemologically authentic scientific reasoning, in Crowley K., Schunn C. D. and Okada T. (ed.), Designing for science: implications from everyday, classroom, and professional settings, Mahwah, NJ: Lawrence Erlbaum, pp. 351–392.
Chinn C. A. and Malhotra B. A., (2002), Epistemologically authentic inquiry in schools: a theoretical framework for evaluating inquiry tasks, Sci. Educ., 86, 175–218.
Criswell B., (2012), Framing Inquiry in High School Chemistry: Helping Students See the Bigger Picture, J. Chem. Educ., 89, 199–205.
Crujeiras-Pérez B. and Jiménez-Aleixandre M. P., (2017), High school students’ engagement in planning investigations: findings from a longitudinal study in Spain, Chem. Educ. Res. Pract., 18, 99–112.
Csíkos C., Korom E. and Csapó B., (2016), Tartalmi keretek a kutatásalapú tanulás tudáselemeinek értékeléséhez a természettudományokban, Iskolakultúra, 26, 17–29.
Dean D. and Kuhn D., (2007), Direct instruction vs. discovery: the long view, Sci. Educ., 91, 384–397.
Deters K. M., (2005), Student Opinions Regarding Inquiry-Based Labs, J. Chem. Educ., 82, 1178–1180.
Educational Authority of Hungary, (2018), Published statistics of the final exams, available online: https://www.ketszintu.hu/publicstat.php (last visited: 31.12.2018.).
Ford M. J. and Forman E. A., (2006), Redefining disciplinary learning in classroom contexts, in Green J. A. and Luke A., (ed.), Review of research in education, vol. 30.
Fradd S. H., Lee O., Sutman F. X. and Saxton M. K., (2001), Promoting science literacy with English language learners through instructional materials development: a case study, Bilingual Res. J., 25, 417–439.
Furtak E. M., Siedel T., Iverson H. and Briggs D. C., (2012) Experimental and Quasi-Experimental Studies of Inquiry-Based Science Teaching: A Meta-Analysis, Rev. Educ. Res., 82, 300–329.
Gott R. and Dugan S., (1995), Investigative work in the Science curriculum, Buckingham: Open University Press.
Hake R. R., (1998), Interactive-engagement versus traditional methods: a six-thousand-student survey of mechanics test data for introductory physics courses, Am. J. Phys., 66(1), 64–74.
Hane E. N., (2007), Use of an inquiry-based approach to teaching experimental design concepts in a general ecology course, in Teaching Issues and Experiments in Ecology (TIEE) project of the Education and Human Resources Committee of the Ecological Society of America, vol. 5, pp. 1–19, http://tiee.ecoed.net (last visited: 19.04.2019.).
Herrington D. G., Yezierski E. J., Luxford K. M. and Luxford C. J., (2011), Target inquiry: changing chemistry high school teachers’ classroom practices and knowledge and beliefs about inquiry instruction, Chem. Educ. Res. Pract., 12, 74–84.
Kanari Z. and Millar R., (2004), Reasoning from data: how students collect and interpret data in science investigations, J. Res. Sci. Teach., 41, 748–769.
Klahr D., (2000), Exploring science: the cognition and development of discovery processes, Cambridge: MIT Press.
Klahr D. and Nigam M., (2004), The equivalence of learning paths in early science instruction: effects of direct instruction and discovery learning, Psychol. Sci., 15, 661–667.
Koslowski B., (1996), Theory and evidence: the development of scientific reasoning, Cambridge: MIT Press.
Krathwohl D. R., (2002), A Revision of Bloom's Taxonomy: An Overview, in Theory into Practice, 41(4), 212–218, College of Education, The Ohio State University.
Kuhn D., (2010), What is Scientific Thinking and How Does it Develop? in Goswami U. (ed.), Handbook of Childhood Cognitive Development, Blackwell.
Kuhn D. and Franklin S., (2006), The second decade: what develops (and how), in Damon W. and Lerner R. M. (ser. ed.), Kuhn D. and Siegler R. S. (vol. ed.), Handbook of child psychology: Cognition, perception and language, 6th edn, Hoboken, NJ: John Wiley & Sons, vol. 2, pp. 953–993.
Kuhn D. and Ho V., (1980), Self-directed activity and cognitive development, J. Appl. Dev. Psychol., 1, 119–130.
Kuhn D. and Pease M. (2008), What needs to develop in the development of inquiry skills? Cognit. Instruct., 26, 512–559.
Kuhn D. and Phelps E., (1982), The development of problem-solving strategies, in Reese H. (ed.), Advances in child development and behavior, Academic Press, vol. 17, pp. 1–44.
Lehrer R., Schauble L. and Petrosino A. J., (2001), Reconsidering the role of experiment in science education, in Crowley K., Schunn C. D. and Okada T. (ed.), Designing for science: implications from everyday, classroom, and professional settings, Mahwah, NJ: Lawrence Erlbaum, pp. 251–278.
Lehrer R., Schauble L. and Lucas, D., (2008), Supporting development of the epistemology of inquiry, Cognit. Dev., 23, 512–529, [special issue, The Development of Scientific Thinking, B. Sodian and M. Bullock, ed.].
Metz K. E., (2004), Children's understanding of scientific inquiry: their conceptualization of uncertainty in investigations of their own design, Cognit. Instruct., 22, 219–290.
Minner D. D., Levi A. J. and Century J., (2010) Inquiry-based Science Instruction – What Is It and Does It Matter? Results from a Research Synthesis Years 1984 to 2002, J. Res. Sci. Teach., 47, 474–496.
National Curriculum of Hungary, (2012), available online: http://ofi.hu/nemzeti-alaptanterv (last visited: 31.12.2018).
OECD, (2007), PISA 2006: Science Competences for Tomorrow's World, Paris: PISA, OECD Publishing, analysis, vol. 1, pp. 64–68.
OECD, (2013), PISA 2015 Draft Science Framework, Paris: PISA, OECD Publishing.
OECD, (2016), PISA 2015 Assessment and Analytical Framework: Science, Reading, Mathematic and Financial Literacy, Paris: PISA, OECD Publishing.
OECD, (2017), PISA 2015 Technical Report, Chapter 18, Computer-based tests, pp. 369–374.
ONeill D. K. and Polman J. L., (2004), Why educate “little scientists?” Examining the potential of practice-based scientific literacy, J. Res. Sci. Teach., 41, 234–266.
Øyehaug A. B. and Holt A., (2013), Students’ understanding of the nature of matter and chemical reactions – a longitudinal study of conceptual restructuring, Chem. Educ. Res. Pract., 14, 450–467.
Pintrich P. R., (2003), A Motivational Science Perspective on the Role of Student Motivation in Learning and Teaching Contexts, J. Educ. Psychol., 95, 667–686.
Plan of the National Curriculum of Hungary, (2018), https://www.oktatas2030.hu/wp-content/uploads/2018/08/a-nemzeti-alaptanterv-tervezete_2018.08.31.pdf (last visited: 31.12.2018).
Prensky M., (2001), Digital Natives, Digital Immigrants, in On the Horizon, NCB University Press, vol. 9 (No. 5).
PRIMAS project, (2013), Promoting inquiry-based learning (IBL) in mathematics and science education across Europe, IBL implementation survey report, 20.12.2013, http://www.primas-project.eu (last visited: 01.01.2019.).
Puntambekar S. and Kolodoner J. K., (2005), Toward implementing distributed scaffolding: helping students learn science from design, J. Res. Sci. Teach., 42, 185–271.
SAILS project, (2015), Strategies for Assessment of Inquiry Learning in Science, available online: http://www.sails-project.eu/ (last visited: 11.01.2019).
Schibeci R. A., (1984), Attitudes to science: an update, Stud. Sci. Educ., 11, 26–59.
Seviana H. and Talanquer V., (2014), Rethinking chemistry: a learning progression on chemical thinking, Chem. Educ. Res. Pract., 15, 10–23.
Siegler R. S. and Liebert R. M., (1975), Acquisition of formal scientific reasoning by 10 and 13 year-olds: designing a factorial experiment, Dev. Psychol., 11, 401–402.
Simon H. A., (2001), Seek and ye shall find, in Crowley K., Schunn C. D. and Okada T. (ed.), Designing for science: implications from everyday, classroom, and professional settings, Mahwah, NJ: Lawrence Erlbaum, pp. 5–20.
Sweller J., (1988), Cognitive Load during Problem Solving: Effects on Learning, Cognit. Sci., 12, 257–285.
Szalay, L., Tóth, Z., (2016), An inquiry-based approach of traditional ‘step-by-step’ experiments, Chem. Educ. Res. Pract., 17, 923–961.
Taber K. S., (2011), Inquiry teaching, constructivist instruction and effective pedagogy, Teach. Dev., 15, 257–264.
Taber K. S., (2013), Non-random thoughts about research, Chem. Educ. Res. Pract., 14, 359–362.
Taber, K. S., (2014), Ethical considerations of chemistry education research involving ‘human subjects’, Chem. Educ. Res. Pract., 15(2), 109–113.
Tomperi P. and Aksela M., (2014), In-service Teacher Training Project On Inquiry Based Practical Chemistry, LUMAT, 2(2), 2015.
Wenning C. J., (2007), Assessing inquiry skills as a component of scientific literacy. J. Phys. Teach. Educ. Online, 4(2), 21–24.
Wilkening F. and Sodian B., (2005), Scientific reasoning in young children: introduction, Swiss J. Psychol., 64, 137–139.
Zimmerman C., (2007), The development of scientific thinking skills in elementary and middle school, Dev. Rev., 27, 172–223.
Zoller U., (2001), Alternative Assessment as (Critical) Means of Facilitating HOCS-Promoting Teaching and Learning in Chemistry Education, Chem. Educ. Res. Pract., 2, 9–17.

Click here to see how this site uses Cookies. View our privacy policy here.