Luca
Szalay
*a,
Zoltán
Tóth
b,
Réka
Borbás
c and
István
Füzesi
d
aMTA-ELTE Research Group on Inquiry-Based Chemistry Education, Research Programme for Public Education Development of the Hungarian Academy of Sciences, Eötvös Loránd University, Faculty of Science, Institute of Chemistry, Pázmány Péter sétány 1/A, H-1117 Budapest, Hungary. E-mail: luca.szalay@ttk.elte.hu
bUniversity of Debrecen, Faculty of Science and Technology, Institute of Chemistry, Egyetem tér 1., H-4032 Debrecen, Hungary
cSzent István Secondary School, Ajtósi Dürer sor 15., 1146 Budapest, Hungary
dEötvös Loránd University, Bolyai János Practicing School, Bolyai János utca 11, H-9700 Szombathely, Hungary
First published on 18th January 2023
The results of an earlier four-year longitudinal research study on the development of experimental design skills led to the conclusion that 12–13 year old students probably need more help to design experiments than had been offered to them in that project. This paper reports the findings of the first year of a further four-year empirical research project. Its aim is to modify ‘step-by-step’ instructions for practical activities in a way that may enable the development of experimental design skills. Nine hundred and thirty-one (931) students have been involved. All were 12–13 years old in the beginning of the study (September 2021). Each year students spend six lessons (some or all of each) doing practical activities using worksheets we provide. The students were divided into three groups. Group 1 (control group) follow step-by-step instructions. Groups 2 and 3 are experimental groups. Group 2 follow the same instructions as Group 1 but also complete a scheme (a series of questions) on their worksheets concerned with the design of the experiment. Group 3 is required to design the experiments, guided by the scheme. The scheme asks indirect questions about the identification of the independent variable, the dependent variable and the constants. The impact of the intervention on the students’ experimental design skills (focusing on the identification and control of variables) and disciplinary content knowledge is measured by structured tests. After the first school year of the project it was clear that the applied type of instruction had a significant positive effect on the results of the Group 3 students’ experimental design skills (Cohen's d effect size: −0.28). However, no significant effect of the intervention could be detected on the changes in the Group 2 students’ experimental design skills (Cohen's d effect size: −0.11). ANCOVA analysis showed that these effects were due to a combination of several factors (mainly the intervention, school ranking and prior knowledge). This paper provides the interesting details of the results of the first year of the research and discusses the minor changes to the approach that have been made for the remaining three years of the project.
Many science educators argue that the organising principle of science education should be the scientific process rather than subject knowledge (Klainin, 1988). Further, this should be for all students, not just those seeking a career in science, e.g.Fensham's 1982 review. Often referred to as scientific literacy (Schwartz et al., 2006), it places the higher order learning and thinking skills at the forefront of a science education. These skills include analogical thinking, deductive thinking, inductive thinking, problem solving, creative/divergent thinking and critical thinking (Csapó, 2022). According to Reid and Amanat Ali (2020), an important goal of formal science education is to develop thinking skills that will prove useful in life, rather than the mere transmission of facts and understanding. Therefore, rather than simply promoting ‘the scientific method’, the school laboratory should focus on how we know what we know and why we believe certain claims rather than competing alternatives (Duschle and Grandy (ed.), 2008). Applying knowledge requires learning by thinking, which leads to deeper understanding, a fundamental condition for knowledge transfer. However, transfer is not automatic, knowledge will be more widely applicable if its applicability is included in the learning objectives (Csapó, 2022).
Experiments in science teaching and learning should help students understand and practice the scientific process. To achieve this, purposeful practical work has been interpreted as data collection aimed at developing learners’ understanding of the scientific method (Abrahams and Millar, 2008). The activities that students engage in while learning science should be effective in stimulating the development of their thinking. Al-Ahmadi (2008) compiled a list of characteristics that distinguish scientific thinking from critical thinking. Prominent among these is the search for experimental evidence to support or reject a hypothesis. This is not surprising, since the “scientific method is a method of procedure that has characterized natural science since the 17th century, consisting in systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses” (Oxford Dictionaries, 2016). Therefore, evidence in science must be ‘empirical’, which according to the Merriam-Webster dictionary (2022) means ‘based on observation or experience’. Scientific thinking, as it manifests itself in the understanding and interpretation of scientific phenomena, has elements that are more closely related to scientific cognition, such as the identification and control of variables, causality, hypothesis generation and testing, and the design of experiments (Csapó, 2022). In doing so, learners can also gain control over their own thinking processes as they engage in learning. In other words, they can develop metacognitive learning skills (Kuhn et al., 2000, Hofstein, 2015). These practical (‘hands-on’) and reflective (‘mind-on’) activities delegate more responsibility to the learners for their own learning. Therefore, a general understanding of formal scientific epistemologies, and not only the performance of specific types of routine experiments, should be the main goal of student learning (Driver et al., 2000; Kind, 2003).
Table 1 summarises a simple classification of the laboratory activities. These four types of inquiry correspond to the four levels of inquiry presented by Schwab (1962), Herron (1971) and Fay et al. (2007). The rubric was expanded to five levels and the levels named by Bruck et al. (2008). This was applied by Bretz et al. (2016). Fradd et al. (2001) defined six levels of inquiry according to the role of the teacher and the students. Different names have also been used for the various levels (e.g.Wenning, 2007). The names of levels in Table 1 are taken from Tafoya et al. (1980) and have also been used by Walker (2007). There has been some confusion over the years about the naming of the levels, but for simplicity the ones showed in Table 1 are used here. Confirmatory/closed or verification-type experiments are not considered as inquiry-based learning (e.g.Xu and Talanquer, 2013). Open inquiries would obviously be the most credible scientific practice. However, Varadarajan and Ladage (2022) question its pedagogical feasibility on a large scale, as it cannot be guaranteed that all the necessary chemicals and equipment are available. Apotheker (2019) argues that it is crucial to involve students in some way in the design of the experiment, which means that structured inquiry is not good enough for this purpose. This leaves the guided or bounded inquiry, that seems useful and can still be implemented in school chemistry lessons. In a guided-inquiry format students are required to design the experiment. It is more likely, therefore, that they understand what they have done and why they have done it (Burke et al., 2006).
Type of laboratory activities | Does the learner know in advance… | ||
---|---|---|---|
…the research question? | …the research method? | …the explanation? | |
Open inquiry | No | No | No |
Guided/bounded inquiry | Yes | No | No |
Structured inquiry | Yes | Yes | No |
Confirmation/closed experiments | Yes | Yes | Yes |
According to Apotheker (2019), it is generally not difficult to convert the ‘cookbook-type’ (‘recipe-like’) experiments found in most textbooks into ‘more open’ experiments. To do this, students should be aware of the research question related to the experiment. Hattie (2008) describes these activities as needing to be open-ended, as they do not aim to achieve a single ‘right’ answer to a particular question they are addressing, but rather to engage students in the process of observation, questioning, experimentation and discovery, and in learning to analyse and reason. Hennah et al. (2022) have developed a framework for understanding how to positively influence the outcome of a secondary school laboratory task without having to change the hands-on practical tasks. They considered it key to place greater emphasis on how students talk during the activity and what they talk about (Mercer, 2007).
Concerning chemistry in schools, Hofstein pointed out in 2015 that for many chemistry teachers (and often curriculum developers), practical work still involves simple ‘recipe-like’ activities – ‘hands-on’ rather than ‘mind-on’. He notes that many teachers do not perceive that laboratory activities have the important purpose of helping students to understand how scientific knowledge is developed and used in the scientific community. Abrahams and Millar (2008) found that teachers do not focus on developing students’ understanding of scientific inquiry procedures. Ferreira et al. (2022) claims that experimentation is generally not common in Brazilian schools, although students in private schools are given more opportunities to do so. Unfortunately, Brazil is not the only country where this happens. Lack of knowledge about various instruction methods, as well as lack of laboratory assistants and equipment are common problems in Hungarian schools (Kertész and Szalay 2009), and probably elsewhere.
According to Reid and Amanat Ali (2020), overcrowded curricula and the demands of national assessment are obstacles to teaching thinking skills. They argue that thinking skills are very difficult to measure in formal school examinations. It is much easier to assess accurate recall of knowledge and known procedures. Therefore, where teachers and schools are assessed based on students’ performance in examinations, repeated calls for teaching thinking skills are unlikely to make a difference. Cole (2015) confirms that one of the barriers to adapting active learning strategies is the fear that content coverage will suffer.
It is not very helpful either that the results of research in the field of inquiry-based science education are rather mixed and therefore not convincing enough (Szalay et al., 2020). Hattie (2008), in a review of numerous meta-analyses on teaching science through inquiry/activity-based methods, summarised the average effect size as only medium to low. There are also large differences in the effects on content, process skills and critical thinking. Looking for the reasons, Johnstone and Wham may be cited, who argued as early as 1982 that teachers underestimate the high cognitive demands of practical work on the learner – overloading the student's memory. Reid and Amanat Ali (2020) agree with Johnstone and El-Banna (1989), Kirschner et al. (2006) and Mayer (2010) that the extent of thinking is regulated by the capacity of working memory. Reid and Amanat Ali (2020) also warn that scientific thinking probably is not accessible until around 15–16 years of age. In other words, no earlier than formal operational thinking (first defined by Piaget) begins to develop between 12 and 15 years of age (Wadsworths, 1979). Serumola (2003) found little evidence that scientific thinking develops between the ages of 12 and 15. However, they say that critical thinking can be developed at younger ages. Al-Ahmadi (2008), on the other hand, found strong evidence that scientific thinking can be developed between the ages of about 16 and 18.
Ferreira et al. (2022) add to the reasons listed above the resistance of students to carrying out inquiry-based activities, arguing that this may be because the activities require autonomy from learners that they are not used to. However, they also find that inquiry-based activities had a positive effect on students’ motivation. Nevertheless, according to Eichler (2022), it still appears to be an ongoing challenge to promote both skill-based learning and a deeper conceptual understanding of scientific thinking.
• Content knowledge and development of thinking skills can help one another.
Reid and Amanat Ali (2020) emphasised that developing thinking skills takes time, which means reducing the content of the curriculum. They named a set of skills that characterise critical thinking: “questioning judgements and evaluations, weighing arguments, judging the quality of evidence, evaluating claims, and credibility of sources, being open minded and aware of implicit assumptions, questioning possible interpretations”. These are also applied in the scientific process, but in the latter the evidence is experimental. Willingham (2019) concludes that general critical thinking skills are not transferable from one subject to another, but that discipline-specific critical thinking skills (close transfer) can be explicitly taught. Therefore, critical thinking about open-ended problems is enabled by broad content knowledge of the domain. Cannady et al. (2019) called critical thinking in the context of applying the scientific method as scientific sense making (SSM). They stressed the importance of a meta-level understanding of the nature of science. Their conclusion was that SSM is necessary to better understand content knowledge, but teaching SSM requires a content-rich classroom first and foremost. Thinking cannot be learned as an abstract skill, it requires the concepts taught in content knowledge and the connections between them. They also state that SSM can be a widely transferable skill across content areas within science. This is supported by Bangert-Drowns and Bunkert, who wrote as early as 1990 that inquiry-based instruction has been shown to have significant benefits in the domain in addition to transferable critical thinking skills. Nokes-Malach and Mestre (2013) argued that a focus on practices as sensemaking (rather than rituals or disconnected processes) in particular might facilitate transfer. Ferreira and colleagues (2022) found that students who did not know the theory (content knowledge) required for the inquiry-based activity preferred not to participate in inquiry-based activities. This is in line with other authors who had previously written that prior knowledge (theoretical and practical) is needed for inquiry-based activities to be successful (e.g.Crujeiras-Pérez and Jiménez-Aleixandre, 2017). The level of students’ current thinking skills also determines their ability to understand new material (Csapó, 2022). In summary, reduced but carefully selected and organised content knowledge is needed to further develop scientific thinking skills.
• The nature of classroom discourse is essential.
Cannady et al. (2019) found a main effect of classroom discourse on science learning. Students in learner-centred classrooms tended to make larger learning gains than students in teacher-centred classrooms. However, no interaction effect was found between classroom discourse and type of instruction (practical vs. traditional). Both styles can be effective, provided teachers ask appropriate questions during the lessons. The nature of these questions foster students’ thinking skills. However, according to Greenbowe and Hand (2005), students who participate in the Science Writing Heuristic (an approach that blends inquiry and writing practices) are better able to design an experiment to address a hypothesis compared to students who participate in a traditional cookbook laboratory activity, as measured on laboratory practical examinations tasks. Marx and colleagues (1998) argued that science teachers often have difficulty helping students to ask thoughtful questions, design investigations and draw conclusions from data. Therefore, tried and tested student sheets containing tasks that help meaningful learning could become very useful.
• Collaboration among teachers and educational researchers is important.
Researchers have shown that inquiry-based teaching is facilitated by teacher training and that long-term pedagogical support is needed to sustain the impact of teacher development groups (Stains et al., 2015). In design-based research, the scenario of teachers as formal researchers collaborating with academic researchers has become increasingly common, and this can be attributed in part to the increase in teacher professional development in practitioner-led research (Anderson and Shattuck, 2012, Cochran-Smith and Lytle, 2015; Lawrie, 2022). Sweitzer and Anderson (1983) concluded that teacher training programmes, both pre-service and in-service in different settings (University and school settings), resulted in changes in teachers’ knowledge, classroom behaviour and attitudes.
• Advantages and difficulties of making connection between classroom and real life.
Studies have shown that if students see how learning activities relate to their ‘real life’ and feel a greater sense of ownership of the inquiry, they are likely to be more motivated (Hofstein and Lunetta, 2004; Hofstein and Kesner, 2006). Teaching chemistry in the context of real life problems has been extensively studied and suggested as a way to increase student motivation (e.g.Mandler et al., 2012). This context-based approach helps learners to relate abstract chemical concepts to everyday life experiences, which facilitates the development of scientific understanding (e.g.Marks and Eilks, 2010, Baydere, 2021). Arguably, it is even better if an activity provides the opportunity to understand problem situations holistically, i.e., by addressing them in conjunction with relevant interrelated components within and beyond the problem scope (Nagarajan and Overton, 2019; Varadarajan and Ladage, 2022). This systems thinking approach has also been researched in recent years (Ben-Zvi Assaraf and Orion 2010; Nagarajan and Overton, 2019; Varadarajan and Ladage, 2022). Eight hierarchical features of systems thinking were described by Ben-Zvi Assaraf and Orion (2010). It starts with the identification of the components and the relationships between them. Thinking temporally with retrospection and prediction is the last and obviously the most difficult skill on the list. Previous research has shown that systems thinking, as a learner-centred approach, can be used to help students understand and work through real-world problems (Richmond, 1993; Jacobson, 2001; Koral Kordova et al. 2018). Jackson and Hurst (2021), based on interviews with university lecturers, reported a broadly positive perception of the integration of systems thinking into the university chemistry curriculum. However, Chen et al. (2019), who investigated assessment related to systems thinking and teaching strategies among undergraduate students involved in contextualised problem-solving found, that only 8% of students were able to engage in higher order systems thinking. Most participants showed difficulties in organising related systems’ components, understanding the cyclic nature of relationships among systems, and identifying limitations in a specific problem context. In 2020, Reid and Amanat Ali pointed out that the extent to which systems thinking can play a role at school level remained to be explored. In systems thinking, the whole is certainly more than the sum of the parts. What makes the situation worse is when the outcome can feed back and influence some of the input factors. Therefore, in systems thinking, the limitations of working memory are seen as the biggest obstacle to overcome. It has been suggested that if the consideration of the system allows great chunking (Miller, 1956), then this approach may be beneficial.
• Appropriate assessment is crucial.
Finally, inquiry-based activities will not be used often enough if the skills developed over time are not assessed. Al-Osaimi (2012) stressed that test development should start with a detailed analysis of the skills central to critical thinking. Schafer and Yezierski (2020) published the results of a discussion among secondary school chemistry teachers on best practices for assessing inquiry-based teaching. The teachers agreed the importance of assessing learning objectives at different conceptual levels. Interestingly, effective implementation of inquiry approaches enabled the difference in the achievement gap between males and females to be closed on an ACS diagnostic test (Greenbowe and Hand, 2005).
The third study described the second and third school years of the above-mentioned longitudinal study when the same sample of students were in 8th and 9th grade. As the method used in the first study did not seem to work for younger students (7th grade) and/or for the longer-term (covering a full school year) described in the second study, the research model was modified. From the beginning of the second school year, when the students in the same sample were in Grade 8, students in the two experimental groups were taught the relevant principles of experimental design, either after conducting step-by-step experiments or before conducting identical experiments that they had partly designed themselves. The control group continued to follow the step-by-step instructions without explanation of the principles of experimental design. Statistical analysis of the results measured at the end of Grades 7–9 showed that two parameters had a significant effect on students’ outcomes: intervention and school ranking. The intervention seems to have accelerated the progress of both experimental groups in Grade 8. However, the effect was temporary. The control group caught up with the experimental groups in terms of EDS development in Grade 9. The conclusion was that school ranking had an increasingly stronger effect on pupils’ EDS than the intervention. Unfortunately, the 4th year of the project could not be completed as planned in June 2020 due to disruptions caused by the Covid-19 pandemic.
Since then, the Education and Training Monitor 2020, HUNGARY has been published, which contains the following findings and recommendations: “Educational outcomes are below the EU average in the latest survey of the OECD Programme for International Student Assessment (PISA 2018). …and have decreased since 2009, with the sharpest decline in science. … Socio-economic background is a strong predictor of pupil performance and … – the gap in pupils’ performance between socio-economically advantaged and disadvantaged schools is the largest in the EU. … the Council of the EU recommended that Hungary take measures to ensure access to quality education for all”. The Future of Education and Skills 2030 position paper published by the OECD (2018) lays out three areas of skills: (1) cognitive and meta-cognitive skills (e.g. critical and creative thinking); (2) social and emotional skills (e.g. collaboration and empathy); (3) physical and practical skills (e.g. inquiry and problem-solving skills). However, their implementation in school curricula is not straightforward (e.g.Voogt and Roblin, 2012; Reimers and Chung, 2016). The National curriculum of Hungary (2020) valid from September 2020 explicitly prescribes the development of EDS, influencing textbooks and workbooks. Unfortunately, development of EDS will probably be still neglected if teachers are not convinced that EDS are important for their students’ entry to Higher Education. Clearly, further intensive educational research is also needed to foster positive changes. As part of this, our research team launched a new four-year longitudinal research project in September 2021, supported by the Research Programme for Public Education Development of the Hungarian Academy of Sciences.
RQ1: Did the intervention result in a significant change in students’ ability to design experiments (Experiment Design Skills, EDS) in either of the experimental groups compared to the control group?
RQ2: Did the students in the experimental groups score significantly differently on the Disciplinary Content Knowledge (DCK) questions because of the intervention compared to the students in the control group?
RQ3: Was there a difference in EDS between students in the two experimental groups?
A quasi-experimental design with a non-equivalent control group was applied. Group 1 (the control group) performed only step-by-step experiments. Group 2 carried out the same step-by-step experiments as Group 1, but after performing the experiments they had to answer questions on the worksheets about the design of the experiments, following the relevant parts of the fair testing scheme mentioned above (under the heading “Research questions”). The answers were discussed with them by their teachers. Group 3 also carried out the same experiments as Group 1, but they had to design the parts that were not written on their student sheet. To help them in this process, they had to answer questions about the design of the experiments according to a similar scheme used for Group 2. The answers were discussed with the Group 3 students by their teachers before the experiments were carried out (see Table 2).
No. | Topic | Experiments that Group 1 and Group 2 students had to do following step-by-step instructions, but Group 3 students had to design before doing the experiment | Context and elements of systems thinking in the “Let's think!” parts for motivation purposes. These are the same on the student worksheets of all the three groups. |
---|---|---|---|
1. | The particle model of matter | Students are given three coloured candies/sweets, cold and warm water, three flat bowls, a ruler and a glass. They can also use the stopper function of their mobile phones or take photographs with their mobiles. | How does the lifespan of glow sticks (and other items used for parties but also used to save lives in disasters such as earthquakes or floods) depend on the temperature at which they are kept before use? Students are explained that at higher temperatures, particles are more likely to transform during collisions. They have to work out how temperature affects the movement of particles, the rate of chemical reactions in the glow sticks and that how long the sticks can emit light (produced by the chemical reaction). |
They investigate how the speed of the spreading rate of the food colouring dissolved from the coating of the candies depends on the temperature of water (cold, lukewarm, and warm). | |||
2. | Chemical reactions | Students are given baking soda, tartaric acid, starch, water, a dropper, three spoons and a tile. They are told that baking powder contains these three ingredients/components (baking soda, tartaric acid and starch). When the baking powder meets water, carbon dioxide gas is formed, which ‘puffs up’ the cake. They need to investigate which of the three components is not needed for the chemical reaction. | Is it really worth adding lemon juice to the baking powder before it is put into the cake (according to the advice in some old recipes and popular websites)? Students are explained that carbonates and hydrocarbonates react with acids to produce carbon dioxide. This is why sculptures made of limestone (calcium carbonate) are damaged by acid rain. Students should answer the above question knowing that baking soda is sodium bicarbonate. |
3. | Solubility | Students are explained that water and fat particles do not mix. But particles of dishwashing liquid/soap can mix with both because a part of them is similar to water particles and another part of them is similar to fat particles. The students are given three pieces of paper, grease (lard), clean water, water containing dishwashing liquid, three bowls, ear cleaning sticks and toothpicks. They have to investigate (by modelling the situations) how the greasing of the ducks’ feathers affects whether they can swim or sink in clean water or water containing dishwashing liquid/soap. | What is the solubility of alcohol if it behaves like soap in dissolving the outer shell of the coronavirus? Students look at a diagram showing the double layer of particles that make up the outer coat of viruses and bacteria. They are explained that particles of the double layer protecting the viruses/bacteria are similar to particles of soap in that they have a part that dissolves in water and another part that dissolves in fat. Students read that washing hands with soap or using an alcohol-based disinfectant can prevent infection, as both destroy the outer coating of the coronavirus. Understanding this, they should answer the question above. |
4. | Constitution of solutions | Students are explained that the use of a solution depends on its concentration. For example, a solution of hydrogen peroxide used to dye hair would be too concentrated to cure a sore throat. It would damage our throats, not just kill germs. They are given 9 tablets that dissolve in water to form a hydrogen peroxide solution, dried yeast, dishwashing liquid, lukewarm water, 3 empty beverage bottles (1.5 litres), 4 beakers/glasses, 4 glass rods, 1 spoon, 1 funnel and a ruler. Hydrogen peroxide is known to decompose into water and oxygen, a process accelerated by yeast. The resulting oxygen gas can blow foam from the water containing the detergent. Students need to test whether the hypothesis that the more tablets used, the more foam is formed in the beverage bottles (all other things being constant) is true. | Many pseudoscientific websites recommend drinking hydrogen peroxide solutions to cure everything from AIDS to cancer. Is it advisable to use it INSIDE (orally)? What effect would a hydrogen peroxide solution have on our digestive system? Would you drink it? (You can think of the sodium hypochlorite too, which is also a disinfectant. Is that okay to drink?) |
Students should answer the question by giving their reasoning in the light of the facts presented earlier on the student worksheet and the experiment they have just done. | |||
5. | Separation of mixtures | Students are explained how sea salt is made. They are given three samples (of the same weight) containing different proportions of salt and sand. They will also be given a kitchen scale, water, three beakers/glasses, three glass sticks/spoons (filtering equipment is optional). They must determine which sample contains the most salt. | Students are given examples of the advantages and disadvantages of salt (sodium chloride). They will also learn that our body needs a certain amount of salt every day. But too much can lead to high blood pressure in the long term. They read that infusions/injections have an optimal salt concentration. They should find the links between too high or too low salt concentrations and their effects. |
6. | Conditions of combustion | Students watch their teacher carrying out an experiment. A paper tissue is soaked in a mixture of 50% alcohol and 50% water and then lit. They discuss that the paper tissue itself does not burn because the heat produced by the burning alcohol is used to evaporate the water. Therefore, it does not reach the ignition temperature of the paper. Next, the students carry out experiments on the other two conditions of combustion (combustible substance and supporter of combustion). | Students are explained how the recent increase in the frequency of forest fires is contributing to global warming through the production of carbon dioxide, smoke and soot. They are asked to study a graph showing a cycle of the following quantities: frequency of forest fires, soot on the surface of the polar ice, heat reflected by the polar ice, surface temperature of the Earth, water evaporating from the oceans into the atmosphere and contributing to global warming, temperature in the atmosphere (and the circle is completed by the relationship between the latter and the frequency of forest fires). Students should determine which of these quantities increase and decrease as a result of more frequent forest fires. Finally, they have to decide whether this is a self-accelerating or self-decelerating cyclical process. |
In the absence of institutional ethics committees or local procedures, our research team had to develop its own ethical protocol to ensure that informed consent was obtained and that the privacy and confidentiality of the individuals were protected (Lawrie et al., 2021). A letter describing the key features of the research was drafted in accordance with the General Data Protection Regulation (GDPR) in the European Union and sent to the mother or guardian of each participating student by their chemistry teachers. Only those students’ results whose mothers or guardians gave written consent for their children to participate in the research and for data collection (including the highest educational qualification of the mother or guardian) have been used anonymously for statistical analysis. Photographs were taken of the students carrying out the experiments (based on the worksheets provided) and uploaded to the research team's website (https://ttomc.elte.hu/galleries). However, teachers had to ensure that only students whose mothers or guardians had given their signed consent to share these photos for this purpose were visible. Teachers also had written permission from school principals to participate. Teachers told the students that the test results would not count in their school's chemistry assessment, but that they were participating in a project to improve chemistry education.
The topics were cross-referenced to the curriculum, along with the experimental design tasks given to Group 3 students on the student worksheets (see Table 2). As in other studies (e.g.Hennah, 2019), the activities had to fit the curriculum timetable. An important consideration and limitation in the choice of experiments was that students should be able to carry them out at home if the situations caused by the Covid-19 pandemic made this necessary. Therefore, only materials and equipment were used that the students would have found in the household or could have easily obtained.
Each topic of the student worksheets (see Table 2) has been agreed by all participating teachers. The first version of Student sheet 1 was written by the research group leader and distributed as a sample to the other student sheet authors. These authors were teachers who are members of the research team, working closely with the research group leader. The first versions of each student worksheet were read by four university lecturers (i.e. instructors, who are also members of the research team). They are experts in the development of chemistry teaching materials for primary and secondary school students. The first versions of the student worksheets have been improved by the authors based on the experts’ suggestions. This second version was then proofread by one of the experts and the leader of the research team, who then agreed on the final changes. Only the student worksheets that had undergone this process were sent to the teachers, who used them in their work with the students.
One way to reduce the cognitive load of the students was to use much shorter and simpler experiments than in the previous longitudinal project. Thus, the student worksheets designed for each group were also much shorter. This also made the student worksheets easier and more flexible to use. However, the teacher guides were not short, as they also included versions of the student worksheets describing the student experiments to be carried out at home for each group.
In designing and writing the student worksheets, a number of theoretical guidelines were also taken into account. These activities had to integrate the learning of science content and processes, and include ongoing student reflection and discussion (e.g.National Research Council, 1996), while adopting an inquiry-based approach (Boud and Feletti, 2013). It was also considered that critical thinking can be elicited when students are asked to plan an experiment using provided information and criteria for a good experimental design during a chemical inquiry practical (Brederode et al., 2020). Therefore, a more specific guidance was provided during the inquiry assignments in the hope that it results in higher quality learning products (Lazonder and Harmsen, 2016). The student worksheets included an introduction designed to arouse interest and curiosity. They also provided an important DCK, which, similarly to the study published by Ferreira et al. (2022), was needed to solve the experimental design task and develop the EDS.
Each experimental design task required problem-solving skills and was related to the topic of the lesson. Students in Groups 1 and 2 followed the same sequence of experiments. Unlike the students in Group 3, they were given step-by-step instructions. Group 3 students had to carry out guided inquiry, as the research questions were given on the student worksheets, but they had to design experiments (according to a scheme) to answer the research questions (Table 1). The experiment design tasks were based on the following components of the EDS, as defined and evaluated by Csíkos et al. (2016): identification and control of variables (including the principle of ‘fair testing’, i.e. ‘how to vary one thing at a time’ or ‘holding other things/variables constant’); selection of equipment and materials; and determining the correct order of the experiment steps. In their meta-analysis, Furtak and colleagues (2012) concluded that evidence from some studies suggests that teacher-led inquiry has a greater impact on student learning than student-led inquiry. Thus, Group 2 students conducted the same step-by-step experiments as Group 1 students, but their worksheets included (as part of a scheme) important questions about the design of step-by-step experiments, concerning the components of the EDS mentioned above.
The main difference between the treatment of the second/third year of the previous longitudinal research and the first year of the current four-year project is the level of support. Although according to the classification used by Choo et al. (2011) both projects used soft scaffolding, involving peer and teacher interactions as well as hard scaffolding in the form of student worksheets, the need to further reduce the students’ cognitive load was evident before the current project began (Reid and Amanat Ali, 2020). This was achieved by giving students even more structured problem-solving tasks (Hmelo-Silver et al., 2007). From the second year of the previous project, students in Group 3 were taught the relevant principles of experimental design before they started to design and carry out experiments. However, this resulted in only a temporary increase in measured EDS (Szalay et al., 2021). Therefore, in the present project, a scheme described under the heading “Research questions” was applied in the students’ worksheets. This was the second way to reduce the cognitive load on students, alongside the shorter and simpler experiments. Following the advice of Apotheker (2019), students were specifically trained in the different steps of experiment design used in scientific procedures by the application of the generic scheme. This means that although the sub-steps for solving the problem tasks were not labelled with the goal they serve, as suggested by Catrambone (1998), but the wording of the questions for identifying independent and dependent variables, and constants, is the same across the different student worksheets. The abovementioned specific terms used to name the variables will only be introduced in the second year of the project, to avoid too much new information and too high a level of abstraction in the first year. This was thought to be an appropriate level of difficulty that would provide an intellectual challenge without requiring unachievable performance from students (Reid and Amanat Ali, 2020). Structured group discussions also provided an opportunity to develop the metacognitive engagement required for the reconstruction and reorganization of students’ initial experimental designs, as suggested by Varadarajan and Ladage (2022). This was a deliberate shift from the knowledge to be learned towards how the understandings had developed and related to life around (Reid, 1999). Using the same general scheme, but applied in different contexts, the six student sheets follow the suggestions of Kurtz et al. (2013). They advised to show learners two solved problems with different surface structures but the same deep structure and asking them to compare them. These techniques can reduce the cognitive load on working memory and help learners see the deep structure behind the surface of the problem.
According to Reid and Skryabina (2002) one reason for students’ negative attitudes is that they do not see the subject matter as related to their lifestyle and context. Jung (2005) suggested that the teaching material should present the social context of the topics studied, showing their relationship to cultures and practical ways of doing things. Reid and Amanat Ali (2020), however, warned that systems thinking is complex and without higher working memory capacity (which is fully developed by about age 16), systems thinking (as well as scientific thinking) is highly unlikely. Critical thinking, however, is not so demanding on limited working memory capacity and is possible at a younger age. Therefore, each type of student worksheet (versions for all three groups, for use at school and at home) includes a section under the heading “Let's think!” which puts the knowledge gained in carrying out the experiment into context. Where appropriate, these include some elements of systems thinking, e.g. identifying the dynamic relationships between the components of a system and understanding the cyclic nature of systems, as in Student worksheet 6. In subsequent years, more complex relationships between the different components of systems will be introduced.
According to Cole (2015), the creation of a comprehensive teacher's guide that includes the more significant tips and insights suggested by both the development team and the first classroom testers of the activities provides an additional resource that can help instructors implement each activity in their own teaching practice. For this reason, the tested teacher's guides (which include all versions of the student worksheets, metadata, technical notes and even photographs of the prepared experiments and their results) are published in Hungarian on the research team's website (https://ttomc.elte.hu/publications/92) for download and use by all interested parties.
When planning activities, the cost in terms of time and money remained a consideration. If these constraints were ignored, it is unlikely that the tried and tested activities would be suitable and widely usable after the research project is completed (Boesdorfer and Livermore, 2018; Orosz et al., 2022).
The effects of the two different types of educational interventions used in the quasi-experimental research described above were demonstrated by tests. Because of the importance of developing process skills and content together, disciplinary content knowledge (DCK) was also measured. The main aim, however, was to show how, if at all, experiment design skills (EDS) are changed by the tasks of the six student worksheet, based on a simplified version of the experiment design checklist developed by Cothron et al. (2000). In balancing these two components of the tests, the views of several previous authors have been considered. They have advised that assessments and tasks should go beyond content knowledge and require more than factual recall (e.g.Cooper, 2013; Reed and Holme, 2014; Rodriguez and Towns, 2018; Underwood et al., 2018), which is inevitable when measuring inquiry skills. According to Reid and Amanat Ali (2020), test success should not be related to the working memory capacity of learners. In addition to the research teams’ own experiences in previous projects (Szalay et al., 2020, Szalay et al., 2021), the following three assessment tools found in the literature were also studied for guidance before constructing test questions:
• the assessment tool developed by Chen et al. (2019);
• the evaluation form for Science Olympiad (2020);
• the criteria of measuring scientific thinking skills, summarised by Sirum and Humburg (2011).
None of those three assessment instruments could be used directly in the present study, as they were designed for older students with more advanced science knowledge and skills, but they still provided valuable support. Although in this study the development of systems thinking skills was not measured (as its components were used only for motivational purposes in conjunction with contextual learning), the assessment tool developed by Chen et al. (2019) was studied. That included open-ended questions to explore how university students connect and translate their conceptual representations when engaged in contextual problem solving. The second assessment instrument, the evaluation form for Science Olympiad (2020) is (understandably) very detailed and goes well beyond the needs and possibilities for evaluating the results of the present research. However, it has provided a good starting point, as it includes sections on correctly identifying the independent and dependent variables, the controlled variables (constants) and judging the quality of the description of the procedure. Six criteria of measuring scientific thinking skills, summarised by Sirum and Humburg (2011), were also considered important. Activities should not be time-consuming; be based on a practical challenge from a ‘daily life’ problem to increase student participation and effort; require minimal quantitative skills; be open-ended to explore student thinking, i.e. not multiple choice, easy and consistent to score; provide quantitative measurement. However, the scoring rubric for their experimental design skills test was designed for undergraduate students in an introductory biology course. This meant that only the following elements could be used in the present research: identification of independent and dependent variables, description of how the dependent variable is measured, choosing of which variables should be held constant.
The tasks had to be different in each test. This is because the use of the same instrument in a pre- and post-test environment can invite repeated testing effects during which students may remember correct answers (Cannady et al., 2019; Szalay et al., 2020; Szalay et al., 2021).
Test 0 and Test 1 included DCK and EDS tasks (see their English translations in Appendix 1 and 2). Each test consists of eighteen compulsory tasks (items), each worth 1 point. Nine were used to assess the EDS. The other nine were used to assess DCK, with three each for recall, understanding and application. The results were analysed to assess the impact of different types of treatments on EDS and DCK.
The EDS tasks had to be set in the context of everyday life. The main goal of the research continued to be to develop transferable EDS (Szalay et al., 2020; Szalay et al., 2021). The approach used in the present assessment tools was previously adopted by Cannady et al. (2019), supported by other authors (e.g.Zimmerman, 2000 and 2007; Tosun, 2019). It was also used in the PISA assessment of science literacy competences (OECD, 2017). The idea is that tasks should integrate content that learners are familiar with and focus on the ability to apply scientific practices. Again, the advice of Cannady et al. (2019) was followed, that it is important to incorporate the content knowledge necessary to solve the problems into the assessment of application the scientific practices.
The following tasks were used on Tests 0 and 1 to compare the development of students’ EDS across the three groups.
(a) How many ice cubes do they need for the experiment?
(b) How many pieces of cloth are needed for the experiment and what should they look like?
(c) Where should the children put the ice cubes?
(d) How should the children place each piece of cloth in the case of each ice cube?
(e) What do the children need to observe to decide which of them was right?
(f) Put a (+) sign in front of the statement(s) in the list below that are important and a (−) sign in front of the statement(s) that are not important. (You can write a different sign after a clear strike-through if you change your mind.)
• The ice cubes must be the same size.
• The ice cubes must be taken out at the same time, from the same freezer.
• The ice cubes should be placed directly next to each other.
• The pieces of cloth must be of the same type and thickness.
(a) Which of the materials available at home (and mentioned above) should be put into each glass during the experiments?
(b) How should the contents of each glass be compared to the chocolate sauces that are made from different concentrations of rum bought from the shop?
(c) What should be changed in each experiment because of your answer to question b) above?
(d) Which of the properties of the contents of the glasses should be tested in each experiment?
(e) How can you test the property of the contents of the glasses in your answer to d)?
(f) How can you decide which rum to buy based on the experience of the experiments?
(g) Put a (+) sign in front of the statement(s) in the list below that are important and a (−) sign in front of the statement(s) that are not important. (You can write a different sign after a clear strike-through if you change your mind.)
• The glasses must be of the same shape and volume.
• The liquids must be the same temperature.
• The volumes of the liquids must be measured with equal accuracy.
Each student was given 40 minutes to complete Test 0 and 40 minutes to complete Test 1. The students were coded so that teachers would know their identity and gender, but the researchers only received anonymous data coded for statistical analysis. These codes are used throughout the project. Participating teachers marked the students’ tests, recording the marks in an Excel spreadsheet as instructed (see “T0 test and instructions for teachers” and “T1 test and instructions for teachers” that are available on the research team's website (https://ttomc.elte.hu/publications/92). As there was an element of subjectivity in the grading protocol, the research group tried to standardise the grading to ensure that the application of the rubric is the same for each test, as done by Goodey and Talgar (2016). An experienced chemistry teacher reviewed all the teachers’ marking and suggested modifications to the marking instructions. After discussions within the team, alterations were made. Based on these, the teachers’ marks were changed to ensure that a unified marking process, free from individual teachers’ decisions was used.
In terms of validity, it is better to use the assessment tasks similar to those used for the trials of the intervention methods under investigation. However, the tasks needed to be different in each test for three reasons. The chances of the successful solution of a task would be higher if it was used the second time, since students might discuss it with others in between times. (This could have caused construct-irrelevant easiness.) Secondly, the goal of the research is to develop experimental design skills that may be applied under different circumstances than when the intervention happened. It was necessary, therefore, to show that the transfer has happened successfully. Thirdly, the EDS tasks had to be put into contexts relevant to the previously gained knowledge, understanding that this increases by the time. The test scores of Groups 2 and 3 were compared with those of Group 1 (control group) to eliminate the risk of maturation (Shadish et al., 2002).
The definition of test content given by the Standards for Educational and Psychological Testing (American Educational Research Association, 1999) and interpreted by Arjoon et al. (2013) was applied. Evidence for content validity was established by a panel of domain experts judging whether the items appropriately sample the domain of interest (Crocker and Algina, 2006). Evidence for content validity can be used to argue against construct underrepresentation that is one of the main threats to construct validity (Wren and Barbera, 2013).
To avoid construct-irrelevant variance, only content given for Grade 5 and 6 in the chapter titled ‘Human in nature’ of the National Curriculum of Hungary (2012) could be assessed on the Test 0, since the participating students had only just started to learn chemistry in the beginning of the project. Taking the relevant curriculum and the content of the textbooks into account, the first version of Tests 0, including marking instructions, was made in Hungarian (Lewis, 2022) by the research team leader. Tasks intended to measure DCK were based on the DCK tasks used in the test in the beginning of the previous project (Szalay et al., 2020; Szalay et al., 2021) and corrected, considering the analysis of earlier experiences reported in a pre-service chemistry teacher student's theses work. An experienced science teacher of Grade 5 and 6 students, who is not a member of the research team, checked the content, the wording, the format and the assumed correct and incorrect answers given in the detailed marking instructions and suggested modifications. The corrections were made accordingly before the university educators in the research group (the same ones who checked each student's sheet) were asked to give their opinion on the test and its marking instructions. Further improvement was made following their suggestions.
The first version of Tests 1 and its marking instructions was also made by the research team leader. Each task of Test 1 could be completed after finishing the tasks on the six student worksheets for the first school year of the present project. Table 3 shows how each task of Test 1 can be matched with the relevant content of certain student sheet(s). Then the same university educators in the research group who checked the content of T0 test checked the T1 test and its marking instructions. Corrections were made according to their suggestions.
No. of task in Test 1 | No. of student sheet and topic |
---|---|
1. a–b | 3. Solubility |
2. a–g | 4. Constitution of solutions |
6. Conditions of combustion | |
3. | 6. Conditions of combustion |
4. a–b | 2. Chemical reactions |
5. a | 4. Constitution of solutions |
5. b | 5. Separation of mixtures |
6. | 4. Constitution of solutions |
7. | 1. The particle model of matter |
This process of item evaluation and revision took place for all items of both tests. Expert feedback on item content, wording, and consensus of the correct answer are all sources for evidence of expert response process validity and against construct-irrelevant variance, both construct irrelevant difficulty and easiness (Wren and Barbera, 2013).
Test 0 was tried with two classes. (N1 = 30, N2 = 32, altogether 62) of 12–13 years old students not participating in the research in the autumn 2021. Test 1 was also tried with the same two classes (N1 = 30, N2 = 32) in April 2022. The chemistry teacher organising this pilot and correcting each test gave detailed suggestions how to improve the wording of the tasks and the marking instructions based on her experiences (i.e. what her students found difficult to interpret and what kind of answers they gave). Both tests and their marking instructions were further revised in response to results of the trial before they were filled in by the students participating in the sample.
Participating teachers had not seen the Test 1 before piloting the six student worksheets of the school year. This was to avoid tasks on Tests 1 influencing the pilot.
The following data were collected and analysed statistically:
• Student total scores (marks) for Tests 0 and Test 1.
• Student scores for EDS tasks Tests 0 and Test 1.
• Student scores for DCK tasks Tests 0 and Test 1.
• Gender of the student.
• School ranking. The student's school ranking amongst Hungarian secondary schools, according to the website “legjobbiskola.hu”. The participating schools were grouped into high, medium, and low-ranking categories and a categorical variable was used according to these three levels (Appendix 3, Table 13). This allowed a statistical assessment of the impact of participating schools ‘quality’ on the development of the students’ knowledge and skills.
• Mother's education. Two categories were formed depending on whether or not the student's mother (or guardian) had a degree in higher education. This categorical variable was intended to characterise the student's socioeconomic status. (In Hungary, children are likely to spend much more time with their mother than with their father during the formative years. This means that of the two parents, the mother's knowledge, skills and attitudes are likely to have a greater influence on the growing child.)
Cronbach's alpha values (Cronbach and Meehl, 1955) for the two tests were acceptable: 0.736 for T0 test and 0.696 for T1 test. Cronbach's alpha values of T0DCK was 0.483 (poor); T0EDS = 0.775 (good); T1DCK: 0.516 (poor); T1EDS: 0.634 (fair). The relatively low values in case of the DCK tasks are partly explained by the small number of items (9 items only for both tests). However, no more items could be included in a 40 min test constructed for this age children.
Statistical analysis of data was done by the SPSS Statistics software. ANOVA and ANCOVA analyses were also performed. Raw mean scores (before ANCOVA analysis) and their standard deviations (SD) for the three groups were calculated for both the T0 and T1 tests in the whole test (‘total’), the DCK tasks and the EDS tasks. The effect of the intervention on the development of the experimental groups (Group 2 and Group 3) was shown by the Cohen's d effect size (Cohen, 1988). The Cohen's d effect size values were calculated taking into consideration the means and standard deviations of the difference between the two test scores (T1-T0).
Although the Cohen's d effect size can be used to characterise the effect of development, it was assumed that apart from the three types of instructional methods used during the intervention for Group 1, 2 and 3, other hypothesised parameters (school ranking, mother's education, gender) and a covariate (prior knowledge, i.e. student scores for T0 test) had also influenced the results. Therefore, the statistical analysis of data was also accomplished by analysis of covariance (ANCOVA) to examine the effect in more detail. Effect sizes in the ANCOVA analysis were characterized by the calculated Partial Eta Squared (PES) values. In the case of multiple comparisons Bonferroni correction was applied. While testing the differences among groups and sub-groups, a significance value of p < 0.05 was applied. However, a significance value of p < 0.025 was used in the comparison of the results of Test 0 and Test 1 (according to the Bonferroni correction).
Table 4 shows the raw mean scores, prior to ANCOVA analysis, and their standard deviations (SD) for the three groups for the T0 test for the whole test (‘total’), the DCK tasks and the EDS tasks. High standard deviations show that the sample was very heterogeneous according to their knowledge and skills as measured by the tests.
Similarly, Table 5 shows the mean raw scores and their standard deviations for the three groups for the T1 test. In all cases, the average raw scores for T1 were lower than for T0 tasks. This is understandable, as the knowledge and skills measured by T1 exceeded those measured by T0. (The two tests contained different tasks for the reasons explained earlier under the heading “Validity”).
ANOVA analysis revealed no significant difference between groups in the performance of either T0total [F (2, N = 890) = 1.461, p = 0.233)], or T0DCK [F (2, N = 890) = 0.920, p = 0.399)], or T0EDS [F (2, N = 890) = 1.040, p = 0.354)]. There is, however, a significant difference between the performance of groups in T1total [F (2, N = 890) = 10.306, p = 0.000)], T1DCK [F (2, N = 890) = 4.846, p = 0.008)] and T1EDS [F (2, N = 890) = 12.329, p = 0.000]. The achievement of the Group 3 students exceeded that of the other two groups in the end of the first year (Grade 7) of this project. Whereas Group 2 performed worse in T1 test than the control group (Group 1) and the other experimental group (Group 3).
For further analysis, the dependent variable was the difference between the two test scores (T1–T0). The means and standard deviations of each group are shown in Table 6.
Group | T1total–T0total (SD) | T1DCK–T0DCK (SD) | T1EDS–T0EDS (SD) |
---|---|---|---|
Group 1 | −2.42 (3.62) | −1.19 (2.15) | −1.23 (2.61) |
Group 2 | −2.90 (3.56) | −1.61 (2.15) | −1.29 (2.75) |
Group 3 | −1.18 (3.55) | −0.94 (2.20) | −0.24 (2.73) |
The ANOVA analysis also showed significant differences between groups in terms of the differences of scores between the two tests T1total–T0total [F (2, N = 890) = 17.966, p = 0.000)], T1DCK–T0DCK [F (2, N = 890) = 7.278, p = 0.001)] and T1EDS–T0EDS [F (2, N = 890) = 13.817, p = 0.000)]. In Group 3, the decrease in scores was significantly smaller than that of the other two groups, while in Group 2 it was significantly larger than that of the other two groups. Based on the means and standard deviations of the differences between the two test scores (T1–T0), Cohen's d effect size values were calculated that are presented in Table 7. These also clearly show that Group 3 developed better, and Group 2 developed less well than Group 1.
Group | Cohen's dtotal | Cohen's dDCK | Cohen's dEDS |
---|---|---|---|
Group 2 | −0.13 | −0.20 | −0.11 |
Group 3 | 0.34 | 0.11 | 0.28 |
Comparing the results of the two experimental groups, the change in performance of Group 3 students was found to be significantly better than that of Group 2 students on both DCK tasks (Cohen's d: 0.31) and EDS tasks (Cohen's d: 0.38).
Previous experience had shown that students’ performance can depend on several factors, not only on the intervention. Therefore, an ANCOVA analysis was conducted with test scores as the dependent variable. Group (the type of instruction methods), school ranking, mother's education and student's gender were the parameters. The covariate was the student's prior knowledge (T0 test scores). Partial Eta Squared (PES) values characterising the effect sizes are shown in Tables 8–10.
Initially, it was mainly the school ranking and, to a lesser extent, in the DCK tasks, the mother's education that had a significant effect on the scores in the whole test and the sub-tests (DCK and EDS). After the intervention, four factors were found to be important: group (type of instruction methods), school ranking, mother's education, and prior knowledge. Of these, prior knowledge had the largest effect size (PES) in the whole test and EDS tasks, whereas it had less effect in the DCK tasks than the school ranking.
In the end of the school year (in the T1 test) the type of instruction methods (‘Group’) and school ranking appear to have similar effect sizes for the whole test (Table 8). School ranking, however, has more effect in the DCK tasks (Table 9) than in the EDS tasks (Table 10), while the instruction methods (‘Group’) appear to have more effect on performance in the EDS tasks than in the DCK tasks.
The relative estimated average scores (ratios of the estimated mean scores of the experimental groups compared to that of the control group's) for the whole test and for the sub-tests in the beginning of the present project (Grade 7, T0) are shown in Table 11 and in the end of first school year (Grade 7, T1) in Table 12. These data show that the EDS changes in Group 2 were essentially identical to those in the control group. However, a marked increase was observed in Group 3.
Ratio | T0total | T0DCK | T0EDS |
---|---|---|---|
Group 2/Group 1 | 1.01 | 1.01 | 1.00 |
Group 3/Group 1 | 0.97 | 0.97 | 0.96 |
Ratio | T1total | T1DCK | T1EDS |
---|---|---|---|
Group 2/Group 1 | 0.95 | 0.91 | 0.99 |
Group 3/Group 1 | 1.12 | 1.04 | 1.19 |
The means estimated by the model of the ANCOVA analysis (absolute scores and percentile performances) and the significance of their differences are shown in the Appendix 3, Tables 14–17 for the whole T1 test and its sub-tests. According to the data in Table 14, Group 2 performed significantly worse than the other two groups in the DCK tasks (and consequently in the whole test). On the EDS tasks, however, Group 3 significantly outperformed the other two groups. Therefore, the use of the scheme before designing the experiments in this project seemed to have helped Group 3 to achieve better results in the EDS tasks. However, answering the questions of the scheme after doing the step-by-step experiments did not seem to help the development of the Group 2 students’ experimental design skills. Group 3 findings seem to support that adequate and appropriate scaffolds should be provided for students coming from a traditional teaching style to successfully complete an investigation task based on inquiry-based learning (Seery et al., 2019). These results are also in line with Bredderman's data (1983), who reported that the use of inquiry-based methods had a greater effect on science process than on science content. However, showing learners solved problems with different surface structures but the same deep structure (Kurtz et al., 2013) concerning experimental design did not seem to increase the Group 2 students’ EDS, as measured by the tests.
In the first year of the present project, school ranking had a significant positive effect on students’ DCK scores (Appendix 3, Table 15). However, it is interesting to note that students from medium-ranking schools did not achieve significantly better scores in the EDS tasks than students from low-ranking schools.
Students with a graduate mother scored significantly higher than others, both in the beginning and in the end of the first year of this project in the whole test and the sub-tests (Appendix 3, Table 16). However, there was not any significant difference found between boys’ and girls’ performances (Appendix 3, Table 17) in any type of the test scores.
The answers to the research questions are as follows.
RQ1: In Grade 7 the intervention had a significant positive change in Group 3 students’ ability to design experiments (Experiment Design Skills, EDS) compared to the control group (Group 1), as measured by the tests (Cohen's d: 0.28). It can be reasonably assumed that this was caused by the fact that in the first year of this project an experimental design scheme was provided on the Group 3 students’ worksheets with questions helping the experimental design. In contrast, the change in the performance of Group 2 students on the EDS tasks was slightly worse than that of the control group's (Cohen's d: −0.11), but this effect was not found to be statistically significant. ANCOVA analysis also showed that there was no statistically significant difference in the change in performance between Group 2 and Group 1 on the EDS tasks.
RQ2: In terms of the disciplinary content knowledge (DCK), answering the questions of the scheme on the student worksheets helping to learn experimental design after doing the experiments (Group 2) seemed to have a negative effect (Cohen's d: −0.20). Filling in a similar scheme before carrying out the designed experiments (Group 3) had no significant effect on the scores the students achieved on DCK tasks. (Cohen's d: 0.11). Comparing the results of the two experimental groups, the change in performance of Group 3 students was found to be significantly better than the change in performance of Group 2 students on the DCK tasks (Cohen's d: 0.31). It is possible that Group 2 students found it unnecessary and/or boring to fill in the scheme after they finished the experiments. Perhaps Group 2 students viewed the scheme as not related to experimental design because the experiment had already been designed for them. Assuming the latter happened, it could have reduced their enthusiasm to solve the DCK tasks in T1 test considerably compared with Group 3 students.
RQ3: There was a statistically significant difference found between the average scores of the students of the experimental groups considering the extent of the development in the experimental design skills. The change in performance of Group 3 students on the EDS tasks was significantly better than the change in performance of Group 2 students (Cohen's d: 0.38). The relatively high positive development of the Group 3 students’ EDS in Grade 7 compared to Group 2 students’ might have been caused by the different treatments of the two groups. Group 2 students did not have to plan experiments. Those classes had to discuss with their teacher why the experiments were planned as they were (according to the questions of the scheme on their student sheets). In contrast, Group 3 students had to design experiments, in teams, while they were answering the questions of the scheme helping to learn experimental design. These results do not seem to support Furtak and colleagues’ findings (2012) that teacher-led inquiry has a greater effect on student learning than student-led inquiry. (However, these results could have been due to reasons other than the type of teaching method used.)
The ANCOVA model calculations show that the mother's education has a weak significant effect on the development of the experimental design skills in this project. However, the gender did not seem to have any significant effect on the achievement in any type of the test scores.
School ranking is an important parameter according to the present results and its effect was significant on T1 scores. In the first year of this project, ranking of the student's school influenced mainly the results of the DCK sub-test (T1DCK), whereas the type of instructional method (‘Group’) influenced mainly the scores gained on the EDS tasks (T1EDS). It is surprising though that students from medium-ranking schools did not have significantly higher mean scores on the T1 test experimental design tasks than students from low-ranking schools.
The instruments used (40 minutes paper-based tests) could only provide a limited picture of how students benefited from the interventions. It is possible that other instruments would have shown different effects.
Performance on any assessment is at least partially driven by the students’ motivation for success on the measure and test taking abilities (Cannady et al., 2019), and probably not all students found the contexts of the tasks equally interesting. In addition, T1 test was completed at the end of the school year, when many of the students might have known their end-of-school-year grades, which also could reduce their motivation to perform well.
No single study can evaluate every variable and every theoretical relationship underlying an instructional model (Mack et al., 2019). The interpretation of results that showed an improvement on just one measurement, could be due to statistical noise or is something peculiar to that measurement. For example, the personalities that make up the classroom community and their relationship with each other influence the atmosphere in the classroom and the attitude towards the subject. The teacher's personality and how well they get along with the class are also complex factors that are impossible to take into account at the time of a study like this. There are many other random events that can affect the final data. Although the large sample size should compensate for most of these, one can never be sure (Lawrie, 2021).
Social variables, students’ prior knowledge and ‘school effects’ (including the teacher's effect), which the literature (Snook et al., 2009) considers as variables affecting students’ performance, were also found to be important in this research.
Since knowledge will be more widely applicable if its applicability is included in the learning objectives (Csapó, 2022), the new worksheets for Group 2 students will include the following sentences: “In science, evidence is gathered through well-designed experiments. To avoid being misled by pseudo-scientific hoaxes, it's good to understand how to design an experiment correctly. To do this, answer the following questions.” On the other hand, Group 3 students’ worksheets will contain the statement in this form: “In science, evidence is gathered through well-designed experiments. To avoid being misled by pseudo-scientific hoaxes, it's good to understand how to design an experiment correctly. Your answers to the following questions will help you do this.”
The basics of correct terminology (independent and dependent variables, constants, hypotheses, control experiment) will be introduced in the student worksheets in both experimental groups from Grade 8. It is also evident that students need to understand why experiments and experimental design are important in science. Therefore, the research group has to make more effort in this aspect.
Students will need motivation to do these inquiry tasks. Therefore, finding interesting contexts remains a challenge. Chemistry topics that impact the environment and human health accompanied by the systems thinking to address global challenges (e.g.Mahaffy et al., 2018) will be applied in the coming years too.
If this research were to yield positive results in the long term, it could be a step forward on the long road to realising the old dreams of the great reformers of science education.
Ranking | Low | Medium | High |
---|---|---|---|
a Number of students completing Test 0. | |||
School ranking | 230. 246. 253. 284. 295. 374. 463. 476. 500. | 106. 128. 134. 169. 170. 171. | 1. 3. 11. 17. 20. 25. 28. 30. 35. 37. |
N | 304 | 273 | 354 |
Footnote |
† Each test contained questions concerning the students’ science or chemistry grade and his or her attitude toward science or chemistry and the scientific experiments. However, the analysis of the students’ answers of the attitude questions is not included in the present study. |
This journal is © The Royal Society of Chemistry 2023 |