Benefits of desirable difficulties: comparing the influence of mixed practice to that of categorized sets of questions on students’ problem-solving performance in chemistry

O. Gulacar *, Arista Wu , V. Prathikanti , B. Vernoy , H. Kim , T. Bacha , T. Oentoro , M. Navarrete-Pleitez and K. Reedy
Department of Chemistry, University of California-Davis, One Shields Avenue, Davis, California 95616, USA. E-mail: ogulacar@ucdavis.edu

Received 9th December 2021 , Accepted 25th January 2022

First published on 25th January 2022


Abstract

The questions in the practice assignments given to students in the form of worksheets or other formats are often grouped by chapter, topic, or concepts. There is a great emphasis on categorization. Most of the end-of-chapter problems in chemistry textbooks are organized by sections. Although this was done with the intention of helping students navigate the assignments more easily and practice in order, it is not what they are expected to do during the tests. There is a mismatch between what they practice on and how they are tested. The goal of this study is to examine the influence of the structure of the assignments on students’ problem-solving performances. Two groups of students from chemistry classes were recruited to participate in this study. Each group had the same length of practice and identical questions with only one difference. The experimental group had assignments with mixed organization of questions, while the control group had traditional assignments with the questions organized around chapters and topics. Students completed three two-hour long problem-solving sessions during the weekends. Evaluation of their progress consisted of their solutions obtained from one pre-test and three post-tests, with one given after each problem-solving session. The study revealed that students in the experimental group increased their problem-solving success more than those in the control group starting from the first intervention. The achievement gap widened as the study progressed. It is recommended that educators and textbook publishers create and utilize assignments that contain more mixed questions on different topics and chapters.


Introduction

In the 21st century, problem solving has become more prevalent than ever before and as such, numerous research projects have focused on finding ways to improve students’ problem-solving abilities across any university curricula (Gabel and Bunce, 1994; States, 2013; Kivunja, 2014). STEM courses, especially at the undergraduate and postgraduate level, have emphasized problem solving due to its importance in accomplishing everyday tasks and being successful in academia and industry, where employees are expected to develop new knowledge on a regular basis and adapt to quickly changing technology (Candy and Crebert, 1991; Jonassen et al., 2006; Kennedy and Odell, 2014). General chemistry curricula expect students to not only understand concepts, but also apply them to unfamiliar problems (Nurrenbern and Pickering, 1987; Bodner and Herron, 2002; Bodner and Bhattacharyya, 2005).

Problems and problem solving, as a method, are not only used in assessments to measure students’ level of comprehension, but also give them a chance to self-reflect on their progress and identify ways to better connect the knowledge pieces to come up with a successful solution (Kauffman et al., 2008). One of the main facets of constructivist theory deals with the fact that each “learner” individually constructs their understanding of a phenomenon or problem while they learn (Bretz, 2001). The role of the educator is then to facilitate the process and guide their students to construct their knowledge based on their beliefs, talents, and perceptions of things they interact with. Even though there is enormous data proving that the mind is not an empty vessel to be filled by instructors (Yager, 2000), some educators do not take this important principle into consideration while developing interventions to improve students’ understanding of concepts and problem-solving achievement. In addition, those educators hold students responsible for their failure without utilizing research-based effective strategies in classrooms. In these environments, there is also a strong belief assuming that students will learn successfully if they listen to their professors well (Su, 1991). Although paying attention is a necessary step in receiving knowledge before processing it, it is only one step (Su, 1991). There are many other factors that need to be investigated to reveal the source of the challenges. Research methodologies guided by constructivist theory aim to explore the influence of all possible effects and accompany problem-solvers’ journeys in their minds through interviews such as think-aloud protocols to bring hidden issues to light (Hardiman et al., 1989; Yager, 2000; Gulacar et al., 2020).

The constructivist theory also recommends getting away from rote memorization and developing conceptual understanding that necessitates the examination and discussions of concepts at a finer level, considering all dimensions to create a knowledge system in which every piece makes sense (Hein, 1991). In chemistry, physics and mathematics education, it has been consistently observed that practice problems often have an emphasis on the quantitative aspect, algorithmic side, of problem solving rather than the qualitative facet, conceptual side (Nurrenbern and Pickering, 1987; Moseley, 2005; Rohrer and Taylor, 2007). In a study conducted by (1995), it was determined that while students were able to solve problems related to Boyle's law or Charles's law, about two-thirds of them were unable to explain critical aspects of gas behaviours. Students tended to simply use the “plug and chug” method by using the surface elements to solve problems without understanding underlying principles involved in the questions (Chi et al., 1981; Schoenfeld, 1982; Hegde and Meera, 2012). Although promoting conceptual understanding is important and needed, students’ algorithmic problem-solving skills should not be neglected at the same time. This idea is based off of studies showing that students who receive instruction focusing purely on the conceptual aspect of topics are also less likely to be able to answer “traditional” numerical problems, meaning that simply teaching concepts is not always enough for a student to be able to approach other kinds of problems introduced in academia (Hardiman et al., 1989; Pickering, 1990). In other words, it is necessary that students are given the opportunity to improve their conceptual understanding and sharpen algorithmic problem-solving skills at the same time.

There are different methods adopted by educators to accomplish this important and challenging goal. For example, in biology education, the Biological Sciences Curriculum 5E (BSCS 5E) model exists to structure teaching activities that put the students in the center of instruction and guide them to move towards becoming an expert. It has a notion that inquiry is divided into five stages, which vary depending on the study. The model lists these stages as Engagement, Exploration, Explain, Elaboration, and Evaluation (Bybee, 2014; Pedaste et al., 2015). All the steps in this method mainly focus on improving students’ conceptual understanding and guiding them to determine the underlying principles in the given questions. In other words, this method and similar ones, including interleaved practices, aim to transform students’ problem-solving ability from looking for simple surface elements and familiar terms in the given question to the exploration and the identification of concepts and underlying principles of the questions to develop successful solutions more efficiently (Hardiman et al., 1989; Rohrer et al., 2014; Persky and Robinson, 2017). Specifically, interleaved practices have proven to be successful in guiding students to evaluate the questions as an overall task, which is known as an expert characteristic, instead of focusing on steps individually (Gerson, 2001; Rohrer et al., 2014; Persky and Robinson, 2017).

In order to promote expert attributes and help students think like experts, it is also imperative that educators use techniques that improve their long-term retention of information learnt (Balch, 1998). Existing literature shows that introducing a structure that makes performance difficult, commonly referred to as desirable difficulties, during practice improves long-term retention (Docktor et al., 2012; Brown et al., 2014). It is believed that this aspect is one of the reasons that makes interleaved practice successful in helping students retain the learned information longer and retrieve it more effectively when needed, e.g., in solving problems (Rohrer et al., 2014).

Interleaved practice can take on a variety of meanings and applications, but the most consistent aspect is its contrasting structure to blocked practice (Janice Lin, 2013). In blocked practice, students solve multiple problems of the same category or “block” before moving on to a different block. Interleaved practice stresses the breaking up and reshuffling of these blocks in a student's practice set (Goldin et al., 2014). These breaks in the blocks may be represented as time breaks between study sessions, the studying of different subjects entirely, or learning about various topics within the same subject. This is also in stark contrast to today's education system, which guides and encourages students to practice chemistry problems in “blocks” at a time. When examined, it becomes obvious that end-of-chapter problems in General Chemistry textbooks (Petrucci et al., 2011; Zumdahl and DeCoste, 2017; Brown et al., 2018) are organized by section. For example, stoichiometry questions are usually broken down to eight or nine sections, each focusing on a separate subtopic such as limiting reagent, percent yield, and mole concept. Similarly, professors traditionally assign homework as a set of questions categorized by chapter or section with clear headings and descriptions for repetition and reinforcement (Sayan and Mertoğlu, 2020). These are all examples of blocked practice and promote the use of the aforementioned “plug and chug” techniques adopted by students. Students memorize solutions they found in the exercises and try to apply them to the questions given on the exams. The studies show that although students do well with questions involving single topics or a simple combination of these topics, they fail to generate correct solutions to the problems involving the topics of the same chapter or across chapters, especially with complex connections (Gulacar and Fynewever, 2010; Gulacar et al., 2014).

When it comes to implementing an interleaved study plan, there are many ways of preventing the traditional “blocks” of study material. In one particular study conducted by Rohrer and Taylor (2007), students were assessed across three categories, massed versus spaced, light versus heavy, and blocked versus mixed. While our study was primarily interested in blocked versus mixed version, the results from massed versus spaced was relevant in the design of our study. Massed versus spaced dealt with the distance in time intervals between study sessions, with students in the massed group studying all the content in one week compared to the spaced group covering the material in two weeks. The results showed that the spaced practice group vastly outperformed the massed group, indicating that spaced practice is a more effective study strategy than massed practice. Rohrer and Taylor (2007) also investigated the success of the blocked versus mixed groups, where students from the blocked group practiced 4 similar problems in a row followed by another 4 similar problems, whereas the mixed groups had their questions shuffled. It was determined that the students in the mixed group scored worse on practice assignments, but higher on exams. The higher practice scores in the blocked group could be due to students’ success with using equations and schema adopted from similar questions, which are fundamental steps followed in the plug-and-chug method (Manji, 1995; Ellis et al., 2015).

Similar to the studies conducted in mathematics education, the research that examined the efficacy of interleaved practice in chemistry revealed positive impact on students’ problem-solving performance. Eglington and Kang (2017) compared students’ success in organic chemistry four times using interleaved versus blocked practices. In their research, a special emphasis placed on the role of problem difficulty, problem similarity, and number of categories in students’ performance. In their first experiment, the interleaved practice group showed stronger levels of categorization, as in, students were able to identify categories more accurately for themselves. The second experiment was similar and included categories that were less distinctive and generally had overlapping concepts; the interleaved practice group scored higher once again. In the third and fourth experiments, Elington and Kang (2017) highlighted various characteristics of practice problems in red. This was done to give both interleaved and blocked groups hints that might identify the categories of the respective problems. They determined that the students’ ability to identify categories did not change when they highlighted these characteristics in red. However, in all four cases, the interleaved group scored better on assessments compared to the blocked group. Another study carried out by Herrington (2021) tested high school chemistry students and added more visual and conceptual components. Consistent to prior studies, there was an interleaved group and an experimental group. The interleaved group scored higher than the blocked group, displaying the effectiveness of interleaved practice outside of computational problems.

Due to lack of the studies investigating the effects of interleaved practice in General Chemistry course, this study was designed to determine how the structures of practice assignments, mixed versus categorized, influence students’ success with problems. It should be noted that in order to create the instruments used in interventions, a wide range of topics across chapters including chemical equilibrium, acid–base equilibria, and thermodynamics were selected, rather than using questions from the same chapter.

Eglington and Kang (2017) also studied the effect of explicitly sharing the underlying content of the questions with both interleaved and blocked practice assignments. However, they found no significant differences on the influence of including such information on students’ problem-solving achievement.

Differently from the Eglington and Kang’ study (2017), in our study, the questions in the blocked set, labeled as the categorized assignment, had headings and short descriptions while the questions in the mixed group assignments did not have any kind of information, which might reveal the topics. For example, if the problem requires the use of percent yield, a subtopic of stoichiometry a common chapter covered in General Chemistry, the question will have some identifier in practice assignments given to the students in the categorized group that reflects either the topic, chapter, or a combination of the two. Although there are many research projects that focus on different aspects of interleaved learning, such as the neural basis of the contextual interference (Lin et al., 2011), the role of interleaved learning in solving multi-digit subtraction problems (Nemeth et al., 2019), the effect of blocked practice compared to interleaved practice on knowledge retention (Schorn and Knowlton, 2021), to date, there has been lack of attention to the role of interleaving categories in this specific way in chemistry.

Methodology

Research questions

This study aims to examine the influence of the structure of the assignments, categorized or mixed, on students’ problem-solving performances in chemistry. Given that most textbooks and instructors tend to organize and classify the content for students to navigate the knowledge base, this study is critical to provide empirical data on the benefits and shortfalls of the traditional method and compare them to the use of mixed strategy. The study was guided by the following research questions:

• How do mixed versus categorized practice assignments influence students’ problem-solving performance?

• How does the method affect different groups of students’ problem-solving success?

As part of the second question, the role of achievement level in a previous course and ethnic background were examined. These variables were found to be correlated at different levels with students’ academic success in different studies (Ascher, 1985; Macphee et al., 2013; Veloo et al., 2015; Gulacar et al., 2019).

Participants and settings

The participants for this study were recruited from the students taking the second course in the General Chemistry series offered in Winter 2020 at a research university located in northern California. For the purpose of recruiting participants, an email with a participation request and a link to a survey including questions about the factors listed below was sent to all the students enrolled in the course per given quarter, 801 to be exact. 79 of them filled out the survey and took the pre-test. Then, these students were equally split into experimental and control groups (Szu et al., 2011). All 79 students took the course with the same instructor thus having equivalent curriculum and identical homework problems. They also took the same course examination that helped us categorize them as A, B, and C students. To avoid any kind of prejudice, students were randomly assigned to each group. Although some of the students attended the first session, they were unable to participate in the subsequent sessions. Therefore, their data were removed from the analysis. Table 1 summarizes the attributes of the remaining students.
Table 1 The number of participants in each sub-group
Attribute Experimental group (17) Control group (14) General student profile
a No data was available.
Grade in gen Chem I A 3 3 a
B 7 4 a
C 7 7 a
Ethnic Background White 5 (29%) 4 (29%) 22%
Asian/Pacific 9 (53%) 7 (50%) 28%
Latino/Hispanic 3 (18%) 2 (14%) 26%
Other 0 1 (7%) 1%


The research team had received Institutional Review Board (IRB) approval before the project was initiated. All the participants voluntarily agreed to be in the study and consented through electronic forms to share their solutions and written thinking process on the assignments. To provide anonymity and protect privacy, every participant was assigned a random number to be put on practice assignments. In addition, the surveys did not contain questions with possible identifying characteristics.

Instruments and design

The study utilized a semi-experimental design. To better examine the effects of blocked versus mixed practice on students’ problem-solving achievement, participants were grouped as experimental and control groups. Each group had the same length of practice and same set of pre-test and post-tests. In the period of the study, as visualized in Fig. 1, three in-person problem-solving sessions were held on three consecutive weekends in weeks 5, 6 and 7 over a 10-week quarter system.
image file: d1rp00334h-f1.tif
Fig. 1 The study design highlighting interventions implemented per week.

The gap between each session was one week. The reason for holding three interventions was to better examine the impact on students’ problem-solving performance over a series of sessions. It should be noted that the problem-solving sessions took place over the weekends because students did not have any other common time during the weekdays. Except the first session, each session lasted one hour and thirty minutes which included, a 50 minutes intervention, 10 minutes reflection, and 30 minutes post-test. The first session was thirty minutes longer than the rest due to the 30 minutes pre-test. The pre-test enabled us to determine a baseline and evaluate students’ academic strengths in general chemistry for further data analysis.

Interventions were designed to give the students proper experience depending on the group, mixed versus categorized, that they were placed. After collecting students’ solutions provided for the intervention questions, students in both groups were provided keys involving detailed solutions prepared by the research team to give them a chance to reflect on their work, compare their answers to those provided on the key, and write down any possible questions that they want to ask to our student instructor during the review session offered at the end of the problem-solving sessions. The mixed group's keys also excluded any explicit information that could help students identify the topics. Each problem-solving session ended with a post-test. The review sessions are not considered a direct part of the study. They were used to encourage more students to participate in the study and attend the problem-solving sessions. The review sessions were put together to provide answers to students’ questions identified during the reflection and get them prepared for the upcoming exam given in the course. Both mixed and categorized groups had the same review sessions with the same student instructor throughout the study. In order to make the study more attractive, students were also mentioned that their names will be entered into a raffle to get a gift card if they attend all three problem-solving sessions.

The pre-tests and post-tests

Both the pre-test and the post-test consisted of five problems and had a time constraint of 30 minutes, in consideration of the average time allowed for general chemistry exams. The students were to solve the problems by themselves while working in a quiet environment to better simulate the standard general chemistry test setting found in college. In addition, no notes or other resources except a calculator were allowed during the testing. The problems for the pre-test were similar in structure to those found on the post-tests, with different chemical compounds and numerical values adding variation. All these problems were open-ended questions and selected from standard general chemistry questions with a varying degree of difficulty and topic focus from the chapters covered in a typical General Chemistry II course. During the period of the study, one pre-test and three posts were given. All three post-tests had similar problems to measure participants’ levels of solving problems on the same scale. This method allowed us to evaluate participants’ learning progress and avoid the introduction of new challenges and variables in the study.

The intervention

For the intervention, the experimental group had assignments with problems from different chapters mixed and listed in random order, whereas the control group was given categorized, blocked assignments with well-organized questions based on the sequence of chapters. In these assignments, questions also included identifying information giving the students a hint about which topic or chapter the question pertained to. However, it should be noted that the intervention assignments for both groups included the same questions selected from chemical equilibrium, acid–base equilibria, buffer solutions, titrations, thermochemistry, and properties of solutions. The questions were picked from these chapters as they were being covered in the course at that time. Participants were given 50 minutes to solve 8 questions while sitting in silence and working alone. All the questions used in the intervention involved concepts similar to those targeted in the pre-tests and post-tests. However, some topics from the intervention did not appear in the pre- and post-tests to keep the length of the tests relatively short. While both groups had identical questions in the intervention, students in the control group were given corresponding chapter numbers and topics for each question in the title to help them correlate problems with concepts. The different formats of intervention problems enabled us to examine the influence of the structure of the assignments on students’ problem-solving performances. During the problem-solving session, students were not allowed to use any additional study materials or resources (i.e. textbooks an notes).

Data analysis

In order to identify the key differences below the surface between two types of interventions, find out how they affect students’ problem-solving performances, and pinpoint the common challenges faced in solving chemistry problems, the team decided to use a reliable and valid coding system, the COSINE (Coding System for Investigating Subproblems and Networks) as described in detail by (Gulacar et al., 2013). With this system, it is possible to determine if students are having issues with the individual subproblems or subtopics involved in the questions or the overarching concept(s) of the question (Gulacar et al., 2021). The COSINE method works to convert a vast amount of qualitative data into a smaller amount of quantitative data to give instructors an efficient method of interpreting where exactly students are having the most trouble. In this way, one problem in a set can yield multiple data points to be used for calculation based on the number of sub-problems within that question. General chemistry problems rarely focus on one concept, and thus COSINE works to define how well students can connect the different concepts within each problem. For the purpose of better identifying and comparing participants’ achievements on the long-answer questions, the sub-problems were determined in each problem used in both pre- and post-tests. The analysis of all the problems revealed the following subtopics: Writing Equations (WEQ), Molarity Calculation (MC), ICE table (ICE), Calculating pH and POH (CPH), Le Chatelier’ Principle (LCP), Writing Equilibrium Equation (KEQ), Heat Calculation (HC), First Law of Thermodynamics (FLT), Boiling Point Differences (DBP), and Molality Calculation (MOC). During the application of the COSINE, one of the key rules is to evaluate and code every sub-problem individually and independently without worrying about the previous or the next step. For each sub-problem, a COSINE code is chosen from a set of eight different codes. Then, all these codes are then entered into two different equations, Attempt Success Rate (ASR) and Complete Success Rate (CSR) to facilitate the measurement of the students’ success with subproblems and overall problems.

As shown in Table 2, the COSINE codes can be generalized into three categories: successful, neutral, and unsuccessful. It should be noted that Unsuccessful Guessed (UG) and Unsuccessful Received Hints (URH), two codes of the COSINE, were not applied because no hint was provided in this study and audio data were not available to claim if students made a guess or could not simply do the subproblem. Therefore, all these instances were interpreted as cases belonging to the Couldn’t Do (CD) category.

Table 2 The descriptions of COSINE codes with examples
Codes Codea Description Example Explanation
a The detailed examples for each code can be found in (Gulacar et al., 2021).
Successful S (successful) Participants complete the sub-problem successfully image file: d1rp00334h-u1.tif Student writes correct equation for dissociation of sulfuric acid
Neutral NR (Not Required) Their solving strategy does not involve the sub-problem as a required step. Participants solve the problems successfully by using a different method Due to limited space, it is not possible to show an example for this code here. See the book chapter published by Gulacar et al. (2021)
DD (Did not know to Do) Participants skip the sub-problem or only include data. Their solving strategy requires the sub-problem as a key step, but they omit it image file: d1rp00334h-u2.tif Student was given initial amounts of reactants and products in moles and skipped the step of converting them to molarity.
DSE (Did Something Else) Participants skip the sub-problem, which is a required step in the solving strategy, and do something else instead image file: d1rp00334h-u3.tif Student did not know equation (ΔT = i·kb·m) and instead uses another equation to solve for molality
Unsuccessful UDI (Unsuccessful Did Incorrectly) Participants correctly identify and attempt the sub-problem, but they fail to complete it successfully image file: d1rp00334h-u4.tif Student knows the subproblem to use equilibrium constant formula exists, but writes it as reactants over products instead of reciprocal
CD (Could Not Do) Participants get stuck on the sub-problem and could not finish the rest of the sub-problems because of the failure to solve this specific sub-problem image file: d1rp00334h-u5.tif Student got stuck and could not figure out how to complete the calculation


The successful category contains only one code, S (successful), which indicates the participant's performance in solving the corresponding sub-problem is correct and complete. There are three codes under the neutral category: NR (Not required), DD (Did not know to Do), and DSE (Did Something Else). These codes are assigned to indicate the participant failed to attempt a sub-problem for some reason. The NR suggests that the participant skipped the sub-problem pre-determined in the rubric by adopting a different successful strategy, indicating the unnecessity of that particular step in the solution. Unlike NR, DD is assigned when the particular step is needed in the predicted routine of the adopted solving strategy, but the participant forgot to include this step in their solution. The code DSE indicates that the participant did something unnecessary in the predicted procedure instead of completing the required sub-problem as laid out in the rubric. The unsuccessful category includes UDI (Unsuccessfully Did Incorrectly) and CD (Could Not Do), which are highlighting the cases that the participants made attempts but failed to complete the sub-problem successfully. UDI indicates that the participants made proper identification and attempted to do the sub-problem without a successful end product. CD is assigned when participants got stuck on the sub-problem and could not finish the rest of the sub-problem for a variety of reasons.

Attempt Success Rate (ASR) and Complete Success Rate (CSR) are calculated by using the COSINE codes. As shown in eqn (1), ASR incorporates only the successful and unsuccessful codes to accurately measure the participants’ true problem-solving ability in doing each individual sub-problem, which is determined by the clear evidence shown in the participants’ solutions. Different from the formula of the ASR, CSR (see eqn (2)) additionally incorporates the neutral codes (i.e., DD and DSE) to evaluate the participants’ conceptual understanding, and ability to put the subproblems together.

 
image file: d1rp00334h-t1.tif(1)
 
image file: d1rp00334h-t2.tif(2)
It should be noted that the codes in the parentheses were not used in this study for the reasons mentioned above. It is obvious that doing subproblems accurately is necessary for a successful solution but does not guarantee a correct overall answer. Therefore, it is vital to investigate where students fail most of the time and how each intervention helps them with doing subproblems in addition to evaluating their overall achievement with the questions. The analysis of both scores, ASR and CSR, together with the distribution of codes helps us determine if the participants have a good conceptual understanding and illuminates what prevents them from connecting the subproblems successfully to obtain a correct answer (Gulacar et al., 2014; Gulacar et al., 2021).

Results and discussion

The data were analysed by considering several factors to answer the two main questions. In order to streamline the analysis and provide concrete evidence for the changes, the research team decided to convert the instances observed in students’ solutions into quantitative values, which were later utilized in the calculation of ASR and CSR scores. Along with the investigation in the changes of both ASR and CSR scores, an in-depth analysis was conducted to better understand if unsuccessful, neutral, or successful codes change more after the interventions for each group. The examination at this fine level provided interesting details about the effectiveness of the intervention, which could not be obtained with traditional methods that essentially compare the students’ scores obtained from individual questions or overall tests.

Influence of mixed vs. categorized practice assignments on students’ problem-solving performance

The main goal of the study was to examine the influence of mixed versus categorized problem assignments on students’ problem-solving achievement in general chemistry. In order to accomplish this goal, students’ codes were utilized in the calculation of CSR scores before and after the interventions. Then, delta CSR values were obtained to quantify the improvement in their problem solving after the study was completed. Table 3 shows the changes in the average delta CSR scores determined based on the students’ post-test scores obtained after each intervention.
Table 3 Descriptive data for the changes in the CSR scores after interventions
Changes in CSR scoresa Experimental Control
M SD Min Max M SD Min Max
a T0: pre-test, T1: the first post-test, T2: the second post-test, and T3: the third post-test.
ΔT1–T0 0.14 0.18 −0.08 0.46 0.13 0.21 −0.19 0.59
ΔT2–T0 0.30 0.21 0.00 0.73 0.25 0.21 −0.05 0.65
ΔT3–T0 0.34 0.21 0.00 0.73 0.27 0.23 −0.10 0.76


Upon examination, it was noted that the changes for both experimental and control groups steadily became larger. The control group values increased from 0.13 to 0.25, and, eventually, to 0.27 after the third intervention. On the other side, the difference in the average delta CSR values for the experimental group started with a value of 0.14, more than doubled after the second intervention, and peaked at a value of 0.34 after the last intervention. Since both groups had a chance to attend the problem-solving sessions and refreshed memories about the topics assessed on the post-test, they naturally benefited from those activities.

Even though the sample size is relatively small, a series of statistical tests were run to determine if the changes in the CSR scores, which were determined by using hundreds of codes assigned to each sub-step in students’ solutions, are significant. First, a series of two-tailed Wilcoxon signed rank tests, the nonparametric alternative of paired-sampled t-test, were run to investigate the changes in the CSR averages of both experimental and control groups after each intervention. The findings were organized and compared in Table 4. The tests indicate that all the changes observed in CSR scores for both groups are statistically significant. Although the values in each case are higher for the experimental group, it is difficult to make a strong argument that the effect of mixed practice is statistically more important than that of categorized practice. This finding reflects the idea that any practice is better than none. Previous studies indicate that an extensive amount of practice is a key to achieving a high level of performance (Ericsson, 2006).

Table 4 Statistical test results for the changes in the CSR scores
Experimental Control
V z-Value p-Value V z-Value p-Value
a T0: pre-test, T1: the first post-test, T2: the second post-test, and T3: the third post-test.
ΔT1–T0a 107 −2.67 <0.001 84 −1.98 0.048
ΔT2–T0 120 −3.41 <0.001 100 −2.98 0.003
ΔT3–T0 136 −3.52 <0.001 101 −3.04 0.002


To further examine the overall effect of interventions, the average of CSR scores calculated after each post-test was determined and compared to that of pre-test. This analysis should reveal if all mixed interventions combined have a larger effect on students’ problem-solving performance. The two-tailed Mann–Whitney U test indicated that the difference between groups is not significant based on an alpha value of 0.05, U = 141, z = −0.84, p = 0.399. The effect size for this test was determined to be 0.20, which is treated as a small effect according to Cohen's classification of effect size (Sullivan and Feinn, 2012). The mean rank for experimental group was 17.29 and the mean rank for control group was 14.43. This suggests that the distribution of the changes in CSR scores for experimental (Mdn = 0.25) was not significantly different from the distribution of that of control group (Mdn = 0.20). However, it should be noted that the success gap between the experimental group and control grew wider as the study progressed. The problem-solving performance of students using mixed assignments improved more than the problem-solving performance of students using a traditional practice approach. It can be argued that the further exposure to the mixed practice will yield statistically significant differences between the groups. The future iterations of this study will aim to recruit a larger student population and extend the number of workshops.

Increased differences in the CSR scores could be interpreted as students improve overall problem-solving performances little more every time. Gulacar et al. (2021) correlates CSR scores with the ability to connect sub-problems successfully. While doing individual subproblems, assessed by the formula of ASR, is critical for the success of any problem solution, it is not sufficient. In order to promote overall improvement, it is necessary to enhance students’ performance by figuring out what subproblems need to be done, executing them successfully, and putting them in the correct schema (Gulacar et al., 2021). Previously, it was determined that this overlooked aspect of problem-solving is the biggest factor that contributes to differences between successful and unsuccessful students taking a General Chemistry course (Gulacar et al., 2014). Considering the findings of this study and those of the previous studies, it can be argued that mixed practice not only helps the students do subproblems correctly, but also ameliorates their ability to organize them successfully. As knowledge organization is associated with conceptual understanding (Loh and Subramaniam, 2018), it could also be claimed that students exposed to mixed practice develop a better conceptual understanding of topics.

The increase in CSR scores implies that students in general got better ASR scores as well after the interventions, but it is important to examine them closely and see how the ASR scores and distribution of assigned codes, unsuccessful, neutral, and successful codes, changed after the study is completed. It should be remembered that ASR deals with students’ performance with individual subproblems. Fig. 2 shows differences in average delta ASR scores for each subtopic assessed in pre-test and post-test #3. The total number of subtopics involved in the tests was 10. The topics ranged from equilibrium to acid–base chemistry and from properties of solutions to thermodynamics. As mentioned in the methodology section, all these topics were covered in the second course in a General Chemistry series offered at the institution where the study was conducted.


image file: d1rp00334h-f2.tif
Fig. 2 Delta ASR scores for the topics involved in pre- and post-tests.

The analysis of the trends in the changes of delta ASR scores reveals that students in the experimental group showed a better improvement with doing individual subtopics when compared to the scores of the students in the control group. The control group showed better improvement with only three topics out of ten, Molarity Calculation (MC), Calculating pH and pOH (CPH), and Molality Calculation (MOC). It can be argued that although conceptual understanding is important across chemistry curriculum, all these subtopics require more algorithmic knowledge. It is possible that the mixed practice is not as effective as improving students’ calculation and algorithmic problem-solving performance compared to enhancement seen in more conceptual topics. This falls in line with Hardiman et al.'s (1989) and Zoller et al.'s (1995) findings pointing out that students who rely on conceptual information heavily tend to struggle early on with algorithmic problem solving compared to students who simply memorize equations.

In addition to the fact that the control group did better with subtopic CPH, it was also noted as being the subtopic both the control group and experimental group showed the greatest improvement. This improvement could also be related to other materials students utilized in the course. It is also possible that students had a fresher mind about this topic as it is covered through the end of the quarter, which is about the same time when the study was conducted. On the other hand, the subtopic HC (Heat Calculation) showed the lowest increase in delta ASR values. This is a topic covered in the very first week of the quarter. So, it appears that knowledge retention is one of the important factors influencing students’ problem-solving performance (Taylor et al., 2017; Nabulsi et al., 2021). Another interesting trend was observed in the delta ASR scores of the subtopic FLT (First Law of Thermodynamics). Although the students did better with this subtopic, the control group surprisingly did worse after several problem-solving sessions where they solved categorized problems. Although it is impossible to make a strong claim on the inefficacy of categorized practice in improving students’ problem-solving performances based on students’ scores on one topic, it should be noted that this kind of practice somehow does not help much, on the contrary, deteriorate the success of students working with specific types of problems (Carvalho and Goldstone, 2014).

Following analysis on the changes of delta ASR scores, to better understand how each intervention improved students’ ability to solve problems, an in-depth and unusual analysis was performed. In traditional assessment, students’ performances are examined based on the scores they get on individual questions or overall exams (Szu et al., 2011; Gao and Lloyd, 2020). However, very few analyse the subtopic level and almost none investigate beyond that level, not only checking the presence of errors but also identifying their types and differences. This kind of analysis provides great insight into students’ challenges with a clear picture that is achieved in very few studies. Here, the students’ performances will not be evaluated at the level of CSR or ASR scores, but at a level that displays the changes in the code distributions of each subtopic before and after the study. As seen in Fig. 3, each bar is split into three segments, each representing a specific code category: green indicates successful codes (S), yellow is associated with neutral codes (DD and DSE), and red shows unsuccessful codes (UDI and CD).


image file: d1rp00334h-f3.tif
Fig. 3 Code breakdown for experimental students for each subtopic utilized in pre- and post-tests.

The analysis of delta CSR and ASR scores highlighted the effectiveness and success of mixed practice in improving students’ problem-solving performances. Therefore, this third analysis was conducted for experimental group students only to better understand where mixed practice makes the difference and how and where exactly it enhances students’ ability to solve problems. It should be noted that the novelty in this analysis is within the investigation of neutral codes. These are the codes neglected or impossible to determine due to different reasons such as limited time and lack of knowledge on the presence of these codes when it comes to traditional assessment. Although it is favorable to see that the students’ number of successful codes go up and that of unsuccessful codes go down after the mixed practice, this finding not unexpected. The major intriguing piece of information determined here is that neutral codes are also responsible for students’ poor problem-solving performances in each subtopic. In some cases, by a higher percentage, they indicate reasons that are more important than those captured by unsuccessful codes. Even this finding alone necessitates a closer look at the meaning of these codes and what they tell us about the students’ challenges in problem solving.

When the changes in the number of unsuccessful, neutral, and successful codes were examined, it was determined that overall, the greatest change happens in the number of neutral codes. It can be argued that mixed practice is more effective at reducing the number DD (Didn’t Know to Do) and DSE (Did Something Else) codes. The decrease in these codes indicates that students get better at figuring out what subtopics or subproblems are needed for successfully executing a problem and avoiding doing any unnecessary subproblems. After the interventions, a number of neutral codes became zero for two subtopics, Boiling Point Differences (DBP) and Molality Calculation (MOC). The remaining subtopics showed a decrease ranging from 90% to 37%. Particularly, it was important to note that students had the greatest decrease, a 90% decrease, for the subtopic LCP (Le Chatelier’ Principle), which followed by the subtopics CPH (Calculating pH and POH) with a 71% decrease and ICE (ICE table) with a 67% decrease. In other words, it can be claimed that mixed practice helped with these three topics significantly comparing the trends in the changes in the neutral codes. Seeing these positive changes after only three interventions provides an important hint that students can learn relatively quickly how to tackle the problems on these challenging topics (Tyson et al., 1999; Kala et al., 2013) if proper and effective methods are used.

Examining the role of different variables on problem-solving performances

In the previous section, the descriptive data indicated that students in the experimental group performed better than their counterparts did in the control group, even though the differences were not determined to be statistically significant. The research group explored the results further to find out whether a certain group of students (e.g., Asian Pacific Islander versus Hispanic/Latino students) benefited from the mixed practice more than others. Therefore, the role of different attributes of students was investigated in light of changes observed in CSR scores and assigned codes. Although the already small sample size was further reduced in this analysis, some meaningful data were gathered due to the number of codes assigned to each subproblem. When each question in all four tests including one pre-test and three post-tests were split into subproblems for an in-dept investigation, 148 sub problems in total were identified. Hence, even the data of a group as small as that of the A students included a total of 444 individual data points. These numbers are particularly shown in Fig. 4 and 5 as code breakdowns. Although the trends observed from pre-test to the last post-test were interesting for both achievement and ethnic groups, the analysis of these results should be assessed with some scepticism due to the small sample size, which will be addressed in the future studies to provide further data with much greater statistical significance.
image file: d1rp00334h-f4.tif
Fig. 4 Code breakdown for experimental students across grade levels (0 = pre-test and 3 = the third post-test).

image file: d1rp00334h-f5.tif
Fig. 5 Code breakdown for experimental students with different ethnic backgrounds (0 = pre-test and 3 = the third post-test).

The first variable considered for the analysis was students’ achievement level determined based on their overall grade in General Chemistry I. Table 5 represents changing delta CSR values for students belonging to one of three achievement levels.

Table 5 Descriptive data of students grouped by success levels in General Chemistry I
Changes in CSR scoresa A students B students C students
M SD M SD M SD
a T0: pre-test, T1: the first post-test, T2: the second post-test, and T3: the third post-test.
ΔT1–T0 0.15 0.14 0.22 0.23 0.06 0.12
ΔT2–T0 0.16 0.14 0.36 0.25 0.31 0.18
ΔT3–T0 0.18 0.12 0.34 0.23 0.39 0.22


Even though small sample size gives little power to the statistical tests, a Kruskal Wallis rank sum test was conducted to assess if there were statistically significant differences in average delta values of post-tests and pre-test between the levels of grade. The Kruskal–Wallis test is a non-parametric alternative to the one-way ANOVA and does not share the ANOVA's distributional assumptions (Conover, 2007). The results of the Kruskal–Wallis test were not significant based on an alpha value of 0.05, χ2(2) = 1.60, p = 0.450, indicating that the mean rank of the averages of post-tests and pre-test was similar for each level of grade. It is speculated that this result was obtained mainly because of the small number of students in each grade group. The discussions in the following sections, therefore, were performed based on the descriptive data and the type of the codes students received for their work.

In discussion of Fig. 4, with regards to the student group A, we see the least positive difference in CSR scores across the three post-tests. This affirms the idea that A-level students have the least room to grow as they already have developed efficient problem-solving abilities through previous means and know-how to properly utilize them (Zhang et al., 2014). In accordance with the previous trends, there is not much but some meaningful growth which lends to believe that there is still some positive benefit to the utilization of the mixed practice even for high achieving students.

On the contrary to the small improvement observed in group A, the students in both groups B and C enhanced their performance considerably. The greatest progress in group C indicates that students with low achievement levels may benefit from the mixed practice the most, because, in general, they use less effective study strategies (Chan and Bauer, 2016). Blocked or categorized examples are utilized by all students to some extent, but due to easiness in their use, it is possibly adapted most by low-achieving students (Rohrer et al., 2020). If the goal is to close the gap between successful and unsuccessful students, it is critical to use evidence-based methods and get away from the traditional and less effective methods even they are easy to implement. Instead, the educators and textbook publishers should promote mixed practice widely.

Following the analysis of the differences in the CSR scores, the team delved into the examination of the differences in the codes the students in each group received in the pre-test and the third post-test. When the pre-test results particularly were examined, it was noted that neutral codes take up a higher percentage of the composition of the received codes for the students with B and C grades as compared to the students with grade A. Fig. 4 shows an interesting trend that reveals that the mixed practice helped the students in group A make fewer mistakes captured by unsuccessful codes (UDI and CD), while the same practice helped the students in groups B and C not to forget to use the needed subproblem and do less irrelevant calculations, captured by DD and DSE codes, respectively, which make up the neutral codes.

It can be argued that B and C students need more help with the big picture, rather than spending extensive time on individual subtopics, so they can more successfully put the pieces together without losing their direction in generating successful solutions.

It is also important to investigate the study strategies of the students in group A and understand why they had the lowest neutral codes, which are translated into ineffective, inefficient, and undesired methods used in studying chemistry or any subject. Similarly, further analysis is needed to understand how exactly mixed practice helps them to do less incorrect subproblems.

Additional variable considered for further analysis was ethnic background, which is considered to be an important attribute as several reforms took place in the last few decades targeting to close the gap between the students belong to different groups (O'Sullivan, 1993; Von Secker, 2002). Due to such a small sample size, data analysis for those in different ethnic backgrounds were considered only. Although we had a small subset of students in each group, their data was shared here to give hints to the educators on how the students in each group with a larger sample size could benefit from an extended mixed practice. For example, Table 6 indicates that after the first intervention, Hispanic/Latino students improved their problem-solving performance most with a delta CSR value of 0.22, compared to 0.16 of Asian students and 0.07 of White students.

Table 6 Descriptive data of different ethnic groups
Changes in CSR Scoresa Asian/Pacific White Latino/Hispanic
M SD M SD M SD
a T0: pre-test, T1: the first post-test, T2: the second post-test, and T3: the third post-test.
ΔT1–T0 0.16 0.20 0.07 0.11 0.22 0.22
ΔT2–T0 0.26 0.19 0.38 0.14 0.31 0.38
ΔT3–T0 0.30 0.18 0.35 0.19 0.42 0.38


Interestingly, as they participated in the interventions, it was noted that the benefits of each group became almost the same with ranging values of delta CSR from 0.30 to 0.42. The results of the Kruskal–Wallis test were not significant based on an alpha value of 0.05, χ2(2) = 0.16, p = 0.925, indicating that the mean rank of the averages of post-tests and pre-test was similar for each ethnic group. So, regardless of their ethnic background, students improved their problem-solving performance with mixed practice. This progress is also evident in the Fig. 5 that shows the changes in the code breakdowns for each ethnic group. The unsuccessful codes decrease for all groups after the interventions, but it is interesting to notice that this decrease is the minimum for the Latino/Hispanic students, 20% versus about 50% for both white and Asian students. On the other side, the decrease observed in the number of neutral codes is the maximum for Hispanic students, 74% versus 50% (white students) and 68% (Asian students). It can be argued that the interventions help all students regardless of their ethnic backgrounds improve their problem-solving performance, but the in-depth analysis of the changes in codes reveals that each group benefits from the interventions in different ways. In order to better understand this trend, it is necessary to consider additional variables such as study skills used by students in each group and utilize think-aloud protocols.

Limitations

Although the invitation to participate in the study was sent to many students, about 900, a relatively low number were able to attend the whole three interventions that took place over the weekends. The small population limited the types of analysis the team could conduct. In parallel, the small number of participants prevented the team from obtaining the desired diversity in participant groups. On the other hand, the size was too large to use think-aloud protocols and capture each student's thinking process due to the sheer amount of work required to create the student transcripts. The lack of audio data made the coding process more challenging for the team. For example, the UG code had to be removed from the analysis.

Conclusions

This study has demonstrated that using mixed problem sets as a method of practice has potential to improve college students’ achievement with problem solving in general chemistry. Moreover, the data showed that the use of mixed practice versus categorized practice not only increases students’ Complete Success Rate scores, but also does widen the success gap between the experimental and control groups after each additional intervention. The further analysis of the changes in the neutral codes also revealed that students, after three mixed practice sessions, were more successful in figuring out the needed subproblems for the solution and did less unsuccessful attempts which is evidenced by a fewer number of Did Not Know to Do and Did Something Else codes assigned. It was determined that after the study, the neutral codes decreased about 70% on average for all the subtopics studied, while unsuccessful codes observed a 40% decrease on average. On the other hand, the successful codes increased by 52% on average.

This study also demonstrated that even though college students showed different levels of improvement after the interventions, they all benefited from the mixed practice. For example, students who got a C or B in the previous course showed better improvement, while the A-level students had the smallest increase in their scores. It seems that the benefit gained with this method is inversely proportional to the students’ achievement level. As many models and interventions aim to transform C-students to B or A-students and B-students to A-students, this model shows promising results in this respect as well. Further analysis of the codes highlighted the nature of differences between A- and C-students and provided evidence on how the interventions helped them. The students with a grade of A initially received only 7% neutral codes based on their solutions in the pre-test while C-students had 28% neutral codes, which could be translated into confusion and struggles with problem solving. Although the difference between those codes was 21% initially, this gap became significantly lower, 3% to be exact, at the end of the study, suggesting that C-students became more successful in their analysis of given questions and determining the right strategy to solve them.

Even though the small sample size did not provide great statistical power, the mixed practice format appeared to positively influence student's problem-solving strategies, which is evidenced by the decreased number of neutral and unsuccessful codes. These findings may encourage educators to assign more mixed practice assignments and textbook publishers to consider utilizing more of them at the end of each chapter. Mixed problem sets may be considered as an alternative to traditional practice assignments across a given curriculum at any level it is important that these assignments not focus on one concept only but incorporate multiple concepts from different chapters or even courses taught previously. To increase the benefits of the desirable difficulties, the questions within the problem sets should not have details identifying the topics or chapters to which those questions belong.

To better understand the trends observed in the data analysis, explore the correlation between CSR and number of interventions further, and provide stronger statistical evidence for the effectiveness of the method, this study should be repeated with a larger number of students. A more diverse population should be sampled to include additional ethnic groups (e.g., African American students). For the analysis, gender information should be collected and utilized to examine the role of gender, if any exists, in problem-solving improvement. Finally, participants should be exposed to a greater number of mixed-problem interventions to test whether the improvements seen keep growing, reach a threshold and stay constant or drop somehow.

Conflicts of interest

There are no conflicts of interests to declare.

References

  1. Ascher C., (1985), Increasing science achievement for disadvantaged students, Urban Rev., 14, 279–284.
  2. Balch W. R., (1998), Practice versus review exams and final exam performance, Teach. Psychol., 25, 181.
  3. Bodner G. M. and Bhattacharyya G., (2005), A cultural approach to problem solving, Educ. Quim., 16, 222–229.
  4. Bodner G. M. and Herron J. D., (2002), Chemical Education: Towards Research-Based Practice, ed. Gabel D. L., Gilbert J. K., De Jong O., Justi R., Treagust D. E. and Van Driel J. H., The Netherlands: Dordecht: Kluwer Academic Publishers, pp. 235–266.
  5. Bretz S. L., (2001), Novak's theory of education: Human constructivism and meaningful learning, J. Chem. Educ., 78, 1107.
  6. Brown P. C., Roediger H. L. and McDaniel M. A., (2014), Make It Stick: The Science of Successful Learning, Cambridge: Harvard University Press.
  7. Brown T., LeMay E. and Bursten B., (2018), Chemistry: The central science, Upper Saddle River, NJ: Pearson.
  8. Bybee R. W., (2014), The BSCS SE instructional model: Personal reflections and contemporary implications, Sci. Child., 51, 10–13.
  9. Candy P. C. and Crebert R. G., (1991), Ivory tower to concrete jungle: The difficult transition from the academy to the workplace as learning environments, J. High. Educ., 62, 570–592.
  10. Carvalho P. F. and Goldstone R. L., (2014), Putting category learning in order: Category structure and temporal arrangement affect the benefit of interleaved over blocked study, Mem. Cogn., 42, 481–495.
  11. Chan J. Y. K. and Bauer C. F., (2016), Learning and studying strategies used by general chemistry students with different affective characteristics, Chem. Educ. Res. Pract., 17, 675–684.
  12. Chi M. T. H., Feltovich P. J. and Glaser R., (1981), Categorization and representation of physics problems by experts and novices, Cogn. Sci., 5, 121–152.
  13. Conover W. J., (2007), Practical nonparametric statistics, [S.l.]: Academic Internet Publishers.
  14. Docktor J. L., Mestre J. P. and Ross B. H., (2012), Impact of a short intervention on novices’ categorization criteria, Phys. Rev. ST: Phys. Educ. Res., 8, 020102.
  15. Eglington L. G. and Kang S. H. K., (2017), Interleaved presentation benefits science category learning, J. Appl. Res. Mem. Cogn., 6, 475–485.
  16. Ellis J., Hanson K., Nuñez G. and Rasmussen C., (2015), Beyond plug and chug: An analysis of calculus I homework, Int. J. Res. Undergrad. Math. Educ., 1, 268–287.
  17. Ericsson K. A., (2006), The influence of experience and deliberate practice on the development of superior expert performance, Cambridge Handb. Expert. Expert Perform., 683–703.
  18. Gabel D. L. and Bunce D. M., (1994), Research on problem solving: Chemistry. In Handbook of research on science teaching and learning, New York: Macmillan.
  19. Gao R. M. and Lloyd J., (2020), Precision and accuracy: Knowledge transformation through conceptual learning and inquiry-based practices in introductory and advanced chemistry laboratories, J. Chem. Educ., 97, 368–373.
  20. Gerson H. H., (2001), Making connections: Compartmentalization in pre-calculus students' understanding of functions, Master's degree, University of Iowa.
  21. Goldin S. B., Horn G. T., Schnaus M. J. J., Grichanik M., Ducey A. J., Nofsinger C., Hernandez D. J., Shames M. L., Singh R. P. and Brannick M. T., (2014), FLS skill acquisition: A comparison of blocked vs. interleaved practice, J. Surg. Educ., 71, 506–512.
  22. Gulacar O. and Fynewever H., (2010), A research methodology for studying what makes some problems difficult to solve, Int. J. Sci. Educ., 32, 2167–2184.
  23. Gulacar O., Overton T. L., Bowman C. R. and Fynewever F., (2013), A novel code system for revealing sources of students' difficulties with stoichiometry, Chem. Educ. Res. Pract.
  24. Gulacar O., Eilks I. and Bowman C. R., (2014), Differences in general cognitive abilities and domain-specific skills of higher- and lower-achieving students in stoichiometry, J. Chem. Educ., 91, 961–968.
  25. Gulacar O., Milkey A. and McLane S., (2019), Exploring the effect of prior knowledge and gender on undergraduate students' knowledge structures in chemistry, Eurasia J. Math., Sci. Technol. Educ., 15, em1726.
  26. Gulacar O., Cox C., Tribble E., Rothbart N. and Cohen-Sandler R., (2020), Investigation of the correlation between college students’ success with stoichiometry subproblems and metacognitive awareness, Can. J. Chem., 98, 676–682.
  27. Gulacar O., Cox C. and Fynewever H., (2021), in Tsaparlis G. (ed.), Problems and Problem Solving in Chemistry Education: Analysing Data, Looking for Patterns and Making Deductions, The Royal Society of Chemistry, pp. 68–92.
  28. Hardiman P., Dufresne R. and Mestre J., (1989), The relation between problem categorization and problem solving among experts and novices, Mem. Cogn., 17, 627–638.
  29. Hegde B. and Meera B. N., (2012), How do they solve it? An insight into the learner's approach to the mechanism of physics problem solving, Phys. Rev. ST – Phys. Educ. Res., 8, 010109.
  30. Hein G. E., (1991), Presented in part at the International Committee of Museum Educators Conference, Jerusalem, Israel.
  31. Herrington A., (2021), The Effect Interleaved Practice in a High School Chemistry Class, Master of Education, Northwestern College.
  32. Janice Lin C.-H., (2013), Interleaved practice enhances skill learning and the functional connectivity of Fronto–Parietal networks, Hum. Brain Mapp., 34, 1542–1558.
  33. Jonassen D., Strobel J. and Lee C. B., (2006), Everyday Problem Solving in Engineering: Lessons for Engineering Educators, Journal of Engineering Education, 95, 139–151.
  34. Kala N., Yaman F. and Ayas A., (2013), The effectiveness of predict–observe–explain technique in probing students’ understanding about acid–base chemistry: A case for the concepts of pH, pOH, and strength, Int. J. Sci. Math. Educ., 11, 555–574.
  35. Kauffman D. F., Ge X., Xie K. and Chen C.-h., (2008), Prompting in web-based environments: Supporting self-monitoring and problem solving skills in college students, J. Educ. Comput. Res., 38, 115–137.
  36. Kennedy T. J. and Odell M. R. L., (2014), Engaging students in STEM education, Sci. Educ. Int., 25, 246–258.
  37. Kivunja C., (2014), Do you want your students to be job-ready with 21st century skills? Change pedagogies: A pedagogical paradigm shift from Vygotskyian social constructivism to critical thinking, problem solving and Siemens' digital connectivism, Int. J. High. Educ., 3, 81.
  38. Lin C.-H., Knowlton B. J., Chiang M.-C., Iacoboni M., Udompholkul P. and Wu A. D., (2011), Brain–behavior correlates of optimizing learning through interleaved practice, NeuroImage, 56, 1758–1772.
  39. Loh A. S. L. and Subramaniam R., (2018), Mapping the knowledge structure exhibited by a cohort of students based on their understanding of how a galvanic cell produces energy, J. Res. Sci. Teach., 55, 777–809.
  40. Macphee D., Farro S. and Canetto S. S., (2013), Academic self-efficacy and performance of underrepresented STEM majors: Gender, ethnic, and social class patterns, Anal. Soc. Iss. Publ. Pol., 13, 347–369.
  41. Manji J., (1995), College students say good-bye to plug-and-chug calculus, Mach. Des., 67, 30.
  42. Moseley B., (2005), Students' early mathematical representation knowledge: The effects of emphasizing single or multiple perspectives of the rational number domain in problem solving, Educ. Stud. Math., 60, 37–69.
  43. Nabulsi L., Nguyen A. and Odeleye O., (2021), A comparison of the effects of two different online homework systems on levels of knowledge retention in general chemistry students, J. Sci. Educ. Technol., 30, 31–39.
  44. Nemeth L., Werker K., Arend J., Vogel S. and Lipowsky F., (2019), Interleaved learning in elementary school mathematics: Effects on the flexible and adaptive use of subtraction strategies, Front. Psychol., 10, 86–86.
  45. Nurrenbern S. C. and Pickering M., (1987), Concept learning versus problem solving: Is there a difference? J. Chem. Educ., 64, 508–510.
  46. O'Sullivan K. A., (1993), Reforms in science education, K-12, Sch. Educ. Rev., 5, 4–5.
  47. Pedaste M., Mäeots M., Siiman L. A., de Jong T., van Riesen S. A. N., Kamp E. T., Manoli C. C., Zacharia Z. C. and Tsourlidaki E., (2015), Phases of inquiry-based learning: Definitions and the inquiry cycle, Educ. Res. Rev., 14, 47–61.
  48. Persky A. M. and Robinson J. D., (2017), Moving from novice to expertise and its implications for instruction, Am. J. Pharm. Educ., 81, 6065.
  49. Petrucci R. H., Herring F. G., Madura J. D. and Bissonnette C., (2011), General chemistry: Principles and modern applications, Toronto: Pearson.
  50. Pickering M., (1990), Further studies on concept learning versus problem solving, J. Chem. Educ., 67, 254.
  51. Rohrer D. and Taylor K., (2007), The shuffling of mathematics problems improves learning, Instruct. Sci., 35, 481–498.
  52. Rohrer D., Dedrick R. F. and Burgess K., (2014), The benefit of interleaved mathematics practice is not limited to superficially similar kinds of problems, Psychon. Bull. Rev., 21, 1323–1330.
  53. Rohrer D., Dedrick R. F., Hartwig M. K. and Cheung C. N., (2020), A Randomized controlled trial of interleaved mathematics practice, J. Educ. Psychol., 112, 40–52.
  54. Sayan H. and Mertoğlu H., (2020), Investigation of the opinions of science teachers about homework, J. Educ. Learn., 9, 232.
  55. Schoenfeld A. H., (1982), Measures of problem-solving performance and of problem-solving instruction, J. Res. Math. Educ., 13, 31–49.
  56. Schorn J. M. and Knowlton B. J., (2021), Interleaved practice benefits implicit sequence learning and transfer, Mem. Cogn.
  57. States N. L., (2013), Next generation science standards: For states, by states.
  58. Su W. Y.-J., (1991), A Study of Student Learning through Lectures Based on Information Processing Theory, Doctor of Philosophy of Chemistry, University of Glasgow.
  59. Sullivan G. M. and Feinn R., (2012), Using effect size—Or why the P value is not enough, J. Grad. Med. Educ., 4, 279–282.
  60. Szu E., Nandagopal K., Shavelson R. J., Lopez E. J., Penn J. H., Scharberg M. and Hill G. W., (2011), Understanding academic performance in organic chemistry, J. Chem. Educ., 88, 1238–1242.
  61. Taylor A. T. S., Olofson E. L. and Novak W. R. P., (2017), Enhancing student retention of prerequisite knowledge through pre-class activities and in-class reinforcement, Biochem. Mol. Biol. Educ., 45, 97–104.
  62. Tyson L., Treagust D. F. and Bucat R. B., (1999), The complexity of teaching and learning chemical equilibrium, J. Chem. Educ., 76, 554.
  63. Veloo A., Lee H. H. and Seung C. L., (2015), Gender and ethnicity differences manifested in chemistry achievement and self-regulated learning, Int. Educ. Stud., 8, 1–12.
  64. Von Secker C., (2002), Effects of inquiry-based teacher practices on science excellence and equity, J. Educ. Res., 95, 151–160.
  65. Yager R. E., (2000), The constructivist learning model, Sci. Teach., 67, 44–45.
  66. Zhang D. K., Ding Y., Barrett D. E., Xin Y. P. and Liu R. D., (2014), A comparison of strategic development for multiplication problem solving in low-, average-, and high-achieving students, Eur. J. Psychol. Educ., 29, 195–214.
  67. Zoller U., Lubezky A., Nakhleh M. B., Tessier B. and Dori Y. J., (1995), Success on Algorithmic and LOCS vs. Conceptual Chemistry Exam Questions, J. Chem. Educ., 72(11), 987.
  68. Zumdahl S. S. and DeCoste D. J., (2017), Chemical principles, Boston, MA: Cengage Learning.

This journal is © The Royal Society of Chemistry 2022