O.
Gulacar
*,
Arista
Wu
,
V.
Prathikanti
,
B.
Vernoy
,
H.
Kim
,
T.
Bacha
,
T.
Oentoro
,
M.
Navarrete-Pleitez
and
K.
Reedy
Department of Chemistry, University of California-Davis, One Shields Avenue, Davis, California 95616, USA. E-mail: ogulacar@ucdavis.edu
First published on 25th January 2022
The questions in the practice assignments given to students in the form of worksheets or other formats are often grouped by chapter, topic, or concepts. There is a great emphasis on categorization. Most of the end-of-chapter problems in chemistry textbooks are organized by sections. Although this was done with the intention of helping students navigate the assignments more easily and practice in order, it is not what they are expected to do during the tests. There is a mismatch between what they practice on and how they are tested. The goal of this study is to examine the influence of the structure of the assignments on students’ problem-solving performances. Two groups of students from chemistry classes were recruited to participate in this study. Each group had the same length of practice and identical questions with only one difference. The experimental group had assignments with mixed organization of questions, while the control group had traditional assignments with the questions organized around chapters and topics. Students completed three two-hour long problem-solving sessions during the weekends. Evaluation of their progress consisted of their solutions obtained from one pre-test and three post-tests, with one given after each problem-solving session. The study revealed that students in the experimental group increased their problem-solving success more than those in the control group starting from the first intervention. The achievement gap widened as the study progressed. It is recommended that educators and textbook publishers create and utilize assignments that contain more mixed questions on different topics and chapters.
Problems and problem solving, as a method, are not only used in assessments to measure students’ level of comprehension, but also give them a chance to self-reflect on their progress and identify ways to better connect the knowledge pieces to come up with a successful solution (Kauffman et al., 2008). One of the main facets of constructivist theory deals with the fact that each “learner” individually constructs their understanding of a phenomenon or problem while they learn (Bretz, 2001). The role of the educator is then to facilitate the process and guide their students to construct their knowledge based on their beliefs, talents, and perceptions of things they interact with. Even though there is enormous data proving that the mind is not an empty vessel to be filled by instructors (Yager, 2000), some educators do not take this important principle into consideration while developing interventions to improve students’ understanding of concepts and problem-solving achievement. In addition, those educators hold students responsible for their failure without utilizing research-based effective strategies in classrooms. In these environments, there is also a strong belief assuming that students will learn successfully if they listen to their professors well (Su, 1991). Although paying attention is a necessary step in receiving knowledge before processing it, it is only one step (Su, 1991). There are many other factors that need to be investigated to reveal the source of the challenges. Research methodologies guided by constructivist theory aim to explore the influence of all possible effects and accompany problem-solvers’ journeys in their minds through interviews such as think-aloud protocols to bring hidden issues to light (Hardiman et al., 1989; Yager, 2000; Gulacar et al., 2020).
The constructivist theory also recommends getting away from rote memorization and developing conceptual understanding that necessitates the examination and discussions of concepts at a finer level, considering all dimensions to create a knowledge system in which every piece makes sense (Hein, 1991). In chemistry, physics and mathematics education, it has been consistently observed that practice problems often have an emphasis on the quantitative aspect, algorithmic side, of problem solving rather than the qualitative facet, conceptual side (Nurrenbern and Pickering, 1987; Moseley, 2005; Rohrer and Taylor, 2007). In a study conducted by (1995), it was determined that while students were able to solve problems related to Boyle's law or Charles's law, about two-thirds of them were unable to explain critical aspects of gas behaviours. Students tended to simply use the “plug and chug” method by using the surface elements to solve problems without understanding underlying principles involved in the questions (Chi et al., 1981; Schoenfeld, 1982; Hegde and Meera, 2012). Although promoting conceptual understanding is important and needed, students’ algorithmic problem-solving skills should not be neglected at the same time. This idea is based off of studies showing that students who receive instruction focusing purely on the conceptual aspect of topics are also less likely to be able to answer “traditional” numerical problems, meaning that simply teaching concepts is not always enough for a student to be able to approach other kinds of problems introduced in academia (Hardiman et al., 1989; Pickering, 1990). In other words, it is necessary that students are given the opportunity to improve their conceptual understanding and sharpen algorithmic problem-solving skills at the same time.
There are different methods adopted by educators to accomplish this important and challenging goal. For example, in biology education, the Biological Sciences Curriculum 5E (BSCS 5E) model exists to structure teaching activities that put the students in the center of instruction and guide them to move towards becoming an expert. It has a notion that inquiry is divided into five stages, which vary depending on the study. The model lists these stages as Engagement, Exploration, Explain, Elaboration, and Evaluation (Bybee, 2014; Pedaste et al., 2015). All the steps in this method mainly focus on improving students’ conceptual understanding and guiding them to determine the underlying principles in the given questions. In other words, this method and similar ones, including interleaved practices, aim to transform students’ problem-solving ability from looking for simple surface elements and familiar terms in the given question to the exploration and the identification of concepts and underlying principles of the questions to develop successful solutions more efficiently (Hardiman et al., 1989; Rohrer et al., 2014; Persky and Robinson, 2017). Specifically, interleaved practices have proven to be successful in guiding students to evaluate the questions as an overall task, which is known as an expert characteristic, instead of focusing on steps individually (Gerson, 2001; Rohrer et al., 2014; Persky and Robinson, 2017).
In order to promote expert attributes and help students think like experts, it is also imperative that educators use techniques that improve their long-term retention of information learnt (Balch, 1998). Existing literature shows that introducing a structure that makes performance difficult, commonly referred to as desirable difficulties, during practice improves long-term retention (Docktor et al., 2012; Brown et al., 2014). It is believed that this aspect is one of the reasons that makes interleaved practice successful in helping students retain the learned information longer and retrieve it more effectively when needed, e.g., in solving problems (Rohrer et al., 2014).
Interleaved practice can take on a variety of meanings and applications, but the most consistent aspect is its contrasting structure to blocked practice (Janice Lin, 2013). In blocked practice, students solve multiple problems of the same category or “block” before moving on to a different block. Interleaved practice stresses the breaking up and reshuffling of these blocks in a student's practice set (Goldin et al., 2014). These breaks in the blocks may be represented as time breaks between study sessions, the studying of different subjects entirely, or learning about various topics within the same subject. This is also in stark contrast to today's education system, which guides and encourages students to practice chemistry problems in “blocks” at a time. When examined, it becomes obvious that end-of-chapter problems in General Chemistry textbooks (Petrucci et al., 2011; Zumdahl and DeCoste, 2017; Brown et al., 2018) are organized by section. For example, stoichiometry questions are usually broken down to eight or nine sections, each focusing on a separate subtopic such as limiting reagent, percent yield, and mole concept. Similarly, professors traditionally assign homework as a set of questions categorized by chapter or section with clear headings and descriptions for repetition and reinforcement (Sayan and Mertoğlu, 2020). These are all examples of blocked practice and promote the use of the aforementioned “plug and chug” techniques adopted by students. Students memorize solutions they found in the exercises and try to apply them to the questions given on the exams. The studies show that although students do well with questions involving single topics or a simple combination of these topics, they fail to generate correct solutions to the problems involving the topics of the same chapter or across chapters, especially with complex connections (Gulacar and Fynewever, 2010; Gulacar et al., 2014).
When it comes to implementing an interleaved study plan, there are many ways of preventing the traditional “blocks” of study material. In one particular study conducted by Rohrer and Taylor (2007), students were assessed across three categories, massed versus spaced, light versus heavy, and blocked versus mixed. While our study was primarily interested in blocked versus mixed version, the results from massed versus spaced was relevant in the design of our study. Massed versus spaced dealt with the distance in time intervals between study sessions, with students in the massed group studying all the content in one week compared to the spaced group covering the material in two weeks. The results showed that the spaced practice group vastly outperformed the massed group, indicating that spaced practice is a more effective study strategy than massed practice. Rohrer and Taylor (2007) also investigated the success of the blocked versus mixed groups, where students from the blocked group practiced 4 similar problems in a row followed by another 4 similar problems, whereas the mixed groups had their questions shuffled. It was determined that the students in the mixed group scored worse on practice assignments, but higher on exams. The higher practice scores in the blocked group could be due to students’ success with using equations and schema adopted from similar questions, which are fundamental steps followed in the plug-and-chug method (Manji, 1995; Ellis et al., 2015).
Similar to the studies conducted in mathematics education, the research that examined the efficacy of interleaved practice in chemistry revealed positive impact on students’ problem-solving performance. Eglington and Kang (2017) compared students’ success in organic chemistry four times using interleaved versus blocked practices. In their research, a special emphasis placed on the role of problem difficulty, problem similarity, and number of categories in students’ performance. In their first experiment, the interleaved practice group showed stronger levels of categorization, as in, students were able to identify categories more accurately for themselves. The second experiment was similar and included categories that were less distinctive and generally had overlapping concepts; the interleaved practice group scored higher once again. In the third and fourth experiments, Elington and Kang (2017) highlighted various characteristics of practice problems in red. This was done to give both interleaved and blocked groups hints that might identify the categories of the respective problems. They determined that the students’ ability to identify categories did not change when they highlighted these characteristics in red. However, in all four cases, the interleaved group scored better on assessments compared to the blocked group. Another study carried out by Herrington (2021) tested high school chemistry students and added more visual and conceptual components. Consistent to prior studies, there was an interleaved group and an experimental group. The interleaved group scored higher than the blocked group, displaying the effectiveness of interleaved practice outside of computational problems.
Due to lack of the studies investigating the effects of interleaved practice in General Chemistry course, this study was designed to determine how the structures of practice assignments, mixed versus categorized, influence students’ success with problems. It should be noted that in order to create the instruments used in interventions, a wide range of topics across chapters including chemical equilibrium, acid–base equilibria, and thermodynamics were selected, rather than using questions from the same chapter.
Eglington and Kang (2017) also studied the effect of explicitly sharing the underlying content of the questions with both interleaved and blocked practice assignments. However, they found no significant differences on the influence of including such information on students’ problem-solving achievement.
Differently from the Eglington and Kang’ study (2017), in our study, the questions in the blocked set, labeled as the categorized assignment, had headings and short descriptions while the questions in the mixed group assignments did not have any kind of information, which might reveal the topics. For example, if the problem requires the use of percent yield, a subtopic of stoichiometry a common chapter covered in General Chemistry, the question will have some identifier in practice assignments given to the students in the categorized group that reflects either the topic, chapter, or a combination of the two. Although there are many research projects that focus on different aspects of interleaved learning, such as the neural basis of the contextual interference (Lin et al., 2011), the role of interleaved learning in solving multi-digit subtraction problems (Nemeth et al., 2019), the effect of blocked practice compared to interleaved practice on knowledge retention (Schorn and Knowlton, 2021), to date, there has been lack of attention to the role of interleaving categories in this specific way in chemistry.
• How do mixed versus categorized practice assignments influence students’ problem-solving performance?
• How does the method affect different groups of students’ problem-solving success?
As part of the second question, the role of achievement level in a previous course and ethnic background were examined. These variables were found to be correlated at different levels with students’ academic success in different studies (Ascher, 1985; Macphee et al., 2013; Veloo et al., 2015; Gulacar et al., 2019).
The research team had received Institutional Review Board (IRB) approval before the project was initiated. All the participants voluntarily agreed to be in the study and consented through electronic forms to share their solutions and written thinking process on the assignments. To provide anonymity and protect privacy, every participant was assigned a random number to be put on practice assignments. In addition, the surveys did not contain questions with possible identifying characteristics.
The gap between each session was one week. The reason for holding three interventions was to better examine the impact on students’ problem-solving performance over a series of sessions. It should be noted that the problem-solving sessions took place over the weekends because students did not have any other common time during the weekdays. Except the first session, each session lasted one hour and thirty minutes which included, a 50 minutes intervention, 10 minutes reflection, and 30 minutes post-test. The first session was thirty minutes longer than the rest due to the 30 minutes pre-test. The pre-test enabled us to determine a baseline and evaluate students’ academic strengths in general chemistry for further data analysis.
Interventions were designed to give the students proper experience depending on the group, mixed versus categorized, that they were placed. After collecting students’ solutions provided for the intervention questions, students in both groups were provided keys involving detailed solutions prepared by the research team to give them a chance to reflect on their work, compare their answers to those provided on the key, and write down any possible questions that they want to ask to our student instructor during the review session offered at the end of the problem-solving sessions. The mixed group's keys also excluded any explicit information that could help students identify the topics. Each problem-solving session ended with a post-test. The review sessions are not considered a direct part of the study. They were used to encourage more students to participate in the study and attend the problem-solving sessions. The review sessions were put together to provide answers to students’ questions identified during the reflection and get them prepared for the upcoming exam given in the course. Both mixed and categorized groups had the same review sessions with the same student instructor throughout the study. In order to make the study more attractive, students were also mentioned that their names will be entered into a raffle to get a gift card if they attend all three problem-solving sessions.
As shown in Table 2, the COSINE codes can be generalized into three categories: successful, neutral, and unsuccessful. It should be noted that Unsuccessful Guessed (UG) and Unsuccessful Received Hints (URH), two codes of the COSINE, were not applied because no hint was provided in this study and audio data were not available to claim if students made a guess or could not simply do the subproblem. Therefore, all these instances were interpreted as cases belonging to the Couldn’t Do (CD) category.
Codes | Codea | Description | Example | Explanation |
---|---|---|---|---|
a The detailed examples for each code can be found in (Gulacar et al., 2021). | ||||
Successful | S (successful) | Participants complete the sub-problem successfully |
![]() |
Student writes correct equation for dissociation of sulfuric acid |
Neutral | NR (Not Required) | Their solving strategy does not involve the sub-problem as a required step. Participants solve the problems successfully by using a different method | Due to limited space, it is not possible to show an example for this code here. See the book chapter published by Gulacar et al. (2021) | |
DD (Did not know to Do) | Participants skip the sub-problem or only include data. Their solving strategy requires the sub-problem as a key step, but they omit it |
![]() |
Student was given initial amounts of reactants and products in moles and skipped the step of converting them to molarity. | |
DSE (Did Something Else) | Participants skip the sub-problem, which is a required step in the solving strategy, and do something else instead |
![]() |
Student did not know equation (ΔT = i·kb·m) and instead uses another equation to solve for molality | |
Unsuccessful | UDI (Unsuccessful Did Incorrectly) | Participants correctly identify and attempt the sub-problem, but they fail to complete it successfully |
![]() |
Student knows the subproblem to use equilibrium constant formula exists, but writes it as reactants over products instead of reciprocal |
CD (Could Not Do) | Participants get stuck on the sub-problem and could not finish the rest of the sub-problems because of the failure to solve this specific sub-problem |
![]() |
Student got stuck and could not figure out how to complete the calculation |
The successful category contains only one code, S (successful), which indicates the participant's performance in solving the corresponding sub-problem is correct and complete. There are three codes under the neutral category: NR (Not required), DD (Did not know to Do), and DSE (Did Something Else). These codes are assigned to indicate the participant failed to attempt a sub-problem for some reason. The NR suggests that the participant skipped the sub-problem pre-determined in the rubric by adopting a different successful strategy, indicating the unnecessity of that particular step in the solution. Unlike NR, DD is assigned when the particular step is needed in the predicted routine of the adopted solving strategy, but the participant forgot to include this step in their solution. The code DSE indicates that the participant did something unnecessary in the predicted procedure instead of completing the required sub-problem as laid out in the rubric. The unsuccessful category includes UDI (Unsuccessfully Did Incorrectly) and CD (Could Not Do), which are highlighting the cases that the participants made attempts but failed to complete the sub-problem successfully. UDI indicates that the participants made proper identification and attempted to do the sub-problem without a successful end product. CD is assigned when participants got stuck on the sub-problem and could not finish the rest of the sub-problem for a variety of reasons.
Attempt Success Rate (ASR) and Complete Success Rate (CSR) are calculated by using the COSINE codes. As shown in eqn (1), ASR incorporates only the successful and unsuccessful codes to accurately measure the participants’ true problem-solving ability in doing each individual sub-problem, which is determined by the clear evidence shown in the participants’ solutions. Different from the formula of the ASR, CSR (see eqn (2)) additionally incorporates the neutral codes (i.e., DD and DSE) to evaluate the participants’ conceptual understanding, and ability to put the subproblems together.
![]() | (1) |
![]() | (2) |
Changes in CSR scoresa | Experimental | Control | ||||||
---|---|---|---|---|---|---|---|---|
M | SD | Min | Max | M | SD | Min | Max | |
a T0: pre-test, T1: the first post-test, T2: the second post-test, and T3: the third post-test. | ||||||||
ΔT1–T0 | 0.14 | 0.18 | −0.08 | 0.46 | 0.13 | 0.21 | −0.19 | 0.59 |
ΔT2–T0 | 0.30 | 0.21 | 0.00 | 0.73 | 0.25 | 0.21 | −0.05 | 0.65 |
ΔT3–T0 | 0.34 | 0.21 | 0.00 | 0.73 | 0.27 | 0.23 | −0.10 | 0.76 |
Upon examination, it was noted that the changes for both experimental and control groups steadily became larger. The control group values increased from 0.13 to 0.25, and, eventually, to 0.27 after the third intervention. On the other side, the difference in the average delta CSR values for the experimental group started with a value of 0.14, more than doubled after the second intervention, and peaked at a value of 0.34 after the last intervention. Since both groups had a chance to attend the problem-solving sessions and refreshed memories about the topics assessed on the post-test, they naturally benefited from those activities.
Even though the sample size is relatively small, a series of statistical tests were run to determine if the changes in the CSR scores, which were determined by using hundreds of codes assigned to each sub-step in students’ solutions, are significant. First, a series of two-tailed Wilcoxon signed rank tests, the nonparametric alternative of paired-sampled t-test, were run to investigate the changes in the CSR averages of both experimental and control groups after each intervention. The findings were organized and compared in Table 4. The tests indicate that all the changes observed in CSR scores for both groups are statistically significant. Although the values in each case are higher for the experimental group, it is difficult to make a strong argument that the effect of mixed practice is statistically more important than that of categorized practice. This finding reflects the idea that any practice is better than none. Previous studies indicate that an extensive amount of practice is a key to achieving a high level of performance (Ericsson, 2006).
To further examine the overall effect of interventions, the average of CSR scores calculated after each post-test was determined and compared to that of pre-test. This analysis should reveal if all mixed interventions combined have a larger effect on students’ problem-solving performance. The two-tailed Mann–Whitney U test indicated that the difference between groups is not significant based on an alpha value of 0.05, U = 141, z = −0.84, p = 0.399. The effect size for this test was determined to be 0.20, which is treated as a small effect according to Cohen's classification of effect size (Sullivan and Feinn, 2012). The mean rank for experimental group was 17.29 and the mean rank for control group was 14.43. This suggests that the distribution of the changes in CSR scores for experimental (Mdn = 0.25) was not significantly different from the distribution of that of control group (Mdn = 0.20). However, it should be noted that the success gap between the experimental group and control grew wider as the study progressed. The problem-solving performance of students using mixed assignments improved more than the problem-solving performance of students using a traditional practice approach. It can be argued that the further exposure to the mixed practice will yield statistically significant differences between the groups. The future iterations of this study will aim to recruit a larger student population and extend the number of workshops.
Increased differences in the CSR scores could be interpreted as students improve overall problem-solving performances little more every time. Gulacar et al. (2021) correlates CSR scores with the ability to connect sub-problems successfully. While doing individual subproblems, assessed by the formula of ASR, is critical for the success of any problem solution, it is not sufficient. In order to promote overall improvement, it is necessary to enhance students’ performance by figuring out what subproblems need to be done, executing them successfully, and putting them in the correct schema (Gulacar et al., 2021). Previously, it was determined that this overlooked aspect of problem-solving is the biggest factor that contributes to differences between successful and unsuccessful students taking a General Chemistry course (Gulacar et al., 2014). Considering the findings of this study and those of the previous studies, it can be argued that mixed practice not only helps the students do subproblems correctly, but also ameliorates their ability to organize them successfully. As knowledge organization is associated with conceptual understanding (Loh and Subramaniam, 2018), it could also be claimed that students exposed to mixed practice develop a better conceptual understanding of topics.
The increase in CSR scores implies that students in general got better ASR scores as well after the interventions, but it is important to examine them closely and see how the ASR scores and distribution of assigned codes, unsuccessful, neutral, and successful codes, changed after the study is completed. It should be remembered that ASR deals with students’ performance with individual subproblems. Fig. 2 shows differences in average delta ASR scores for each subtopic assessed in pre-test and post-test #3. The total number of subtopics involved in the tests was 10. The topics ranged from equilibrium to acid–base chemistry and from properties of solutions to thermodynamics. As mentioned in the methodology section, all these topics were covered in the second course in a General Chemistry series offered at the institution where the study was conducted.
The analysis of the trends in the changes of delta ASR scores reveals that students in the experimental group showed a better improvement with doing individual subtopics when compared to the scores of the students in the control group. The control group showed better improvement with only three topics out of ten, Molarity Calculation (MC), Calculating pH and pOH (CPH), and Molality Calculation (MOC). It can be argued that although conceptual understanding is important across chemistry curriculum, all these subtopics require more algorithmic knowledge. It is possible that the mixed practice is not as effective as improving students’ calculation and algorithmic problem-solving performance compared to enhancement seen in more conceptual topics. This falls in line with Hardiman et al.'s (1989) and Zoller et al.'s (1995) findings pointing out that students who rely on conceptual information heavily tend to struggle early on with algorithmic problem solving compared to students who simply memorize equations.
In addition to the fact that the control group did better with subtopic CPH, it was also noted as being the subtopic both the control group and experimental group showed the greatest improvement. This improvement could also be related to other materials students utilized in the course. It is also possible that students had a fresher mind about this topic as it is covered through the end of the quarter, which is about the same time when the study was conducted. On the other hand, the subtopic HC (Heat Calculation) showed the lowest increase in delta ASR values. This is a topic covered in the very first week of the quarter. So, it appears that knowledge retention is one of the important factors influencing students’ problem-solving performance (Taylor et al., 2017; Nabulsi et al., 2021). Another interesting trend was observed in the delta ASR scores of the subtopic FLT (First Law of Thermodynamics). Although the students did better with this subtopic, the control group surprisingly did worse after several problem-solving sessions where they solved categorized problems. Although it is impossible to make a strong claim on the inefficacy of categorized practice in improving students’ problem-solving performances based on students’ scores on one topic, it should be noted that this kind of practice somehow does not help much, on the contrary, deteriorate the success of students working with specific types of problems (Carvalho and Goldstone, 2014).
Following analysis on the changes of delta ASR scores, to better understand how each intervention improved students’ ability to solve problems, an in-depth and unusual analysis was performed. In traditional assessment, students’ performances are examined based on the scores they get on individual questions or overall exams (Szu et al., 2011; Gao and Lloyd, 2020). However, very few analyse the subtopic level and almost none investigate beyond that level, not only checking the presence of errors but also identifying their types and differences. This kind of analysis provides great insight into students’ challenges with a clear picture that is achieved in very few studies. Here, the students’ performances will not be evaluated at the level of CSR or ASR scores, but at a level that displays the changes in the code distributions of each subtopic before and after the study. As seen in Fig. 3, each bar is split into three segments, each representing a specific code category: green indicates successful codes (S), yellow is associated with neutral codes (DD and DSE), and red shows unsuccessful codes (UDI and CD).
The analysis of delta CSR and ASR scores highlighted the effectiveness and success of mixed practice in improving students’ problem-solving performances. Therefore, this third analysis was conducted for experimental group students only to better understand where mixed practice makes the difference and how and where exactly it enhances students’ ability to solve problems. It should be noted that the novelty in this analysis is within the investigation of neutral codes. These are the codes neglected or impossible to determine due to different reasons such as limited time and lack of knowledge on the presence of these codes when it comes to traditional assessment. Although it is favorable to see that the students’ number of successful codes go up and that of unsuccessful codes go down after the mixed practice, this finding not unexpected. The major intriguing piece of information determined here is that neutral codes are also responsible for students’ poor problem-solving performances in each subtopic. In some cases, by a higher percentage, they indicate reasons that are more important than those captured by unsuccessful codes. Even this finding alone necessitates a closer look at the meaning of these codes and what they tell us about the students’ challenges in problem solving.
When the changes in the number of unsuccessful, neutral, and successful codes were examined, it was determined that overall, the greatest change happens in the number of neutral codes. It can be argued that mixed practice is more effective at reducing the number DD (Didn’t Know to Do) and DSE (Did Something Else) codes. The decrease in these codes indicates that students get better at figuring out what subtopics or subproblems are needed for successfully executing a problem and avoiding doing any unnecessary subproblems. After the interventions, a number of neutral codes became zero for two subtopics, Boiling Point Differences (DBP) and Molality Calculation (MOC). The remaining subtopics showed a decrease ranging from 90% to 37%. Particularly, it was important to note that students had the greatest decrease, a 90% decrease, for the subtopic LCP (Le Chatelier’ Principle), which followed by the subtopics CPH (Calculating pH and POH) with a 71% decrease and ICE (ICE table) with a 67% decrease. In other words, it can be claimed that mixed practice helped with these three topics significantly comparing the trends in the changes in the neutral codes. Seeing these positive changes after only three interventions provides an important hint that students can learn relatively quickly how to tackle the problems on these challenging topics (Tyson et al., 1999; Kala et al., 2013) if proper and effective methods are used.
![]() | ||
Fig. 4 Code breakdown for experimental students across grade levels (0 = pre-test and 3 = the third post-test). |
![]() | ||
Fig. 5 Code breakdown for experimental students with different ethnic backgrounds (0 = pre-test and 3 = the third post-test). |
The first variable considered for the analysis was students’ achievement level determined based on their overall grade in General Chemistry I. Table 5 represents changing delta CSR values for students belonging to one of three achievement levels.
Even though small sample size gives little power to the statistical tests, a Kruskal Wallis rank sum test was conducted to assess if there were statistically significant differences in average delta values of post-tests and pre-test between the levels of grade. The Kruskal–Wallis test is a non-parametric alternative to the one-way ANOVA and does not share the ANOVA's distributional assumptions (Conover, 2007). The results of the Kruskal–Wallis test were not significant based on an alpha value of 0.05, χ2(2) = 1.60, p = 0.450, indicating that the mean rank of the averages of post-tests and pre-test was similar for each level of grade. It is speculated that this result was obtained mainly because of the small number of students in each grade group. The discussions in the following sections, therefore, were performed based on the descriptive data and the type of the codes students received for their work.
In discussion of Fig. 4, with regards to the student group A, we see the least positive difference in CSR scores across the three post-tests. This affirms the idea that A-level students have the least room to grow as they already have developed efficient problem-solving abilities through previous means and know-how to properly utilize them (Zhang et al., 2014). In accordance with the previous trends, there is not much but some meaningful growth which lends to believe that there is still some positive benefit to the utilization of the mixed practice even for high achieving students.
On the contrary to the small improvement observed in group A, the students in both groups B and C enhanced their performance considerably. The greatest progress in group C indicates that students with low achievement levels may benefit from the mixed practice the most, because, in general, they use less effective study strategies (Chan and Bauer, 2016). Blocked or categorized examples are utilized by all students to some extent, but due to easiness in their use, it is possibly adapted most by low-achieving students (Rohrer et al., 2020). If the goal is to close the gap between successful and unsuccessful students, it is critical to use evidence-based methods and get away from the traditional and less effective methods even they are easy to implement. Instead, the educators and textbook publishers should promote mixed practice widely.
Following the analysis of the differences in the CSR scores, the team delved into the examination of the differences in the codes the students in each group received in the pre-test and the third post-test. When the pre-test results particularly were examined, it was noted that neutral codes take up a higher percentage of the composition of the received codes for the students with B and C grades as compared to the students with grade A. Fig. 4 shows an interesting trend that reveals that the mixed practice helped the students in group A make fewer mistakes captured by unsuccessful codes (UDI and CD), while the same practice helped the students in groups B and C not to forget to use the needed subproblem and do less irrelevant calculations, captured by DD and DSE codes, respectively, which make up the neutral codes.
It can be argued that B and C students need more help with the big picture, rather than spending extensive time on individual subtopics, so they can more successfully put the pieces together without losing their direction in generating successful solutions.
It is also important to investigate the study strategies of the students in group A and understand why they had the lowest neutral codes, which are translated into ineffective, inefficient, and undesired methods used in studying chemistry or any subject. Similarly, further analysis is needed to understand how exactly mixed practice helps them to do less incorrect subproblems.
Additional variable considered for further analysis was ethnic background, which is considered to be an important attribute as several reforms took place in the last few decades targeting to close the gap between the students belong to different groups (O'Sullivan, 1993; Von Secker, 2002). Due to such a small sample size, data analysis for those in different ethnic backgrounds were considered only. Although we had a small subset of students in each group, their data was shared here to give hints to the educators on how the students in each group with a larger sample size could benefit from an extended mixed practice. For example, Table 6 indicates that after the first intervention, Hispanic/Latino students improved their problem-solving performance most with a delta CSR value of 0.22, compared to 0.16 of Asian students and 0.07 of White students.
Interestingly, as they participated in the interventions, it was noted that the benefits of each group became almost the same with ranging values of delta CSR from 0.30 to 0.42. The results of the Kruskal–Wallis test were not significant based on an alpha value of 0.05, χ2(2) = 0.16, p = 0.925, indicating that the mean rank of the averages of post-tests and pre-test was similar for each ethnic group. So, regardless of their ethnic background, students improved their problem-solving performance with mixed practice. This progress is also evident in the Fig. 5 that shows the changes in the code breakdowns for each ethnic group. The unsuccessful codes decrease for all groups after the interventions, but it is interesting to notice that this decrease is the minimum for the Latino/Hispanic students, 20% versus about 50% for both white and Asian students. On the other side, the decrease observed in the number of neutral codes is the maximum for Hispanic students, 74% versus 50% (white students) and 68% (Asian students). It can be argued that the interventions help all students regardless of their ethnic backgrounds improve their problem-solving performance, but the in-depth analysis of the changes in codes reveals that each group benefits from the interventions in different ways. In order to better understand this trend, it is necessary to consider additional variables such as study skills used by students in each group and utilize think-aloud protocols.
This study also demonstrated that even though college students showed different levels of improvement after the interventions, they all benefited from the mixed practice. For example, students who got a C or B in the previous course showed better improvement, while the A-level students had the smallest increase in their scores. It seems that the benefit gained with this method is inversely proportional to the students’ achievement level. As many models and interventions aim to transform C-students to B or A-students and B-students to A-students, this model shows promising results in this respect as well. Further analysis of the codes highlighted the nature of differences between A- and C-students and provided evidence on how the interventions helped them. The students with a grade of A initially received only 7% neutral codes based on their solutions in the pre-test while C-students had 28% neutral codes, which could be translated into confusion and struggles with problem solving. Although the difference between those codes was 21% initially, this gap became significantly lower, 3% to be exact, at the end of the study, suggesting that C-students became more successful in their analysis of given questions and determining the right strategy to solve them.
Even though the small sample size did not provide great statistical power, the mixed practice format appeared to positively influence student's problem-solving strategies, which is evidenced by the decreased number of neutral and unsuccessful codes. These findings may encourage educators to assign more mixed practice assignments and textbook publishers to consider utilizing more of them at the end of each chapter. Mixed problem sets may be considered as an alternative to traditional practice assignments across a given curriculum at any level it is important that these assignments not focus on one concept only but incorporate multiple concepts from different chapters or even courses taught previously. To increase the benefits of the desirable difficulties, the questions within the problem sets should not have details identifying the topics or chapters to which those questions belong.
To better understand the trends observed in the data analysis, explore the correlation between CSR and number of interventions further, and provide stronger statistical evidence for the effectiveness of the method, this study should be repeated with a larger number of students. A more diverse population should be sampled to include additional ethnic groups (e.g., African American students). For the analysis, gender information should be collected and utilized to examine the role of gender, if any exists, in problem-solving improvement. Finally, participants should be exposed to a greater number of mixed-problem interventions to test whether the improvements seen keep growing, reach a threshold and stay constant or drop somehow.
This journal is © The Royal Society of Chemistry 2022 |