Improving students’ summative knowledge of introductory chemistry through the forward testing effect: examining the role of retrieval practice quizzing

Kaylee Todd a, David J. Therriault *b and Alexander Angerhofer *a
aDepartment of Chemistry, University of Florida, P.O. Box 117200, Gainesville, FL 32611, USA. E-mail: kmtodd8485@gmail.com; alex@chem.ufl.edu
bHuman Development and Organizational Studies in Education, University of Florida, P.O. Box 117042, Gainesville, FL 32611, USA. E-mail: therriault@coe.ufl.edu

Received 23rd June 2020 , Accepted 11th October 2020

First published on 13th October 2020


Abstract

Building domain knowledge is essential to a student's success in any course. Chemistry, similar to other STEM disciplines, has a strong cumulative element (i.e., topic areas continuously build upon prior coursework). We employed the testing effect, in the form of post-exam retrieval quizzes, as a way to improve students’ understanding of chemistry over an entire semester. Students (n = 146) enrolled in Introduction to Chemistry were presented with retrieval quizzes released one week after each during-term exam (that covered that exam's content). We measured students’ level of quiz participation, during-term exam scores (a control variable), and cumulative final exam scores to determine the effectiveness of implementing a post-exam retrieval quiz system. Most critically, students completing more than 50% of the retrieval quizzes performed significantly better (i.e., more than a half letter grade) on the cumulative final exam than those who were below 50% participation as determined by one-way between-subjects ANOVA and planned follow-up analyses. We found no significant differences between the participating groups on during-term exam scores, suggesting that high achieving students were not more likely than struggling students to participate in the practice testing (and thus benefit from it).


Introduction

College-level science, technology, engineering, and mathematics (STEM) degrees require students to either complete a series of prerequisite introductory courses or pass placement exams. For the biological and physical sciences, the general chemistry sequence is a primary prerequisite course – providing the foundational content that is needed in upper-level courses – and are often commonly referenced as “weed-out” courses (Koebler, 2012). This term is used colloquially to label courses that drastically reduce the number of students in a course and major, sometimes by as much as 50% of the starting enrolment, which contributes to the competitive and prestigious nature of STEM programs (Mervis, 2010). For many students, their experiences with chemistry result in a change in majors and career plans (Mervis, 2010; Baker, 2016). There are many factors, in addition to the rigor of the content itself, that contribute to this trend, e.g., adjusting to college-level academics and new social networks/environments (Smith and Zhang, 2010; Higgins, 2014).

While social aspects and new environments are important contextual considerations, more students fail to complete the introductory sequence simply because of the difficulty associated with the pace and volume of material covered during the semester (Higgins, 2014; Baker, 2016). To address this issue, some institutions have implemented supplemental chemistry courses (University of Florida, 2020). However, there may be effective in-course alternatives that could reduce student failure rates and withdrawals. For example, employing empirically-based effective study approaches (from psychology) could lead to a more successful course outcome (Dunlosky et al., 2013). In the current study, we examine the effectiveness of using one particular method, described as the testing effect or retrieval-based study, to augment students’ learning of chemistry over an entire semester.

Review of the testing effect: retrieval based study

Learning benefits from the testing effect (i.e., that actively retrieving information by testing oneself leads to significant learning gains) was demonstrated empirically in early psychology and has a storied history (Witasek, 1907; Gates, 1917; Spitzer, 1939; Woodworth and Schlosberg, 1954; Tulving, 1967; Carrier and Pashler, 1992). There has been a resurgence of interest in the testing effect – recently recast as retrieval-based study (Karpicke, 2012; Rowland, 2014; Karpicke 2017).

The bulk of research investigating the testing effect has been conducted in controlled laboratory settings. Typically, in such experiments, participants read an expository passage and then either reread or attempt to freely recall the passage (Roediger III and Karpicke, 2006). A week later all participants complete an exam covering the material. Results consistently show that some form of testing oneself (i.e., free recall) is better than rereading, restudy, or even elaborative study (e.g., concept mapping) (Karpicke and Blunt, 2011). The advantages of the testing effect also tend to be robust; Roediger III and Karpicke (2006) reported a 21% performance increase on exam scores for students who engaged in testing compared to those who simply reread (Roediger III and Karpicke, 2006).

Given the powerful performance increases associated with the testing effect, there was a natural shift in the research landscape toward exploring how retrieval practice could be implemented in the classroom (McDaniel et al., 2007; Agarwal et al., 2012; McDaniel et al., 2013; Grimaldi and Karpicke, 2014; Karpicke et al., 2014; Dobson and Linderholm, 2015). For example, Grimaldi and Karpicke (2014) used a self-paced computer program to adapt free recall prompts for university students attempting to learn human anatomy (Grimaldi and Karpicke, 2014). Students in the program's repeated retrieval group outperformed those in a control group on a final test by 17%. Similar, but slightly less pronounced effects were reported by Dobson and Linderholm (2015) using a write-all-you-can recall manipulation (Dobson and Linderholm, 2015). Research has also demonstrated that the form of the retrieval practice can be effectively varied – from free recall to short answer and/or multiple-choice (Pashler et al., 2007; Smith and Karpicke, 2014; Bae et al., 2019).

The transition from the lab to the classroom has seen mostly positive results, with some important cautionary tales. The cautions are related to quizzing and particularly the use of multiple-choice materials. Wooldridge et al. (2014) explored how quizzing was actually being used in the classroom compared to how it was assessed in research demonstrating an effect of testing (Wooldridge et al., 2014). They reported that laboratory experiments typically used identical questions in their quizzes and final tests; whereas, most instructors used topically related but different questions. In a follow-up experiment to their survey, Wooldridge et al. (2014) manipulated quizzes to either be identical to the final test or topically relevant but using different questions and had students take a final test (Wooldridge et al., 2014). They report the usual testing effect for repeated questions but no effects when the same conceptual information was tested through different questions. Roediger III and Marsh (2005) examined the benefits and pitfalls of using multiple-choice testing upon a final cued-recall knowledge test (Roediger III and Marsh, 2005). They found evidence for a typical testing effect. However, exposure to a larger number of multiple-choice lures increased incorrect answers and the possible creation of false knowledge (Roediger III and Marsh, 2005).

The most recent research employing the testing effect makes use of interim testing and what has been referred to as the forward testing effect or potentiated learning (Pastötter and Bäuml, 2014; Chan et al., 2018; Yang et al., 2018). In this line of research, traditional testing effect studies are characterized as the backward testing effect; that testing of previously studied materials improves retention of those materials compared to rereading or doing nothing. For example, students are given a set of vocabulary words to learn and are subsequently tested on those words. Their memory of the words would be improved for those words, i.e., memory of previously studied materials is improved. In contrast, the forward testing effect refers to the gains in learning and retention of new but conceptually related information, in a bounded domain, which is subsequently delivered over rereading or doing nothing (i.e., learning in a domain is generally improved using retrieval testing). Revisiting the example above, students would be given the same vocabulary words to study. However, their performance would improve on learning new words that were not studied but shared some overlapping features or could be generalized from Latin roots used in the word structure.

It is clear from the research trajectory reviewed above, there is considerably more work needed to document the boundary conditions, under which, retrieval practice is feasible and its level of effectiveness in the classroom. In the domain of Chemistry education, our review revealed a single direct application of the testing effect (Pyburn et al., 2014). In their study, they leveraged multiple-choice and elaborative interrogation questions (e.g., “why” questions) as retrieval-practice stimuli obtaining mixed results. For low comprehenders, answering multiple-choice questions led to a subsequent testing effect. This result was not obtained for the high comprehenders. In addition, the elaborative interrogation questions had a negative effect upon all students’ performance. There is much we need to explore and specify to fully understand the applicability of the testing effect in chemistry.

Study goals and specific research questions

The present study makes use of psychological research on the forward testing effect (i.e., interim practice quizzing [the testing effect applied in multiple-choice testing]) to develop a retrieval practice quizzing system and employ it throughout an entire semester. Research on the forward testing effect provides laboratory evidence for memory gains in list learning, categorical learning, learning complex narratives, videos, and faces/names (Thomas et al., 2010; Bäuml and Kliegl, 2013; Szpunar et al., 2013; Lee and Ahn, 2018; Yang et al., 2019) However, to our knowledge, the forward testing effect has not been assessed in the Chemistry domain, nor has it been naturally employed in a classroom, i.e., outside of laboratory control. Our goal was to document how the forward testing effect may, or may not, benefit chemistry students in a real classroom setting. The research hypothesis tested by the present study is that individuals who participate more in the retrieval practice quizzing will score higher on the cumulative final exam than individuals who elected not to participate. Specifically, we hypothesize that students who intentionally engage in the retrieval practice for more than half of the during-term exams, i.e., the so-called moderate and high participation groups, will perform statistically better on the cumulative final exam than those who do not consistently engage in retrieval practice throughout the course.
Null hypothesis. There will be no difference in participants’ cumulative final exam scores as a function of participation level in retrieval practice quizzing.
Research hypothesis. The individuals who participate more (greater than 50% completion) in retrieval practice quizzing will score higher on the cumulative final exam than individuals who participate less (less than 50% completion) in the quizzing.

To assess our research questions, we employed a quasi-experiment design by measuring the effects of completing post-exam retrieval quizzes on students’ cumulative final exam scores. Study participants were grouped based on the number of completed retrieval quizzes which is our independent variable. This resulted in a low, a moderate, and a high–level participation group. The participants’ score on their cumulative final exam was our dependent variable.

Implementation design and research methods

Course description/participants

Students in our Introduction to Chemistry course were required to meet the minimum threshold on a placement exam for mathematics and be co-enrolled in precalculus or already have course credit. Students in this study were enrolled in a single section of Introduction to Chemistry, which was taught by one instructor for 2 class periods per week to coincide with the 2-credit hour course load. Introduction to Chemistry was offered in a face-to-face format but had an online course page (Canvas) where materials and assignments were managed. Recruitment of participants was initiated within two weeks of the semester's beginning via in-class announcements and announcements in the Canvas course page. Completion of an online “agree to participate” survey satisfied the IRB mandated requirement of a signed informed consent form. In this survey, students were asked two questions: their age at the beginning of the study and their willingness to agree to participate in the study. Participants were required to both agree to participate in the study and be between the ages of 18 and 25.

The during-term exams were administered on paper and scantron format during assembly exam times. University of Florida assembly exams are evening exams from 8:20 pm to 10:20 pm. The scantron key was posted within 24 hours of the exam. Students were allowed to mark their answers on the exam papers, which they were allowed to keep, to compare with the posted key. Students were encouraged to re-work the exam questions they had answered incorrectly, but no additional or make-up credit was offered for doing so. The cumulative final exam was administered on paper and scantron format as well, but the exam took place in the morning from 10:00 am to 12:00 pm and no key was posted for student access.

Quasi-experimental design

We chose a quasi-experimental design because we wanted an authentic assessment of how students would use a quizzing system in a classroom setting, i.e., participants could choose to use the quizzing system at their discretion, at any level of participation, or not at all. In addition to capturing natural student-classroom behavior, we can also garner realistic effect sizes of implementing the quizzing system outside of a laboratory setting. Correct answers to the post-exam retrieval quizzes were counted as bonus credit up to 1% of the overall course grade. Students were given the opportunity to take up to twelve such quizzes throughout the semester. Because the post-exam quizzes were optional, students could select the quizzes they wished to complete; by doing so, students self-selected into one of the three participation levels for later analyses and comparisons.

Materials and design

Previous semesters’ during-term exams (DTEs) and final exams for Introduction to Chemistry were analyzed for content that was likely to be taught in each unit and that would also reappear on the cumulative final exam. These topics were used as a guideline for creating three retrieval quizzes for each of the four DTEs, resulting in a total of twelve retrieval quizzes given over the semester. Each set of three retrieval quizzes was concurrently released one week after the corresponding DTE via Canvas. Each set remained open for 72 hours so the students could complete the quizzes at their leisure. Students were given 60 minutes to complete each of the three retrieval quizzes, totalling a maximum time commitment of three hours per RQ set. The quiz length and time allotted were designed to be less than that of an exam's pace, as exams were between 33 and 45 questions with a 2 hour time limit. Each retrieval quiz covered a single topic that would appear on both a DTE and the cumulative final and contained questions ranging from simple recognition of key terms to conceptual applications using complex calculations. The complete list of retrieval quiz topics is shown in Table 1. For a comparison of DTE, RQ, and final exam questions and variability within a single topic, see the Appendix (ESI).
Table 1 Topic list for the 12 retrieval quizzesa
Retrieval quiz, exam number Retrieval quiz short name Topics
a Categorized list of post-exam retrieval quiz (RQ) topics ordered chronologically by exam number, E1–4, including the short names of each quiz, for Introduction to Chemistry. RQ 1–3 is represented by RQ set 1 in Fig. 1, RQ 4–6 is represented by RQ set 2, RQ 7–9 is represented by RQ set 3, RQ 10–12 is represented by RQ set 4.
RQ 1, E1 Dimensional Dimensional analysis, density, significant figures, scientific notation
RQ 2, E1 Properties Chemical vs. physical properties, states of matter
RQ 3, E1 Isotopes Isotopes, protons, electrons, neutrons
RQ 4, E2 Naming Nomenclature and chemical formulas
RQ 5, E2 Moles Empirical/molecular formulas, molecular mass, moles, molecules
RQ 6, E2 Solutions Solutions and molarity calculations
RQ 7, E3 Trends Periodic table trends, electron configurations and orbital notation
RQ 8, E3 Redox Redox reactions and oxidation numbers
RQ 9, E3 Solubility Solubility and reaction types
RQ 10, E4 Geometry Geometry and formal charges
RQ 11, E4 Acids Acids, bases, buffers, electrolyte strength
RQ 12, E4 Polarity Intermolecular forces, polarity, functional groups


Originally, there were 11 quizzes with 12 questions each that were written before the semester began. However, due to variations in exam dates and material covered in each unit, one RQ was split into two RQs. As labelled in Table 1, RQ 6 – solutions and RQ 9 – solubility are the two quizzes that resulted from the split. RQ 6 – solutions was left with 2 questions, and RQ 9 – solubility was left with the remaining 10 questions. All other retrieval quizzes had 12 questions as originally planned. Due to the retrieval quizzes being open for multiple days, immediate feedback was restricted to questions labelled as correct or incorrect. Additionally, due to the retrieval quizzes contributing to less than 1% in bonus points and there not being an immediate exam to prepare for, the correct answers were not released. However, students were encouraged to consult with their peers, teaching assistants, or email the study designer if they had questions. No direct cautions were implemented to prevent students from cheating other than the time constraints. The pattern of during-term exams and retrieval quizzes was repeated after each of the four DTEs, as depicted in the timeline in Fig. 1.


image file: d0rp00185f-f1.tif
Fig. 1 Introduction to Chemistry – Exam and Retrieval Quiz Timeline. Timeline for the Introduction to Chemistry course showing the cycle of during-term exams (DTEs) and post-exam retrieval quiz sets (RQ), which repeat four times before the cumulative final exam at the end of the semester.

Ethics

Protocol approval from the University of Florida's Institutional Review Board (IRB) was obtained before beginning the study in Introduction to Chemistry (CHM1025), which is a first-year course for undergraduate students. Entering students without a strong background in chemistry are advised to take this course to prepare for the regular General Chemistry 1 course, CHM2045. At any point during the study, students could withdraw their participation by contacting the course instructor. All identifying information, i.e., student's name, email address, and student ID were replaced by a random number created by the instructor who was not part of the research team. The identity of the students remained anonymous to the researchers.

Participant grouping

Students could complete up to twelve retrieval quizzes throughout the semester. The percentage of completed quizzes allowed for the categorization of the participants into either the high, moderate, or low participation groups. Participants were grouped after the semester and study had been completed, meaning that the participation groups were not set while the study was active nor were the participants grouped based on the percentage of quizzes they had completed prior to each subsequent DTE. The cut-offs for the three participation levels were chosen to provide comparison levels that made conceptual sense but also had enough participants in them for statistical analysis. The low participation group completed 50% or fewer of the 12 retrieval quizzes (n = 35). The moderate participation group completed more than 50% but fewer than 75% of the quizzes (n = 31). The high participation group completed 75% or more of the 12 retrieval quizzes (n = 80). Of the total enrolled students in the course (n = 184), 146 agreed to participate (79%).

Results

Data analysis

The data collected for each participant included the following: each of the four DTE scores, the cumulative final exam score, and the post-exam retrieval quiz participation level. After separating students into either high, moderate, or low participation groups, all exam scores were analyzed in the Statistical Package for the Social Sciences (SPSS) to obtain the following values: exam means, standard deviations, and standard error of the means. A one-way between-subjects omnibus ANOVA and two follow-up two-tailed T tests were performed. The one-way between subjects ANOVA was chosen because three participation groups were compared using one independent variable – the number of completed post-exam retrieval quizzes. The two-tailed T tests were used to compare specific groups’ cumulative final exam means based on the research hypothesis. The p values from our analyses were used to determine if the participation level had an impact on cumulative final exam performance. Statistical significance was set to a p-value of 0.05 for all tests.

Participation group control results

To address if there were any sample bias effects of implementing the quizzing system, a one-way between subjects ANOVA was conducted comparing the participation level in the Retrieval Practice Quizzing system across all four DTEs. Exam 1 served as an initial control to ensure that the groups’ exam means were not significantly different before receiving the treatment of the retrieval quizzes, which occur approximately one week after each during-term exam. For the first during-term exam (i.e., sample control condition), we found no significant effect at the p < 0.05 level for the three participation groups (Table 2). Despite receiving some of the treatment before exam 2, there were no significant differences in the exam 2 means across the three participation groups; this trend continued for exams 3 and 4. That is, scores on during-term exams did not differ among the three groups, implying that any difference would not appear on the incremental assessments, which tend to be focused more on a specific unit of material, but rather on a true cumulative assessment. This suggests that students’ selected participation was not tied to their ability or effort; providing evidence that potential differences in the cumulative exam scores are not attributed to sample bias.
Table 2 One-way between-subjects ANOVA results for F and p valuesa
Assessment F value p value
a The three participation levels’ exam means were analyzed to determine if the groups were statistically different on any of the five major assessments. The asterisk indicates the statistically significant p value. All values were calculated in SPSS.
Exam 1 0.956 0.387
Exam 2 0.656 0.520
Exam 3 0.436 0.647
Exam 4 0.809 0.448
Final 3.111 0.048*


Participation group condition results

The results of a one-way between subjects ANOVA comparing the effect of the participation level in retrieval quizzes on the cumulative final exam revealed that the difference for the three participation groups was significant to the p = 0.05 level [F(2,143) = 3.11, p = 0.048] (Table 2). That is, the three participation groups’ cumulative final exam means were statistically different from each other. This pattern of results showed that cumulative final exam scores significantly differed between the three groups, suggesting that the effect of the Retrieval Practice Quiz system is providing a forward testing effect. That is, learning is improved when assessing summative knowledge about chemistry concepts. Table 2 shows the p values of comparing the three participation groups’ performance on each of the four during-term exams and the cumulative final exam. Fig. 2 shows the exam means for each participation group with the standard error as the error bars. Exam means, standard deviations, standard error values, and the number of participants in each group are given in Table 3.
image file: d0rp00185f-f2.tif
Fig. 2 Exam means by participation level. Graphical expression of the exam means for the four during-term exams and the cumulative final exam for each of the participation levels – low in green, moderate in blue, high in yellow. The standard error of the mean is used as the error bars.
Table 3 Descriptive statisticsa for the five major assessments for each participation level
Assessment Groupb Exam mean Standard deviation Standard error
a The exam means, standard deviations, and standard error of the mean for each group. All values were calculated in SPSS. b Grouping variable: number of completed retrieval quizzes. c N = 35. d N = 31. e N = 80. f N = 146, representing all participants, not the entire cohort.
Exam 1 Lowc 79.5 14.3 2.4
Moderated 82.2 11.8 2.1
Highe 77.6 17.6 2.0
Totalf 79.1 15.8 1.3
Exam 2 Lowc 67.1 13.7 2.3
Moderated 71.1 15.6 2.8
Highe 70.0 15.4 1.7
Totalf 69.5 15.0 1.2
Exam 3 Lowc 73.1 15.3 2.6
Moderated 75.2 16.2 2.9
Highe 72.0 17.3 1.9
Totalf 72.9 16.6 1.4
Exam 4 Lowc 71.1 14.4 2.4
Moderated 70.8 16.0 2.9
Highe 67.7 16.0 1.8
Totalf 69.2 15.6 1.3
Cumulative final Lowc 70.6 13.9 2.4
Moderated 78.4 11.2 2.0
Highe 76.6 14.7 1.6
Totalf 75.6 14.0 1.2


Planned comparisons

To evaluate our specific research hypotheses, that the two higher participating groups (high and moderate participation level) would score better on the cumulative final exam than the lower participating group, two planned two-tailed T tests were conducted. These T tests revealed significant differences in the final exam means for the high participation group compared to the low participation group [t(113) = 2.0469, p = 0.042, d = 0.42] and for the moderate participation group compared to the low participation group [t(64) = 2.4757, p = 0.016, d = 0.62] means and descriptive statistics are presented in Table 3. These results suggested a rejection of the null hypothesis, meaning that the extent to which students participate in a post-exam review and retrieval quiz system did influence the cumulative final exam score. However, beyond the 50% participation mark, the extent of participation related to the improvement in the final exam score did not differ, meaning that the difference between the moderate and high participation groups was not significant.

Discussion and limitations

Students who engaged in more than half of the online quizzes saw significant learning benefits on their cumulative final exam. Implementing sets of brief, topic-specific quizzes after an exam, as a means of engaging students in retrieval practice, improved their performance (N.B., for those who completed more than 50% of the retrieval quizzes). It should be noted that this increase was 7–8 points on a 100-point cumulative final exam, which equates to slightly less than a full letter grade improvement on the final exam. These effect sizes would be characterized as small to medium, d = 0.42 and 0.62 (Cohen, 1988; Sawilowsky, 2009).

Our results demonstrate that implementing forward testing is feasible in Chemistry and will work in a natural classroom setting. Nearly 80% of the class opted to be part of the quizzing system, and student participation levels were exceptionally high. While the overall performance increases on the final exam are not as robust as traditionally reported in laboratory studies or forward testing with pictures, it is important to contextualize them within the domain of Chemistry and our quasi-experimental design decisions (Roediger III and Karpicke, 2006; Yang et al., 2018).

Chemistry is unique in the layers of conceptual, algorithmic, computational, comprehension, and logic skills needed to successfully navigate the problem space (BouJaoude et al., 2004). As such, we think that our exploration of a quizzing system is an important first step in demonstrating that forward testing studies in psychology which use narrative materials or picture learning can be successfully extended into the realm of the natural sciences, specifically Chemistry.

Concerning our design, we believe it likely that our effect sizes are under-representative of the potential gains that could be obtained with similar quizzing systems because of constraints with self-grouping and subsequent lack of power. Approximately half of the participants fell into the high participation group. The other half were split evenly between the low and moderate participation groups. These divisions regarding participation levels produced unequal sample sizes between the three groups. It is important to note that if an additional conservative familywise correction method was employed, the follow-up effects would not have been significant. Considering the observed medium effect sizes in the pairwise comparisons conducted, the lack of statistical power may be a plausible cause of the lack of significance. Replication of the study with larger sample sizes is recommended to further substantiate the observed impact. In addition, the number of participants who completed zero retrieval quizzes was only eight students, which was too small to use for statistical analysis.

Another limitation is that we did not have an equivalent activity for students to complete if they chose not to engage in the retrieval practice quizzes. It is possible that students in this business-as-usual condition did not engage in any other study activities. If so, then our results could be attributable to time spent on task.

Finally, it should be noted that the retrieval quizzes were not tied to the students’ performance on the during-term exams. One might expect that identifying areas of weak performance on an exam and focusing the retrieval quizzes on those areas would lead to greater learning gains. This would be an interesting direction in which the current work could be expanded (Grimaldi and Karpicke, 2014).

Conclusions and implications

In our study, we explored a method to quantify the importance of intentionally readdressing exam material shortly after each introductory class's major assessments. Reexamining and reassessing exam materials allows the student to both correct misunderstandings about course material and strengthen the retention of exam material, which are both vital to success in future courses. This was accomplished by engaging students in retrieval practice throughout a semester of a typical introductory chemistry course. The retrieval quizzes offer students an additional, low-stress assessment after each during-term exam and prior to the cumulative final. The practice quizzes are presented in a low-stakes environment, i.e., online and open for multiple days, and only marginally contribute to the students’ grades while still relying on students to engage in retrieval processes. Our results demonstrate that students who engaged in more than half of the online quizzes saw significant learning benefits on their cumulative final exam.

The post-exam review and practice retrieval system may be applicable beyond Introduction to Chemistry. This post-exam retrieval quiz system could be applied to general chemistry and upper-level chemistry courses for which there is a strong cumulative focus on the domain. The system is easily implemented using an online course page and does not require a significant amount of additional effort from instructors. Consequently, retrieval practice may be a more efficient solution to the issue of STEM attrition than providing supplemental courses.

Conflicts of interest

There are no conflicts to declare.

References

  1. Agarwal P. K., Bain P. M. and Chamberlain R. W., (2012), The value of applied research: retrieval practice improves classroom learning and recommendations from a teacher, a principal, and a scientist, Educ. Psychol. Rev., 24(3), 437–448.
  2. Bae C. L., Therriault D. J. and Redifer J. L., (2019), Investigating the testing effect: retrieval as a characteristic of effective study strategies, Learn. Instr., 60, 206–214.
  3. Baker S. C., (2016), The problem with weed out classes, The Observer.
  4. Bäuml K. H. T. and Kliegl O., (2013), The critical role of retrieval processes in release from proactive interference, J. Mem. Lang., 68(1), 39–53.
  5. BouJaoude S., Salloum S. and Abd-El-Khalick F., (2004), Relationships between selective cognitive variables and students' ability to solve chemistry problems, Int. J. Sci. Educ., 26(1), 63–84.
  6. Carrier M. and Pashler H., (1992), The influence of retrieval on retention, Mem. Cogn., 20(6), 633–642.
  7. Chan J. C., Meissner C. A. and Davis S. D., (2018), Retrieval potentiates new learning: a theoretical and meta-analytic review, Psychol. Bull., 114(11), 1111–1146.
  8. Cohen J., (1988), Statistical power analysis for the behavioral sciences, 2nd edn, Hillsdale, NJ: Lawrence Erlbaum Associates.
  9. Dobson J. L. and Linderholm T., (2015), Self-testing promotes superior retention of anatomy and physiology information, Adv. Health Sci. Educ., 20(1), 149–161.
  10. Dunlosky J., Rawson K. A., Marsh E. J., Nathan M. J. and Willingham D. T., (2013), Improving students' learning with effective learning techniques: promising directions from cognitive and educational psychology, Psychol. Sci. Public Interest, 14(1), 4–58.
  11. Gates A. I., (1917), Experiments as the relative efficiency of men and women in memory & reasoning, Psychol. Rev., 24(2), 139.
  12. Grimaldi P. J. and Karpicke J. D., (2014), Guided retrieval practice of educational materials using automated scoring, J. Educ. Psychol., 106(1), 58.
  13. Higgins T., (2014), Are “Weed out” STEM classes real? The Flat News.
  14. Karpicke J. D., (2012), Retrieval-based learning: active retrieval promotes meaningful learning, Curr. Direct. Psychol. Sci., 21(3), 157–163.
  15. Karpicke J. D., (2017), Retrieval-based learning: a decade of progress, in Byrne J. H., (ed.), Learning and Memory: A Comprehensive Reference, 2nd edn, Elsevier Ltd, pp. 487–514.
  16. Karpicke J. D. and Blunt J. R., (2011), Retrieval practice produces more learning than elaborative studying with concept mapping, Science, 331(6018), 772–775.
  17. Karpicke J. D., Blunt J. R., Smith M. A. and Karpicke S. S., (2014), Retrieval-based learning: the need for guided retrieval in elementary school children, J. Appl. Res. Mem. Cogn., 3(3), 198–206.
  18. Koebler J., (2012), Experts: “Weed Out” classes are killing STEM achievement, US News & World Report.
  19. Lee H. S. and Ahn D., (2018), Testing prepares students to learn better: the forward effect of testing in category learning, J. Educ. Psychol., 110(2), 203–217.
  20. McDaniel M. A., Anderson J. L., Derbish M. H. and Morrisette N, (2007), Testing the testing effect in the classroom, Eur. J. Cogn. Psychol., 19(4–5), 494–513.
  21. McDaniel M. A., Thomas R. C., Agarwal P. K., McDermott K. B. and Roediger H. L., (2013), Quizzing in middle-school science: successful transfer performance on classroom exams, Appl. Cogn. Psychol., 27(3), 360–372.
  22. Mervis J., (2010), Better intro courses seen as key to reducing attrition of STEM majors, Science, 330(6002), 306.
  23. Pashler H., Rohrer D., Cepeda N. J. and Carpenter S. K., (2007), Enhancing learning and retarding forgetting: Choices and consequences, Psychon. Bull. Rev., 14(2), 187–193.
  24. Pastötter B. and Bäuml K. H. T., (2014), Retrieval practice enhances new learning: the forward effect of testing, Front. Psychol., 5, 286.
  25. Pyburn D. T., Pazicni S., Benassi V. A. and Tappin E. M., (2014). The testing effect: an intervention on behalf of low-skilled comprehenders in general chemistry, J. Chem. Educ., 91(12), 2045–2057.
  26. Roediger III H. L. and Karpicke J. D., (2006), Test-enhanced learning: taking memory tests improves long-term retention, Psychol. Sci., 17(3), 249–255.
  27. Roediger III H. L. and Marsh E. J., (2005), The positive and negative consequences of multiple-choice testing, J. Exp. Psychol.: Learn., Mem., Cogn., 31(5), 1155.
  28. Rowland C. A., (2014), The effect of testing versus restudy on retention: a meta-analytic review of the testing effect, Psychol. Bull., 140(6), 1432.
  29. Sawilowsky S, (2009), New effect size rules of thumb, J. Mod. Appl. Stat. Methods, 8(2), 467–474.
  30. Smith M. A. and Karpicke J. D., (2014), Retrieval practice with short-answer, multiple-choice, and hybrid tests, Memory, 22(7), 784–802.
  31. Smith W. and Zhang P., (2010), The impact of key factors on the transition from high school to college among first-and second-generation students, J. First-Year Exp. Stud. Transit., 22(2), 49–70.
  32. Spitzer H. F., (1939), Studies in retention, J. Educ. Psychol., 30, 641–656.
  33. Szpunar K. K., Khan N. Y. and Schacter D. L., (2013), Interpolated memory tests reduce mind wandering and improve learning of online lectures, Proc. Natl. Acad. Sci. U. S. A., 110(16), 6313–6317.
  34. Thomas A. K., Bulevich J. B. and Chan J. C., (2010), Testing promotes eyewitness accuracy with a warning: implications for retrieval enhanced suggestibility, J. Mem. Lang., 63(2), 149–157.
  35. Tulving E., (1967), The effects of presentation and recall of material in free-recall learning, J. Verb. Learn. Verb. Behav., 6(2), 175–184.
  36. University of Florida, (2020), Chemistry Course Catalog, retrieved from https://catalog.ufl.edu/UGRD/courses/chemistry/, accessed on 06/01/2020.
  37. Witasek B., (1907), Über Lesen und Rezitieren in ihren Beziehungen zum Gedächtnis, Z. Psychol., 44, 161–185.
  38. Woodworth R. S., Barber B. and Schlosberg H., (1954), Experimental psychology, Oxford and IBH Publishing.
  39. Wooldridge C. L., Bugg J. M., McDaniel M. A. and Liu Y., (2014), The testing effect with authentic educational materials: a cautionary note, J. Appl. Res. Mem. Cogn., 3(3), 214–221.
  40. Yang C., Chew S. J., Sun B. and Shanks D. R., (2019), The forward effects of testing transfer to different domains of learning, J. Educ. Psychol., 111(5), 809.
  41. Yang C., Potts R. and Shanks D. R., (2018), Enhancing learning and retrieval of new information: a review of the forward testing effect, npj Sci. Learn., 3(1), 1–9.

Footnote

Electronic supplementary information (ESI) available. See DOI: 10.1039/d0rp00185f

This journal is © The Royal Society of Chemistry 2021