Jamie L.
Schneider
*a,
Suzanne M.
Ruder
b and
Christopher F.
Bauer
c
aDepartment of Chemistry and Biotechnology, University of Wisconsin – River Falls, River Falls, WI 54022, USA. E-mail: jamie.schneider@uwrf.edu
bDepartment of Chemistry, Virginia Commonwealth University, Richmond, VA 23284, USA
cDepartment of Chemistry, University of New Hampshire, Durham, NH 03824, USA
First published on 8th January 2018
Feedback is an important aspect of the learning process. The immediate feedback assessment technique (IF-AT®) form allows students to receive feedback on their answers during a testing event. Studies with introductory psychology students supported both perceived and real student learning gains when this form was used with testing. Knowing that negative student perceptions of innovative classroom techniques can create roadblocks, this research focused on gathering student responses to using IF-AT® forms for testing in general chemistry 1 and organic chemistry 2 classes at several institutions. Students’ perceptions on using the IF-AT® forms and how it influenced their thinking were gathered from a 16-item survey. The results of the student surveys are detailed and implementation strategies for using IF-AT® forms for chemistry testing are also outlined in this article.
One form of AUC format that can be used with traditional paper and pencil tests is the Immediate Feedback Assessment Technique (IF-AT®) (Epstein et al., 2001; Epstein et al., 2002). This manufactured multiple-choice answer sheet contains a row of selections where each answer choice is covered with a waxy coating like that of a lottery ticket. Students scratch off the coating of their answer choice: they obtain immediate feedback because the correct answer is marked with a star while incorrect answer spaces are blank. If the correct answer is not chosen on the first attempt, students may continue to scratch until the correct answer is obtained, thus allowing partial credit to be assigned based on the number of spaces that were scratched. An example of how a student might complete an IF-AT® form is shown in Fig. 1. In this example, full credit of 4 points was awarded when obtaining the answer after one scratch, half credit (2 points) when obtaining the correct answer after two scratches, and 1 point was awarded after obtaining the answer after three scratches. Instructors may choose how many points to distribute to incorrect answers. Slepkov et al. (2016) explored the validity of offering partial-credit when AUC formats where used with multiple-choice testing in general chemistry II. With a scoring scheme [1, 0.5, 0.1, 0.0] that awards the highest fractions of credit for the first or second scratches, they found test scores increased by about 6–7% compared to a dichotomously administered MC test. Additionally, they found that the partial credit was awarded in a discriminating way, such that student performance and correction of initial errors were highly correlated. They concluded that this data supports that students using AUC formats are less likely to be just guessing on subsequent answers; however, they did not implement any surveys for student feedback, or interview the students to verify these practices.
In introductory psychology courses, DiBattista found that students had a very positive reaction to using IF-AT® forms for testing (DiBattista et al., 2004). In particular, students thought the forms were easy to use and expressed a strong desire to continue using IF-AT® forms for all multiple-choice tests. DiBattista describes the Immediate Feedback Assessment Technique as a learner-centered multiple-choice response form (DiBattista, 2005). A large majority of these psychology students noted that using the IF-AT® forms made taking the test seem something like a game, while still contributing to their learning. Additionally the level of students’ test anxiety in introductory psychology courses while using the IF-AT® forms was investigated (DiBattista and Gosse, 2006). They found that use of the IF-AT® form did not increase test anxiety in general. In fact, student test anxiety was reduced when they got an answer correct. Students who reported high levels of text anxiety continued to have similar test anxiety levels with IF-AT® testing as in regular testing situations.
Although our paper is focused on student response to IF-AT® forms for testing, other articles have raised instructor concerns of cost, test prep time, and test grading time (DiBattista et al., 2004; DiBattista, 2005; Slepkov et al., 2016). These are additional considerations when choosing this type of assessment response form over machine graded response forms for multiple-choice tests.
Several classroom studies done using the IF-AT® forms for testing in introductory psychology courses showed improved student learning upon repeat testing (Epstein et al., 2002; Dihoff et al., 2003; Dihoff et al., 2004; Brosvic et al., 2005). In a more extensive study involving 611 introductory psychology students, students answered course test items with a bubble form or IF-AT® form (Brosvic and Epstein, 2007). The control group was given their scored answer sheets a day after the test. Performance on novel or repeated items on the final exam as well as performance at 3–12 months after the final exam was then studied. The immediate feedback using IF-AT® forms showed significant gains in repeat test items on the final exam compared to the control group. Closer examination of the response patterns showed that students were more likely to get answers correct on both exams and to change incorrect answers to correct answers upon repeat testing using the IF-AT® form. The patterns on the 3–12 month post-final exam testing revealed similar patterns albeit smaller gain due to attrition.
In addition to conceiving that feedback is useful for correction of errors, a more holistic viewpoint of feedback is summarized by Mulliner and Tucker (2017). They cite numerous research that links the importance of feedback that feeds forward, encourages further motivation for learning, and helps students identify gaps between desired and actual performance (Mulliner and Tucker). Effective feedback involves student engagement and action toward future work.
Results of student surveys from participants in the aforementioned courses at four institutions taught by five different faculty will be discussed. We were particularly interested in determining if students reported negative test taking behaviors like guessing or negative morale issues with obtaining immediate feedback during chemistry testing. We also share some insights gathered from the faculty as they wrote multiple-choice exams for use with IF-AT® forms for chemistry testing.
Institution A | Institution B | Institution C | Institution D | ||
---|---|---|---|---|---|
Institution size/type | Medium sized comprehensive | Medium sized PUI | Small sized comprehensive | Large sized R1 | |
Course | General Chemistry | Organic Chemistry | General Chemistry | Organic Chemistry | Organic Chemistry |
Class size | 110, 110, 79 | 42 | 34 | 40 | 110 |
Instructor | Green | Purple | Red | Yellow | Blue |
Dominant pedagogy | POGIL® | POGIL® | Interactive lecture | Interactive lecture | POGIL® |
Testing type used with IF-AT® forms | Tests, few group quizzes | Tests | Tests | Tests, few group and individual quizzes | Tests, regular group quiz |
Participants | 96, 98, 65 | 32 | 26 | 38 | 100 |
At the beginning of the term under study, all instructors assumed that students had no prior knowledge of the IF-AT® form so class time was used to describe the format in detail. In order to help ease student anxiety over a new testing format, the IF-AT® forms were first used for a group multiple-choice quiz in a peer-supported environment. These were counted for credit, most often for a combination of participation credit and credit for correct responses. Students were told to scratch their answers after answering each question and to rethink the question if they were wrong before selecting their next answer choice. This preparation was completed prior to administration of the first test using the IF-AT® forms.
Some instructors in this study used the IF-AT® forms only for testing after that initial group quiz introduction, while others used the forms for testing, individual quizzing and group quizzing. For each unit test, multiple-choice questions (answered using IF-AT® forms) made up about 20–60% of points with the remainder of the test consisting of short answer problems. Point distributions for correct answers varied between the institutions, but all instructors gave some partial credit for not obtaining the correct answer on the first try. A common point distribution was 4, 2, 1, or 0 points awarded for correct answers given on the first, second, third, or fourth attempt, respectively. In the organic chemistry course at Institution D, the same distribution was used on tests, but no credit was awarded for the third or fourth attempts on quizzes.
The frequency of quizzing (both individual and team-based) with IF-AT® forms varied between the different courses in this study. Most of the institutions irregularly gave quizzes using the IF-AT® often reporting less than five quizzing events. The organic chemistry course at Institution D had the most systematic approach to quizzing. Quizzes were given daily at the beginning of the class following completion of each class activity. Quizzes were either short clicker quizzes (4 points), individual paper quizzes (10 points) or group quizzes (10 points) using the IF-AT® forms. Half of the 10 point quizzes were group quizzes, given to groups or teams of students. In group quizzes, each group received one printout of the quiz and one IF-AT® answer sheet. As is typical in a POGIL® classroom, roles were assigned to all group members. During a group quiz, the student assigned the role of manager was given the quiz and was asked to read the quiz question to the other group members. The student assigned the role of presenter, made sure that each group member had a say in the discussion about the quiz, and finally the student assigned the role of recorder was responsible for recording (or scratching) the answer on the IF-AT® form. Using this technique, groups received full credit for getting the correct answer on the first attempt and half credit for getting the correct answer on the second attempt. The other courses used group quizzing in the beginning of the term to introduce the IF-AT® forms and occasionally during the term. The dominant use of the IF-AT® forms in all the courses in this study was for testing (higher stakes unit exams). All students in this study used the IF-AT® forms for testing, and some used for testing and quizzing. Surveys were based on overall student perceptions of the forms.
# | Question content |
---|---|
1 | The IF-AT® form helped reinforce material that I knew. |
2 | The IF-AT® form helped me learn material I was unsure of. |
3 | With this type of exam I don’t have to study as much to do well. |
4 | The IF-AT® form had a positive effect on my morale because I gained confidence with each correct answer I scratched off. |
5 | The IF-AT® form had a positive effect on my morale because I knew I could get partial credit even if I didn’t get it right the first time. |
6 | Giving partial credit is unfair to students who really know the material. |
7 | The IF-AT® form helped me on the tests because knowing the right answer to some questions helped to guide me to the right answer in other questions. |
8 | The IF-AT® form had a negative effect on my morale because I got more and more anxious with each question, knowing that I’d already gotten some wrong. |
9 | When I got a question wrong on the first scratch, I often gave up and guessed at the correct answer. |
10 | The IF-AT® form had a negative effect on my morale because I lost confidence when I discovered I was wrong. I would rather not have known. |
11 | I would have rather used Scantron forms for the multiple choice portions of the exam. |
12 | I like the fact that I can score the multiple choice portion of the exam right away to see how I did. |
13 | I usually do better on the problem/fill in the blank portions of the exams. |
14 | Before I was willing to mark an answer, I found myself rechecking answers more often with IF-AT® forms compared to scantron forms. |
15 | Getting a question wrong challenged me to rethink about the question and try to logically choose a better answer. |
16 | The IF-AT® form helped me retain correct information. |
All procedures in this study were reviewed and approved by Institutional Review Boards at all of the institutions. Statistical analyses were carried out using a current package of SPSS and Excel for Windows.
Principal components analysis (PCA) with Varimax rotation was used to examine the structure of the 16-item survey data from the general chemistry students. The presence of three components with eigenvalues exceeding 1, explained 40.0%, 7.6%, and 6.9% of the variance respectively. Upon examining a 3-component solution, the rotated matrix had the loading patterns shown in Table 3. Items 3, 6, and 13 did not load well with any of these components. Upon closer examination of the items, item 11 loaded across components and had a unique topic compared with other items. Therefore, item 11 was not included. Item 15 was included in the third component because it had a stronger loading there than in the other components.
Factor | Items loading above 0.4 | Other items with strong loading |
---|---|---|
1 Negative morale | 8, 9, 10, 11 | 15 (negative below −0.4) |
2 Positive consequences | 4, 5, 7, 12 | 11 (negative below −0.4) |
3 Thinking support | 1, 2, 15, 16 | 14 (almost 0.4) |
The final structure is shown in Table 4. Based on the semantic sense of the items in each component, names were generated to represent the sense of the items in each component. “Negative Morale” expresses a sense of negative responses to incorrect answers. “Positive Consequences” describe morale-boosting effects. The last component was called “Thinking Support” because the items suggest an effect on thinking or remembering.
Factor | Items loading above 0.4 | Cronbach alpha value |
---|---|---|
1 Negative morale | 8, 9, 10 | 0.762 |
2 Positive consequences | 4, 5, 7, 12 | 0.802 |
3 Thinking support | 1, 2, 15, 16 | 0.829 |
As shown in Table 4, each factor has good internal consistency, with Cronbach alpha coefficients greater than 0.76 reported for the general chemistry data. The organic chemistry survey data from Institution D showed similar Cronbach alpha values for these three components (note: thinking support factor was missing Q16). The organic chemistry paper/pencil survey from Institution B and C gave Cronbach alpha values that were slightly lower (0.58–0.75), which could be a result of smaller sample size. The factor structure was assumed to be the same based on reasonable Cronbach alpha values; however, exploratory factor analysis was not performed on the organic samples due to the smaller sample sizes. The component scores (averaging the responses for the items in those components) for each group of students are illustrated Fig. 3. In general, students mildly to moderately disagreed with negative morale effects. Students agreed with positive consequences and thinking support.
A one-way between-groups multivariate analysis of variance (MANOVA) was performed to investigate differences between the three students groups (Gen Chem 1, Org Chem 2 (online), and Org Chem 2) across the three survey factors. There was a main effect of student group, F(3,440) = 6.83, p <0.001; Wilks’ Lambda = 0.91; partial eta squared = 0.44. Pairwise comparisons revealed that negative morale is significantly different among the three data sets (p < 0.05). The organic chemistry (paper/pencil PUI) group was significantly higher (p < 0.05) in positive consequences and thinking support when compared to the other two groups which were not statistically different.
We also investigated whether students who reported high negative morale effects (>3.5) would have lower positive consequences and thinking support as compared with students with lower negative morale scores (<2.5). We were particularly interested in investigating whether students who agreed that the IF-AT® forms contributed to negative morale effects would disagree with positive consequences and thinking supports from using IF-AT® forms. Stated differently, we hypothesized that students who reported negative effects from getting an answer incorrect would report less positive effects from correct answers and less positive test taking strategies. This hypothesis is supported by the work of several researchers who connect student failure to feelings of hopelessness that ultimately result in loss of student effort (Crooks, 1988). For this analysis, we pooled the general and organic chemistry students together, then separated the students into high and low negative morale scores. As shown Table 5, students with high negative morale effects had lower positive consequences and thinking support scores as compared with those who reported less negative morale effects. This was further supported with a MANOVA analysis where there was a main effect on the three factor scores for high and low negative morale scores, F(3,342) = 254.36, p < 0.001; Wilks’ Lambda = 0.309; partial eta squared = 0.691. Pairwise comparisons revealed significant differences in the positive consequences and thinking support scores between the high and low negative morale groups. However, the high negative morale group scores expressed neutral rather than negative average response regarding thinking support and perception of positive consequences, which contrasted with our original hypothesis.
Negative morale group | Negative morale | Positive consequences | Thinking support |
---|---|---|---|
High (>3.5) | 4.14 (0.45) | 3.35 (0.83) | 3.08 (0.83) |
Low (<2.5) | 1.88 (0.42) | 4.38 (0.46) | 4.17 (0.54) |
Institution A | Institution C | |||||||
---|---|---|---|---|---|---|---|---|
N | Average | Std dev. | Median | N | Average | Std dev. | Median | |
Q1 | 259 | 3.75 | 0.899 | 4.00 | 26 | 3.65 | 1.294 | 4.00 |
Q2 | 259 | 3.51 | 1.047 | 4.00 | 26 | 3.42 | 1.301 | 3.00 |
Q3 | 259 | 1.82 | 0.743 | 2.00 | 26 | 2.23 | 1.243 | 2.00 |
Q4 | 256 | 3.77 | 1.085 | 4.00 | 26 | 3.58 | 1.362 | 4.00 |
Q5 | 259 | 4.03 | 0.908 | 4.00 | 26 | 3.96 | 1.216 | 4.00 |
Q6 | 259 | 1.89 | 0.782 | 2.00 | 26 | 1.77 | 0.908 | 1.50 |
Q7 | 259 | 3.80 | 0.791 | 4.00 | 25 | 3.64 | 1.221 | 4.00 |
Q8 | 258 | 3.45 | 1.080 | 4.00 | 26 | 3.35 | 1.198 | 4.00 |
Q9 | 259 | 2.30 | 1.008 | 2.00 | 26 | 2.73 | 1.151 | 2.00 |
Q10 | 259 | 2.62 | 1.157 | 2.00 | 26 | 2.54 | 1.240 | 2.00 |
Q11 | 259 | 2.25 | 1.155 | 2.00 | 26 | 2.08 | 1.294 | 1.50 |
Q12 | 259 | 4.15 | 0.886 | 4.00 | 26 | 4.12 | 1.003 | 4.00 |
Q13 | 259 | 3.17 | 1.059 | 3.00 | 26 | 2.54 | 0.989 | 3.00 |
Q14 | 259 | 3.81 | 0.952 | 4.00 | 26 | 3.77 | 1.177 | 4.00 |
Q15 | 259 | 3.93 | 0.772 | 4.00 | 26 | 3.85 | 1.008 | 4.00 |
Q16 | 259 | 3.44 | 0.940 | 3.00 | 26 | 3.62 | 1.203 | 4.00 |
Neg. morale factor | 258 | 2.79 | 0.886 | 2.67 | 26 | 2.87 | 1.046 | 2.67 |
Pos. conseq. factor | 256 | 3.93 | 0.715 | 4.00 | 26 | 3.79 | 1.093 | 4.00 |
Thinking support factor | 259 | 3.65 | 0.737 | 3.75 | 26 | 3.63 | 1.065 | 3.75 |
Institution D | ||||
---|---|---|---|---|
N | Average | Std dev. | Median | |
Q1 | 100 | 3.47 | 1.414 | 4.00 |
Q2 | 100 | 3.45 | 1.192 | 4.00 |
Q3 | ||||
Q4 | 100 | 3.70 | 1.176 | 4.00 |
Q5 | 99 | 3.97 | 0.974 | 4.00 |
Q6 | 100 | 1.90 | 1.078 | 2.00 |
Q7 | 100 | 3.80 | 0.841 | 4.00 |
Q8 | 98 | 3.45 | 1.245 | 4.00 |
Q9 | 99 | 2.69 | 1.192 | 2.00 |
Q10 | 98 | 2.90 | 1.272 | 3.00 |
Q11 | 98 | 2.34 | 1.192 | 2.00 |
Q12 | 99 | 3.95 | 0.941 | 4.00 |
Q13 | 98 | 3.17 | 1.219 | 3.00 |
Q14 | 99 | 3.86 | 0.904 | 4.00 |
Q15 | 99 | 3.93 | 0.824 | 4.00 |
Q16 | ||||
Neg. morale factor | 97 | 3.02 | 1.076 | 3.00 |
Pos. conseq. factor | 98 | 3.87 | 0.775 | 4.00 |
Thinking support factor | 99 | 3.61 | 0.896 | 3.67 |
Institution B | Institution C | |||||||
---|---|---|---|---|---|---|---|---|
N | Average | Std dev. | Median | N | Average | Std dev. | Median | |
Q1 | 32 | 4.22 | 0.832 | 4.00 | 38 | 4.32 | 0.620 | 4.00 |
Q2 | 32 | 4.19 | 0.859 | 4.00 | 38 | 3.97 | 0.636 | 4.00 |
Q3 | 32 | 1.94 | 0.914 | 2.00 | 38 | 1.89 | 0.764 | 2.00 |
Q4 | 32 | 4.28 | 0.772 | 4.00 | 38 | 4.42 | 0.722 | 5.00 |
Q5 | 32 | 4.41 | 0.798 | 5.00 | 38 | 4.47 | 0.830 | 5.00 |
Q6 | 32 | 1.50 | 0.718 | 1.00 | 38 | 1.71 | 0.694 | 2.00 |
Q7 | 32 | 4.41 | 0.756 | 5.00 | 38 | 4.16 | 0.679 | 4.00 |
Q8 | 32 | 2.84 | 1.298 | 3.00 | 38 | 2.61 | 1.079 | 2.50 |
Q9 | 32 | 2.31 | 1.120 | 2.00 | 38 | 1.92 | 0.784 | 2.00 |
Q10 | 32 | 1.88 | 0.942 | 2.00 | 38 | 1.74 | 0.950 | 1.50 |
Q11 | 32 | 1.59 | 0.837 | 1.00 | 38 | 1.58 | 0.889 | 1.00 |
Q12 | 32 | 4.28 | 0.792 | 5.00 | 38 | 4.39 | 0.790 | 5.00 |
Q13 | 31 | 2.71 | 1.006 | 3.00 | 38 | 2.79 | 0.741 | 3.00 |
Q14 | 32 | 4.00 | 1.191 | 4.00 | 38 | 3.71 | 0.984 | 4.00 |
Q15 | 32 | 4.44 | 0.716 | 5.00 | 38 | 4.18 | 0.801 | 4.00 |
Q16 | 32 | 3.97 | 0.933 | 4.00 | 38 | 4.11 | 0.649 | 4.00 |
Neg. morale factor | 32 | 2.34 | 0.940 | 2.50 | 38 | 2.08 | 0.750 | 2.00 |
Pos. conseq. factor | 32 | 4.37 | 0.462 | 4.50 | 38 | 4.36 | 0.559 | 4.50 |
Thinking support factor | 32 | 4.20 | 0.597 | 4.25 | 38 | 4.14 | 0.417 | 4.25 |
Instructor Green found that the IF-AT® form was helpful for addressing and confronting conceptual chemical understanding. Fig. 5 shows an example of a conceptual question used on a general chemistry exam using IF-AT® forms. Out of 225 students, 70% answered the question correctly in one attempt, 24% used two attempts and 6% used three or more attempts. Of the students who took two attempts to answer correctly, 96% chose choice d, which could indicate a conceptual misunderstanding of cation/anion charge labeling, namely the positive and negative assignments were reversed. What is unclear at this time is whether this conflict/resolution created during the exam was enough to create long-term conceptual change for some students. This question will be grounds for future study.
Instructor Blue found the ability to link questions within the multiple-choice portion was particularly useful, especially when asking a student to explain why a previous answer was chosen or for multistep synthesis questions in organic chemistry. Similar types of linked questions in multiple-choice format have been reported: linked questions referred to as integrated testlets in physics (Slepkov, 2013) and two tiered questions where the first question refers to the content and the second question asks for a reason (Treagust, 1986; Treagust, 1988). An example pair of linked questions from organic chemistry testing is depicted in Fig. 6. Students are not penalized twice for getting the first part of the question wrong. Instead they know the answer to the first part of the question (whether they got full credit or not) thus helping them to solve the remaining questions and therefore learn from their initial error.
Although this data is from a different semester than survey data included herein, we did not ask questions on the survey about linked questions, but just provide this as an example used. Of the 240 students who answered the linked questions, 73% of the students got the answer for the first question correct on the first try, 13% got it right on the second try and 14% on the third or fourth attempt. For the second question, 86% of the students got the answer correct on the first try, while 9% got it in the second attempt and 5% in the third and fourth attempts. This seems to suggest that students who did not know the answer was the tertiary alkyl halide (a) in one attempt, were able to use the knowledge of the correct answer and successfully complete the second question, which asks why. Of the students who took more than one attempt to correctly answer the first question, 82% improved or took fewer attempts to answer the second question than the first question. Students appear at the very least to be paying attention to the previous question when answering the following linked question and are likely learning from their mistakes in order to improve their performance on the second question. Using IF-AT® forms allows more effective use of linked questions, as students can answer the second part after discovering the answer to the first part. More research is necessary before strong conclusions can be made about student learning from IF-AT® feedback on linked questions.
From our studies, we would encourage others to consider using IF-AT® testing over traditional Scantron testing. The ability to provide immediate feedback during a testing event parallels the focus of most active learning environments. We also plan to further investigate whether student “perception” of learning is actually linked to enhancement of student performance on future exam questions.
This journal is © The Royal Society of Chemistry 2018 |