Jeffrey A.
Webb
a and
Andrew G.
Karatjas
*b
aChemistry Department, Southern Connecticut State University, New Haven, CT 06515, USA
bJohnson and Wales University, Science Department, Providence, RI 02903, USA. E-mail: andrew.karatjas@jwu.edu
First published on 2nd February 2018
Various reasons are attributed to poor student performance in physical science courses such as lack of motivation, lack of ability, and/or the overall difficulty of these courses. One overlooked reason is a lack of self-awareness as to preparation level. Through a study over a two-year period, students at all levels (freshman through M.S.) of a chemistry program were surveyed and asked to self-report predictions of their score on examinations. At all levels, strong evidence of the Kruger–Dunning effect was seen where higher performing students tended to underpredict their examination scores while the lowest performing students tended to grossly overpredict their scores.
Until recently, studies of the Kruger–Dunning effect have predominated in the field of psychology (Ehrlinger and Dunning, 2003). Limited study has also been explored in geography (Grimes, 2002), statistics (Jordan, 2007), biology (Bowers et al., 2005), geology (Wirth and Perkins, 2014), economics (Grimes, 2002), and pharmacy (Austin and Gregory, 2007). In the realm of chemistry, little work was done until very recently (Potgieter et al., 2010; Bell and Volckmann, 2011; Karatjas, 2013; Pazicni and Bauer, 2014; Karatjas and Webb, 2015). Given the limited diversity of study in chemistry before the initiation of our work, we were curious as to the application of Kruger and Dunning's work in chemistry for two reasons: (1) a comprehensive study (with one exception, all prior studies in chemistry were at the introductory level. No previous study looked at students above the 200-level) in any field had never been performed and (2) numerous studies indicate that chemistry is generally thought by students to be one of the most difficult subjects studied (Fitz-Gibbon and Vincent, 1994; Sparkes, 2000; Mallow, 2006; Coe et al., 2008). Given this higher level of assumed difficulty, it might be expected that poorer performing students would be more accurate since expectations in a more difficult subject might be expected to be lower. Therefore, we wondered if these studies within chemistry, especially in the upper level courses would show different results based both on the level of the student as well as the perception of difficulty of the subject (i.e. would predictions generally be lower and therefore more accurate given that students perceive the field as more difficult). Presumably, as students move toward upper-level courses, the weakest students (and those that predict least accurately) are no longer taking the upper-level courses (although it is clear that there are always going to be some weaker students in the upper levels due to the D− being allowed for moving to upper level courses). Therefore, given that the upper-level courses would likely be a stronger subgroup of students as compared to the 100-level students, would this make their predictive ability more accurate? Or, is the only factor the performance in a specific course (i.e. a student that performs well in a 100-level course (freshman level) is likely to predict accurately. However, when that student performs poorly in a 400-level (senior level) course, are they still highly self-aware as they were previously found to be, or would they be redistributed based on the specific course being taken).
The first reported study in chemistry was published in 2011, where Bell and Volckmann used a Bloom-taxonomy-based knowledge survey, to compare student's self-reported understanding of specific topics with their performance on the final exam in a general chemistry course (Bell and Volckmann, 2011). Bell and Volckmann found strong evidence for the Kruger–Dunning effect on the final examination in general chemistry courses. More recently, Pazicni and Bauer reported on a large study of general chemistry students where they found the Kruger–Dunning effect to be a robust phenomenon with their student population. Pazicni and Bauer also looked at the phenomenon over time (exam 1 → exam 2 → exam 3, etc.) leading to some insight into how instructors give feedback. They found that over time, there was no difference in the accuracy of prediction. However, it is important to note that this study used the following procedure: “A sheet of paper was attached to the last page of each of the three course exams.” Given that the placement of the assessment tool would logically lead students to complete the survey after completing the examination, it does seem that their self-assessment is more of a postdiction (prediction done after the assessment) rather than a prediction, making it different from the work presented here (Pazicni and Bauer, 2014). Brandriet and Bretz recently published a study looking at the relationship between student understanding and student confidence on several topics in general chemistry I courses. They looked at specific concepts in the area of redox and found that students that have misconceptions about these topics are unaware of their lack of knowledge. This is another illustration of the Kruger–Dunning effect in general chemistry topics (Brandriet and Bretz, 2014). A recent study by Hawker et al. also explored student ability at exam postdictions. They found that most general chemistry students were not accurate in their examination postdictions (Hawker et al., 2016).
Karatjas published a study in 2013 looking at this effect specifically in an organic chemistry course, marking the first chemistry study that looked at courses other than freshman level (Karatjas, 2013). This study used an examination reflection to assess students’ pre- and post-dictions in organic chemistry. It was found that in organic chemistry courses the highest performing students generally underpredicted their grades, the middle students were the most accurate, and poorer performing students tended to overpredict their performance. In addition, this study showed more accurate postdictions than predictions for many groups, but students scoring <50% on exams still grossly overestimated in their postdictions as well. We previously published a study that looked specifically at gender differences in 100-level chemistry courses (Karatjas and Webb, 2015). This work showed a significant Kruger–Dunning effect in all 100-level courses, but with some important distinctions by gender. Male and female students showed almost no difference in performance while male students at almost all levels had higher predictions (and overconfidence). The primary exception was amongst the poorest performing students where there was very little difference in perception based on gender. More recently, we published a study looking at whether there was any relation between a student's major and their ability to accurately self-predict their examination scores (Karatjas and Webb, 2017). It was found that chemistry majors tended to predict lower scores than biology majors. Overall, students majoring in natural science fields tended to predict lower examination scores than those majoring in fields outside of the natural sciences.
Previously Hacker et al. studied both predictions and postdictions in the psychology classroom, where students were asked to predict on the first page of their exams and postdict on the last page (Hacker et al., 2000). Their work found that the higher performing students were accurate and that their accuracy improved over multiple exams. The low performing students in Hacker's study showed moderate prediction accuracy and good postdiction accuracy. The lowest performers, exhibiting the Kruger–Dunning effect, reported poor pre- and postdictions. The work presented here is a self-reported prediction where students were instructed to fill out the survey prior to attempting the exam. In addition this study is a more complete large scale study of all freshman undergraduate chemistry courses including all of the following courses: 100-level: General Chemistry I, General Chemistry II, Chemistry in Contemporary Issues (populated by non-science majors), Crime Scene Chemistry (populated by non-science majors), and Principles and Application of General, Organic, and Biochemistry (populated by nursing students) which primarily focuses on organic chemistry; 200-level: Organic Chemistry I, Organic Chemistry II, and Quantitative Analysis; 300-level: Physical Chemistry I, Physical Chemistry II, and Environmental Chemistry; 400-level: Biochemistry I, Biochemistry II, Chemical Hazards and Laboratory Safety, Instrumental Methods, and Medicinal Chemistry; and 500-level: Advanced Organic Chemistry, Advanced Physical Chemistry, and Advanced Analytical Chemistry.
With all of these prior studies, we desired to look at the some additional factors in the field of chemistry. First, this work is the only work that we know of that looks at student perception and prediction at multiple courses levels; freshman, sophomore, junior, senior and graduate level courses. We were interested to see if the Kruger–Dunning effect stayed consistent through all levels of chemistry courses, or if the effect lessened as the worst performing students (in the lower level courses) were no longer part of the population in the higher level courses. In addition we wanted to study the context of the Kruger–Dunning effect within a field that is reputed to be difficult (i.e. would chemistry be perceived differently than other fields). Second, with the exception of our previous work, the only studies in chemistry focused exclusively on freshman level courses. As stated above, we were interested in seeing if this effect continues at higher levels or if it is a phenomenon only apparent in introductory-level courses. Finally, we wanted to explore student examination predictions instead of postdictions for the following reason: a student's own perception of how well prepared they are for an examination could play a significant role in their preparation. A student that believes they are well prepared for an examination is likely to be a student that predicts that they will perform well. Logically, one would expect that if a student believes that they are well prepared, they would find no reason to change their study habits (or to continue their examination preparation) until it is too late. While studying examination postdictions would also be an interesting area to look at in this context, we wanted to focus on examination predictions as they should be a direct reflection of student self-perception of their own preparedness, rather than postdictions, which reflect students’ perception of the exam they just took.
Van Etten et al. did a study looking at students’ beliefs about exam preparation where they collected interview data about the four aspects of exam preparation: motivation to study, strategies for exam preparation, effect about exam preparation, and the effect of external factors on study (Van Etten et al., 1997). All these factors should have some effect on students’ predications. From motivation (or a student's drive to learn the material whether it be for obtaining a good grade, impressing a teacher, or getting a job after graduation) to external factors effects on exam preparation (demands of friends, physical environments, nature of the content studied). Two of these are fairly unique to individual students (effect and exam preparation and external factors) and given their nature difficult to account for. For example: a student's mood as they prepare for an exam will affect their perception of their preparedness but mood should be largely irrelevant with a large sample size (i.e. the number of students in a good mood as they prepare should be mirrored by the number of students who have a bad mood during exam preparation). It is important to note that, while a student's prediction would ideally by an amalgam of all these factors, the data examined here (a students’ predicted exam score) is the best reflection we have of students’ preparedness (as well as perceived difficulty). While the factors affecting student exam preparation are more complicated most of the other factors are things tied to individual students (like motivation, and external factors) and are outside the scope of this study since they vary greatly between students.
Recently, Wilson-Conrad and Kowalske looked at self-efficacy beliefs for chemistry students that were taking an introductory chemistry course (Willson-Conrad and Kowalske, 2018). The researchers studied interview data from this group to try and understand students’ experiences through the summative exam process typical of introductory chemistry courses. Their results indicate they were able to categorize the group studied based on their performance on their examinations: students earning higher exams scores reported the mastery levels of self-efficacy while the lower exam performers reported much lower self-efficacy beliefs than the rest of the participants. While this was a much smaller sample size than this study entails, the results reveals that high exam performance (at least at the introductory level of chemistry courses) is tied to mastery experience self-efficacy beliefs and that similarly lower exam performance comes from lower self-efficacy beliefs.
The survey (Fig. 1) consisted of some demographic data, name, major, etc. as well as several additional questions:
Data was collected over a one and one-half year period (Spring 2013, Summer 2013, Fall 2013, and Spring 2014) throughout the entire chemistry program at the university in the study. While the authors are continuing to look at data from the other questions, the focus of this study is the student exam predictions from question #3. Accuracy of student predictions was found by subtracting their actual examination grade from their predicted examination grade.
Statistical analysis performed used a single factor Anova as well as t-tests: paired two sample assuming unequal variances with an alpha value of 0.05. Bonferroni alpha corrections were also performed for comparison.
Group of students | N | Predicted exam grade (mean) (%) | Actual exam grade (mean) (%) | Mean of differences (%) | Percentage of students that overpredicted exam grade |
---|---|---|---|---|---|
100–113% | 82 | 88.27 | 102.88 | −14.60 | 0 |
90–99% | 423 | 85.00 | 93.89 | −8.89 | 9.2 |
>90% | 505 | 85.53 | 95.35 | −9.81 | 7.7 |
80–89% | 500 | 81.48 | 84.36 | −2.87 | 43.4 |
70–79% | 712 | 78.83 | 74.40 | 4.43 | 60.8 |
60–69% | 503 | 76.17 | 64.58 | 11.60 | 89.5 |
50–59% | 345 | 72.31 | 54.76 | 17.55 | 92.2 |
<50% | 502 | 67.81 | 36.57 | 31.24 | 96.0 |
40–49% | 246 | 70.34 | 44.84 | 25.50 | 96.3 |
30–39% | 139 | 67.23 | 34.88 | 32.35 | 95.7 |
20–29% | 76 | 63.61 | 25.45 | 38.16 | 97.4 |
10–19% | 29 | 64.91 | 15.21 | 49.71 | 93.1 |
0–9% | 11 | 55.90 | 5.82 | 50.09 | 100 |
Group of students | N | Predicted exam grade (mean) (%) | Actual exam grade (mean) (%) | Mean of differences (%) | Percentage of students that overpredicted exam grade |
---|---|---|---|---|---|
>90% | 402 | 86.09 | 95.64 | −9.56 | 8.9 |
80–89% | 424 | 82.62 | 84.37 | −1.75 | 42.0 |
70–79% | 431 | 79.60 | 74.33 | 5.27 | 73.3 |
60–69% | 373 | 76.76 | 64.66 | 12.11 | 90.4 |
50–59% | 259 | 72.77 | 54.64 | 18.13 | 93.4 |
<50% | 397 | 69.39 | 36.79 | 32.60 | 97.5 |
Group of students | N | Predicted exam grade (mean) (%) | Actual exam grade (mean) (%) | Mean of differences (%) | Percentage of students that overpredicted exam grade |
---|---|---|---|---|---|
>90% | 47 | 82.76 | 94.04 | −11.28 | 2.1 |
80–89% | 87 | 78.26 | 84.53 | −6.27 | 17.2 |
70–79% | 94 | 76.02 | 74.92 | 1.10 | 59.6 |
60–69% | 90 | 74.99 | 64.55 | 10.45 | 86.7 |
50–59% | 56 | 70.33 | 55.29 | 15.04 | 89.3 |
<50% | 86 | 60.85 | 34.60 | 26.26 | 91.8 |
Standard deviations for all groups seen in Tables 1–6 can be found in Table 7. Not surprisingly, the standard deviations increase as the scores go down. The highest scoring students, those scoring better than 90% have the least amount of room to predict relative to their scores. These highest performing students are generally more accurate than other students and so there is less variance among their examination predictions than other groups. The worst performing students have the highest standard deviations of any group. As these students have consistently shown to have the lowest level of self-awareness, it is not unexpected that there would be more variation among their scores.
Group of students | N | Predicted exam grade (mean) (%) | Actual exam grade (mean) (%) | Mean of differences (%) | Percentage of students that overpredicted exam grade |
---|---|---|---|---|---|
>90% | 17 | 80.18 | 95.64 | −15.46 | 0 |
80–89% | 24 | 80.13 | 83.54 | −3.42 | 16.7 |
70–79% | 16 | 78.19 | 73.70 | 4.49 | 81.3 |
60–69% | 11 | 72.09 | 65.34 | 6.75 | 72.7 |
50–59% | 9 | 73.67 | 54.72 | 18.94 | 88.9 |
<50% | 9 | 63.33 | 41.89 | 21.45 | 100 |
Group of students | N | Predicted exam grade (mean) (%) | Actual exam grade (mean) (%) | Mean of differences (%) | Percentage of students that overpredicted exam grade |
---|---|---|---|---|---|
>90% | 22 | 81.73 | 93.07 | −11.34 | 0 |
80–89% | 50 | 79.20 | 84.34 | −5.14 | 30.0 |
70–79% | 61 | 77.62 | 74.18 | 3.44 | 68.8 |
60–69% | 27 | 73.07 | 63.17 | 9.91 | 85.2 |
50–59% | 18 | 70.00 | 55.00 | 15.00 | 83.3 |
<50% | 9 | 68.33 | 39.22 | 29.11 | 88.9 |
Group of students | N | Predicted exam grade (mean) (%) | Actual exam grade (mean) (%) | Mean of differences | Percentage of students that overpredicted exam grade |
---|---|---|---|---|---|
>90% | 17 | 90.35 | 94.71 | −4.35 | 11.8 |
80–89% | 15 | 77.87 | 84.40 | −6.53 | 13.3 |
70–79% | 10 | 80.10 | 74.80 | 5.30 | 70 |
60–69% | 5 | 79.00 | 64.90 | 14.10 | 100 |
Group of students | All | 100-level | 200-level | 300-level | 400-level | 500-level |
---|---|---|---|---|---|---|
100–113% | 6.1 | |||||
90–99% | 8.6 | |||||
>90% | 8.5 | 8.5 | 8.8 | 6.15 | 8.78 | 5.0 |
80–89% | 9.8 | 9.7 | 8.8 | 5.56 | 12.7 | 7.7 |
70–79% | 9.4 | 9.4 | 9.1 | 10.0 | 9.3 | 8.0 |
60–69% | 11.0 | 11.1 | 9.5 | 14.3 | 13.8 | 4.9 |
50–59% | 12.8 | 12.9 | 12.6 | 11.0 | 14.1 | N/A |
<50% | 16.6 | 15.6 | 20.3 | 9.86 | 17.9 | N/A |
40–49% | 13.2 | |||||
30–39% | 16.3 | |||||
20–29% | 14.6 | |||||
10–19% | 20.3 | |||||
0–9% | 24.2 |
Our data shows consistent evidence of the Kruger–Dunning effect throughout the program. Every course-level in the program exhibited the same trend, where the higher scoring students under predicted (>90%; 80–89%) their exam score. In addition, these groups had very small percentages of students that overpredicted their exam scores. The middle scoring students (70–79%; 60–69%) over predicted by a moderate amount, but had a majority of students that overpredicted their performance (60–90%, depending on the course level and the grouping of student). Amongst the lowest scoring students (50–59%; <50%), more than 90% of these students generally overpredicted their examination scores, with most of these being grossly overpredicted, topping out at an average 30% over-prediction. Clearly, it is more difficult for a higher performing student to overpredict their examination score, but this is not a focus of this study. The students that are highest performing are not the ones that need guidance in the future, so our focus here is on the lower performing students, where an intervention might make a significant difference. This was also seen in the graduate level courses although the sample size was very small (and too small with the lower performing students) to make any meaningful conclusions.
Additionally, it appears from the data collected here that the high level of perceived difficulty of courses in chemistry does not seem to have a significant impact on the accuracy of predictions. This was unexpected as we believed that students often expect chemistry courses to be difficult and would therefore have less confidence in their ability to do well on exams. However, if this was the case, we would not see such a strong Kruger–Dunning effect in chemistry courses at any level, let alone all of them. Therefore, we can say with confidence that the fact that this subject is perceived as being difficult still shows a strong Kruger–Dunning effect based on student self-predictions.
There is always some concern with self-reported data. A study has shown that even when a monetary incentive was provided, accuracy of predictions did not significantly improve (Ehrlinger et al., 2008). Therefore, while it is possible that some students might be predicting a best case (or even dream) scenario, attempts to get more accurate predictions by focusing only on the prediction and not the outcome have not yielded significantly different results. This is also the most significant limitation of the study. The study is completely reliant on honest answers to questions from students. If students, instead of giving an accurate prediction, give as a prediction what they would like to score (or hope to score), then some the data is of limited value.
Statistical analysis of the data groups can be seen in Table 7. Interestingly, while all levels of students show the same general trends for the data (a summary of all of the differences of means can be found in Table 8), the T-test results found for comparisons show some interesting (but not terribly surprising) trends. At the 100-level there is the greatest diversity of students since this includes general chemistry (which has students who continue on to upper level courses) as well as nursing chemistry and non-majors courses, both of those groups do not typically take any additional chemistry courses. Interestingly, these students tend to have some of the highest overpredictions (and smallest underpredictions) of any groups of students looked at. Additionally, the t-tests comparing them to other groups show them to be statistically different against all other data groups. This is not surprising given that the 100-level students have the greatest diversity of population of any group of students, including a large number of non-science majors (CHE 101, 103). This group having the highest overpredictions/smallest underpredictions is not surprising as we previously found that non-science majors generally predicted higher scores than science majors (Karatjas and Webb, 2017). Therefore, the fact that this group was statistically different from the other four populations was not surprising. The same is also true for the 200-level students. The small p values show the significance of the data and uniqueness for these groups of students. Again, while this is a subset of the 100-level students, there is a significant number of 100-level students that do not take any further chemistry courses (non-science majors take CHE 101/103 and only nursing majors take CHE 125), making this a very different sample in terms of student academic backgrounds. In addition there are a number of students in this group that takes these courses (organic chemistry) but do not continue to 300 and 400-level courses. For comparison of 300 through 500 level students, the p values are very high, indicating that these groups are not statistically different. These are courses that are almost exclusively taken by chemistry majors (with the exception of biochemistry which draws a significant number of biology majors) and so their populations are the most similar of any levels. Interestingly, the data shows that the individual breakdown of their data at the 300-level and higher is highly similar. Table 7, also includes the effect sizes shown by the calculations of the Hedges’ G-value for each pair of groups. This shows that at the 100-level and 200-level, there is a moderate difference between those groups and the 300, 400, and 500-level. At the upper levels, 300 and above, the effect size is extremely small. This coincides well with the T-test results as well as the makeup of each sample population. The p values from the T-tests showed that the 100 and 200 level students were statistically different from each other and the 300, 400, and 500-level students. However, the 300, 400, and 500-level students were found to be not statistically different. Therefore, we can say with confidence that the Kruger–Dunning effect does seem to be independent of course level and is highly present in all courses surveyed. In addition, a Bonferroni correction was also looked at for the data. The correction value was calculated to be 0.005. This correction does not significantly alter the significance of the data. The only change with this addition is the 100-level versus 200-level is slightly above this value showing a small statistical similarity between these populations. For all of the remaining groups, the correction does not have a significant effect on the groups compared. Finally, an ANOVA test was run comparing the 5 groups of students (100-level, 200-level, 300-level, 400-level, and 500-level). A p-value of 3.34 × 10−8 was found using an alpha value of 0.05. As a result, a high degree of certainty exists that at least two means are not equal.
Data groups compared | P(T ≤ t) two-tailed | Hedges’ G-value |
---|---|---|
100-level vs. 200-level | 2.54 × 10−2 | 0.1128 |
100-level vs. 300-level | 8.63 × 10−5 | 0.3804 |
100-level vs. 400-level | 7.68 × 10−7 | 0.3385 |
100-level vs. 500-level | 2.59 × 10−5 | 0.4295 |
200-level vs. 300-level | 9.26 × 10−3 | 0.2779 |
200-level vs. 400-level | 4.23 × 10−3 | 0.2358 |
200-level vs. 500-level | 2.69 × 10−3 | 0.3306 |
300-level vs. 400-level | 0.693 | 0.0510 |
300-level vs. 500-level | 0.704 | 0.0630 |
400-level vs. 500-level | 0.407 | 0.1124 |
Group of students | All | 100-level | 200-level | 300-level | 400-level | 500-level |
---|---|---|---|---|---|---|
>90% | −9.81 | −9.56 | −11.28 | −15.46 | −11.34 | −4.35 |
80–89% | −2.87 | −1.75 | −6.27 | −3.42 | −5.14 | −6.53 |
70–79% | 4.43 | 5.27 | 1.10 | 4.49 | 3.44 | 5.30 |
60–69% | 11.60 | 12.11 | 10.45 | 6.75 | 9.91 | 14.10 |
50–59% | 17.55 | 18.13 | 15.04 | 18.94 | 15.00 | |
<50% | 31.24 | 32.60 | 26.26 | 21.45 | 29.11 |
This journal is © The Royal Society of Chemistry 2018 |