Grade perceptions of students in chemistry coursework at all levels

Jeffrey A. Webb; Andrew G. Karatjas

doi:10.1039/C7RP00168A

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/C7RP00168A (Paper) Chem. Educ. Res. Pract., 2018, 19, 491-499

Grade perceptions of students in chemistry coursework at all levels

Jeffrey A. Webb ^a and Andrew G. Karatjas *^b
^aChemistry Department, Southern Connecticut State University, New Haven, CT 06515, USA
^bJohnson and Wales University, Science Department, Providence, RI 02903, USA. E-mail: andrew.karatjas@jwu.edu

Received 1st September 2017 , Accepted 2nd February 2018

First published on 2nd February 2018

Abstract

Various reasons are attributed to poor student performance in physical science courses such as lack of motivation, lack of ability, and/or the overall difficulty of these courses. One overlooked reason is a lack of self-awareness as to preparation level. Through a study over a two-year period, students at all levels (freshman through M.S.) of a chemistry program were surveyed and asked to self-report predictions of their score on examinations. At all levels, strong evidence of the Kruger–Dunning effect was seen where higher performing students tended to underpredict their examination scores while the lowest performing students tended to grossly overpredict their scores.

Introduction

In the late 1990's Kruger and Dunning were at the forefront of studying predictive abilities. They reported what is now commonly referred to as the “Kruger–Dunning effect,” where participants were asked to complete various activities on grammar, logic, and humor and predict the quality of their performance as well as their performance relative to others (Kruger and Dunning, 1999). Specifically, they found that the highest performing participants possessed the metacognitive abilities that allowed them to accurately understand their own abilities, thus reporting more accurate predictions (although, the highest performing individuals often underpredicted their own abilities). The Kruger–Dunning effect also showed that lower performers tended to: “reach erroneous conclusions and make unfortunate choices, (exhibiting an) incompetence (that) robs them of the ability to realize it (Kruger and Dunning, 1999).” Dunning and co-workers continued to explore this phenomenon in an effort to further identify which psychological or social devices are involved. While these studies were mostly in non-academic settings, they did argue for the utility of the work in the realm of education (Dunning et al., 2004).

Until recently, studies of the Kruger–Dunning effect have predominated in the field of psychology (Ehrlinger and Dunning, 2003). Limited study has also been explored in geography (Grimes, 2002), statistics (Jordan, 2007), biology (Bowers et al., 2005), geology (Wirth and Perkins, 2014), economics (Grimes, 2002), and pharmacy (Austin and Gregory, 2007). In the realm of chemistry, little work was done until very recently (Potgieter et al., 2010; Bell and Volckmann, 2011; Karatjas, 2013; Pazicni and Bauer, 2014; Karatjas and Webb, 2015). Given the limited diversity of study in chemistry before the initiation of our work, we were curious as to the application of Kruger and Dunning's work in chemistry for two reasons: (1) a comprehensive study (with one exception, all prior studies in chemistry were at the introductory level. No previous study looked at students above the 200-level) in any field had never been performed and (2) numerous studies indicate that chemistry is generally thought by students to be one of the most difficult subjects studied (Fitz-Gibbon and Vincent, 1994; Sparkes, 2000; Mallow, 2006; Coe et al., 2008). Given this higher level of assumed difficulty, it might be expected that poorer performing students would be more accurate since expectations in a more difficult subject might be expected to be lower. Therefore, we wondered if these studies within chemistry, especially in the upper level courses would show different results based both on the level of the student as well as the perception of difficulty of the subject (i.e. would predictions generally be lower and therefore more accurate given that students perceive the field as more difficult). Presumably, as students move toward upper-level courses, the weakest students (and those that predict least accurately) are no longer taking the upper-level courses (although it is clear that there are always going to be some weaker students in the upper levels due to the D⁻ being allowed for moving to upper level courses). Therefore, given that the upper-level courses would likely be a stronger subgroup of students as compared to the 100-level students, would this make their predictive ability more accurate? Or, is the only factor the performance in a specific course (i.e. a student that performs well in a 100-level course (freshman level) is likely to predict accurately. However, when that student performs poorly in a 400-level (senior level) course, are they still highly self-aware as they were previously found to be, or would they be redistributed based on the specific course being taken).

The first reported study in chemistry was published in 2011, where Bell and Volckmann used a Bloom-taxonomy-based knowledge survey, to compare student's self-reported understanding of specific topics with their performance on the final exam in a general chemistry course (Bell and Volckmann, 2011). Bell and Volckmann found strong evidence for the Kruger–Dunning effect on the final examination in general chemistry courses. More recently, Pazicni and Bauer reported on a large study of general chemistry students where they found the Kruger–Dunning effect to be a robust phenomenon with their student population. Pazicni and Bauer also looked at the phenomenon over time (exam 1 → exam 2 → exam 3, etc.) leading to some insight into how instructors give feedback. They found that over time, there was no difference in the accuracy of prediction. However, it is important to note that this study used the following procedure: “A sheet of paper was attached to the last page of each of the three course exams.” Given that the placement of the assessment tool would logically lead students to complete the survey after completing the examination, it does seem that their self-assessment is more of a postdiction (prediction done after the assessment) rather than a prediction, making it different from the work presented here (Pazicni and Bauer, 2014). Brandriet and Bretz recently published a study looking at the relationship between student understanding and student confidence on several topics in general chemistry I courses. They looked at specific concepts in the area of redox and found that students that have misconceptions about these topics are unaware of their lack of knowledge. This is another illustration of the Kruger–Dunning effect in general chemistry topics (Brandriet and Bretz, 2014). A recent study by Hawker et al. also explored student ability at exam postdictions. They found that most general chemistry students were not accurate in their examination postdictions (Hawker et al., 2016).

Karatjas published a study in 2013 looking at this effect specifically in an organic chemistry course, marking the first chemistry study that looked at courses other than freshman level (Karatjas, 2013). This study used an examination reflection to assess students’ pre- and post-dictions in organic chemistry. It was found that in organic chemistry courses the highest performing students generally underpredicted their grades, the middle students were the most accurate, and poorer performing students tended to overpredict their performance. In addition, this study showed more accurate postdictions than predictions for many groups, but students scoring <50% on exams still grossly overestimated in their postdictions as well. We previously published a study that looked specifically at gender differences in 100-level chemistry courses (Karatjas and Webb, 2015). This work showed a significant Kruger–Dunning effect in all 100-level courses, but with some important distinctions by gender. Male and female students showed almost no difference in performance while male students at almost all levels had higher predictions (and overconfidence). The primary exception was amongst the poorest performing students where there was very little difference in perception based on gender. More recently, we published a study looking at whether there was any relation between a student's major and their ability to accurately self-predict their examination scores (Karatjas and Webb, 2017). It was found that chemistry majors tended to predict lower scores than biology majors. Overall, students majoring in natural science fields tended to predict lower examination scores than those majoring in fields outside of the natural sciences.

Previously Hacker et al. studied both predictions and postdictions in the psychology classroom, where students were asked to predict on the first page of their exams and postdict on the last page (Hacker et al., 2000). Their work found that the higher performing students were accurate and that their accuracy improved over multiple exams. The low performing students in Hacker's study showed moderate prediction accuracy and good postdiction accuracy. The lowest performers, exhibiting the Kruger–Dunning effect, reported poor pre- and postdictions. The work presented here is a self-reported prediction where students were instructed to fill out the survey prior to attempting the exam. In addition this study is a more complete large scale study of all freshman undergraduate chemistry courses including all of the following courses: 100-level: General Chemistry I, General Chemistry II, Chemistry in Contemporary Issues (populated by non-science majors), Crime Scene Chemistry (populated by non-science majors), and Principles and Application of General, Organic, and Biochemistry (populated by nursing students) which primarily focuses on organic chemistry; 200-level: Organic Chemistry I, Organic Chemistry II, and Quantitative Analysis; 300-level: Physical Chemistry I, Physical Chemistry II, and Environmental Chemistry; 400-level: Biochemistry I, Biochemistry II, Chemical Hazards and Laboratory Safety, Instrumental Methods, and Medicinal Chemistry; and 500-level: Advanced Organic Chemistry, Advanced Physical Chemistry, and Advanced Analytical Chemistry.

With all of these prior studies, we desired to look at the some additional factors in the field of chemistry. First, this work is the only work that we know of that looks at student perception and prediction at multiple courses levels; freshman, sophomore, junior, senior and graduate level courses. We were interested to see if the Kruger–Dunning effect stayed consistent through all levels of chemistry courses, or if the effect lessened as the worst performing students (in the lower level courses) were no longer part of the population in the higher level courses. In addition we wanted to study the context of the Kruger–Dunning effect within a field that is reputed to be difficult (i.e. would chemistry be perceived differently than other fields). Second, with the exception of our previous work, the only studies in chemistry focused exclusively on freshman level courses. As stated above, we were interested in seeing if this effect continues at higher levels or if it is a phenomenon only apparent in introductory-level courses. Finally, we wanted to explore student examination predictions instead of postdictions for the following reason: a student's own perception of how well prepared they are for an examination could play a significant role in their preparation. A student that believes they are well prepared for an examination is likely to be a student that predicts that they will perform well. Logically, one would expect that if a student believes that they are well prepared, they would find no reason to change their study habits (or to continue their examination preparation) until it is too late. While studying examination postdictions would also be an interesting area to look at in this context, we wanted to focus on examination predictions as they should be a direct reflection of student self-perception of their own preparedness, rather than postdictions, which reflect students’ perception of the exam they just took.

Van Etten et al. did a study looking at students’ beliefs about exam preparation where they collected interview data about the four aspects of exam preparation: motivation to study, strategies for exam preparation, effect about exam preparation, and the effect of external factors on study (Van Etten et al., 1997). All these factors should have some effect on students’ predications. From motivation (or a student's drive to learn the material whether it be for obtaining a good grade, impressing a teacher, or getting a job after graduation) to external factors effects on exam preparation (demands of friends, physical environments, nature of the content studied). Two of these are fairly unique to individual students (effect and exam preparation and external factors) and given their nature difficult to account for. For example: a student's mood as they prepare for an exam will affect their perception of their preparedness but mood should be largely irrelevant with a large sample size (i.e. the number of students in a good mood as they prepare should be mirrored by the number of students who have a bad mood during exam preparation). It is important to note that, while a student's prediction would ideally by an amalgam of all these factors, the data examined here (a students’ predicted exam score) is the best reflection we have of students’ preparedness (as well as perceived difficulty). While the factors affecting student exam preparation are more complicated most of the other factors are things tied to individual students (like motivation, and external factors) and are outside the scope of this study since they vary greatly between students.

Recently, Wilson-Conrad and Kowalske looked at self-efficacy beliefs for chemistry students that were taking an introductory chemistry course (Willson-Conrad and Kowalske, 2018). The researchers studied interview data from this group to try and understand students’ experiences through the summative exam process typical of introductory chemistry courses. Their results indicate they were able to categorize the group studied based on their performance on their examinations: students earning higher exams scores reported the mastery levels of self-efficacy while the lower exam performers reported much lower self-efficacy beliefs than the rest of the participants. While this was a much smaller sample size than this study entails, the results reveals that high exam performance (at least at the introductory level of chemistry courses) is tied to mastery experience self-efficacy beliefs and that similarly lower exam performance comes from lower self-efficacy beliefs.

Methodology

After receiving IRB approval (SCSU IRB# 13-015), the authors solicited professors teaching lectures in all courses in a state university chemistry department. Students at this university come from a variety of backgrounds with some never having taken a chemistry course prior to the courses involved in this study. Students are not required to have high school chemistry. In order to advance to upper level courses, students are only required to have passing grade (D⁻) in the prerequisite course(s). Students that were willing to participate were asked to sign a consent form before participating in the study. Overall participation in the study ranged from 70–80% of students in courses, but a more exact number is harder to discern due to items such as student withdrawals, etc. Instructors and students were given instructions to fill out a short survey prior to attempting the exam. Professors were asked to staple surveys to the front of each exam in order to help assure that the data collected was an exam prediction and not a postdiction and instructions were given to complete the survey prior to starting the examination. In some cases (but not all), faculty actually collected the survey immediately from students before they started the examination.

The survey (Fig. 1) consisted of some demographic data, name, major, etc. as well as several additional questions:


	Fig. 1 Sample survey.

Data was collected over a one and one-half year period (Spring 2013, Summer 2013, Fall 2013, and Spring 2014) throughout the entire chemistry program at the university in the study. While the authors are continuing to look at data from the other questions, the focus of this study is the student exam predictions from question #3. Accuracy of student predictions was found by subtracting their actual examination grade from their predicted examination grade.

Statistical analysis performed used a single factor Anova as well as t-tests: paired two sample assuming unequal variances with an alpha value of 0.05. Bonferroni alpha corrections were also performed for comparison.

Results

For the purposes of this analysis all examinations were grouped together. Cumulative final examinations were excluded from the data analysis (final examinations were excluded due to the inconsistency in type of final examination offered in various courses. Some courses use the American Chemical Society (ACS) final examinations while others use more conventional self-written final examinations. Due to these inconsistencies, specifically the need for rescaling of examination grades from the ACS finals, we chose to omit this data.). For each course level (100, 200, 300, 400, & 500), the students were grouped by their examination grade. We looked at the means of the exam grades and the students’ self-reported predicted grade and analyzed the difference. Table 1 shows the data for all 3070 examinations (from 1240, different students) that had completed surveys. (On average at least 80% of students completed at least one examination survey in each course. Most courses had higher completion rates, but exact values are difficult to calculate based on items such as when students withdrew from courses, etc.). Please note that in Table 1, the total N appears to be greater than 3070 because the >90% group is broken down into two separate entries and the <50% group is broken down further as well. Not surprisingly, the highest performing students (>90%) underpredicted their exam score by almost a full letter grade. Among these students only 7.7% of students overpredicted their examination grade. Of the 3070 completed surveys, 82 examinations had scores of over 100%. These students underpredicted by an even larger margin and unsurprisingly had zero overpredictions. No students in the entire survey predicted scores of above 100%. This would have been possible as students are generally aware of the existence of extra credit problems on exams before beginning the examinations, but is most likely because of the way that question was asked on the survey. As expected, as scores decreased, the prediction changed accordingly. Students in the 80–89% range were the most accurate, underpredicting by just 3 points with approximately half overpredicting their grade (and approximately half underpredicting their grade). From this point on, all groups of students showed significant overconfidence in their performance both in terms of the margin of overprediction as well as the percentage of students that overpredicted their examination grade. The worst performing students (<20%) overpredicted their scores by 50 points. For students that scored below 50%, more than 90% overpredicted their actual examination grade. Perhaps most alarming was the fact that students that displayed virtually no mastery of course material (<30%) still on average predicted a passing course grade (for most chemistry courses in the department, a passing grade (D⁻) is given at between 50 and 55%).

Table 1 Comparison of student predictions to actual performance for all students

Group of students	N	Predicted exam grade (mean) (%)	Actual exam grade (mean) (%)	Mean of differences (%)	Percentage of students that overpredicted exam grade
100–113%	82	88.27	102.88	−14.60	0
90–99%	423	85.00	93.89	−8.89	9.2
>90%	505	85.53	95.35	−9.81	7.7
80–89%	500	81.48	84.36	−2.87	43.4
70–79%	712	78.83	74.40	4.43	60.8
60–69%	503	76.17	64.58	11.60	89.5
50–59%	345	72.31	54.76	17.55	92.2
<50%	502	67.81	36.57	31.24	96.0
40–49%	246	70.34	44.84	25.50	96.3
30–39%	139	67.23	34.88	32.35	95.7
20–29%	76	63.61	25.45	38.16	97.4
10–19%	29	64.91	15.21	49.71	93.1
0–9%	11	55.90	5.82	50.09	100

100-level courses

At the 100 level, the following courses were involved in this study: CHE 101 (Chemistry in Contemporary Issues), CHE 103 (Crime Scene Chemistry), CHE 120 (General Chemistry I), and CHE 121 (General Chemistry II), CHE 125 (Principles and Applications of General, Organic and Biochemistry). Table 1 shows all the 100-level courses over the period of the study (N = 2286), the students who earned a grade higher than 90% on the exam underpredicted by approximately 9% (N = 402). The “B” students (grade of 80–89%) slightly underpredicted their grade (around 2% (N = 424)). The next grouping of students (those who earned between 70–79%) was the first group who overpredicted their grade (∼5%). Moving down to the lower performing groups, as described in the Kruger–Dunning effect, an increase in the overprediction is observed with a maximum at students scoring <50% (N = 397) overpredicting by approximately three letter grades (just over 32%).

200-level courses

The following courses were studied at the 200-level: CHE 240 (Quantitative Analysis), CHE 260 (Organic Chemistry I), and CHE 261 (Organic Chemistry II). Table 2 looks at the 200-level courses over the two-year period of the study (N = 460). The “A” students, who earned a grade higher than 90% on the exam, underpredicted by around 11% (N = 47). The “B” students (grade of 80–89%) also underpredicted their grade (around 6% (N = 87)). The next grouping of students who earned between 70–79% was the first group who over predicted (∼1%) but was also the group that had the most accurate predictions. Moving to the lower performing groupings shows as described in the Kruger–Dunning effect, an increase in the overprediction which tops out with the students scoring <50% (N = 86) overpredicting by around 26%. As a whole, the 200-level students generally predicted lower grades at every level, although the same pattern of prediction accuracy was observed.

Table 2 Comparison of student predictions to actual performance for all 100-level courses

Group of students	N	Predicted exam grade (mean) (%)	Actual exam grade (mean) (%)	Mean of differences (%)	Percentage of students that overpredicted exam grade
>90%	402	86.09	95.64	−9.56	8.9
80–89%	424	82.62	84.37	−1.75	42.0
70–79%	431	79.60	74.33	5.27	73.3
60–69%	373	76.76	64.66	12.11	90.4
50–59%	259	72.77	54.64	18.13	93.4
<50%	397	69.39	36.79	32.60	97.5

300-level courses

At the 300-level, the following courses were involved in this study: CHE 340 (Environmental Chemistry), CHE 370 (Physical Chemistry I), and CHE 371 (Physical Chemistry II). Table 3 looks at all of the 300-level courses over the two-year period of the study (N = 86). Here, the students who earned a grade higher than 90% on the exam underpredicted by around 15% (N = 17). The “B” students (grade of 80–89%) slightly underpredicted their grade (around 3% (N = 24)). The next group of students, those who earned between 70–79%, was the first group who overpredicted their exam grade (∼4%). Again moving to the lower performing groupings shows an increase in the overprediction which tops out with the students scoring <50% (N = 9) overpredicting by around 21%. However, the predicted exam grades in these groups are generally much lower. This may simply be due to the smaller sample size because of the small number of 300-level courses available to students. As well as the fact that these courses are almost exclusively taken by chemistry majors. We previously found that chemistry majors tend to predict lower examination scores than students in many other majors (Karatjas and Webb, 2017).

Table 3 Comparison of student predictions to actual performance for all 200-level courses

Group of students	N	Predicted exam grade (mean) (%)	Actual exam grade (mean) (%)	Mean of differences (%)	Percentage of students that overpredicted exam grade
>90%	47	82.76	94.04	−11.28	2.1
80–89%	87	78.26	84.53	−6.27	17.2
70–79%	94	76.02	74.92	1.10	59.6
60–69%	90	74.99	64.55	10.45	86.7
50–59%	56	70.33	55.29	15.04	89.3
<50%	86	60.85	34.60	26.26	91.8

400-level courses

At the 400-level, the following courses were involved in this study: CHE 440 (Instrumental Methods), CHE 445 (Chemical Hazards and Laboratory Safety), CHE 450 (Biochemistry I), CHE 451 (Biochemistry II), and CHE 456 (Medicinal Chemistry). Table 5 looks at all the 400-level courses over the two-year period of the study (N = 187). The students who earned a grade higher then >90% on the exam underpredicted by around 11% (N = 22). The “B” students (grade of 80–89%) slightly underpredicted their grade (around 5% (N = 50)). The next group of students, those who earned between 70–79%, was the first group who overpredicted their grade (∼3%). Moving to the lower performing groupings again shows an increase in the overprediction, topping out with students scoring <50% (N = 9) overpredicting by 29% on average.

500-level courses

Courses at the M.S. (500-level) were: CHE 500 (Advanced Organic Chemistry), CHE 520 (Advanced Physical Chemistry), and CHE 540 (Advanced Analytical Chemistry). Table 5 looks at all the 500-level courses, the smallest sample size over the period of the study (N = 51). The students who earned a grade higher then >90% on the exam underpredicted by around 4% (N = 17). The “B” students (grade of 80–89%) also underpredicted their grade (around 6% (N = 15)), which is the first case in this study where the “B” students had a larger underprediction than the “A” grouping of students. The next grouping, those who earned between 70–79%, was the first group who overpredicted (∼5%). In the lower performing groupings, again, as described in the Kruger–Dunning effect, there is an increase in the overprediction, but at the lowest performance levels (50–59% and <50%), the sample size is too small to reach a meaningful conclusion.

Standard deviations for all groups seen in Tables 1–6 can be found in Table 7. Not surprisingly, the standard deviations increase as the scores go down. The highest scoring students, those scoring better than 90% have the least amount of room to predict relative to their scores. These highest performing students are generally more accurate than other students and so there is less variance among their examination predictions than other groups. The worst performing students have the highest standard deviations of any group. As these students have consistently shown to have the lowest level of self-awareness, it is not unexpected that there would be more variation among their scores.

Table 4 Comparison of student predictions to actual performance for all 300-level courses

Group of students	N	Predicted exam grade (mean) (%)	Actual exam grade (mean) (%)	Mean of differences (%)	Percentage of students that overpredicted exam grade
>90%	17	80.18	95.64	−15.46	0
80–89%	24	80.13	83.54	−3.42	16.7
70–79%	16	78.19	73.70	4.49	81.3
60–69%	11	72.09	65.34	6.75	72.7
50–59%	9	73.67	54.72	18.94	88.9
<50%	9	63.33	41.89	21.45	100

Table 5 Comparison of student predictions to actual performance for all 400-level courses

Group of students	N	Predicted exam grade (mean) (%)	Actual exam grade (mean) (%)	Mean of differences (%)	Percentage of students that overpredicted exam grade
>90%	22	81.73	93.07	−11.34	0
80–89%	50	79.20	84.34	−5.14	30.0
70–79%	61	77.62	74.18	3.44	68.8
60–69%	27	73.07	63.17	9.91	85.2
50–59%	18	70.00	55.00	15.00	83.3
<50%	9	68.33	39.22	29.11	88.9

Table 6 Comparison of student predictions to actual performance for all 500-level courses

Group of students	N	Predicted exam grade (mean) (%)	Actual exam grade (mean) (%)	Mean of differences	Percentage of students that overpredicted exam grade
>90%	17	90.35	94.71	−4.35	11.8
80–89%	15	77.87	84.40	−6.53	13.3
70–79%	10	80.10	74.80	5.30	70
60–69%	5	79.00	64.90	14.10	100

Table 7 Standard deviations for the mean of differences in Tables 1–6

Group of students	All	100-level	200-level	300-level	400-level	500-level
100–113%	6.1
90–99%	8.6
>90%	8.5	8.5	8.8	6.15	8.78	5.0
80–89%	9.8	9.7	8.8	5.56	12.7	7.7
70–79%	9.4	9.4	9.1	10.0	9.3	8.0
60–69%	11.0	11.1	9.5	14.3	13.8	4.9
50–59%	12.8	12.9	12.6	11.0	14.1	N/A
<50%	16.6	15.6	20.3	9.86	17.9	N/A
40–49%	13.2
30–39%	16.3
20–29%	14.6
10–19%	20.3
0–9%	24.2

Discussion

In a 100-level course, considering the nature of freshman chemistry and the reputation for difficulty of the subject matter, it is natural to expect some Kruger–Dunning effect because a large portion of this student population is unaware of their knowledge because of their general lack of knowledge about the subject matter at the introductory level. However, it also might be expected that 100-level students might be the most accurate since 70% of college students are reported to have taken a high school chemistry course before attending college (National Center for Education Statistics, 2014). While there is a clear difference in difficulty level and student responsibilities while they make the transition from a high school course to a college course, at the 100-level, these students are likely to be familiar with many of the concepts studied which could have an influence on their preparation and confidence levels. In a 200-level course, one might expect that there would still be a milder Kruger–Dunning effect because most students are still relatively naïve to the field of chemistry and thus possibly unaware of the knowledge required to make an accurate prediction. In addition, students in a 200-level course are presumably the higher performing group in 100-level courses. Therefore, the worst performing students (and those with the least accurate predictions) would not be taking the 200-level courses. In addition, the 200-level courses listed here are ones that students would be highly unlikely to have taken previously (with the exception of students repeating the course due to not successfully completing the course the first time). Based on the newness of these higher level courses, and the filtering of the poorest-performing students, it was speculated that the effect might be diminished. However, while the overpredictions are slightly lower, the overall Kruger–Dunning effect is still strong at the 200-level. Once students begin to take 300-level (and 400-level) courses, it is natural to think students would have acquired the knowledge necessary to combat the perception gap and lessen the Kruger–Dunning effect. Additionally, there should be an even stronger subgroup of students than the 100- and 200-level students (as these students have successfully made it through General Chemistry I & II, Organic Chemistry I & II, and possibly Quantitative Analysis). Based on this, when we began thinking about the students in a quantitative subject such as general chemistry, we thought that if 100-level chemistry courses exhibited the Kruger–Dunning effect we might see the effect change as the student population changes in the higher level courses.

Our data shows consistent evidence of the Kruger–Dunning effect throughout the program. Every course-level in the program exhibited the same trend, where the higher scoring students under predicted (>90%; 80–89%) their exam score. In addition, these groups had very small percentages of students that overpredicted their exam scores. The middle scoring students (70–79%; 60–69%) over predicted by a moderate amount, but had a majority of students that overpredicted their performance (60–90%, depending on the course level and the grouping of student). Amongst the lowest scoring students (50–59%; <50%), more than 90% of these students generally overpredicted their examination scores, with most of these being grossly overpredicted, topping out at an average 30% over-prediction. Clearly, it is more difficult for a higher performing student to overpredict their examination score, but this is not a focus of this study. The students that are highest performing are not the ones that need guidance in the future, so our focus here is on the lower performing students, where an intervention might make a significant difference. This was also seen in the graduate level courses although the sample size was very small (and too small with the lower performing students) to make any meaningful conclusions.

Additionally, it appears from the data collected here that the high level of perceived difficulty of courses in chemistry does not seem to have a significant impact on the accuracy of predictions. This was unexpected as we believed that students often expect chemistry courses to be difficult and would therefore have less confidence in their ability to do well on exams. However, if this was the case, we would not see such a strong Kruger–Dunning effect in chemistry courses at any level, let alone all of them. Therefore, we can say with confidence that the fact that this subject is perceived as being difficult still shows a strong Kruger–Dunning effect based on student self-predictions.

There is always some concern with self-reported data. A study has shown that even when a monetary incentive was provided, accuracy of predictions did not significantly improve (Ehrlinger et al., 2008). Therefore, while it is possible that some students might be predicting a best case (or even dream) scenario, attempts to get more accurate predictions by focusing only on the prediction and not the outcome have not yielded significantly different results. This is also the most significant limitation of the study. The study is completely reliant on honest answers to questions from students. If students, instead of giving an accurate prediction, give as a prediction what they would like to score (or hope to score), then some the data is of limited value.

Statistical analysis of the data groups can be seen in Table 7. Interestingly, while all levels of students show the same general trends for the data (a summary of all of the differences of means can be found in Table 8), the T-test results found for comparisons show some interesting (but not terribly surprising) trends. At the 100-level there is the greatest diversity of students since this includes general chemistry (which has students who continue on to upper level courses) as well as nursing chemistry and non-majors courses, both of those groups do not typically take any additional chemistry courses. Interestingly, these students tend to have some of the highest overpredictions (and smallest underpredictions) of any groups of students looked at. Additionally, the t-tests comparing them to other groups show them to be statistically different against all other data groups. This is not surprising given that the 100-level students have the greatest diversity of population of any group of students, including a large number of non-science majors (CHE 101, 103). This group having the highest overpredictions/smallest underpredictions is not surprising as we previously found that non-science majors generally predicted higher scores than science majors (Karatjas and Webb, 2017). Therefore, the fact that this group was statistically different from the other four populations was not surprising. The same is also true for the 200-level students. The small p values show the significance of the data and uniqueness for these groups of students. Again, while this is a subset of the 100-level students, there is a significant number of 100-level students that do not take any further chemistry courses (non-science majors take CHE 101/103 and only nursing majors take CHE 125), making this a very different sample in terms of student academic backgrounds. In addition there are a number of students in this group that takes these courses (organic chemistry) but do not continue to 300 and 400-level courses. For comparison of 300 through 500 level students, the p values are very high, indicating that these groups are not statistically different. These are courses that are almost exclusively taken by chemistry majors (with the exception of biochemistry which draws a significant number of biology majors) and so their populations are the most similar of any levels. Interestingly, the data shows that the individual breakdown of their data at the 300-level and higher is highly similar. Table 7, also includes the effect sizes shown by the calculations of the Hedges’ G-value for each pair of groups. This shows that at the 100-level and 200-level, there is a moderate difference between those groups and the 300, 400, and 500-level. At the upper levels, 300 and above, the effect size is extremely small. This coincides well with the T-test results as well as the makeup of each sample population. The p values from the T-tests showed that the 100 and 200 level students were statistically different from each other and the 300, 400, and 500-level students. However, the 300, 400, and 500-level students were found to be not statistically different. Therefore, we can say with confidence that the Kruger–Dunning effect does seem to be independent of course level and is highly present in all courses surveyed. In addition, a Bonferroni correction was also looked at for the data. The correction value was calculated to be 0.005. This correction does not significantly alter the significance of the data. The only change with this addition is the 100-level versus 200-level is slightly above this value showing a small statistical similarity between these populations. For all of the remaining groups, the correction does not have a significant effect on the groups compared. Finally, an ANOVA test was run comparing the 5 groups of students (100-level, 200-level, 300-level, 400-level, and 500-level). A p-value of 3.34 × 10⁻⁸ was found using an alpha value of 0.05. As a result, a high degree of certainty exists that at least two means are not equal.

Table 8 Results of T-tests for two samples containing unequal variances. Comparison of the difference of means for different level courses

Data groups compared	P(T ≤ t) two-tailed	Hedges’ G-value
100-level vs. 200-level	2.54 × 10⁻²	0.1128
100-level vs. 300-level	8.63 × 10⁻⁵	0.3804
100-level vs. 400-level	7.68 × 10⁻⁷	0.3385
100-level vs. 500-level	2.59 × 10⁻⁵	0.4295
200-level vs. 300-level	9.26 × 10⁻³	0.2779
200-level vs. 400-level	4.23 × 10⁻³	0.2358
200-level vs. 500-level	2.69 × 10⁻³	0.3306
300-level vs. 400-level	0.693	0.0510
300-level vs. 500-level	0.704	0.0630
400-level vs. 500-level	0.407	0.1124

Conclusions

We have completed the first study of its kind in chemistry, as well as the first large-scale study of grade perception throughout an entire program in any field. The results (summarized in Table 9) give evidence that not only is the Kruger–Dunning effect present in 1st and 2nd year college courses; it is present throughout all courses in an entire program as well as at the M. S. level. This points to a greater issue in college classrooms, that, on all levels, even in a discipline such as chemistry, students are still revealing a significant perception gap (especially among lower scoring students (50–59%; <50%)). Work in this area will continue to explore methods to correct these false perceptions that students have before taking examinations. We would also like to explore more quantitative work such as this in other fields to see if the same level of Kruger–Dunning effect is seen in all fields or if the degree of the effect is field dependent. There are clear implications of the Kruger–Dunning effect for the learning of chemistry. If students are unaware of their poor preparation for assessment activities in any course, this would have a profound effect on their performance. For students that are putting in the effort to be successful, they may have difficulty in improving their ability to succeed if they are not able to accurately self-assess their progress in the course as well as improve their self-efficacy. These false perceptions/low self-efficacies may be one reason that poorer performing students do not take the additional steps needed to improve the performance and study habits – they are unaware that anything is wrong until it is too late. In the future we would like to explore metacognitive study methods, like those described by Cook, Kennedy, and McGuire (Cook et al., 2013), for helping poorer performing students to overcome their poor predictions. Once they are able to more accurately self-assess their own abilities, then we can help them to take the steps to improve both study habits and their understanding of the appropriate material which ideally would improve their performance on examinations.

Table 9 Summary of differences of means for all levels of students

Group of students	All	100-level	200-level	300-level	400-level	500-level
>90%	−9.81	−9.56	−11.28	−15.46	−11.34	−4.35
80–89%	−2.87	−1.75	−6.27	−3.42	−5.14	−6.53
70–79%	4.43	5.27	1.10	4.49	3.44	5.30
60–69%	11.60	12.11	10.45	6.75	9.91	14.10
50–59%	17.55	18.13	15.04	18.94	15.00
<50%	31.24	32.60	26.26	21.45	29.11

Conflicts of interest

There are no conflicts to declare.

Acknowledgements

The authors would like to acknowledge: Adiel Coca, Gregory Kowalczyk, Ericka Barnes, James Kearns, Edward Krainer, Camille Solbrig, and JiongDong Pang for allowing this work to happen in their classrooms. The authors wish to thank Nicholas Karatjas for helpful discussions on statistical analysis. The authors would also like to thank the SCSU students for their participation in this project.

Notes and references

Austin Z. and Gregory P. A. M., (2007), Evaluating the accuracy of pharmacy students’ self-assessment skills, Am. J. Pharm. Educ., 71, 89.
Bell P. and Volckmann D., (2011), Knowledge surveys in general chemistry: confidence, overconfidence, and performance, J. Chem. Educ., 88, 1469–1476.
Bowers N., Brandon M. and Hill C. D., (2005), The use of a knowledge survey as an indicator of student learning in an introductory biology class, Cell Biol. Educ., 4, 311–322.
Brandriet A. R. and Bretz S. L., (2014), Measuring meta-ignorance through the lens of confidence: examining students’ redox misconceptions about oxidation numbers, charge, and electron transfer, Chem. Educ. Res. Pract., 15, 729–746.
Coe R., Searle J., Barmby P., Jones K. and Higgins S., (2008), Report for SCORE: Relative difficulty of examinations in different subjects, retrieved from http://www.cem.org/attachments/SCORE2008report.pdf.
Cook E., Kennedy E. and McGuire S. Y., (2013), Effect of Teaching Metacognitive Learning Strategies on Performance in General Chemistry Courses, J. Chem. Educ., 90(8), 961–967.
Dunning D., Heath C. and Suls J. M., (2004), Flawed self-assessment: implications for health, education and the workplace, Psychol. Sci. Publ. Int., 5, 69–106.
Ehrlinger J. and Dunning D., (2003), How chronic self-views influence (and potentially mislead) estimates of performance, J. Pers. Soc. Psychol., 84, 5–17.
Ehrlinger J., Johnson K., Banner M., Dunning D. and Kruger J., (2008), Why the unskilled are unaware: further explorations of (absent) self-insight among the incompetent, Organ. Behav. Hum. Dec. Proc., 105, 98–121.
Fitz-Gibbon C. T. and Vincent L., (1994), Candidates’ Performance in Public Examinations in Mathematics and Science, London: SCAA.
Grimes P., (2002), The overconfident principles of economics student: an examination of a metacognitive tool, J. Econ. Educ., 33, 15–30.
Hacker D. J., Bol L., Horgan D. D. and Rakow E. A., (2000), Test prediction and performance in a classroom context, J. Educ. Psychol., 92, 160–170.
Hawker M. J., Dysleski L. and Rickey D., (2016), Investigating General Chemistry Students’ Metacognitive Monitoring of Their Exam Performance by Measuring Postdiction Accuracies over Time, J. Chem. Educ., 93, 832–840.
Jordan J., (2007), The application of statistics education research in my classroom, J. Stat. Educ., 15, retrieved from https://ww2.amstat.org/publications/jse/v15n2/jordan.pdf.
Karatjas A. G., (2013), Comparing college students’ self-assessment of knowledge in organic chemistry to their actual performance, J. Chem. Educ., 90, 1096–1099.
Karatjas A. G. and Webb J. A., (2015), The role of gender in grade perception in chemistry courses, J. Coll. Sci. Teach., 45, 24–29.
Karatjas A. G. and Webb J. A., (2017) The Role of Student Major in Grade Perception in Chemistry Courses, International Journal for the Scholarship of Teaching and Learning, 11(2), 14.
Kruger J. and Dunning D., (1999), Unskilled and unaware of it: how difficulties in recognizing one's own incompetence lead to inflated self-assessments, J. Pers. Soc. Psychol., 77, 1121–1134.
Mallow J., (2006), Science anxiety: research and action, in Handbook of College Science Teaching, NSTA Press, pp. 3–14.
National Center for Education Statistics, (2014), Percentage of public and private high school graduates taking selected mathematics and science courses in high school, by sex and race/ethnicity: selected years, 1982 through 2009, retrieved from http://nces.ed.gov/programs/digest/d12/tables/dt12_179.asp.
Pazicni S. and Bauer C. F., (2014), Characterizing illusions of competence in introductory chemistry students, Chem. Educ. Res. Pract., 15, 35–46.
Potgieter M., Ackermann M. and Fletcher L., (2010), Inaccuracy of self-evaluation as additional variable for prediction of students at risk of failing first-year chemistry, Chem. Educ. Res. Pract., 11, 17–24.
Sparkes B., (2000), Subject comparisons – a Scottish perspective, Oxford Rev. Educ., 26, 175–189.
Van Etten S., Freebern G. and Pressley M., (1997), College Students’ beliefs about Exam Preparation, Contemp. Educ. Psychol., 22, 192–212.
Willson-Conrad A. and Kowalske M. G., (2018), Using self-efficacy beliefs to understand how students in a general chemistry course approach the exam process, Chem. Educ. Res. Pract., 19, 265–275.
Wirth K. and Perkins D., (2014), Knowledge Surveys: An Indispensable Course Design and Assessment Tool, Innov. Schol. Teach. Learn., retrieved from: http://www.macalaster.edu/geology/wirth/wirthperkinsKS.pdf.