Ozcan
Gulacar
*a and
Charles R.
Bowman
b
aSam Houston State University, Huntsville, Texas, USA. E-mail: ogulacar@shsu.edu; Tel: +1 936-294-1532
bDrexel University, Philadelphia, Pennsylvania, USA. E-mail: bowmancr@drexel.edu
First published on 28th May 2014
If the goal of teaching is to help students understand a subject, teaching cannot begin until student difficulties with a subject are understood. In order to create a guide for assessing student difficulties with chemistry material, students were asked to rate exam questions on three factors: problem difficulty, familiarity, and self-confidence. These surveys were then compared to difficulty ratings of the same questions as determined by chemistry professors. Students' ratings of problem difficulty, problem familiarity, and self-confidence correlated, as expected, with their success on exam problems. There was also some agreement between students and faculty on the difficulty of exam problems, though students were much more accurate of judging problem difficulty than were chemistry professors. Students were surveyed in two separate cohorts to test year-to-year reliability.
”We may begin to understand student difficulties in chemistry if we understand the ways in which their perceptions of the context of our chemistry courses differ from our perceptions. Otherwise, students and faculty are living in different worlds and speaking different languages (Carter and Brickhouse, 1989).”
Carter and Brickhouse (1989) observed in their study that students and faculty did, indeed, live in two different worlds. Whereas students thought that doing homework and attending lecture were more important, faculty thought that student interest in the subject was more important. Interestingly, students seemed to view mastery of chemistry as within their reach, while the faculty thought chemistry was just too difficult of a subject. Grove and Bretz (2007) created the CHEMX survey to measure expectations for learning chemistry (i.e., knowing how to learn chemistry). Both students' and chemistry teachers' expectations about learning chemistry were measured; expectations between general chemistry students and faculty were significantly different, but that difference largely disappeared by the students' third year.
Most of the studies looking at how accurately students assess their success on exams are performed in psychology or related courses (Sinkavich, 1995; Smith, 2002; de Carvalho Filho, 2009; Rosenthal et al., 2010). While it is a popular topic for psychologists studying how students learn, rarely are the students' perceptions compared with those of the faculty. The two studies above (Carter and Brickhouse, 1989; Grove and Bretz, 2007) are two of the few done on student perceptions in chemistry. However, neither was done on the question level of an exam, rather both studied overall perceptions of students; they were unique in that they compared those perceptions with faculty perceptions.
Further studies by Symington and Kirkwood (1996) in general chemistry and Sözbilir (2004) in physical chemistry are the only other such studies known to the authors, though Symington and Kirkwood did not compare the faculty perceptions to that of the students. Recently, a rubric has been developed to help chemistry faculty assess the complexity of their exams, which may help instructors in better predicting student success (Knaus et al., 2011). These studies all point to discrepancies between teacher and student perception of various aspects of chemistry. However, these studies did not analyse perception differences on specific chemistry topics, or the differences in perceived difficulty of those topics.
In general, students have been observed to have varying abilities at predicting their own success. A number of variables may affect a students' ability to accurately gauge their own success. For example, when asked to predict their performance on a test, higher-performing students were more accurate at predicting their success; lower-performing students tended to be overconfident in their success (Hacker et al., 2000). Self-confidence in correct answers, too, has been shown to be correlated to success on exams (Smith, 2002).
Metacognition, too, is assumed to be related to student success in predicting success. Any evaluation one makes on whether or not they were successful is a form of metacognition (i.e., thinking about your thinking). It should not be surprising, then, that students with higher metacognitive skills, as measured by the Metacognitive Awareness Inventory (MAI) (Schraw and Dennison, 1994), were able to predict their success on an exam than were classmates with lower metacognitive awareness (de Carvalho Filho, 2009). Metacognition may even help students achieve more by compensating for lower aptitude (Cooper and Sandi-Urena, 2009). It has been shown that students in psychology courses can accurately predict their success on individual problems, but have a more difficult time predicting overall test scores on exams (Rosenthal et al., 2010).
Given the lack of research on how students and teachers in chemistry perceive the difficulty of exams, the following research questions were investigated:
1. How accurately do students and teachers assess the difficulty of chemistry problems?
2. Who is more accurate in their assessment of chemistry problem difficulty – students or teachers?
3. Does metacognition affect a student's ability to assess the difficulty of chemistry problems?
Three professors from the same university, with between 5 and 15 years of experience each, also participated in the study. None of these professors taught the general chemistry course at the time of the study, though all have at various times taught general chemistry, and none are authors of this paper. The study was approved by Texas State IRB.
To mitigate item-order bias, the tests given to the 2013 cohort were reordered. The test questions were identical to the tests given the previous cohort, but given in a different order within a section. The sections of each exam (fill-in-the-blank, multiple choice, and long answer) were not reordered. The survey instruments used were unchanged. The instructor for the 2013 cohort was not the same as that of the 2011 cohort.
The surveys asked the student to rate each question of the exam they had just taken on the difficulty of that question, how confident they were on their answer, and how familiar they were with the problem. Each question was rated on a scale of 1 to 3, with 3 being most difficult. The scales and the descriptions given to the students can be seen in Table 1. For exams 1 and 2, a rating of 3 was also most familiar and most confident. However, because students expressed confusion about the self-confidence and familiarity scales, a rating of 3 was changed to least familiar and least confident; guessing was considered “very little confidence.” Analysis of ratings before and after the switch showed no observable differences; the data from before and after the switch were both included in the study as a result. The ratings shown in Table 1 were used in all surveys for the 2013 cohort.
Difficulty (D) | Familiarity (F) | Self confidence (SC) |
---|---|---|
1 – Easy (solution requires remembering some basic definitions or facts) | 1 – Very familiar (have seen and done several similar examples) | 1 – Extremely confident (know how to check and it is correct) |
2 – Medium (solution requires formulas and some application) | 2 – Somehow familiar (have seen before but have very little experience) | 2 – Somehow confident (think I got it right but not sure and do not know how to check) |
3 – Difficult (solution requires linking among different concepts and a lot of calculations) | 3 – Not familiar (have not seen at all before) | 3 – Very little confidence (have no idea if I got it right) |
The instrument asked students to rate each question on the exam with the scales shown in Table 1. 3-point scales were used to simplify the responses. Fewer response possibilities was thought to reduce the time needed to complete the survey. Students were also asked for any additional comments they had on each question and to identify the chemistry topic(s) they felt were being tested by that problem. A sample survey is available as Appendix 1 (ESI†). There were a total of 25 questions on exam 1, 23 questions on exam 2, 40 questions on exam 3, and 38 questions on exam 4. These exams were a mixture of fill-in-the-blank/short response, multiple-choice, and long answer/free response questions (Table 2). Students entered their responses on paper, which were hand-entered by student volunteers into a computer database for analysis. Professors were given copies of the exams and asked to rate the difficulty only for each problem and identify the topic(s) involved in each problem; they entered their responses directly into a computer.
Short response | Multiple choice | Long answer | |
---|---|---|---|
Exam 1 | 5 | 14 | 6 |
Exam 2 | 4 | 15 | 4 |
Exam 3 | 0 | 40 | 0 |
Exam 4 | 0 | 33 | 5 |
Students in the 2013 cohort were also asked to fill out the MAI for bonus points on their own time not during the exams. Students filled out the MAI online using a 10-point scale, rather than the original 100-point scale. 55 students filled out the MAI. This number of surveys was not adequate for running a factor analysis. However, the original loadings were used to calculate the two MAI factors for each student (Schraw and Dennison, 1994); Cronbach's alpha for each of the factors was above 0.900.
Statistical analysis was done using SPSS 20. Most of the tests run were correlations calculated as Spearman's rho, rs. Each test was calculated via bias-corrected accelerated (BCa) bootstrapping using 1000 samples (Preacher and Hayes, 2008; Singh and Xie, 2010). BCa bootstrapping is itself a non-parametric method (Efron, 1981, 1987); the 95% confidence intervals generated are in brackets. In the cases where the correlations were with the students' success on a problem, point bi-serial correlations were used. Positive correlations for the point bi-serial correlations in this paper indicate that the higher the rating (difficulty, self-confidence, or familiarity), the more likely that a student was correct on that problem; negative correlations indicate the reverse. For rs, the effect size is equal to the coefficient (Field, 2009). To bring clarity to the analysis, the self-confidence and familiarity numbers were reversed prior to statistical analysis. The result was that a positive correlation between self-confidence and success on a multiple-choice problem meant that higher self-confidence correlated with higher success.
As can be seen in Table 3, there was a significant, negative correlation between students' difficulty rating on the problems and their success (i.e., correct or incorrect) on the problems (rs = −.200, in 2011 and rs = −.237 in 2013; p < .001 for both). This result indicates that students did have a somewhat accurate assessment of which problems were difficult (i.e., the higher the student difficulty rating, the less likely a student was to answer the question correctly). However, the accuracy was less than might be expected. A correlation coefficient of −.200 only accounts for about 4% of the variance in the data (i.e., .040 = (−.200)2). The result was repeated in both cohorts with nearly the same effect size, even after applying bootstrapping methods, which suggests that this was not a one-time event.
Correlation coefficient | Sig. (2 tailed) | N | ||
---|---|---|---|---|
Difficulty | 2011 | −.200 [−.219, −.181] | <.001 | 10![]() |
2013 | −.237 [−.255, −.219] | <.001 | 9955 | |
Familiarity | 2011 | .155 [.137, .175] | <.001 | 10![]() |
2013 | .222 [.204, .240] | <.001 | 9955 | |
Self-confidence | 2011 | .186 [.166, .208] | <.001 | 10![]() |
2013 | .249 [.230, .267] | <.001 | 9955 |
In order to see if one exam was skewing the results, the correlations in Table 3 were run for each individual exam's questions. The results showed that the correlation remained largely consistent over each exam, with coefficients ranging from −.110 for exam 1 to −.195 for exam 4. The significance levels were likewise unchanged from the overall results, and year-to-year results remained unchanged. In general, the result was expected; students were able to judge the difficulty of chemistry problems with consistent accuracy. The surprise was that the accuracy of that judgment was much smaller than might be expected (i.e., only about 4% of the variance). In other words, one cannot simply assume that students will find problems that they are not likely to get right on an exam to be difficult.
Of interest, too, was how students' feelings of familiarity with a question were related to their success on a problem. The same was true of their confidence that their answer was correct. Here, too, were observed significant correlations between student perception and their performance (Table 3). These correlations were positively correlated, meaning that as students felt more familiar with a question, they were more likely to answer correctly (rs = .155, in 2011 and rs = .222 in 2013; p < .001 for both). The same was true of students' confidence in their answers; higher confidence was correlated with higher success (rs = .186, in 2011 and rs = .249 in 2013; p < .001 for both). Like the correlation with problem difficulty, these correlations were expected, but rather small in size, only accounting for 2–6% of the variance.
Again, correlations were also run for each individual exam. The results were much the same as for the difficulty ratings. None of the exams appeared to be exceptional with regards to correlations between student perceptions and student performance. It was good to note that the correlations here are opposite of those observed for the difficulty ratings. This means that students were more confident of and more familiar with problems they answered correctly.
With such consistent findings between the three ratings, it is reasonable to ask how closely related the ratings are. To determine this, correlations were run between each of the three student rating variables. As suspected from the above analysis, familiarity and self-confidence were found to be strongly, positively correlated (rs = .707 in 2011 and rs = .719 in 2013; p < .001 for both), which accounts for almost 50% of the variance between the two ratings. Furthermore, both familiarity and self-confidence were negatively correlated with difficulty (rs = −.478 in 2011 and rs = −.708 in 2013; p < .001 for a correlation between difficulty and familiarity; similar effect sizes were seen with self-confidence), which account for a large portion of the variance for each correlation.
As a result of the consistent pattern shown above and these strong correlations, a multiple linear regression was run to determine how interdependent the three variables were. (The data were unable to converge while running a logistic regression.) As expected, difficulty accounted for 5.6% of the model's variance (adjusted r2 value). The addition of self-confidence accounted for an additional 1.8% of the model's variance, also a significant addition. However, the addition of familiarity does not account for any additional variance and further analyses suggest that they are co-linear (i.e., they describe the same thing).
b | SE B | β | Sig. | |
---|---|---|---|---|
Constant | 0.26 [0.14, 0.37] | 0.063 | <.001 | |
Difficulty rating | −0.07 [−0.10, −0.05] | 0.012 | −.106 | <.001 |
Self-confidence rating | 0.11 [0.08, 0.13] | 0.012 | .166 | <.001 |
Familiarity rating | 0.02 [−0.01, 0.04] | 0.014 | .027 | .181 |
Knowledge of cognition | 0.04 [0.02, 0.06] | 0.011 | .081 | .0012 |
Regulation of cognition | −0.03 [−0.05, −0.01] | 0.009 | −.088 | <.001 |
In general, students and teachers agreed on the difficulty of problems. There was a large, positive correlation between students' and teachers' difficulty ratings (rs = .468 [.303, .612] with 2011 cohort and rs = .351 [.173, .519] with 2013, p < .001 for both). This accounted for roughly 21% of the variance between the faculty and students in 2011; 12% in 2013. In all, there was a low to moderate degree of agreement. However, this agreement varied by year. A much stronger agreement on difficulty was measured between the two cohorts of students (rs = .765 [.668, .828], p < .001). Even with this higher degree of agreement, there does appear to be some variability in which problems students find difficult from year to year.
As such, it is important to know not just if the students and teachers agree, but who is generally more accurate in their assessment of difficulty (research question 2). Both student and faculty difficulty ratings were compared with each question's performance index (Table 5). Students in both cohorts had moderately strong, significant correlations between their average difficulty ratings and the performance index. As can be seen, the correlation was generally stronger when the cohort year matched with the performance index of that year. That is, students were better able to assess performance of themselves than another student cohort. The faculty, on the other hand, did not have significant correlations between their average difficulty rating of a problem and any of the performance indices. Overall, students, when averaged as a class, were much more accurate at determining the difficulty of the chemistry problem than were teachers.
Performance index 2011 | Performance index 2013 | Performance index combined | N | |
---|---|---|---|---|
a Correlation significant at 99.9% level (p < 0.001). | ||||
Faculty | .178 [−.011, .363] | .191 [.002, .386] | .184 [−.004, .360] | 102 |
Students 2011 | .561a [.378, .702] | .441a [.259, .619] | .516a [.325, .669] | 102 |
Students 2013 | .341a [.129, .520] | .479a [.310, .631] | .414a [.224, .582] | 102 |
There is a body of research that has aimed to increase the accuracy of both student and teacher perceptions of exam question difficulty for both general chemistry and organic chemistry (Knaus et al., 2009, 2011; Raker et al., 2013). These methods focus on cognitive complexity as a major factor in determining how difficult a given problem is; the more demand put on a student's cognitive processes (e.g., memory capacity), the more difficult a problem is. Thinking about the required cognitive load needed does help faculty more accurately gauge problem difficulty. The problem, as shown above, is that most chemistry faculty do not think in these terms. The faculty that participated in this study were not chemical education faculty; their selections of difficulty were based on whatever criteria they developed for themselves; criteria that were clearly not accurate. This difference between chemistry education faculty and other chemistry faculty has been observed before (Grove and Bretz, 2007).
Similarly, the students were not given a more rigorous framework. The comparisons in this study are between students' and faculty's perceptions of difficulty by whatever criteria they determine for themselves (within the rough framework shown in Table 1). Without any guidance, students appear to be much better at making judgements about chemistry problem difficulty.
It seems, too, that students and teachers agree somewhat on what is difficult (or easy) on chemistry exams, with an agreement that is in the range of 12–21% of the total variance. However, this does not mean that both teachers and students are equally good at assessing what is difficult. As was shown, students generally have a better idea than do teachers of what problems are most difficult (research question 2). This is likely because teachers, as experts in chemistry, have a very different view of the classroom than do students, who are novices in chemistry (Carter et al., 1988). It seems that students and teachers still live in two different worlds (Carter and Brickhouse, 1989).
The primary purpose in surveying a second cohort was to verify that the correlations observed with the first cohort were not specific to that group. The repetition of the correlations seen in Table 3 supports the idea that these correlations are likely to be seen in various classrooms and were not isolated correlations. The differences in which problems students in each year found most difficult, however, suggest that the types of problems found most difficult will vary from year to year and teacher to teacher.
Since classrooms change from year to year, one survey of difficulties cannot determine the best course of action for every classroom. Students from the 2011 and 2013 cohorts did not rate the same problems as most difficult. There are a number of potential learning environments that will allow on-the-spot evaluation, such as clickers (King, 2011), flipped classrooms (Tucker, 2012; Johnson, 2013), or peer-led team learning (Woods, 2013), just to name a few. Because students generally have better idea of what is difficult for them, getting them involved is key, and these learning environments are designed to involve students. Surveying students with the survey used in this study for a sample of chemistry problems would be another way for teachers to find out what problems their students find difficult.
The strength of the correlations between the three student ratings (i.e., problem difficulty, familiarity, and confidence in a correct answer) of the exam questions suggests that improving one may help the other two. While reducing the difficulty of the questions – which would undoubtedly make students more confident – is not desirable, increasing self-confidence and familiarity with the problems may have a positive influence on student success. However, improving self-confidence will not directly improve a student's performance in chemistry; the interaction between the two is complex (Ajzen, 2002; Bauer, 2008; Bowman, 2012), and the correlations measured in this study cannot determine causation. In the end, improvement in the correlations observed will have a limit.
Although it would also be important to determine the areas in which there are the largest disagreements (as measured by the difference in average student and faculty rating), such an assessment cannot be made here. As seen above, the problems students find most difficult vary from year to year, though the statistical patterns persisted. Only a large, longitudinal study could determine what disagreements persist between years, if any. Of course, the exact topics are likely to vary from classroom to classroom, as well. Therefore, it is much more important and useful for teachers to determine which topics are proving difficult in each individual class.
This study can, however, act as one potential method for evaluating current courses (they need not be chemistry) and their needed foci. Carter and Brickhouse (1989) believed their general findings to be transferable to other universities; the same is likely true here. Whether one chooses to measure the gap between faculty and student perception of difficulty by the method in this paper or by other means, the fundamental question, as always, is “are we addressing students' needs?” As this study showed, the best people to ask are the students.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c4rp00055b |
This journal is © The Royal Society of Chemistry 2014 |