Casandra
Koevoets-Beach
,
Karen
Julian
and
Morgan
Balabanoff
*
Department of Chemistry, University of Louisville, Louisville, KY, USA. E-mail: morgan.balabanoff@louisville.edu
First published on 15th August 2023
Two-tiered assessment structures with paired content and confidence items are frequently used within chemistry assessments to stimulate and measure students’ metacognition. The confidence judgment is designed to promote students’ reflection on their application of content knowledge and can be characterized as calibrated or miscalibrated based on their accuracy. Previous studies often attributed students’ miscalibrated confidence rankings to metaignorance, however, in this qualitative study, interviews with general chemistry students were thematically analysed to provide a more robust understanding of the processes and factors students use when engaging with these metacognitive prompts in a chemistry assessment. Both calibrated and miscalibrated confidence judgments were observed independent of accuracy. Students who provided miscalibrated confidence judgments often used unreliable metrics such as processing fluency which can mimic content mastery whereas students who provided more accurate evaluations of their confidence relied more heavily on their stable understanding of chemistry concepts. Many students cited previous experiences, underlying self-efficacy beliefs, and/or the use of test-taking strategies which negatively or positively impacted their confidence. These findings suggest that the confidence tier is indeed capturing students’ self-assessment, however, students’ confidence judgments are based on a range of factors independent of content knowledge which may impede on the utility of this metacognitive tool for students, researchers, and instructors.
In these two-tier assessments, confidence judgments have been paired with content questions and used to investigate the relationship between students’ chemistry content knowledge and their confidence in that knowledge. In addition to investigating this relationship, the confidence tier has also served as a way for students to practice self-assessment and strengthen metacognitive skill by providing an opportunity to reflect on their conceptual understanding. Having a strong grasp of their level of conceptual understanding is a critical skill for students as they continually encounter new topics over the course of their STEM education (NGSS Lead States, 2013).
While the confidence tier engages students in this important practice and the relationship between confidence and content knowledge has been used to make claims about students’ ability to self-assess, the range of factors students use to evaluate their confidence has yet to be investigated in the context of chemistry assessments. Without a robust understanding of how students are engaging in the confidence tier, there may be limitations on the claims that can be made regarding the relationship between students’ chemistry knowledge and confidence.
In the present study, semi-structured interviews allowed students to describe their thought processes and the factors which impacted their confidence rankings. This work aims to elicit the variation in students’ decision-making processes when evaluating confidence in their own content knowledge. Equipped with a more complete understanding of the factors that students are using in these contexts, instructors can more ably promote reflective learning and provide targeted support. This work seeks to underpin improved measurement of the construct of confidence so that claims can be better understood and utilized by researchers.
Students’ metacognitive skill when monitoring their learning processes has been shown to improve effective regulation of study and assessment performance (Thiede et al., 2003). Other studies have shown that learning outcomes improve when students’ metacognition is improved through in-class interventions (Thiede et al., 2011; Lavi et al., 2019) and that high performance on problem-solving tasks is more closely related to performance on metacognitive measures than aptitude measures (Swanson, 1990).
Growing literature in science education has strengthened claims that improved metacognition can enhance science literacy, teaching, and learning (Adey et al., 1989; Davis, 1996; Blank, 2000; Georghiades, 2000, 2004). The expectation for students to consciously interrelate chemical phenomena to solve problems in chemistry courses (Mahaffy, 2004) has prompted research on interventions and assessments targeting metacognitive skills in chemistry learners specifically. Studies of undergraduate chemistry students have found that explicit metacognitive regulation can improve learning performance in chemistry courses (Cook et al., 2013; Casselman and Atwood, 2017; Dori et al., 2018). However, implicit metacognitive monitoring alone was not found to improve chemistry students’ learning performance (Hawker et al., 2016), reinforcing the need to explicitly prompt metacognitive regulation.
Students engaging in metacognition are expected to reflect on the techniques and processes employed to engage in problem solving and make final judgments. Heuristics are frequently used tools which students employ to reduce their information-processing load during problem solving (Gigerenzer and Todd, 1999; Gilovich et al., 2002). Specific heuristics, or forms of intuitive reasoning, relevant to chemistry learners have been cited as particularly threatening to the deep, conceptual chemical thinking that is essential in chemistry education (Talanquer, 2014).
An affect heuristic is used to make judgments and decisions based on feelings evoked from the information at hand (Finucane et al., 2000). Evaluation of chemistry students’ conceptual understanding and the factors students use to self-evaluate have also been shown to correlate with affective variables including self-concept and situational interest (Nieswandt, 2007). The affect heuristic is operationalized as a decision-making tool which impacts choices made by college students regarding synthetic substances and chemical processes (Rozin, 2005). Problem-solving and reasoning in chemistry is often hindered by the use of this type of heuristic (Talanquer, 2014).
Another heuristic often used is processing fluency, a metacognitive experience determined by the relative ease or difficulty demanded by a cognitive process. Use of this mental shortcut often leads students to choose responses based on features which are processed fastest (i.e., most fluently) (Heckler and Scaife, 2015).
The use of test-taking strategies independent of content knowledge in chemistry problem-solving can result in applications of a cognitive rigidity heuristic. This heuristic, which is often applied by novice learners who utilize rigid problem-solving algorithms, leads students to fall back on processes or solutions that have worked for them in the past without recognizing how to use the presented information (Gabel and Bunce, 1994).
Explicit metacognitive regulation through meaningful engagement with students in their chemistry learning environments may mitigate the effects of these heuristics on chemistry students’ problem-solving processes by prompting the development of more productive chemical thinking (Talanquer, 2014).
In assessment development, calibration or miscalibration of conceptual knowledge is often measured through self-assessments of individuals’ accuracy in cognitive domains by prompting students to assess the correctness of their responses (Pallier et al., 2002). The agreement between self-assessments and actual measured accuracy are then utilized to make claims regarding calibration and metacognitive skill.
Metacognitive self-assessment outcome scores may take different forms, from subjective probabilities via rating confidence on a 1–100 scale, to dichotomous predictions of whether a performance was successful or unsuccessful, to a Likert-style scale coded to reflect degrees of confidence (Schraw, 2009). Quantitatively, these outcome scores have been used to measure the goodness of fit of these judgments, which may be done through measurement of accuracy, scatter, discrimination, and/or bias scores (Yates, 1990; Keren, 1991; Nelson, 1996; Allwood et al., 2005; Burson et al., 2006).
Each measure provides different types of information which complement one other and none of which is well suited for all situations (Schraw, 2009). For example, bias scores reflect the correspondence between personal assessment of accuracy and an empirical result, where a positive bias score represents overconfidence, negative bias score represents underconfidence, and a bias score of zero indicates accurate assessment (Pallier et al., 2002). Investigators have attempted to provide explanations for such over- and underconfidence phenomena through two prominent approaches: the heuristics and biases approach, and the ecological approach.
The heuristics and biases approach claims that confidence miscalibration occurs due to general cognitive biases, heuristics, or a combination of both, which facilitate intuitive judgments (Tversky and Kahneman, 1996). The ecological approach, on the other hand, suggests that cues which are used to solve cognitive problems are provided by environmental knowledge (Gigerenzer, 1991). This approach posits that both the individual's response and confidence in that response are generated by the same environmental cue (Gigerenzer, 1991). These two approaches may be interpreted as attributing miscalibration to either environmental or personal influences, however the reciprocal interaction of these factors and behavioural influences as outlined by the social cognitive theory (SCT) (Fig. 1) provides a more robust and holistic explanation for students’ metacognitive calibration (Bandura, 1986).
Fig. 1 Reciprocal model of influences which impact learning outcomes as outlined by Bandura's social cognitive theory (SCT). |
Miscalibration between accuracy and self-judgment is a well-documented psychological occurrence termed the Dunning–Kruger effect which characterizes the phenomenon of illusory competence, or metaignorance, a cognitive bias where individuals’ perceived ability exceeds their actual ability (Kruger and Dunning, 1999). This example of miscalibration has been recognized across disciplines and identified as a robust phenomenon in introductory chemistry courses (Bell and Volckmann, 2011; Pazicni and Bauer, 2014). In introductory chemistry, students with mistakenly high self-ratings were resistant to traditional forms of feedback, i.e., low exam scores, and were likely unaware of the need to take corrective steps to improve their performance (Pazicni and Bauer, 2014). Strengthening students’ metacognitive skills through providing opportunities to practice self-assessment has been shown to have a positive effect on student performance calibration (Nietfeld et al., 2005).
Framed by SCT, self-efficacy (SE) refers to “beliefs in one's capabilities to organize and execute the courses of action required to produce given attainments” (Bandura, 1977). SE can affect students’ motivational outcomes (Bandura, 1997), as learners who feel efficacious about learning are likely to engage in cognitive and behavioural activities that improve their learning. SE judgments are not, however, generalized feelings of success, but rather assess the level, generality, and strength of one's ability to act in pursuit of a designated goal. SE judgments can be measured using questionnaire items that are task specific, vary in difficulty, and capture degrees of confidence (Zimmerman, 2000). For example, students’ SE on a set of twenty cognitive puzzles could be assessed via the prompt: “Judge how many items of the previous task that you think you were capable of solving” (Cervone and Peake, 1986) which encompasses all three of Bandura's requirements.
Alternatively, confidence judgments are prompts which specifically target an individual's accuracy in a cognitive domain and are often presented as subjective probabilities of a successful outcome (Zimmerman et al., 1977). Where SE judgments more broadly encompass one's belief in their ability to act in pursuit of successfully completing a task, confidence judgments more closely focus on accuracy within a single domain. These question types both present opportunities for students to engage in targeted metacognitive reflection and may be used in two-tier assessments for chemistry students, the distinction merely lies in the focus of the self-assessment.
Two-tier assessments initially took shape as diagnostic instruments utilizing two-tier multiple-choice items to identify students’ knowledge of a scientific concept in tier 1 and their reasoning for a concept in tier 2 (Treagust, 1986). These assessment items have been presented as an alternative to more arduous qualitative processes to determine students’ understanding and identify alternate conceptions in limited and clearly defined ways (Treagust, 1988, 1995). The two-tier assessment format was adapted to include metacognitive tools with the certainty of response index (CRI), developed to gauge individuals’ certainty in their response as a paired, follow-up question. CRI was developed in light of growing literature in the social sciences which promoted higher effectiveness of immediate vs. delayed feedback on skill performance (Webb et al., 1994). CRI was utilized by Hasan et al., in a physics education assessment development study which sought to identify strongly held alternate cognitive structures regarding classical mechanics (Hasan et al., 1999). The “Hasan hypothesis” posits that high certainty-incorrect response miscalibration may identify students with strongly held alternate conceptions, and that low certainty-correct response miscalibration may identify guessing on assessment items (Hasan et al., 1999).
As more robust metacognitive judgment literature emerged and the construct of response confidence was clearly defined (Pallier et al., 2002; Allwood et al., 2005; Burson et al., 2006; Schraw, 2009), the Dunning–Kruger “metaignorance” phenomenon was identified and broadly identified across groups (Kruger and Dunning, 1999). In response to the NGSS call for improvement of metacognitive skill in science learners (NGSS Lead States, 2013) and work which identified significant presence of the Dunning–Kruger phenomenon in introductory chemistry students (Pazicni and Bauer, 2014), two-tier assessment formats pairing confidence judgments with content questions have been increasingly used by researchers who seek to target chemistry learners’ metacognition.
Assessments and concept inventories targeting representational competence (Connor et al., 2021); fundamental pH concepts (Watson et al., 2020); redox reactions (Brandriet and Bretz, 2014); reaction coordination diagrams (Atkinson et al., 2020); as well as enthalpy and entropy (Abell and Bretz, 2019) all utilize two-tier confidence judgments. The use of this confidence/calibration data has been used to distinguish clusters of students whose data are similar to one another and dissimilar to those in other clusters (Han et al., 2012), which allows developers to cluster students based on calibration categories.
Improvement of student self-efficacy has been identified as a target to promote well-rounded and conceptualized chemistry knowledge (Dalgety and Coll, 2006; Kan and Akbaş, 2006; Villafañe et al., 2014; Ferrell and Barbera, 2015; Avargil, 2019), however two-tier assessments utilizing confidence tiers have largely been used as tools to identify metaignorance in chemistry learners who may benefit from interventions (McClary and Bretz, 2012; Brandriet and Bretz, 2014; Connor et al., 2021). The confidence judgment tool provides a measure of chemistry learners’ calibration of accuracy to confidence, delivering feedback for instructors to identify whether students are calibrated. It also allows for targeted instruction and provides students the opportunity to practice the skill of metacognitive monitoring and engage in comparative benchmarking with peers (Farh and Dobbins, 1989; Clinchot et al., 2017).
RQ1. Which strategies, factors, and thought processes do students describe while ranking their confidence on a General Chemistry assessment?
RQ2. What relationships can be observed between correctness of students’ answers and reasoning for their confidence judgments?
Beginning in Spring 2020 and for all subsequent interviews, the RPV interview protocol for the research team was expanded to explicitly include probes exploring students’ response processes for their confidence judgments. Similar to the preliminary interviews, students were shown items from the assessment which served as a frame of reference for confidence judgments. For this study, the data for both content and confidence response processes were analysed, and relevant data from preliminary interviewees who discussed factors impacting their confidence without explicit prompting has been included.
The interview protocol utilized from Spring 2020 forward included questions targeting responses to content items as well as additional interview items directed at participants’ perspectives and reasons for their confidence rankings. The interviews were semi-structured and designed to prompt students to describe the techniques they used to answer questions. Students were presented with a content question first and asked to describe the process they used to answer. They were then presented with the Likert scale-ranked confidence tier and asked to explain their chosen confidence ranking (Fig. 2). Prompts in the interview protocol asked students to expand, describe, or connect strands presented between both their content and confidence response processes. Students repeated this process for 3–6 items depending on the length of the problems to keep interview length similar.
Interviews were audio- and video-recorded, transcribed using Otter.ai, and cleaned by the research team to confirm accuracy.
The qualitative codebook was structured to incorporate evolving codes as a result of the research team's recognition of new patterns and themes during data analysis (Miles et al., 2014). This resulted in segments being adjusted and recoded. As central themes were identified by the researchers, codes were grouped and recategorized utilizing constant comparative techniques (Glaser, 1965). Knowledge-related codes, confidence-related codes, and question-related codes were categories that emerged during this open coding process (Fig. 3).
Codes and categories which emerged from open coding were then further refined during the subsequent axial coding process. The impact of increasingly frequent codes on confidence was assessed, which led to the alteration of certain codes for clarity. An example of this occurrence was the separation of the Confidence category into Self Reflection and Time Dependence to distinguish differences in confidence trends between instances in which students described their level of confidence or ability to respond in terms of their own judgments or experiences versus when they described their processing fluency.
Connections and relationships observed between various codes in one transcript were documented and evaluated by analysing their recurrence in the other transcripts. These relationships were frequently identified between codes pertaining to confidence level and correctness and codes associated with test-taking strategies and/or heuristics. This process was streamlined through use of MAXQDA's Complex Coding Query feature.
The final step of the coding process involved defining overarching themes which were used to categorize the set of codes from the previous axial coding process. These emergent themes focused on content knowledge, self-reflection, time dependence, and test-taking strategies.
Throughout the process, the first and second authors independently coded the same transcripts before meeting to discuss and refine until agreement was reached. Intermittent check-ins with the third author to confirm and clarify emerging codes allowed for consensus to be reached by the research team. Each final code in the code book was confirmed to be present in at least two transcripts at each institution. The lowest frequency code observed was “Peer Comparison”, observed in transcripts of four students (two at each institution), and the highest frequency code was “Correctly Applied Conceptions”, present in all twenty-two interviews analysed. The final code book was agreed upon by the entire research team and any transcripts utilizing old codes were re-coded by the first and second authors to align with the final code book and revised until agreement among all authors was reached (Table 1).
Code | Description |
---|---|
Content knowledge | |
Correctly applied conceptions | Correctly applied a learned concept which was well-conceptualized |
Incorrectly applied conceptions | Utilized and described an alternate, incorrect conception |
Self-identified gaps | Describes a concept that they believe they should understand or know but do not |
Personal influences | |
Past experiences | Description of a past experience when describing confidence |
Self-judgment | Provides an overarching, big picture evaluation of own abilities |
Comparing confidence across items | Compared confidence on a specific item to confidence on previous item(s) in the assessment |
Peer comparison | Compares own performance to perception of their classmates’ abilities |
Time dependence | |
Retrieval ease | Describes their own efficient recall of information to answer question |
Retrieval difficulty | Describes longer time needed to retrieve information to be able to answer |
Test-taking strategies | |
Identifying key words | Identified key words for easily activated concepts and/or other relevant information |
Elimination | Describes process of eliminating options to select a final answer |
Guessing | Mentions a selection choice based on a guess rather than content |
Analysis resulted in the characterization of a broad scope of factors used by students to make metacognitive confidence judgments for chemistry assessment items. Students’ application of content knowledge was indeed identified as a factor used to rank confidence; however, extensive use of affective judgments, time-based judgments, and deployment of test-taking strategies were also observed in interviews by the research team. Selected student quotes that exemplify emergent themes are discussed across each research question.
“I would say I can answer this one well… My ranking is more so based on the concept that the question was asking me about this time. Phase changes have been covered since seventh-grade chemistry, I think it's been hammered. It got a bit specific talking about the bond lengths and density, temperature, but that's all pretty familiar to me.”
(Ted)
Ted's high confidence ranking was attributed to a strong conceptual understanding of chemistry content which he developed over time because of repeat exposures. He explicitly stated that his confidence ranking for this question was “based on the concept that the question was asking”, indicating that he considered how well he understood the target concept when reflecting on his accuracy. Ted's application of his conceptual understanding to answer this specific question led to a higher confidence judgment.
Despite many observable examples of students using content knowledge as a measure of confidence, the ability to correctly apply general chemistry concepts did not consistently correspond to high confidence judgments. When students answered correctly but did not describe content knowledge as impacting their confidence, their rankings were consistently average or low. Factors such as requiring extended time to answer, and/or dependence on test-taking strategies often led students to express doubt and ultimately rank their confidence as average or low despite answering correctly. For example, Sharon correctly applied chemistry concepts to eliminate multiple response options designed as distractors, however, she ranked her confidence as average. Here, Sharon considered dispersion forces:
“Dispersion forces, I know that those are typically found in basically all molecules and compounds. So, they’re not permanent. I crossed out everything that said the word ‘permanent’ in here because dispersions are temporary. They’re not the same as induced dipole which changes based on the orientation of the molecules, I think.”
(Sharon)
After discussing the question content, Sharon went on to judge her confidence as average, explaining:
“I remember the big topic things. Whether I remember which definition goes to which, that's what's kind of up in the air. That's why I put average, because I might have gotten this right and I probably know what I'm talking about, but maybe I just picked the wrong one.”
(Sharon)
Sharon's doubt regarding how well she remembered a specific chemistry concept was linked to her use of elimination strategies which lengthened her time to decide on an answer. When explaining her reasons for her confidence judgment Sharon did not self-identify her working understanding of intermolecular forces which provided the foundation for her response. In this example, Sharon's metacognitive confidence judgment based on her knowledge was hindered by self-doubt and other interfering factors.
“I think it's [Option] B, probably. The reason that things form bonds is because they want to satisfy their octet, to be more stable […] The other choices were not as correct to me, like [Option] C says, ‘H and O attract because the valence electrons of each atom are attracted to the nucleus of the other’. I know that's true, but I feel like B is more true because that's why they want to bond.”
(Gretchen)
Gretchen correctly identified that the correct response (Option C) was a valid explanation, demonstrating a degree of content knowledge, but misattributed the octet rule as the driving reason for bond formation. She went on to make a confidence judgment which reflected her struggle to fully reject either Option B or C:
“I guess average, but I’m still a little confused. I don't know. If I reviewed it before taking the test, I would probably be better at answering this. I generally know the concept and would be comfortable explaining it to someone, but I could also imagine saying ‘Oh, I’m not sure, let me check the book.’”
(Gretchen)
Alternate conceptions like the previous example in which students attribute bond formation to the teleological explanation of atoms “wanting to fulfil an octet” are often held due to instructors’ use of simplified language when introducing concepts for the first time (Bodner, 1986; Talanquer, 2014). These conceptions can be strongly held, resistant to change, and often persist over time (Bodner 1991), which makes them even more difficult to metacognitively assess for those with lower metacognitive skill.
A deficit in metacognitive skill was observed in students who confidently applied such alternate conceptions to assessment items. Students were able to construct thorough arguments for incorrect response options using alternate conceptions which often resulted in more confident judgments. Gloria explained her thought process when she was asked to choose which free energy diagram most appropriately described the formation of water from its constituent elements.
“In the [chemical] equation there's hydrogen and oxygen separately and then they come together so that would be a decrease in entropy. Then we just have to think decreasing entropy means nonspontaneous reaction and then nonspontaneous means the ΔG is going to be greater than zero which would mean it's an endothermic reaction. Which means products will be higher energy than reactants. That's how I picked C.”
(Gretchen)
This explanation demonstrates another common alternate conception frequently held by students, that a decrease in entropy indicates that a reaction is non-spontaneous under all conditions (Teichert and Stacy, 2002). Gloria went on to rank her confidence as high based on her perception of piecing multiple ideas together.
“I feel pretty confident in my answer being right. I just think about how much information I could remember, what the teacher said about it, or that I wrote it down in my notes to decide if I felt confident about my answer or not.”
(Gloria)
Alternate conceptions regarding the relationships between chemical bonding, spontaneity, and entropy, have been shown to relate to confusion and unwillingness to reconcile contradictory information in chemistry students (Teichert and Stacy, 2002). Ultimately, Gloria integrated an alternate conception seamlessly into her train of thought which resulted in a metacognitive judgment of high confidence.
“I feel like if I were to know what I could use that [constant] for it would have been helpful for sure, because when I take chemistry exams or in my homework if I was given a constant, I always felt a lot better.”
(Lily)
Lily's low confidence in her incorrect response indicated a degree of metacognitive skill. While Lily was unable to correctly apply content knowledge to respond to assessment questions, her ability to reflect and identify weaknesses in her own knowledge structures was captured through the metacognitive tier.
“I'm gonna put well for this one, just because I feel like I'm remembering these things correctly. We've talked about this topic in, not only in Gen Chem 1, but we also talked about it in the second one. So, I feel like it's been very reinforced and that's why I put well instead of average. I would say out of 100% I feel like 75%. During the school year, my friend was like ‘Oh, I don't get this’ so I would frequently have to explain this topic to her.”
(Sharon)
Sharon framed repeat exposures with the construct in a variety of settings as experiences which influenced her high confidence ranking rather than strict application of content knowledge. When positive past experiences were related to confidence rankings, they were frequently applied in these topic-specific contexts related to repetition by instructors or peer-mediated interactions focused on one content area. These repeated cues regarding a specific content area can be considered environmental influences which impact metacognition when framed by SCT (Bandura, 1986).
Alternatively, negative past experiences more generally described students’ broader experiences with chemistry courses and interactions with their professors. One student who brought up negative interactions with instructors during the interview explained how these experiences led to a decline in her online homework and attending her professor's office hours:
“I'd go with low confidence. I didn’t really get much help from the online homework for this, it was more about getting the points as opposed to actually doing it to ‘get’ it. I struggle with online homework in chemistry because I like to physically work stuff out and it was hard because [my professor] was the only professor teaching this course and I had a really full schedule last semester. I would go into [my professor's] office to talk to [them] about homework questions, but there would be 15 people in front of me so I couldn't get them answered. Then I’d find myself referring to online websites just to get the right answers as opposed to actually working through and understanding them.”
(Celeste)
Similar to Celeste, Lily related her overall experience in chemistry to her confidence on a specific item stating that she generally did not feel confident in the course.
“I definitely don't have any confidence in the answer that I chose. In general, the entire time I was taking chemistry, it felt like every time we were on a new part, I was still trying to figure out the last part.”
(Lily)
For these students, their ability to metacognitively engage with the confidence tier was overridden by powerful feelings evoked by their previous experiences, either specifically with a content area or more broadly in their learning environments. In these cases, strong affective associations impacted students’ perceptions of their content knowledge.
This feeling state response demonstrates the mental shortcut of an affect heuristic which students used to guide their confidence judgments. The use of affect heuristics leads to choices that are either consciously or unconsciously guided by affective feelings (positive or negative) tied to experiences with the content area (Finucane et al., 2000). When affect heuristics were observed, the feelings invoked were described by students as directly influencing confidence judgments. When students are choosing confidence rankings based on emotional reactions to the assessment item, it can be reasonably assumed that they are not effectively engaging in metacognitive reflection. Confidence judgments are then based less on the experience of answering the content question and more on shortcuts in response to emotions triggered by the assessment item.
Generalized positive self-judgments reflected a strong identity in STEM as a successful student. These judgments allowed students to overcome doubt or even a lack of content knowledge by buoying confidence rankings. When interviewed, Gus consistently described his confidence as high even when he acknowledged struggle or extended time required to select an answer. At the end of the interview, the research team prompted Gus to reflect on his overall interaction with the confidence tier and his consistently high rankings:
“I believe in myself, so I if I take an ‘L’, I take an ‘L’. If I get a ‘W’, I get a ‘W’. I’m still going to be confident in myself when approaching the answers, so I will usually give myself ‘well’ and I'll always be ranking myself high. Whenever it comes to each specific question… it'll probably be a range, but overall, [my ranking would be] well. Whether I get questions right or wrong… if I feel like I'm at least on the right track, and if I need to study a little bit harder for something then I guess I will. Overall, I'd say it would always be well.”
(Gus)
Gus demonstrated a strong, underlying positive judgment of his abilities in chemistry which translated into all high and average confidence rankings despite acknowledging that he may be answering some questions incorrectly. One possible explanation for these generally positive ranking would be to consider Gus's confidence judgment process as an example of utilizing the affect heuristic, as his positive feelings towards his overall ability to succeed on a general chemistry assessment dictated his decision to rank himself highly confident.
Not all self-judgments students described were positive, however. Students who exhibited negative judgments did so in reference to their overall ability in chemistry or in general assessment settings. These static negative evaluations were pervasive during the interview regardless of the concept being assessed. Lily described her confidence judgment process as a response to her assessment anxiety:
“I feel like the majority of time I start to go into panic mode and it's like, ‘Okay I have to answer, I have to pick something, it has to be right.’ Because I do that, I definitely answer a lot of things incorrectly because I don't let myself have time to think it through and work it out. I always answer questions incorrectly and the way that I answer the questions I could probably be doing something differently.”
(Lily)
Lily's descriptions of her confidence judgments used particularly severe language such as “panic mode”, “it has to be right”, and “I always answer questions incorrectly”. Based on Lily's thought processes, extreme language during the interview, and sweeping generalizations, Lily's ability to metacognitively engage with the confidence tier was being consistently overridden by critical self-judgment and static evaluations of her understanding. The reasons she cited when ranking her confidence were not observed to be directly tied to her current assessment of performance on the assessment items but rather a product of underlying negative self-judgments.
These examples of high and low baseline confidence due to self-judgments could be students’ use of self-efficacy judgments as a basis on which to rank themselves rather than confidence in their understanding. Self-efficacy beliefs are an example of personal influences which interplay with environmental and behavioural influences to affect students’ learning and metacognition (Bandura, 1986). If students identify themselves as particularly efficacious or inefficacious, these feelings of self-efficacy may be translated to their confidence rankings as fixed overall confidence which lacks sensitivity to each individual response provided. If the target purpose of the confidence tier is to promote metacognition, students with extreme positive or negative self-efficacy beliefs who interpret the prompt as a general reflection on their ability to answer chemistry assessment items may not be benefiting.
“I would say I was able to do this one well. And earlier… I'd say it'd be well instead of average. I'd say well, not very well, because immediately I was like ‘Oh, I know this concept. I know what neutrons are, I know what protons are’ so I was able to take this off. And I know that electrons and protons are not what determines an isotope. But I was not able to confidently remove a second option.”
(Roy)
As they were exposed to metacognitive practice over the course of the interview, students exhibited the ability to align their response processes within a set of personal criteria and described an understanding of their abilities with more detail and rigor. This has been demonstrated on a larger temporal scale in a longitudinal study which showed that students with more experience in chemistry learning environments exhibited better calibration than those with less experience (Atkinson and Bretz, 2021).
“I want to put average because I feel like I said some things that were true but I'm gonna put poorly because I probably got it wrong to be honest. I know some people I was in class with definitely know the answer to this because it was on our tests, and it was something we had to understand.”
(Sharon)
At another point in the interview, Sharon would shift and describe her confidence resulting from a stronger conceptual understanding relative to her peers:
“I feel like other people probably don't remember this, but I'd like to say that I remembered the big topics… that's why I put average.”
(Sharon)
Whether she felt like she had less skill or more robust knowledge than her peers, Sharon described a comparison between her own understanding and that of her classmates. If students interpret the confidence tier primarily as a tool to engage in peer comparison, it may be limiting their ability to engage in reflection of their own understanding, ultimately diminishing the intended effect of improving and engaging students’ metacognition and the types of conclusions an instructor could make about students’ ideas about their conceptual understanding.
“I’d pick very well. I didn't have a lot of trouble with this question because it was a concept I was familiar with. And [the ranking] is based on how much I thought about it. Which was not very much. When I was doing the test it was just like, ‘that's water’. It was pretty straightforward.”
(Gretchen)
In instances where quickness to respond represented use of a processing fluency heuristic to choose an option rather than correct conceptual knowledge, students also identified speed as an indicator of high confidence. In this example, Cady was asked to identify a bonding model which represented a water molecule and chose a distractor that did not account for hybridization. Her confidence was based on quick identification where no rethinking was observed.
“I was the most confident with that question so far, I would say I answered it very well. When I was reading the question, I was already able to already visualize how I have learned and remembered a water molecule to look like. So, matching that up was pretty straightforward and that's why I feel that way.”
(Cady)
Students who described quick thought processes resulting in rapid response times discussed how this retrieval ease improved confidence in their response. In addition, these students were not observed to go back and revisit other options to confirm their initial choice. When the first idea that came to mind was provided as an option, they often viewed the rapid response time as confirmation of a correct understanding.
“I would say average because it took me a while and I had to think about more. I guess it was more than just my general knowledge of chemistry. I just had to think a little bit more so there's more room to second guess myself.”
(Gretchen)
Gretchen went as far as identifying that when she took longer, she engaged in rethinking and even began to doubt her response, resulting in an average confidence ranking. This experience echoes previous literature using FOR judgments which relate extended rethinking times to an increased probability of answer change (Thompson et al., 2011). For both retrieval difficulty and retrieval ease, time was an unreliable signal for students as demonstrated by the varying degrees of success in their confidence rankings.
Students cited this phenomenon as a test-taking strategy by describing the process of identifying key words wherein they would quickly scan the question and response options for easily activated concepts and immediately eschew other relevant information provided. Regina stated that she was able to apply this strategy which led to an average confidence ranking despite the fact that she was not familiar with the concept of dispersion forces. She also discussed her past experiences specifically with how she applies test-taking strategies:
“I’m going to go with average since I didn't know the content super well to be able to answer it. But I used what I did know to try to eliminate some. I grew up doing academic team and taking standardized tests, so I’m used to using the strategies. Like, for this one, since I didn't really know the content, I picked apart the key words that were different in each of the answers, and then answered it based on the content. The first step in answering this question was just a standardized test-taking strategy rather than a chemistry strategy. I think that really helps you in taking questions like this and answering them.”
(Regina)
Regina frequently applied strategies to answer when content knowledge alone was not enough to choose a response and her use of strategy directly informed her confidence ranking. Regina, like many other students across interviews, relied on both past experiences and test-taking strategies demonstrating the complex overlap and intersection of factors.
“It really made me think a lot, evaluating myself and keeping it as the same scale when I’m comparing yourself between questions. Like if I was struggling between two choices, maybe I could put well. And maybe if I knew the right answer immediately, I would put very well. If I could only cross off one choice, maybe I could put average, but I’d have to make sure that I do that on every other question too. Having the same scale when it comes to all the other questions is what I was trying to do. Just making sure I was comparing them the same way, so that way I could look back at it and know, ‘oh yeah, I didn't know this one because I didn't know what the choices meant’.”
(Haley)
Haley identified that she had developed criteria for her confidence judgment and carefully based it on how many response options she was able to eliminate. She explained why the use of defined criteria was useful for her:
“I just feel like it'd be easier for me to compare them together when it comes to studying. Like if I had an exam coming up, I would know that isotopes are something I really need to review because compared to the other questions, I struggled with them the most. So, having them be similar to each other when it comes to scaling would be helpful so that you can compare them and look back at it to decide how much time you really need to spend studying each topic.”
(Haley)
Haley was able to clearly outline how she utilized her content knowledge within her application of test-taking strategies ultimately to further target her study efforts. This criteria development indicates significant metacognitive skill and awareness and may be a useful framing when students are learning about metacognition. The successful application of this strategy seems more indicative of the stability of the content knowledge used to eliminate response options rather than on robust metacognitive skill. That is not to say that students who utilize elimination strategies are not engaging in metacognition; the above Haley quote exemplifies a circumstance where chemistry content knowledge, the use of a test-taking strategy, acknowledgement of processing fluency, and metacognitive reflection are all present within the confidence judgment. Without the supporting conceptual understanding, elimination strategies can be unreliable in confidence rankings.
The Hasan hypothesis posits that students who answer correctly but rank their confidence as low may be identified by instructors and researchers as possible “guessers” (Hasan et al., 1999). While this was the case in some instances, students’ low confidence rankings for correct responses were generally more nuanced and often based on affective factors or stemmed from a potential self-identified gap in content knowledge rather than on true guessing. Generalizing all responses which fall into the low confidence and correct category as guesses neglects several dimensions of student understanding and metacognition.
Interestingly, students whose ability to correctly guess in the absence of chemistry knowledge has been reinforced through past experiences may more closely fall under the “metaignorant” calibration category. Regina cited guessing as a factor which actually improved her confidence based on her successful history of guessing on assessments.
“It was my gut guess. I've always been pretty good at guessing on standardized tests, so I eliminated those two pretty quickly. And then out of the two left, I was just like, I'll pick that one. But doesn't mean it's right, but that's just what I was thinking… my first guess.”
(Regina)
Regina explained that her previous experiences in successfully guessing increased her confidence while other students aligned with the Hasan hypothesis, highlighting variation from student to student. Clustering groups based on perceived metacognitive calibration in two-tiered assessments may have the unintended consequence of inadvertently mislabelling students like Regina due to the complex interweaving of factors which impact their confidence rankings.
“So now since I've said ‘well’, I would say that not having reasons for the other answers to be wrong would be the distinction between ‘very well’ and ‘well’. So ‘very well’ would be when I could argue why each wrong choice is wrong and know I’m right because I have that information. But ‘well’ would be when I know which one is right, but I don't know why some of the other choices aren't right. I feel like whenever I started these ranking things I just picked ‘very well’ but then as it went on it became like ‘compared to the last question, it's not ‘very well’, it's just ‘well’’. It just gets more defined as I go along. Like in the beginning, like the first question, if it was hard for me then I'd be less inclined to say, ‘average’ and more inclined to say, ‘very poorly’, because I don't know what's coming. I guess I'm comparing it to the other questions.”
(Gretchen)
This indicates that significant cognitive power may be used to structure and restructure a personal rubric for confidence judgments across an entire assessment.
Students responded to the confidence tier positively, indicating that the tool was helpful for them and could be applied in their future studies. Cady discussed how she would engage in metacognitive reflection and self-judgment to guide her study efforts.
“I think this is great… I'm actually going to take this with me in my studies this coming year. I feel like it's really important to just see how you feel about certain questions because it allows you to have a starting point for things you need to work on and things that you can skip over. I know when studying sometimes I go over things that I'm already good at just for a confidence booster, and honestly that's kind of a waste of time. So, I like this method a lot actually.”
(Cady)
When the utility of the tool was verbalized by the student within the interview, it was well-conceptualized and described as being applicable in contexts outside of chemistry and outside of summative assessments. Cady was able to describe in detail how she would use metacognitive confidence judgments in the future while studying to direct her focus and attention and ultimately improve her performance on assessment tasks. This exemplifies the interaction observed between environmental and personal influences (i.e., feedback and self-reflection) and behavioural factors (i.e., targeted study tasks) guided by the metacognitive tier when considered within the theoretical framework of this study.
Possible response patterns are outlined in Fig. 4. Student responses were categorized as correct or incorrect and the calibration of their confidence was based on their selection of a high or low confidence ranking. Response patterns outlined in the dashed box represent miscalibrated confidence judgements.
Fig. 4 Classification calibration categories based on students’ response patterns. The dashed box represents miscalibrated response patterns for both high and low confidence. |
Two main observations regarding calibrated, highly confident students can be made based on HC interview responses. First, students who rank their confidence as high are often ranking their confidence based on how well they performed within the constraints of an assessment rather than how well they conceptually understood and applied their chemistry knowledge. This is likely a result of the language used in the stem of the confidence tier which asks students to rank how well they felt they answered the question. The resulting measurement therefore captures how well students feel they navigate exams rather than their confidence in their chemistry knowledge, lessening the benefit of engaging in metacognition within a chemistry assessment context. Secondly, the prevalence of students basing their confidence on test-taking strategies independent of correct conceptual understanding may be a considered a limitation specific to multiple choice question type-assessments which are uniquely suited to strategic thinking. For the confidence tier to function as intended, and for it to accurately measure the construct it is designed to, students may require priming or more targeted language to outline the purpose of engaging in metacognitive practices and direct their self-reflection towards content mastery rather than assessment competence.
Successfully engaging in metacognitive calibration generally serves students well as they engage in both formative and summative assessments. For students whose responses fall under this LI category, however, their rankings are influenced by the emotions evoked by the experience of being assessed, both by the assessment and by themselves. These affective domains are not being directly targeted by the confidence tier, but they are being captured as students revert back to their mindset from the learning process. While the data being gathered accurately reflects the low confidence of students with lesser chemistry knowledge, the negative focus and self-talk that is occurring in response to this tool can be, at best, distracting, and at worst, damaging to the target population. This may have unintended consequences which could impede their ability to self-assess in the future.
In interviews, students who selected high confidence when incorrect were often observed to hold deep-seated alternate conceptions which prompted a high level of confidence. Students with alternate conceptions cited processing fluency because the information they drew on was readily available or easily activated, and these signals served as indicators that they knew the concept well. However, these parameters resulted in a false-positive which signalled conceptual understanding to students. Possessing any alternate conception indicates that a student is not a complete novice as they have some level of understanding of chemical concepts. At this stage in their education, students have begun to master some topics making them feel confident in their content knowledge. Because students are enrolled in an introductory chemistry course and are relatively new to chemistry content at large, they may be at the stage where they have enough working knowledge to engage in the material but are still misunderstanding key concepts. False-positive signals of processing fluency corresponding to an alternate conception consequently made students feel overconfident in their knowledge. In some instances, quick activation of a concept can be appropriately signalling correct conceptual understanding, however, it can be particularly deceptive when the student holds a deep-seated alternate conception. As such, these signals should be used with caution and is an important discussion point when asking students to engage in metacognitive tasks.
Students who are miscalibrated by being overly confident do not recognize crossing a boundary from the concepts they correctly and incorrectly understand. Knowing when this boundary is crossed is the target metric of metacognition and improving the awareness of such crossing is central to engaging in metacognitive tasks. The challenge remains that even experts are not aware of the unknown unknown. That is, even experts are sometimes unaware of the boundary crossing. Improving metacognitive skill is still a target for chemistry learners to master.
The students in this category often possess correct conceptions regarding the chemistry content but were not confident in that knowledge. In some cases, students described past experiences of initially struggling with the material but did not consider whether they had since developed a better understanding of that concept. Their past experience of struggle was overriding any present reflection on conceptual understanding. In other cases, students were doubtful of their answer because of their inability to eliminate all but one response option. Due to the nature of multiple-choice assessments and the prevalence of test-taking strategies, many students cited the fact that they were stuck deciding between two options or that they could not confidently eliminate the remaining options as metrics which cast doubt on their confidence. Many of these students were able to correctly explain a concept out loud to interviewers but then felt doubtful as they engaged in the multiple-choice question. In these instances, the prevalence of test-taking strategies impeded students from exclusively considering their content knowledge. For students in the underconfident miscalibration category, it may be helpful to prompt them to generate an answer to the stem prior to reading the response options and select the option that best matches their initial thought.
In many cases, selecting average indicated that a secondary dimension may be encroaching on a student's ability to engage in true metacognition. For example, Ted held substantial content knowledge but expressed doubt in many of his responses, and ultimately ranked half of his responses as average. In another case, Gus possessed high underlying self-efficacy which may have overinflated confidence in his application of content knowledge on specific questions. In these scenarios, overriding self-efficacy perceptions can interfere with effective metacognition.
In other instances, a tendency to compare one's performance to peers’ performance may be leading students to rank themselves as average in an effort to sort themselves as part of the group. Again, this indicates less engagement with metacognition and more so with ancillary thought processes separate from their accuracy on the assessment.
During interviews, accuracy-confidence calibration was elicited as a point-in-time measurement, meaning that claims cannot be made about the stability of confidence in content knowledge over time based on the findings. While the interviews collected came from two different higher-education institutions, demographics and subsequent trends were not accounted for within the confines of this study.
Future work includes investigating the intersection of demographic, environmental, and/or identity factors and their role regarding student confidence and calibration, as well as exploring the possible influence of social desirability on students engaging in metacognition while speaking with an interviewer.
If the goal is to measure confidence to make more comprehensive claims about students’ content knowledge, there is significant evidence from response process validity interviews that external factors are impeding the assessment from capturing metacognitive skill and content-centred confidence. There are several instances where students are engaging in calibrated metacognition. However, the processes that students outline in their responses show a host of influences beyond content mastery when a confidence ranking is chosen. The issue is not that the confidence tier does not capture confidence judgments, but rather that it captures additional information.
If the goal of including a confidence tier in a two-tier assessment is to promote metacognition, assessment designers in both research and instructional settings should consider that students are considering factors beyond strict metacognition. This tier may serve students who already possess strong metacognitive skill to engage in the practice but may prompt those without significant experience with metacognitive monitoring to spend valuable cognitive energy thinking about external factors.
For students who rank themselves based on their perceived understanding relative to their peers, the confidence tier is providing incomplete information about individual students’ metacognitive abilities to those analysing its output. Ranking oneself relative to the class is likely a symptom of the U.S. higher education system where curving grades is a common practice. This means that effort must be put into priming students on the utility of the confidence tier to better contextualize confidence as relative to the material rather than relative to their peers. Explicitly priming students by explaining the target construct of the confidence tier before they take a two-tiered assessment may provide them with more clarity and purpose to engage in metacognitive practice.
Restructuring both the testing stem and response options to clarify the purpose of the confidence tier may improve both promotion and measurement of students’ metacognition. One technique that could assist in clarifying the metacognitive target of the confidence tier for students would be using more targeted stems in prompts. Many assessments using confidence tiers utilize language such as “how well do you feel you answered the previous question?” and may benefit from rephrasing. Specifically, directing students’ attention to reflect on how effectively they were able to apply their content knowledge would assist in narrowing their focus. Additional clarity can also come with restructuring response options. More thorough understanding of students’ calibration may come if an average option is not provided in a confidence tier, as for many students this option is chosen as a null response (Chyung et al., 2017) based on peer comparison or an underlying baseline self-efficacy feeling. Constraining students to select either high or low confidence has the potential to promote targeted metacognitive engagement as well as provide richer data for analysts. In some studies, the removal of a mid-point point resulted in respondents selecting either a more positive or negative option (Worcester and Burns 1975; Garland, 1991) suggesting context-specific engagement with Likert-scale items. As such, the removal of average from confidence items warrants further investigation in the context of chemistry assessments.
For continued use of confidence tiers, these findings highlight the need for assessment designers to consider what the intended construct of interest and design item stems to specifically capture that construct. As previously posited, miscalibration of high confidence can be an indicator of deep-seated alternate conceptions. Miscalibrated low confidence, which has previously been cited as a hallmark of guessing, is more accurately providing insight into environmental factors such as students’ preconceived notions about their own chemistry abilities. These inferences may become more reliable as the clarity of the testing stems and the structure of response options is improved, and with the inclusion of explicit priming.
The interpretation of this assessment output by instructors to focus instructional changes or target individual miscalibrated students may be most useful if paired with explicit instruction regarding how results will be used. A recent study found that students’ confidence rankings in two-tier question formats were highly calibrated across all performance levels when surveyed using clickers during class periods (Bunce et al., 2023). This presents a compelling argument for use of two-tier assessment items in low-stakes formative assessments to allow instructors to provide immediate feedback on miscalibrated concept areas. This may be especially beneficial as the timeliness of the feedback may impact the interference seen in the current study by external factors due to the temporal nature of collecting interviews post-assessment. Ongoing interaction with confidence rankings in these formative assessment settings during a course provides practice for students to better engage in metacognition, providing higher quality feedback for instructors and stronger metacognitive skill development for students. When asked to engage with the confidence tier in interviews, students responded positively, and some went as far as to indicate that they would engage in their own confidence reflections independently in their future assessments. This benefit would be further supported with post-instruction and discussion of the utility and importance of metacognitive monitoring.
It is imperative in promoting metacognition to move away from a deficit mindset when students are misinformed or uninformed. An alternative framing for miscalibration of confidence rankings is as an opportunity to improve metacognitive skill and better prepare students for future educational endeavours rather than as an opportunity to identify students with misaligned perceived accuracy for the sake of informing them that they are misaligned. Metacognition is a powerful and important tool in a chemistry student's toolbox, so assessment tools which target this skill offer more holistic student feedback. It is up to those who use these tools in assessment development, both in research and instructional contexts, to decide what kind of metacognitive task is most beneficial for that particular context. If the confidence judgment is the best fit for the assessment through targeting accuracy in a specific cognitive domain, some adjustment may be warranted to better capture strict metacognition of content application. This may mean utilizing confidence rankings more frequently in formative assessment settings and/or improving and narrowing prompts and response options. Alternatively, well-designed self-efficacy prompts may provide the information desired by the assessment developer which would call for its own specific set of criteria to be met. Ultimately, assisting students in developing an accurate understanding of the utility of the metacognitive tier chosen will allow for it to be more reliably used to identify and combat alternate conceptions and improve chemistry instruction.
This journal is © The Royal Society of Chemistry 2023 |