Stephanie A.
Berg
and
Alena
Moon
*
Department of Chemistry, University of Nebraska, Lincoln, Nebraska, USA. E-mail: amoon3@unl.edu
First published on 13th October 2021
To develop competency in science practices, such as data analysis and interpretation, chemistry learners must develop an understanding of what makes an analysis and interpretation “good” (i.e., the criteria for success). One way that individuals extract the criteria for success in a novel situation is through making social comparisons, which is often facilitated in education as peer review. In this study, we explore using a simulated peer review as a method to help students generate internal feedback, self-evaluate, and revise their data analysis and interpretation. In interviews, we tasked students with interpreting graphical data to determine optimal conditions for an experiment. Students then engaged in social comparisons with three sample responses that we constructed and compared these samples to their own. We present a model informed by social comparison theory that outlines the different processes students went through to generate internal feedback for their own analysis and response. We then discuss the different ways students use this internal feedback to determine if and how to improve their response. Our study uncovers the underlying mechanism of self-evaluation in peer review and describes the processes that led students to revise their work and develop their analysis. This work provides insight for both practitioners and researchers to leverage student's internal feedback from comparisons to self-evaluate and revise their performance.
Of the documented challenges, many students begin to experience difficulties when working with empirical data. Students may fail to differentiate important data from unimportant data (Jeong et al., 2007). Students may also focus on surface features of data and ignore salient features that target the given phenomenon (Kanari and Millar, 2004; Heisterkamp and Talanquer, 2015). This can lead to students uncovering less relevant patterns in the data that do not effectively target the phenomenon (Zagallo et al., 2016). Focusing on these surface-level patterns in a dataset may lead to students missing the relevant scientific concepts (Lai et al., 2016). Students also face challenges when connecting patterns back to the target phenomenon. Many students will form conclusions with misconstrued reasoning or neglect using scientific reasoning entirely when connecting uncovered patterns from datasets to the target phenomenon (Heisterkamp and Talanquer, 2015; Becker et al., 2017).
To overcome these challenges and support students in developing competency in data analysis and interpretation, we propose peer review as a method to help students develop evaluative judgment in their data analysis and interpretation. In our study, we simulate peer review to explore how critiquing peers’ work helps learners develop evaluative judgment. Understanding how students evaluate their own work when giving feedback to others can inform and improve peer review practices in the classroom. Additionally, it can offer a practical approach to supporting undergraduate students’ development of competency in science practices.
Similar challenges have also been identified for data analysis and interpretation in chemistry contexts. Like in many other domains, chemistry students will often rely on surface features or less relevant features of data representations and models to form conclusions or construct explanations (Heisterkamp and Talanquer, 2015; Becker et al., 2017). In addition to this, many chemistry students will use misconstrued reasoning or neglect to use reasoning entirely when engaging in data analysis and interpretation (Heisterkamp and Talanquer, 2015; Becker et al., 2017). In a case study investigating the major types of reasoning general chemistry students use when engaging in data analysis and interpretation, participants relied on “hybridized” reasoning and mixed intuitive knowledge with their chemical knowledge when producing explanations (Heisterkamp and Talanquer, 2015). In another study investigating how students construct mathematical models to describe rate laws from empirical data, many of the students did not connect the mathematical model they had produced to the actual trends in the data (Becker et al., 2017). Becker and colleagues also found that some of the participants engaging in the data analysis and interpretation had produced conclusions without even consulting the kinetic data given to them. This is perhaps the most problematic approach to data analysis and interpretation, as the Next Generation Science Standards states that students must “present data as evidence to support their conclusions” when engaging in data interpretation and analysis (National Research Council, 2012).
The current literature in psychology, science education, and chemistry education have described how students engage in the practice of data analysis and interpretation and documented common challenges for students; however, little work has explored how to support the development of their data analysis and interpretation skills (Zagallo et al., 2016) and no work that we know of has done this in chemistry.
The process of receiving feedback in peer review has received much of the attention within peer review literature. Receiving feedback from multiple peers can help students evaluate their work and make changes to improve the quality of their work more so than only receiving feedback from an instructor (Cho and MacArthur, 2011); however, receiving feedback from peers does not guarantee a student will make necessary revisions to their work (Finkenstaedt-Quinn et al., 2019). Students must recognize the value of the feedback they are given and make judgements on what feedback must be incorporated, while also managing affect surrounding the feedback (Carless and Boud, 2018). This process of enabling feedback uptake takes time and labour to develop for both instructors and students.
Recent work has found that the gains from receiving feedback are less than the gains from giving feedback in peer review (Lundstrom and Baker, 2009; Cho and MacArthur, 2011; Anker-Hansen and Andrée, 2015; Ion et al., 2019; Nicol and McCallum, 2021). Giving feedback appears to engage students differently than receiving feedback from others. When giving feedback, students make comparisons with their own work (Nicol et al., 2014; McConlogue, 2015; van Popta et al., 2017). The student's own product will often serve as a reference to compare against. The comparison process allows for students to engage in active reflection on the task criteria and their own work (Nicol et al., 2014; McConlogue, 2015; Nicol and McCallum, 2021). Through producing feedback for others, students can generate internal feedback to inform and revise their draft to be in better compliance with their understanding of the task criteria. Students have reported that revising their draft in this way reduces the need for receiving feedback from peers, as they had already made the changes suggested to them by reflecting on their own work (Nicol et al., 2014; Anker-Hansen and Andrée, 2015; Nicol and McCallum, 2021).
To better understand how students evaluate and revise their own work when giving feedback to others, we can first consider the process of revising a written draft. Previous studies in college writing have found that when making revisions, students engage in a four-step process (Flower et al., 1986). First, they define the task, gaining a deeper understanding of what must be done in the task. This part of the review process is further supported by students self-reporting that they are able to take the perspective of an assessor and better understand the given standards for the task when providing feedback (Nicol et al., 2014). Second, students detect any problems that might be present in the work. To detect a problem, students must recognize differences between the given work and an ideal work that follows the standards defined in the first step. Students will often use their own work as a standard to compare against (Nicol et al., 2014). The differences that students find between the given works will likely be the problem they detect. Once the problems have been detected, they can be further identified in the third step: diagnosis of the problem. Flower states that the diagnosis of a problem “brings new information to the task” (Flower et al., 1986, p. 41). The problem diagnosis is not necessarily essential for the revision process; however, identifying and articulating the nature of a given problem is associated with more sophisticated revisions (Patchan and Schunn, 2015). Finally, a solution strategy is offered as the final step in the revision process. A strategy may involve getting rid of a problematic portion or revising and rewriting the given task.
The cognitive processes of making revisions overlaps with many of the cognitive processes associated with providing feedback for others in peer review (Patchan and Schunn, 2015). Students must be able to detect a problematic part of a work, diagnose what makes that part problematic, and then determine a solution strategy to improve the work. In addition to these processes, current peer review literature has also outlined how peer review can act as vehicle to generate internal feedback for students (Nicol, 2020; Nicol and McCallum, 2021). Because students use their own work as a benchmark to make comments on other's work, the resulting comparisons will promote active reflection on one's own work and help generate internal feedback about their performance. Generating this internal feedback is one way students are able to make improvements to their own work (Butler and Winne, 1995; Nicol et al., 2014; Nicol, 2020; Nicol and McCallum, 2021). A key step in this comparison is the explicit differentiation between a peer's perspective and one's own, or decentring (Teuscher et al., 2016; Moon et al., 2017). Decentring itself has shown to be productive in supporting one's own reasoning and interactions with others (Teuscher et al., 2016; Moon et al., 2017).
Social comparison theory was originally developed by social psychologist, Leon Festinger, in 1954. He theorized that when placed in ambiguous environments that produce uncertainty about how to think or behave, individuals will compare themselves with others in the same situation to reduce that uncertainty (Festinger, 1954). Later research in social psychology has found that people will often engage in social comparison in situations where there are specific criteria and standards (Levine, 1983; Martin, 2000; Smith and Arnkelsson, 2000; Alicke, 2007; Pomery et al., 2012; Miller et al., 2015; Greenwood, 2017). These comparisons serve to gauge an individual's performance and ability relative to others.
When engaging in social comparison, an individual will compare to a “target” (Martin, 2000; Smith and Arnkelsson, 2000; Alicke, 2007; Pomery et al., 2012; Miller et al., 2015; Greenwood, 2017). The target is simply the subject(s) to whom the individual compares themself to, and these subjects can be real or imaginary as long as they exist in a similar environment or situation. The individual's perception of the target's performance will determine the kind of social comparison being made. If the target's performance is perceived as superior in some way, it is considered an upwards comparison. If the target's performance is perceived as inferior in some way, the comparison is considered downwards. Performances that are perceived as similar are considered lateral comparisons. The direction of the social comparison is often influenced by the motivation for the social comparisons, beyond reducing uncertainty.
Further research in social comparison theory has found that there are two primary additional motivations for engaging in social comparisons: self-improvement and self-enhancement. Self-improvement is associated with upward comparisons (Dijkstra et al., 2008). By comparing one's work to a “better” model, an individual has the chance to gain inspiration or learn how to improve their own work. On the other hand, self-enhancement is associated with downward comparisons (Dijkstra et al., 2008). Individuals will engage in a downward comparison with a target that they perceive to be worse. This aids in helping the individual improve their perception of their own work, easing the anxiety and low self-esteem surrounding their performance or ability (Dijkstra et al., 2008).
Social psychologists have argued that the classroom creates the ideal conditions for engaging in social comparisons (Pepitone, 1972). Students are motivated to improve their learning and the act of learning new material in the classroom often generates cognitive uncertainty. Therefore, students are motivated to engage in social comparison as a method to evaluate and obtain internal feedback on their performance (Levine, 1983). Some, however, have hesitated to use social comparison in the classroom due to negative connotations of comparing oneself to others. There are underlying assumptions that engaging in social comparison could potentially cause feelings of inferiority, competitiveness amongst peers, and decreased motivation for some students (Levine, 1983). To minimize this possibility, we propose adjusting the conditions of social comparison within the peer review process by lowering the stakes of the comparison and having students review anonymous, preconstructed responses (Beach and Tesser, 2000).
Using social comparison theory to investigate peer review narrows our focus to the thoughts and perspective of the reviewer. We propose using this theory as a lens to explore the mechanisms by which a student evaluates their own work and generates internal feedback while giving feedback in a simulated peer review. This specifically guided the study as we seek to answer this central research question:
How do chemistry students evaluate their own data interpretations when critiquing hypothetical peers’ data interpretations?
![]() | ||
| Fig. 1 Graph modified from Doidge et al. (2016). | ||
In the second stage of the interview, participants evaluated three sample responses. They were told that these sample responses had been generated by students participating in the same study (i.e., interpreting the same data). The three responses corresponded to the three concentrations of hydrochloric acid considered in the first stage of the task. Importantly, we constructed each sample response to include potential epistemic errors that could be made in this context (e.g., only considering one variable). Each sample response contained accurate information from the graph but used different reasoning to support one of the three hydrochloric acid concentrations (Appendix). Students were presented with one sample response at a time to review. Students often began by identifying points of strength and weakness for the sample. If they did not explicitly bring up their own response at this point in the interview, they were directly asked to compare it to the sample response. This point served as the social comparison of the interview. Students generally brought up differences between the content and the quality of the sample responses and their own. Follow-up questions were asked as needed to elicit comparisons of both the content and quality types. After the comparison, students shared their feelings about their own response and analysis. This point served to gauge the student's confidence from engaging in the social comparison and providing feedback to the sample. If the student stated they felt less confident or had low confidence, the interviewer asked the student what kinds of changes they would make to their own response to improve their confidence. Students also shared why they felt their confidence was affected by reading the sample.
To begin the analysis, we used a combination of process coding and open coding to find patterns in students’ responses (Miles et al., 2014). Process coding is a form of open coding that uses gerunds to describe observable and conceptual actions performed by the participants in the study (Miles et al., 2014). All process codes and other open codes produced resulted from students’ own words describing their actions and confidence throughout the task. Additionally, there were some codes developed a priori used to describe gaps students identified within each written sample. These codes were weaknesses we had purposefully constructed into each response, and we anticipated students would identify them at some point in their interviews.
The process codes that we developed were used to describe students’ actions throughout the interview (Miles et al., 2014). We began by reading through each interview to identify how students responded to each sample response. As certain actions repeated within interviews and across interviews, codes were generated to describe the specific action. These codes related to both how students reacted towards the written samples and their own responses. Some examples of process codes from this point in the analysis include “offering constructive criticism”, “dismissing sample”, and “changing claim.”
To investigate students’ confidence, each interview was read through to see how students gauged their confidence when responding to different written samples. We coded the points when students stated an overall level of confidence or change in confidence, specifically noting if the student had stated they had higher or lower confidence. In addition to coding students’ confidence, we had noticed many students with lower confidence making statements such as “I don’t know” or “I don’t know about…” while engaging in the social comparison with the written sample. We considered these to be instances of students expressing cognitive uncertainty surrounding some element of the task. Mitigating uncertainty is one of the motivations people may have to engage in social comparison (Festinger, 1954; Martin, 2000; Smith and Arnkelsson, 2000; Pomery et al., 2012; Miller et al., 2015; Greenwood, 2017); therefore, by accounting for students’ expressions of uncertainty and documenting the specific elements students expressed uncertainty about, we better followed students’ thought process throughout the social comparison.
In the next iteration of analysis, we used axial coding to see how the different codes generated from open coding related to each other. This mainly involved relating the different process codes together to describe the general actions that students engaged in when giving feedback to the samples. We first used constant comparative analysis to sort students’ responses to each sample response based on whether they found gaps in their own response or not. The gaps were indicative of critical internal feedback the student had generated regarding their own work. From there, students were further sorted based on any changes in confidence they expressed after engaging in the social comparison with a sample response. This sorting included accounting for increases, decreases, or no notable changes in confidence. Finally, we further sorted students based on how they responded to their change in confidence. This first consisted of sorting students based on if they made changes to their response. Students who did make changes were then further sorted according to how they modified or planned to modify their response.
In the final stage of analysis, selective coding was done to piece together the general actions from our axial coding to outline the processes involved in giving feedback in peer review. Actions were put in sequential order to develop a model of obtaining internal feedback from peer review with the four categories from the axial coding stage as potential paths that could be taken. Student confidence and uncertainty of their own response were also incorporated into the model as observable events to track which path a student might end up taking when engaging in peer review during the interviews.
![]() | ||
| Fig. 2 Model of different paths through social comparison, internal feedback generation, self-evaluation, and revision. | ||
In addition to outlining the processes associated with generating and using internal feedback from peer review, our model also considers how a student's confidence and uncertainty change and influence how they use any internal feedback. After engaging in a social comparison, students’ confidence often changed, which we infer is related to the internal feedback they had generated from the comparison. We observed students with lower confidence and more uncertainty in their original response re-evaluate their original analysis. When the social comparison might have caused some uncertainty surrounding the quality of their work, many students were motivated to address their uncertainty by making changes to their answer.
We observed four different types of responses to the social comparison illustrated in Fig. 2. Each response category is distinct based off what kind of internal feedback students generated from the comparison, their resulting confidence after the comparison, and how students responded to their internal feedback. The response categories were also tied back to the different motivations that have been identified in social comparison theory.
“Minimal doesn't mean the same thing as minimum, if I'm not mistaken… I would make the assumption that 1% is a minimal amount of waste, but it's not the minimum amount of waste. So 1% is a really small amount of waste, but it's not the smallest amount of waste.” (Bruce)
Here, Bruce explains part of the criteria for his own response, noting the differences between a criterion of minimum waste and minimal waste within the context of the task. His answer was chosen and constructed to reflect his definition of minimal as a component of his criteria. Because of how implicit it was within a student's analysis, the criteria itself often did not surface until students engaged in the social comparison. Bruce, like many of the other students, did not fully explain what ‘minimal’ meant until he encountered another interpretation of the same prompt. It was through encountering an alternative interpretation of a sample and comparing it to their own that we observed the standards most students had for their own responses.
“They [0 M response] considered the impurity as the end all be all, however much gold we extract in the end, it is what it is. I kinda met or I started with at four and then worked my way down to two. I had the maximize gold approach and then the minimizing the impurity was kind of second hand to that.” (Evander)
Here, Evander recognizes that the 0 M response had interpreted the prompt differently than he had and is able to identify how the response differs from his own. He then provides his own approach to fulfilling the task, demonstrating that his own response acted as a benchmark for the comparison. Importantly, Evander very specifically uncovers the difference between the criteria being used in the sample and his own. Evander argues that the sample author considered only one criterion: eliminating impurity; whereas Evander prioritized maximizing gold followed by considering the impurity. Evander's quote illustrates the decentring that served as the first step in comparing a sample response to one's own. While all students used their own response as a benchmark for a comparison to the sample, some students also made additional comparisons to previous samples they had encountered in the interview. These comparisons were similar in nature to ones in which students used their own response as a benchmark, they just included more targets to compare to and these occurred later in the interviews after students had encountered multiple sample responses.
“They [4 M response] focused purely on the amount of [gold] extracted and they didn't take into account the potential for impurities as the concentration [of HCl] increased. So I guess starting from zero and going to four, like when they talked about that 65 to 95, they didn't, I guess not understand, but they didn't take into account the other two compounds that are classified as waste within the question.” (Evander)
Evander began his evaluation by identifying a gap in the 4 M response: the response only included information on gold. He recognized that the prompt considered two of the metals included in the task were waste and could be extracted with gold, causing an impure extraction to take place. Having a pure extraction was a criterion that informed Evander's own response for the task, so encountering a sample response that was not aligned with this criterion ended up resulting in a negative evaluation of the sample response.
Interviewer: “Okay. And how does this response [2 M response] compare to yours?”
Ben: “I think it's kind of on the same level. I think we're saying the same thing. I don't really see it as false. We both do the same kind of analysis and like we compare both of them while acknowledging the maximum and the minimum amounts.”
Interviewer: “How are you feeling about your response after reading this one?”
Ben: “I'm feeling good because I see that someone did the same thing I did. They analyzed it the same way without any – like it doesn't differ from mine. If this differed from mine and the conclusion was different, that would make me less confident because I can see I had an error in mine, which makes mine not correct.”
Although Ben had identified some argumentative gaps in the 2 M sample response earlier in the interview and had suggested that the response include more evidence to support its conclusion, he still viewed it as similar in quality to his own. He found that his own response and the sample had similar analyses and interpretations of the prompt that in turn validated the internal criteria for his own response. Seeing that his own internal criteria and analyses are mirrored in the 2 M sample response gave Ben positive internal feedback. Experiencing validation and higher confidence from positive internal feedback like this was indicative of a student experiencing self-enhancement from the social comparison. Students who experienced self-enhancement from the social comparison did not make changes to their response in any way; therefore, we consider them to not have been motivated to change their response. The validation they gained from the social comparison helped them to feel confident enough in the strength of their response that they likely did not feel an incentive to revise it.
“My answer [2 M] made sense to me when it was just me thinking it through. And then getting the perspective of these other two students and what they think—it just makes more sense to have absolutely zero waste and have 65% of the gold. Versus my answer you're having 90% of the gold but you have a little bit of waste… And in the paragraph, they want to use the maximum amount of gold with minimum amounts of waste. So, it just makes more sense to have absolutely zero waste and then you know that it is just the 65% of the gold going through.” (Violet)
Upon making the social comparison with the 0 M sample response, Violet generated critical internal feedback for her own response. Even though her original response seemed to fit her original internal criteria at the time, it did not seem to align with her new internal criteria as much as the 0 M sample response did after the social comparison. Violet's internal criteria seemed to shift after being exposed to the perspective of the 0 M response. She then identifies that the amount of waste at 2 M in her original response did not satisfy the “minimum amounts of waste” criterion as well as the 0 M sample response did after the comparison.
After engaging in the social comparison, students who generated critical internal feedback generally expressed doubt in the quality of their original response. They then were given an opportunity to make changes to their response to address any of the gaps they identified in their own response. By addressing their uncertainty in the quality of their response, students demonstrated that they were motivated by self-improvement. To begin improving their response, students first evaluated the alignment between their internal criteria and their response. The results of this evaluation then went on to affect what kinds of changes students made to their response.
Interviewer: “What is your confidence in your own response after reading this?”
Fernald: “I think that it's a little bit lower because it shows a weakness that I may not have explored in its entirety. And because I don't know the details, I could end up being wrong with my answer.”
Interviewer: “Okay. What changes would you make after reading this to your answer?”
Fernald: “I would probably use, I would ask to see the specific numbers because just guessing kind of off of a graph is not very effective. I'd try to find the ratio that would show that two molarity would be better than zero molarity, unless of course the reverse is true.”
When asked, Fernald began questioning his original response, stating that there was a gap in his response that, if not addressed, could make his response wrong. The potential “weakness” he mentioned had to do with whether or not 2 M had an appropriate amount of gold relative to other metals, something he had relied on with his reasoning in the first stage of the task. Fernald felt there was a gap in his response because he did not include numerical evidence to support his claim. To address the gap, Fernald sought new empirical evidence that would improve and adapt his response. Fernald's response still fit his internal criteria for the task (i.e., amounts of metals in the extraction), but by adapting it through incorporating new numerical evidence he would also definitively align it with his internal criteria.
Interviewer: “How has this affected your thinking about your own response?”
Hector: “It kind of made me realize that I didn't account for the single extraction test part. It also enforced that I talked about the gold yield on mine…So it sort of pointed out the things that I liked about mine while also, you know, showing the big point that I ended up missing.”
Interviewer: “Okay. Is it making you want to change your response at all?”
Hector: “Yeah, a bit.”
Interviewer: “Okay. How would you change your response?”
Hector: “If I ended up changing it? I would say I would switch to zero molarity HCl, just because I would want to get out as much gold as I can in a single extraction.”
Although Hector had generated some positive internal feedback by identifying a gap in the 0 M sample response that his own response addressed, he generated more critical feedback overall from the social comparison. His original internal criteria were fulfilled by his response, but Hector ended up modifying his internal criteria after the simulated peer review. Hector directly notes that his original response (and his internal criteria) did not account for a single extraction, something that was mentioned as a parameter for the experiment in the prompt. He then incorporated the single extraction criterion into his internal criteria and presumably noted that his original response did not fulfil his full set of criteria. To address this gap, Hector adopted a new claim to better align with the new criteria allowing Hector to better satisfy the updated criteria from the social comparison.
“Just because I guess it would have changed my perspective if I hadn't seen that four molar was the highest extraction. But I already kind of knew that the four molar was the highest extraction going into reading the answer. It didn't really propose anything different or any new information that I hadn't considered.” (Hal)
Earlier in his interview, Hal considered 4 M as a choice for his original response, but ultimately decided on 2 M as his final choice for his response. Reading this response exposed Hal to the same evidence and reasoning he had considered before. The lack of new information in the 4 M sample response did little to help generate internal feedback for Hal's own response; therefore, he did not feel motivated to make any changes to his response.
Some students within this category had different reactions to sample responses and would recognize the sample responses as valid. At times, they could even identify what informed that sample response. Consider Jo's (2 M) comparison to the 0 M sample response:
Interviewer: “Okay. How does this response compare to yours?”
Jo: “Like I said, different conclusion, most of the same reasoning. Um, I think they're both pretty strong and just have different opinions on the best way to do it.”
Interviewer: “Okay. What is your confidence in your own response right now?”
Jo: “Yeah, it's still the same. I considered all those factors too. I just came to a different conclusion.”
In her response, Jo recognized that her response and the 0 M sample response had similar reasoning, and even considered the 0 M sample response to be a strong argument. She also recognized that its perspective was informed differently than hers, hence the “different opinions” of the responses. Like Hal, she had already considered the information that the 0 M response used and did not feel any differently towards her own response after the social comparison. The social comparison produced no change in confidence and did not seem to offer Jo any critical internal feedback. With the lack of critical internal feedback, students likely did not feel any incentive to revise their response in any way to improve it.
To develop their internal criteria, students constructed certain plans to accomplish the task at hand and meet specific goals. To form these goals and plans for any given task, students must consider external information such as instructor's comments, task prompts, and instructions (Nicol, 2020). Nicol found that the goals that students end up forming to accomplish a task are informed by their prior knowledge, beliefs, experience with similar tasks, and their overall interpretation of instructions given to them. Once students had formed an interpretation of the instructions and constructed goals for their internal criteria, the goals shaped how students evaluated and interacted with all responses for the task, including their own response. Nicol has also reported that students’ criteria formed for a given task influence how they interact with all external products (i.e., their own response and other's responses for a given task) (Nicol et al., 2014; Nicol, 2020).
After producing internal criteria for the task, some students showed evidence of going through the process of decentring, which is the process of recognizing and understanding different perceptions and reasoning from one's own (Piaget, 1955). Decentring has been shown to lead to more productive discourse within the classroom. Physical chemistry students engaging in discourse in a process-oriented guided inquiry classroom demonstrated decentring when they recognized where their peer's response stemmed from, allowing them to consider alternative reasoning and reflect on their own as well (Moon et al., 2017). In our study, students showed evidence of decentring during their social comparison when they could identify the internal criteria that informed the sample response they were reviewing. For example, when comparing the 0 M sample response to his own (2 M) Evander stated that the 0 M sample response weighed the impurities present more heavily in its analysis. Although his own response involved accounting for impurities as well, it was weighed along with the other goal of obtaining a larger amount of gold. This demonstrates that Evander was able to recognize the perspective for the 0 M sample response and identify the internal criteria and reasoning that informed the perspective. It was through the act of decentring that allowed some students to develop their analysis and change their response. Students, such as Hector or Violet, adjusted their own criteria in some way after identifying other internal criteria that informed the sample response they were reviewing.
Students that changed their internal criteria or other aspects of their response did so because they gathered new information from the social comparison. According to Nicol (2020), students can use comparisons to gather external information to re-evaluate and modify their interpretations of instructions and therefore modify the strategies and tactics they use to accomplish the task. Students participating in our simulated peer review had generally engaged in multiple social comparisons before they adjusted their internal criteria or any part of their response. Yan and Brown(2017) also noted this in their investigation regarding student self-assessment. Students used multiple sources of external information to “calibrate” their own performance and evaluations of other's performances. Students generated internal feedback from multiple external sources of information that corroborated each other and then made changes to their work to address the abundance of internal feedback.
Even though students were engaged in multiple social comparisons to generate some sort of internal feedback, there were some instances in which students did not generate any observable internal feedback to evaluate their own work. This can be interpreted as a limitation of the interview setting, as students might have had unconscious internal feedback that we were unable to elicit through our interview protocol; however, Nicol's work in internal feedback suggests that providing external information for students to compare against does not guarantee they will make meaningful comparisons to produce internal feedback (Nicol, 2020). Instead, it is possible that students will “monitor” this external information without using it evaluate their own work. This “monitoring” could also explain why some students do not make changes to their own work when receiving explicit feedback from reviewers in traditional peer review settings (Finkenstaedt-Quinn et al., 2019). If they are not meaningfully engaging with external information such as peer's constructive criticisms, there is no reason to then generate internal feedback and revise their work.
The students who do end up generating internal feedback and revise their work are likely to be motivated to act in this way. Our results suggest that students are acting to address their critical internal feedback and mitigate their uncertainty in meeting their internal criteria as part of this motivation. By addressing their critical internal feedback and working to meet higher standards, students are attempting to improve their work. According to social comparison theory, these students seem to be motivated by self-improvement (Dijkstra et al., 2008). Previous work in peer review has also found that students report that they were motivated to improve the quality of their own work after being exposed to other's work (Nicol et al., 2014). Although our model may not capture this motivation wholly, we can consider this another form of self-improvement for self-improvement through adoption and adaption.
Our findings, which were grounded in real-time data analysis and interpretation and review, are echoed in other peer review studies that investigated similar processes retrospectively (i.e., focus group interviews following completion of peer review) (Nicol et al., 2014). Follow-up studies need to be conducted to ensure that the processes we identified occur similarly in an actual peer review setting and that they also lead to the four different outcomes we observed. Real peer review settings are not always anonymous, nor will students be guaranteed to see a variety of answers such as in our study. Students also tend to receive feedback in traditional peer review, something which we did not include for investigation in our study. In addition to this, our task for the simulated peer review was designed to target two specific performance expectations for the science practice of data analysis and interpretation: analyse data using tools, technologies, and/or models in order to determine optimal design solution and analyse data to identify design features or characteristics of the components of a proposed system to optimize it relative to criteria for success (National Research Council, 2012). Future research can and should consider using students’ internal feedback to regulate and develop other performance expectations within data analysis and interpretation and consider it for the seven other science practices outlined by the Next Generation Science Standards.
Internal feedback that students generate can also be leveraged in classroom settings. Offering students the opportunity to evaluate preconstructed sample responses allows students to generate internal feedback and evaluate their performance. Findings showed that students were adept at uncovering what ideas and distinctions were contained in a sample response, and how those ideas differed from their own. This means that instructors can leverage preconstructed sample responses to convey ideas, criteria, and nuances in a way that students can likely understand and use. Internal feedback can be generated from a comparison with many different external sources of information, so this practice could potentially be expanded to include comparisons with exemplar works or even a rubric for a given task (Nicol, 2020). Facilitating comparisons with sample responses or other sources of external information can be implemented in the chemistry classroom or in homework assignments. In chemistry contexts, facilitating comparison can potentially support the teaching of criteria for science practices that have been historically difficult to teach in a lecture or class setting, but are necessary for using chemical knowledge. For example, how to consider all data, weigh variables, and connect data to an assertion are rather difficult features of data analysis and interpretation to teach. This is the case for many of the science practices. Providing more opportunities for students to compare, evaluate, reflect, and revise their own work is a relatively low labour instructional method that could help to develop certain practices and foster student's own evaluative judgement. These opportunities could serve as a vehicle for having students extract and generate these criteria themselves.
| This journal is © The Royal Society of Chemistry 2022 |