Prompting hypothetical social comparisons to support chemistry students’ data analysis and interpretations

Stephanie A. Berg and Alena Moon *
Department of Chemistry, University of Nebraska, Lincoln, Nebraska, USA. E-mail: amoon3@unl.edu

Received 6th August 2021 , Accepted 12th October 2021

First published on 13th October 2021


Abstract

To develop competency in science practices, such as data analysis and interpretation, chemistry learners must develop an understanding of what makes an analysis and interpretation “good” (i.e., the criteria for success). One way that individuals extract the criteria for success in a novel situation is through making social comparisons, which is often facilitated in education as peer review. In this study, we explore using a simulated peer review as a method to help students generate internal feedback, self-evaluate, and revise their data analysis and interpretation. In interviews, we tasked students with interpreting graphical data to determine optimal conditions for an experiment. Students then engaged in social comparisons with three sample responses that we constructed and compared these samples to their own. We present a model informed by social comparison theory that outlines the different processes students went through to generate internal feedback for their own analysis and response. We then discuss the different ways students use this internal feedback to determine if and how to improve their response. Our study uncovers the underlying mechanism of self-evaluation in peer review and describes the processes that led students to revise their work and develop their analysis. This work provides insight for both practitioners and researchers to leverage student's internal feedback from comparisons to self-evaluate and revise their performance.


Introduction

Reforms in science education have called for the integration of science practices (i.e., the ways scientific knowledge is generated) into science instruction (National Research Council, 2012). Although there is consensus on the need for integrating these practices in the classroom (Council, 2012; Singer et al., 2012; Cooper et al., 2015), the science education community continues to investigate methods to support students’ competency in science practices. Recent research has found critique to be an essential component of competency in science practices (Ford, 2012; Osborne et al., 2016; González-Howard and McNeill, 2020). Of the eight science practices, research of student engagement in data analysis and interpretation has uncovered a multitude of challenges many students face.

Of the documented challenges, many students begin to experience difficulties when working with empirical data. Students may fail to differentiate important data from unimportant data (Jeong et al., 2007). Students may also focus on surface features of data and ignore salient features that target the given phenomenon (Kanari and Millar, 2004; Heisterkamp and Talanquer, 2015). This can lead to students uncovering less relevant patterns in the data that do not effectively target the phenomenon (Zagallo et al., 2016). Focusing on these surface-level patterns in a dataset may lead to students missing the relevant scientific concepts (Lai et al., 2016). Students also face challenges when connecting patterns back to the target phenomenon. Many students will form conclusions with misconstrued reasoning or neglect using scientific reasoning entirely when connecting uncovered patterns from datasets to the target phenomenon (Heisterkamp and Talanquer, 2015; Becker et al., 2017).

To overcome these challenges and support students in developing competency in data analysis and interpretation, we propose peer review as a method to help students develop evaluative judgment in their data analysis and interpretation. In our study, we simulate peer review to explore how critiquing peers’ work helps learners develop evaluative judgment. Understanding how students evaluate their own work when giving feedback to others can inform and improve peer review practices in the classroom. Additionally, it can offer a practical approach to supporting undergraduate students’ development of competency in science practices.

Background

Data analysis and interpretation

Data analysis and interpretation is one of eight science practices outlined in the Next Generation Science Standards (National Research Council, 2012). Practicing data analysis and interpretation often involves making sense of a visual representation, such as a graph or a table, and using it to form a conclusion. There are several processes students engage in to interpret data. First, the visual representation of data must be “decoded” where the information that is embedded is extracted (Carpenter and Shah, 1998; Shah and Hoeffner, 2002; Glazer, 2011; Zagallo et al., 2016). The difficulty of this step can vary depending on the kind of representation that is being used and the amount of information that is embedded within it (Shah and Hoeffner, 2002; Glazer, 2011). From here, relevant patterns within the data must be identified (Carpenter and Shah, 1998; Shah and Hoeffner, 2002; Glazer, 2011; Zagallo et al., 2016). This step can prove difficult for many students as studies have shown that students may selectively use data or struggle to differentiate important features of data from unimportant ones (Kanari and Millar, 2004; Jeong et al., 2007). By focusing on the less important information, students may uncover irrelevant patterns within the data (Zagallo et al., 2016). Focusing on less relevant data and patterns ultimately proves problematic when students must then tie the patterns back to the target phenomenon to form a conclusion or explanation (Carpenter and Shah, 1998; Shah and Hoeffner, 2002; Glazer, 2011). When the claims and explanations constructed from less relevant data do not effectively target the phenomenon for which the data has been collected, students may risk missing the relevant scientific concept entirely (Lai et al., 2016; Zagallo et al., 2016). For example, Zagallo and colleagues found that some groups of undergraduate biology students in a transformed Cell and Developmental Biology course became distracted by less relevant data during a classroom problem set (Zagallo et al., 2016). Although the students did eventually shift their focus to the relevant data, they did lose valuable class time and needed guidance from an instructor to lead them to the relevant scientific concept.

Similar challenges have also been identified for data analysis and interpretation in chemistry contexts. Like in many other domains, chemistry students will often rely on surface features or less relevant features of data representations and models to form conclusions or construct explanations (Heisterkamp and Talanquer, 2015; Becker et al., 2017). In addition to this, many chemistry students will use misconstrued reasoning or neglect to use reasoning entirely when engaging in data analysis and interpretation (Heisterkamp and Talanquer, 2015; Becker et al., 2017). In a case study investigating the major types of reasoning general chemistry students use when engaging in data analysis and interpretation, participants relied on “hybridized” reasoning and mixed intuitive knowledge with their chemical knowledge when producing explanations (Heisterkamp and Talanquer, 2015). In another study investigating how students construct mathematical models to describe rate laws from empirical data, many of the students did not connect the mathematical model they had produced to the actual trends in the data (Becker et al., 2017). Becker and colleagues also found that some of the participants engaging in the data analysis and interpretation had produced conclusions without even consulting the kinetic data given to them. This is perhaps the most problematic approach to data analysis and interpretation, as the Next Generation Science Standards states that students must “present data as evidence to support their conclusions” when engaging in data interpretation and analysis (National Research Council, 2012).

The current literature in psychology, science education, and chemistry education have described how students engage in the practice of data analysis and interpretation and documented common challenges for students; however, little work has explored how to support the development of their data analysis and interpretation skills (Zagallo et al., 2016) and no work that we know of has done this in chemistry.

Peer review

Peer review offers a unique opportunity to expose students to their peers’ work. The potential benefit is especially promising for tasks that require students to generate a product, as many of the science practices do (in this case, an evidence-based decision). Reviewing peers’ work can help students evaluate their own work and potentially make changes to improve it. Making improvements to their work could entail incorporating new evidence or reasoning they had encountered in their peer's work, or it could even involve producing an entirely new conclusion if their peer's work is more compelling than their own. On the other hand, if their peer's work is like their own, students may develop confidence in their conclusion. In this way, engaging students in peer review has been shown to develop evaluative judgement (Nicol et al., 2014). Evaluative judgement includes an understanding of the criteria for success and quality work within a given domain (Sadler, 2010). Therefore, developing evaluative judgement is key to learning what makes data analysis and interpretations good. For the science practices, evaluative judgment is a part of what has been referred to as deep understanding, or understanding of the epistemic criteria of science (Kuhn et al., 2017). Deep understanding is a key learning objective for engaging learners in science practices.

The process of receiving feedback in peer review has received much of the attention within peer review literature. Receiving feedback from multiple peers can help students evaluate their work and make changes to improve the quality of their work more so than only receiving feedback from an instructor (Cho and MacArthur, 2011); however, receiving feedback from peers does not guarantee a student will make necessary revisions to their work (Finkenstaedt-Quinn et al., 2019). Students must recognize the value of the feedback they are given and make judgements on what feedback must be incorporated, while also managing affect surrounding the feedback (Carless and Boud, 2018). This process of enabling feedback uptake takes time and labour to develop for both instructors and students.

Recent work has found that the gains from receiving feedback are less than the gains from giving feedback in peer review (Lundstrom and Baker, 2009; Cho and MacArthur, 2011; Anker-Hansen and Andrée, 2015; Ion et al., 2019; Nicol and McCallum, 2021). Giving feedback appears to engage students differently than receiving feedback from others. When giving feedback, students make comparisons with their own work (Nicol et al., 2014; McConlogue, 2015; van Popta et al., 2017). The student's own product will often serve as a reference to compare against. The comparison process allows for students to engage in active reflection on the task criteria and their own work (Nicol et al., 2014; McConlogue, 2015; Nicol and McCallum, 2021). Through producing feedback for others, students can generate internal feedback to inform and revise their draft to be in better compliance with their understanding of the task criteria. Students have reported that revising their draft in this way reduces the need for receiving feedback from peers, as they had already made the changes suggested to them by reflecting on their own work (Nicol et al., 2014; Anker-Hansen and Andrée, 2015; Nicol and McCallum, 2021).

To better understand how students evaluate and revise their own work when giving feedback to others, we can first consider the process of revising a written draft. Previous studies in college writing have found that when making revisions, students engage in a four-step process (Flower et al., 1986). First, they define the task, gaining a deeper understanding of what must be done in the task. This part of the review process is further supported by students self-reporting that they are able to take the perspective of an assessor and better understand the given standards for the task when providing feedback (Nicol et al., 2014). Second, students detect any problems that might be present in the work. To detect a problem, students must recognize differences between the given work and an ideal work that follows the standards defined in the first step. Students will often use their own work as a standard to compare against (Nicol et al., 2014). The differences that students find between the given works will likely be the problem they detect. Once the problems have been detected, they can be further identified in the third step: diagnosis of the problem. Flower states that the diagnosis of a problem “brings new information to the task” (Flower et al., 1986, p. 41). The problem diagnosis is not necessarily essential for the revision process; however, identifying and articulating the nature of a given problem is associated with more sophisticated revisions (Patchan and Schunn, 2015). Finally, a solution strategy is offered as the final step in the revision process. A strategy may involve getting rid of a problematic portion or revising and rewriting the given task.

The cognitive processes of making revisions overlaps with many of the cognitive processes associated with providing feedback for others in peer review (Patchan and Schunn, 2015). Students must be able to detect a problematic part of a work, diagnose what makes that part problematic, and then determine a solution strategy to improve the work. In addition to these processes, current peer review literature has also outlined how peer review can act as vehicle to generate internal feedback for students (Nicol, 2020; Nicol and McCallum, 2021). Because students use their own work as a benchmark to make comments on other's work, the resulting comparisons will promote active reflection on one's own work and help generate internal feedback about their performance. Generating this internal feedback is one way students are able to make improvements to their own work (Butler and Winne, 1995; Nicol et al., 2014; Nicol, 2020; Nicol and McCallum, 2021). A key step in this comparison is the explicit differentiation between a peer's perspective and one's own, or decentring (Teuscher et al., 2016; Moon et al., 2017). Decentring itself has shown to be productive in supporting one's own reasoning and interactions with others (Teuscher et al., 2016; Moon et al., 2017).

Social comparison theory

Peer review is an inherently social process in which students typically engage in comparison with other's work. These comparisons are affected by how the student perceives themself in relation to others. This will also impact the internal feedback that students generate from evaluating their own work while giving feedback in peer review. We propose using social comparison theory to investigate how chemistry students generate internal feedback and evaluate their own work when giving feedback in a peer review setting.

Social comparison theory was originally developed by social psychologist, Leon Festinger, in 1954. He theorized that when placed in ambiguous environments that produce uncertainty about how to think or behave, individuals will compare themselves with others in the same situation to reduce that uncertainty (Festinger, 1954). Later research in social psychology has found that people will often engage in social comparison in situations where there are specific criteria and standards (Levine, 1983; Martin, 2000; Smith and Arnkelsson, 2000; Alicke, 2007; Pomery et al., 2012; Miller et al., 2015; Greenwood, 2017). These comparisons serve to gauge an individual's performance and ability relative to others.

When engaging in social comparison, an individual will compare to a “target” (Martin, 2000; Smith and Arnkelsson, 2000; Alicke, 2007; Pomery et al., 2012; Miller et al., 2015; Greenwood, 2017). The target is simply the subject(s) to whom the individual compares themself to, and these subjects can be real or imaginary as long as they exist in a similar environment or situation. The individual's perception of the target's performance will determine the kind of social comparison being made. If the target's performance is perceived as superior in some way, it is considered an upwards comparison. If the target's performance is perceived as inferior in some way, the comparison is considered downwards. Performances that are perceived as similar are considered lateral comparisons. The direction of the social comparison is often influenced by the motivation for the social comparisons, beyond reducing uncertainty.

Further research in social comparison theory has found that there are two primary additional motivations for engaging in social comparisons: self-improvement and self-enhancement. Self-improvement is associated with upward comparisons (Dijkstra et al., 2008). By comparing one's work to a “better” model, an individual has the chance to gain inspiration or learn how to improve their own work. On the other hand, self-enhancement is associated with downward comparisons (Dijkstra et al., 2008). Individuals will engage in a downward comparison with a target that they perceive to be worse. This aids in helping the individual improve their perception of their own work, easing the anxiety and low self-esteem surrounding their performance or ability (Dijkstra et al., 2008).

Social psychologists have argued that the classroom creates the ideal conditions for engaging in social comparisons (Pepitone, 1972). Students are motivated to improve their learning and the act of learning new material in the classroom often generates cognitive uncertainty. Therefore, students are motivated to engage in social comparison as a method to evaluate and obtain internal feedback on their performance (Levine, 1983). Some, however, have hesitated to use social comparison in the classroom due to negative connotations of comparing oneself to others. There are underlying assumptions that engaging in social comparison could potentially cause feelings of inferiority, competitiveness amongst peers, and decreased motivation for some students (Levine, 1983). To minimize this possibility, we propose adjusting the conditions of social comparison within the peer review process by lowering the stakes of the comparison and having students review anonymous, preconstructed responses (Beach and Tesser, 2000).

Using social comparison theory to investigate peer review narrows our focus to the thoughts and perspective of the reviewer. We propose using this theory as a lens to explore the mechanisms by which a student evaluates their own work and generates internal feedback while giving feedback in a simulated peer review. This specifically guided the study as we seek to answer this central research question:

How do chemistry students evaluate their own data interpretations when critiquing hypothetical peers’ data interpretations?

Methods

Overview

The central phenomenon we have chosen to investigate is how students evaluate their own work when giving feedback in peer review, specifically focusing on the cognitive processes students go through to make changes to their work (Creswell and Poth, 2016). Semi-structured interviews were used to allow for a systematic approach in interviews with the added flexibility of probing questions.

Interview protocol

The interviews consisted of two stages. The first stage of the interview engaged the students in data analysis using a modified line graph shown in Fig. 1. The data was taken from a report of an experiment performed to extract gold from waste electronic and electric equipment (Doidge et al., 2016). Students were tasked with finding an optimal concentration of hydrochloric acid to obtain a maximal extraction of gold with a minimal extraction of waste metals, tin and iron. Students were given relevant experimental details to aid in their analysis and interpretation of the graph to help them choose between three different concentrations of hydrochloric acid (0 M, 2 M, and 4 M). At the end of this stage, students produced a written response to convince someone else and explain why their concentration choice was the best.
image file: d1rp00213a-f1.tif
Fig. 1 Graph modified from Doidge et al. (2016).

In the second stage of the interview, participants evaluated three sample responses. They were told that these sample responses had been generated by students participating in the same study (i.e., interpreting the same data). The three responses corresponded to the three concentrations of hydrochloric acid considered in the first stage of the task. Importantly, we constructed each sample response to include potential epistemic errors that could be made in this context (e.g., only considering one variable). Each sample response contained accurate information from the graph but used different reasoning to support one of the three hydrochloric acid concentrations (Appendix). Students were presented with one sample response at a time to review. Students often began by identifying points of strength and weakness for the sample. If they did not explicitly bring up their own response at this point in the interview, they were directly asked to compare it to the sample response. This point served as the social comparison of the interview. Students generally brought up differences between the content and the quality of the sample responses and their own. Follow-up questions were asked as needed to elicit comparisons of both the content and quality types. After the comparison, students shared their feelings about their own response and analysis. This point served to gauge the student's confidence from engaging in the social comparison and providing feedback to the sample. If the student stated they felt less confident or had low confidence, the interviewer asked the student what kinds of changes they would make to their own response to improve their confidence. Students also shared why they felt their confidence was affected by reading the sample.

Sample selection

The study took place at a large, Midwestern university in the fall of 2020. The researchers obtained approval for the study from the Institutional Review Board before recruiting for interviews and students consented immediately prior to participating in the interviews. Participating students (N = 18) were recruited for semi-structured interviews from both first-semester and second-semester general chemistry courses near the end of the semester. All interviews took place remotely over Zoom a week after the semester had ended.

Data collection

During the interviews, students used the chat feature to write their own response for the data analysis task, and later to review their response and read sample responses. The first author presented the student with one sample response at a time. Interviews lasted between 45 and 80 minutes. The resulting audio recordings were transcribed via Temi.com or Zoom, and the video recordings were kept for reference in the case that students made comments referring to visual features in the graph. All data collected from the interviews were deidentified and pseudonyms were assigned to each interview participant. Participants who completed the interview were compensated with $20 gift cards.

Data analysis

For this study we analysed interview transcripts from the second stage of the interview. In each interview, a student responded to three sample responses; thus, a total of 54 students’ social comparisons to samples were collected. Two of the social comparisons were excluded from the analysis and results because students did not show evidence of engaging in the social comparison.

To begin the analysis, we used a combination of process coding and open coding to find patterns in students’ responses (Miles et al., 2014). Process coding is a form of open coding that uses gerunds to describe observable and conceptual actions performed by the participants in the study (Miles et al., 2014). All process codes and other open codes produced resulted from students’ own words describing their actions and confidence throughout the task. Additionally, there were some codes developed a priori used to describe gaps students identified within each written sample. These codes were weaknesses we had purposefully constructed into each response, and we anticipated students would identify them at some point in their interviews.

The process codes that we developed were used to describe students’ actions throughout the interview (Miles et al., 2014). We began by reading through each interview to identify how students responded to each sample response. As certain actions repeated within interviews and across interviews, codes were generated to describe the specific action. These codes related to both how students reacted towards the written samples and their own responses. Some examples of process codes from this point in the analysis include “offering constructive criticism”, “dismissing sample”, and “changing claim.”

To investigate students’ confidence, each interview was read through to see how students gauged their confidence when responding to different written samples. We coded the points when students stated an overall level of confidence or change in confidence, specifically noting if the student had stated they had higher or lower confidence. In addition to coding students’ confidence, we had noticed many students with lower confidence making statements such as “I don’t know” or “I don’t know about…” while engaging in the social comparison with the written sample. We considered these to be instances of students expressing cognitive uncertainty surrounding some element of the task. Mitigating uncertainty is one of the motivations people may have to engage in social comparison (Festinger, 1954; Martin, 2000; Smith and Arnkelsson, 2000; Pomery et al., 2012; Miller et al., 2015; Greenwood, 2017); therefore, by accounting for students’ expressions of uncertainty and documenting the specific elements students expressed uncertainty about, we better followed students’ thought process throughout the social comparison.

In the next iteration of analysis, we used axial coding to see how the different codes generated from open coding related to each other. This mainly involved relating the different process codes together to describe the general actions that students engaged in when giving feedback to the samples. We first used constant comparative analysis to sort students’ responses to each sample response based on whether they found gaps in their own response or not. The gaps were indicative of critical internal feedback the student had generated regarding their own work. From there, students were further sorted based on any changes in confidence they expressed after engaging in the social comparison with a sample response. This sorting included accounting for increases, decreases, or no notable changes in confidence. Finally, we further sorted students based on how they responded to their change in confidence. This first consisted of sorting students based on if they made changes to their response. Students who did make changes were then further sorted according to how they modified or planned to modify their response.

In the final stage of analysis, selective coding was done to piece together the general actions from our axial coding to outline the processes involved in giving feedback in peer review. Actions were put in sequential order to develop a model of obtaining internal feedback from peer review with the four categories from the axial coding stage as potential paths that could be taken. Student confidence and uncertainty of their own response were also incorporated into the model as observable events to track which path a student might end up taking when engaging in peer review during the interviews.

Trustworthiness of analysis

Through these iterative cycles of coding, we developed a coding scheme to characterize student internal feedback from engaging in peer review and the changes students made to their original responses. A researcher from outside of the project was trained on the coding and then independently coded 10% of student's peer review in the interviews using the coding scheme. The coded subset of data was then compared and discussed between two of the researchers until they reached consensus. The main points of discussion concerned internal feedback for students that did not make any changes to their response and also stated they did not have any change in confidence for a given social comparison. Originally, this response category only included students who identified gaps in the sample responses but did not state a notable change in confidence or make changes to their response. The researchers discussed how to categorize a small sample of students who did not appear to actively engage in comparison with the sample responses. These students identified gaps in the sample responses, but also recognized the alternative responses as valid ways to approach the task. These students did not make any changes to their response, nor did they state that they felt differently about their response from the social comparison. Because of the lack of change to their response and lack of change in confidence, we inferred that they did not gain any observable internal feedback from the comparison. To capture this type of reaction, the “No Internal Feedback” category was expanded to include this type of reaction more explicitly. Discussions such as these refined the coding scheme and working model. The remaining data was coded by the first author, but the other trained researcher was consulted on two interviews for an additional perspective on coding.

Results

Overview of results

From the semi-structured interviews, we constructed a model that describes how students generated internal feedback from giving feedback in a one-sided peer review setting and how they then use the internal feedback to evaluate their work (Fig. 2). All students began with the same process of forming their own response and continued onto comparing and evaluating it against a written sample, but from there diverged into different paths depending on the kind of internal feedback they had developed from the social comparison. These paths further diverged based on how students used and responded to the internal feedback they had generated from the social comparison.
image file: d1rp00213a-f2.tif
Fig. 2 Model of different paths through social comparison, internal feedback generation, self-evaluation, and revision.

In addition to outlining the processes associated with generating and using internal feedback from peer review, our model also considers how a student's confidence and uncertainty change and influence how they use any internal feedback. After engaging in a social comparison, students’ confidence often changed, which we infer is related to the internal feedback they had generated from the comparison. We observed students with lower confidence and more uncertainty in their original response re-evaluate their original analysis. When the social comparison might have caused some uncertainty surrounding the quality of their work, many students were motivated to address their uncertainty by making changes to their answer.

We observed four different types of responses to the social comparison illustrated in Fig. 2. Each response category is distinct based off what kind of internal feedback students generated from the comparison, their resulting confidence after the comparison, and how students responded to their internal feedback. The response categories were also tied back to the different motivations that have been identified in social comparison theory.

Form internal criteria

Students first formed a response to fulfil the task. To do this, students needed to interpret the prompt from the first phase of the interview and define criteria to fulfil the task. For this task specifically, students needed to form criteria surrounding what minimal and maximal meant within the context of an extraction, and then translate the interpretation to the graph to find an appropriate answer for the task. These terms were ambiguous enough where there was no universal definition for students to use, so students were required to ascribe some sort of meaning to them. In constructing meaning for his criteria, Bruce (2 M), like many of the other participants, defined minimal as closely related to minimum, but not the same:

Minimal doesn't mean the same thing as minimum, if I'm not mistaken… I would make the assumption that 1% is a minimal amount of waste, but it's not the minimum amount of waste. So 1% is a really small amount of waste, but it's not the smallest amount of waste.” (Bruce)

Here, Bruce explains part of the criteria for his own response, noting the differences between a criterion of minimum waste and minimal waste within the context of the task. His answer was chosen and constructed to reflect his definition of minimal as a component of his criteria. Because of how implicit it was within a student's analysis, the criteria itself often did not surface until students engaged in the social comparison. Bruce, like many of the other students, did not fully explain what ‘minimal’ meant until he encountered another interpretation of the same prompt. It was through encountering an alternative interpretation of a sample and comparing it to their own that we observed the standards most students had for their own responses.

Comparison of sample response

After constructing their own response according to their criteria, students then encountered an alternative response and compared it against their own. Students often made comparisons of their interpretation of the prompt to the sample's interpretation, using their own interpretation as a benchmark. For example, consider Evander's comparison of his own interpretation (2 M) to the 0 M sample:

They [0 M response] considered the impurity as the end all be all, however much gold we extract in the end, it is what it is. I kinda met or I started with at four and then worked my way down to two. I had the maximize gold approach and then the minimizing the impurity was kind of second hand to that.” (Evander)

Here, Evander recognizes that the 0 M response had interpreted the prompt differently than he had and is able to identify how the response differs from his own. He then provides his own approach to fulfilling the task, demonstrating that his own response acted as a benchmark for the comparison. Importantly, Evander very specifically uncovers the difference between the criteria being used in the sample and his own. Evander argues that the sample author considered only one criterion: eliminating impurity; whereas Evander prioritized maximizing gold followed by considering the impurity. Evander's quote illustrates the decentring that served as the first step in comparing a sample response to one's own. While all students used their own response as a benchmark for a comparison to the sample, some students also made additional comparisons to previous samples they had encountered in the interview. These comparisons were similar in nature to ones in which students used their own response as a benchmark, they just included more targets to compare to and these occurred later in the interviews after students had encountered multiple sample responses.

Evaluation of sample response

Students’ comparisons with the sample response served as a basis for the evaluation step of the model. During this step students assessed the sample to see how well it fit with their internal criteria formed during the first step. When assessing the sample, students would identify different strengths and weaknesses of the sample response. Once these were identified, students would go on to determine how well the response aligned with the internal criteria they formed from the prompt. For instance, Evander's (2 M) evaluation of the 4 M response is heavily informed by his motivation for a pure gold extraction.

They [4 M response] focused purely on the amount of [gold] extracted and they didn't take into account the potential for impurities as the concentration [of HCl] increased. So I guess starting from zero and going to four, like when they talked about that 65 to 95, they didn't, I guess not understand, but they didn't take into account the other two compounds that are classified as waste within the question.” (Evander)

Evander began his evaluation by identifying a gap in the 4 M response: the response only included information on gold. He recognized that the prompt considered two of the metals included in the task were waste and could be extracted with gold, causing an impure extraction to take place. Having a pure extraction was a criterion that informed Evander's own response for the task, so encountering a sample response that was not aligned with this criterion ended up resulting in a negative evaluation of the sample response.

Self-enhancement route

How well the sample response fit with a student's internal criteria influenced what kind of internal feedback was obtained from the social comparison. Students who did not find that the sample response fit their internal criteria typically would find gaps in the response that made it weaker. This often gave students favourable internal feedback from the social comparison, as their own responses did not have these gaps making them relatively stronger; however, some students were able to obtain favourable internal feedback from social comparisons with sample responses that were similar in strength to their own, as is the case with Ben (2 M).

Interviewer: “Okay. And how does this response [2 M response] compare to yours?

Ben: “I think it's kind of on the same level. I think we're saying the same thing. I don't really see it as false. We both do the same kind of analysis and like we compare both of them while acknowledging the maximum and the minimum amounts.

Interviewer: “How are you feeling about your response after reading this one?

Ben: “I'm feeling good because I see that someone did the same thing I did. They analyzed it the same way without any – like it doesn't differ from mine. If this differed from mine and the conclusion was different, that would make me less confident because I can see I had an error in mine, which makes mine not correct.

Although Ben had identified some argumentative gaps in the 2 M sample response earlier in the interview and had suggested that the response include more evidence to support its conclusion, he still viewed it as similar in quality to his own. He found that his own response and the sample had similar analyses and interpretations of the prompt that in turn validated the internal criteria for his own response. Seeing that his own internal criteria and analyses are mirrored in the 2 M sample response gave Ben positive internal feedback. Experiencing validation and higher confidence from positive internal feedback like this was indicative of a student experiencing self-enhancement from the social comparison. Students who experienced self-enhancement from the social comparison did not make changes to their response in any way; therefore, we consider them to not have been motivated to change their response. The validation they gained from the social comparison helped them to feel confident enough in the strength of their response that they likely did not feel an incentive to revise it.

Self-improvement route

In contrast, students that experienced critical internal feedback often lost confidence in their own response, as we observed in Ben's explanation. Had the sample response had a different conclusion and analysis (and presumably fit his internal criteria) Ben would have lost confidence in his own response. This is what occurred with other students in their social comparisons, as Violet (2 M) noted after her comparison with the 0 M sample response:

My answer [2 M] made sense to me when it was just me thinking it through. And then getting the perspective of these other two students and what they think—it just makes more sense to have absolutely zero waste and have 65% of the gold. Versus my answer you're having 90% of the gold but you have a little bit of waste… And in the paragraph, they want to use the maximum amount of gold with minimum amounts of waste. So, it just makes more sense to have absolutely zero waste and then you know that it is just the 65% of the gold going through.” (Violet)

Upon making the social comparison with the 0 M sample response, Violet generated critical internal feedback for her own response. Even though her original response seemed to fit her original internal criteria at the time, it did not seem to align with her new internal criteria as much as the 0 M sample response did after the social comparison. Violet's internal criteria seemed to shift after being exposed to the perspective of the 0 M response. She then identifies that the amount of waste at 2 M in her original response did not satisfy the “minimum amounts of waste” criterion as well as the 0 M sample response did after the comparison.

After engaging in the social comparison, students who generated critical internal feedback generally expressed doubt in the quality of their original response. They then were given an opportunity to make changes to their response to address any of the gaps they identified in their own response. By addressing their uncertainty in the quality of their response, students demonstrated that they were motivated by self-improvement. To begin improving their response, students first evaluated the alignment between their internal criteria and their response. The results of this evaluation then went on to affect what kinds of changes students made to their response.

Self-Improvement: adaption

Students who found their original response still mostly fit their internal criteria maintained the essence of their response but made smaller changes. This kind of response towards critical internal feedback was considered self-improvement through adaption. Students within this response category were motivated to address their critical internal feedback by maintaining their original claim and adapting their response through minor revisions. Students typically proposed and made changes to their response by incorporating new evidence or reasoning into their responses. Fernald (2 M) does this after engaging in social comparison with the 0 M sample response.

Interviewer: “What is your confidence in your own response after reading this?

Fernald: “I think that it's a little bit lower because it shows a weakness that I may not have explored in its entirety. And because I don't know the details, I could end up being wrong with my answer.

Interviewer: “Okay. What changes would you make after reading this to your answer?

Fernald: “I would probably use, I would ask to see the specific numbers because just guessing kind of off of a graph is not very effective. I'd try to find the ratio that would show that two molarity would be better than zero molarity, unless of course the reverse is true.

When asked, Fernald began questioning his original response, stating that there was a gap in his response that, if not addressed, could make his response wrong. The potential “weakness” he mentioned had to do with whether or not 2 M had an appropriate amount of gold relative to other metals, something he had relied on with his reasoning in the first stage of the task. Fernald felt there was a gap in his response because he did not include numerical evidence to support his claim. To address the gap, Fernald sought new empirical evidence that would improve and adapt his response. Fernald's response still fit his internal criteria for the task (i.e., amounts of metals in the extraction), but by adapting it through incorporating new numerical evidence he would also definitively align it with his internal criteria.

Self-improvement: adoption

Students who found their original response did not fit their internal criteria after the social comparison often made other changes to their response. These changes consisted of adopting a new response with a different claim from their original. This kind of response to critical internal feedback was considered self-improvement through adoption. Students were motivated to wholly address their critical internal feedback surrounding the gaps in their original response and did so by adopting a new claim. Take Hector's social comparison of his 2 M response to the 0 M sample response:

Interviewer: “How has this affected your thinking about your own response?

Hector: “It kind of made me realize that I didn't account for the single extraction test part. It also enforced that I talked about the gold yield on mine…So it sort of pointed out the things that I liked about mine while also, you know, showing the big point that I ended up missing.

Interviewer: “Okay. Is it making you want to change your response at all?

Hector: “Yeah, a bit.

Interviewer: “Okay. How would you change your response?

Hector: “If I ended up changing it? I would say I would switch to zero molarity HCl, just because I would want to get out as much gold as I can in a single extraction.

Although Hector had generated some positive internal feedback by identifying a gap in the 0 M sample response that his own response addressed, he generated more critical feedback overall from the social comparison. His original internal criteria were fulfilled by his response, but Hector ended up modifying his internal criteria after the simulated peer review. Hector directly notes that his original response (and his internal criteria) did not account for a single extraction, something that was mentioned as a parameter for the experiment in the prompt. He then incorporated the single extraction criterion into his internal criteria and presumably noted that his original response did not fulfil his full set of criteria. To address this gap, Hector adopted a new claim to better align with the new criteria allowing Hector to better satisfy the updated criteria from the social comparison.

No internal feedback observed

Some of the students in our study did not seem to generate internal feedback from a social comparison. These students stated that they did not have any significant change of confidence after their comparison, and they did not make any changes to their response. Some of the students in this category mentioned that they did not gain any new information from reviewing the sample response, such as Hal (2 M) did with the 4 M response:

Just because I guess it would have changed my perspective if I hadn't seen that four molar was the highest extraction. But I already kind of knew that the four molar was the highest extraction going into reading the answer. It didn't really propose anything different or any new information that I hadn't considered.” (Hal)

Earlier in his interview, Hal considered 4 M as a choice for his original response, but ultimately decided on 2 M as his final choice for his response. Reading this response exposed Hal to the same evidence and reasoning he had considered before. The lack of new information in the 4 M sample response did little to help generate internal feedback for Hal's own response; therefore, he did not feel motivated to make any changes to his response.

Some students within this category had different reactions to sample responses and would recognize the sample responses as valid. At times, they could even identify what informed that sample response. Consider Jo's (2 M) comparison to the 0 M sample response:

Interviewer: “Okay. How does this response compare to yours?

Jo: “Like I said, different conclusion, most of the same reasoning. Um, I think they're both pretty strong and just have different opinions on the best way to do it.

Interviewer: “Okay. What is your confidence in your own response right now?

Jo: “Yeah, it's still the same. I considered all those factors too. I just came to a different conclusion.

In her response, Jo recognized that her response and the 0 M sample response had similar reasoning, and even considered the 0 M sample response to be a strong argument. She also recognized that its perspective was informed differently than hers, hence the “different opinions” of the responses. Like Hal, she had already considered the information that the 0 M response used and did not feel any differently towards her own response after the social comparison. The social comparison produced no change in confidence and did not seem to offer Jo any critical internal feedback. With the lack of critical internal feedback, students likely did not feel any incentive to revise their response in any way to improve it.

Discussion

The current study explored how students evaluate their own work in a simulated peer review and what affective changes arose during students’ self-evaluations. The reported results advance our understanding of the underlying mechanisms of peer review and self-evaluation that can accompany peer review. In doing so, we outlined the cognitive processes students went through to evaluate their own work and identified four distinct outcomes from a simulated peer review. The outcomes we identified differ based on how students used internal feedback from a social comparison to evaluate how well their response met certain internal criteria. The nature of student's generated internal criteria had deep impact on the social comparisons they made.

To develop their internal criteria, students constructed certain plans to accomplish the task at hand and meet specific goals. To form these goals and plans for any given task, students must consider external information such as instructor's comments, task prompts, and instructions (Nicol, 2020). Nicol found that the goals that students end up forming to accomplish a task are informed by their prior knowledge, beliefs, experience with similar tasks, and their overall interpretation of instructions given to them. Once students had formed an interpretation of the instructions and constructed goals for their internal criteria, the goals shaped how students evaluated and interacted with all responses for the task, including their own response. Nicol has also reported that students’ criteria formed for a given task influence how they interact with all external products (i.e., their own response and other's responses for a given task) (Nicol et al., 2014; Nicol, 2020).

After producing internal criteria for the task, some students showed evidence of going through the process of decentring, which is the process of recognizing and understanding different perceptions and reasoning from one's own (Piaget, 1955). Decentring has been shown to lead to more productive discourse within the classroom. Physical chemistry students engaging in discourse in a process-oriented guided inquiry classroom demonstrated decentring when they recognized where their peer's response stemmed from, allowing them to consider alternative reasoning and reflect on their own as well (Moon et al., 2017). In our study, students showed evidence of decentring during their social comparison when they could identify the internal criteria that informed the sample response they were reviewing. For example, when comparing the 0 M sample response to his own (2 M) Evander stated that the 0 M sample response weighed the impurities present more heavily in its analysis. Although his own response involved accounting for impurities as well, it was weighed along with the other goal of obtaining a larger amount of gold. This demonstrates that Evander was able to recognize the perspective for the 0 M sample response and identify the internal criteria and reasoning that informed the perspective. It was through the act of decentring that allowed some students to develop their analysis and change their response. Students, such as Hector or Violet, adjusted their own criteria in some way after identifying other internal criteria that informed the sample response they were reviewing.

Students that changed their internal criteria or other aspects of their response did so because they gathered new information from the social comparison. According to Nicol (2020), students can use comparisons to gather external information to re-evaluate and modify their interpretations of instructions and therefore modify the strategies and tactics they use to accomplish the task. Students participating in our simulated peer review had generally engaged in multiple social comparisons before they adjusted their internal criteria or any part of their response. Yan and Brown(2017) also noted this in their investigation regarding student self-assessment. Students used multiple sources of external information to “calibrate” their own performance and evaluations of other's performances. Students generated internal feedback from multiple external sources of information that corroborated each other and then made changes to their work to address the abundance of internal feedback.

Even though students were engaged in multiple social comparisons to generate some sort of internal feedback, there were some instances in which students did not generate any observable internal feedback to evaluate their own work. This can be interpreted as a limitation of the interview setting, as students might have had unconscious internal feedback that we were unable to elicit through our interview protocol; however, Nicol's work in internal feedback suggests that providing external information for students to compare against does not guarantee they will make meaningful comparisons to produce internal feedback (Nicol, 2020). Instead, it is possible that students will “monitor” this external information without using it evaluate their own work. This “monitoring” could also explain why some students do not make changes to their own work when receiving explicit feedback from reviewers in traditional peer review settings (Finkenstaedt-Quinn et al., 2019). If they are not meaningfully engaging with external information such as peer's constructive criticisms, there is no reason to then generate internal feedback and revise their work.

The students who do end up generating internal feedback and revise their work are likely to be motivated to act in this way. Our results suggest that students are acting to address their critical internal feedback and mitigate their uncertainty in meeting their internal criteria as part of this motivation. By addressing their critical internal feedback and working to meet higher standards, students are attempting to improve their work. According to social comparison theory, these students seem to be motivated by self-improvement (Dijkstra et al., 2008). Previous work in peer review has also found that students report that they were motivated to improve the quality of their own work after being exposed to other's work (Nicol et al., 2014). Although our model may not capture this motivation wholly, we can consider this another form of self-improvement for self-improvement through adoption and adaption.

Implications for research and practice

Our investigation of a simulated peer review suggests that peer review settings can help produce internal feedback to help students evaluate their own performance. So long as students engage in a purposeful social comparison with another response, they are likely to generate helpful internal feedback for themselves. Further, our findings show that students generated internal feedback from reviewing both similar and different others. The internal feedback from the comparison can validate the student's work, in which case students will not make changes to improve their performance and maintain their approach. The internal feedback can also provide incentive for students to revise their performance in effort to improve it. Although our research setting only simulated peer review, the kind of internal feedback we observed can potentially be reproduced in actual peer review settings. Many of our findings mirror that of traditional peer review work (Nicol, et al., 2014; Nicol, 2020), which suggests that the cognitive processes and mechanisms leading to internal feedback from our study are also occurring in traditional peer review settings.

Our findings, which were grounded in real-time data analysis and interpretation and review, are echoed in other peer review studies that investigated similar processes retrospectively (i.e., focus group interviews following completion of peer review) (Nicol et al., 2014). Follow-up studies need to be conducted to ensure that the processes we identified occur similarly in an actual peer review setting and that they also lead to the four different outcomes we observed. Real peer review settings are not always anonymous, nor will students be guaranteed to see a variety of answers such as in our study. Students also tend to receive feedback in traditional peer review, something which we did not include for investigation in our study. In addition to this, our task for the simulated peer review was designed to target two specific performance expectations for the science practice of data analysis and interpretation: analyse data using tools, technologies, and/or models in order to determine optimal design solution and analyse data to identify design features or characteristics of the components of a proposed system to optimize it relative to criteria for success (National Research Council, 2012). Future research can and should consider using students’ internal feedback to regulate and develop other performance expectations within data analysis and interpretation and consider it for the seven other science practices outlined by the Next Generation Science Standards.

Internal feedback that students generate can also be leveraged in classroom settings. Offering students the opportunity to evaluate preconstructed sample responses allows students to generate internal feedback and evaluate their performance. Findings showed that students were adept at uncovering what ideas and distinctions were contained in a sample response, and how those ideas differed from their own. This means that instructors can leverage preconstructed sample responses to convey ideas, criteria, and nuances in a way that students can likely understand and use. Internal feedback can be generated from a comparison with many different external sources of information, so this practice could potentially be expanded to include comparisons with exemplar works or even a rubric for a given task (Nicol, 2020). Facilitating comparisons with sample responses or other sources of external information can be implemented in the chemistry classroom or in homework assignments. In chemistry contexts, facilitating comparison can potentially support the teaching of criteria for science practices that have been historically difficult to teach in a lecture or class setting, but are necessary for using chemical knowledge. For example, how to consider all data, weigh variables, and connect data to an assertion are rather difficult features of data analysis and interpretation to teach. This is the case for many of the science practices. Providing more opportunities for students to compare, evaluate, reflect, and revise their own work is a relatively low labour instructional method that could help to develop certain practices and foster student's own evaluative judgement. These opportunities could serve as a vehicle for having students extract and generate these criteria themselves.

Conflicts of interest

There are no conflicts to declare.

Appendix

Below are the three sample responses students evaluated during the second phase of the interview.

4 M sample response

Starting at 0 M HCl, Au is only 65% extracted. As the concentration of HCl increases, so does the percent extraction of Au. The percent extraction of Au at 4 M HCl has reached its maximum and is 95% extracted. Because there is a large percent of Au in the organic layer at 4 M, the best concentration of HCl to use is 4 M.

0 M sample response

The best concentration of HCl to use is 0 M HCl. The percent of waste extracted increases as the concentration of HCl is increased. Because the waste extracted is not minimal at 2 M or 4 M, 0 M is a better option. 0 M HCl also has 65% Au extracted, so there is a large amount of Au that can be gathered with no waste present.

2 M sample response

From 0 M HCl to 4 M HCl, both Au and waste extraction increase as the concentration of HCl increases. Therefore, the best concentration of HCl to use is 2 M HCl. The increase in Au extraction from 2 M to 4 M does not justify the increase in waste. But the increase in Au extraction from 0 M to 2 M does justify the increase in waste.

Acknowledgements

We would like to thank the members of the Moon group for offering feedback on the model and helping establish trustworthiness.

References

  1. Alicke M. D., (2007), In defense of social comparison, Rev. Int. Psychol. Soc., 20(1), 11–29, available at: https://www.cairn.info/load_pdf.php?ID_ARTICLE=RIPSO_201_0011.
  2. Anker-Hansen J. and Andrée M., (2015), More blessed to give than receive – A study of peer-assessment of experimental design, Procedia – Soc. Behav. Sci., 167, 65–69 DOI:10.1016/j.sbspro.2014.12.643.
  3. Beach S. R. H. and Tesser A., (2000), Self-evaluation maintenance and evolution: Some speculative notes, in Suls J. and Wheeler L. (ed.), Handbook of Social Comparison, New York: Kluwer Academic.
  4. Becker N. M., Rupp C. A. and Brandriet A., (2017), Engaging students in analyzing and interpreting data to construct mathematical models: An analysis of students’ reasoning in a method of initial rates task, Chem. Educ. Res. Pract., 18(4), 798–810 10.1039/c6rp00205f.
  5. Butler D. L. and Winne P. H., (1995), Feedback and self-regulated learning: A theoretical synthesis, Rev. Educ. Res., 65(3), 245–281 DOI:10.3102/00346543065003245.
  6. Carless D. and Boud D., (2018), The development of student feedback literacy: Enabling uptake of feedback, Assess. Eval. High. Educ., 43(8), 1315–1325 DOI:10.1080/02602938.2018.1463354.
  7. Carpenter P. A. and Shah P., (1998), A model of the perceptual and conceptual processes in graph comprehension, J. Exp. Psychol.: Appl., 4(2), 75–100 DOI:10.1037/1076-898X.4.2.75.
  8. Cho K. and MacArthur C., (2011), Learning by Reviewing, J. Educ. Psychol., 103(1), 73–84 DOI:10.1037/a0021950.
  9. Cooper M. M. et al. (2015), Challenge faculty to transform STEM learning, Science, 350(6258), 281 LP–282 LP DOI:10.1126/science.aab0933.
  10. Council N. R., (2012), A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas, Washington, DC: The National Academies Press DOI:10.17226/13165.
  11. Creswell J. W., Poth C. N., (2016), Qualitative Inquiry and Research Design: Choosing Among Five Approaches, Los Angeles, CA: SAGE Publications.
  12. Dijkstra P. et al., (2008), Social comparison in the classroom: A review, Rev. Educ. Res., 78(4), 828–879 DOI:10.3102/0034654308321210.
  13. Doidge E. D. et al., (2016), A simple primary amide for the selective recovery of gold from secondary resources, Angew. Chem., Int. Ed., 55(40), 12436–12439 DOI:10.1002/anie.201606113.
  14. Festinger L., (1954), A theory of social comparison processes, Hum. Relat., 7(2), 117–140 DOI:10.1177/001872675400700202.
  15. Finkenstaedt-Quinn S. A. et al., (2019), Characterizing peer review comments and revision from a writing-to-learn assignment focused on Lewis structures, J. Chem. Educ., 96(2), 227–237 DOI:10.1021/acs.jchemed.8b00711.
  16. Flower L. et al., (1986), Detection, diagnosis, and the strategies of revision, Coll. Compos. Commun., 37(1), 16–55 DOI:10.2307/357381.
  17. Ford M. J., (2012), A dialogic account of sense-making in scientific argumentation and reasoning, Cogn. Instruct., 30(3), 207–245 DOI:10.1080/07370008.2012.689383.
  18. Glazer N., (2011), Challenges with graph interpretation: A review of the literature, Stud. Sci. Educ., 47(2), 183–210 DOI:10.1080/03057267.2011.605307.
  19. González-Howard M. and McNeill K. L., (2020), Acting with epistemic agency: Characterizing student critique during argumentation discussions, Sci. Educ., 104(6), 953–982 DOI:10.1002/sce.21592.
  20. Greenwood D., (2017), Social Comparison Theory’, The International Encyclopedia of Media Effects. (Major Reference Works), pp. 1–9 DOI:10.1002/9781118783764.wbieme0089.
  21. Heisterkamp K. and Talanquer V., (2015), Interpreting data: The hybrid mind, J. Chem. Educ., 92(12), 1988–1995 DOI:10.1021/acs.jchemed.5b00589.
  22. Ion G., Sánchez Martí A. and Agud Morell I., (2019), Giving or receiving feedback: Which is more beneficial to students’ learning?, Assess. Eval. High. Educ., 44(1), 124–138 DOI:10.1080/02602938.2018.1484881.
  23. Jeong H., Songer N. B. and Lee S. Y., (2007), Evidentiary competence: Sixth graders’ understanding for gathering and interpreting evidence in scientific investigations, Res. Sci. Educ., 37(1), 75–97 DOI:10.1007/s11165-006-9014-9.
  24. Kanari Z. and Millar R., (2004), Reasoning from data: How students collect and interpret data in science investigations, J. Res. Sci. Teach., 41(7), 748–769 DOI:10.1002/tea.20020.
  25. Kuhn D. et al., (2017), Can engaging in science practices promote deep understanding of them?, Sci. Educ., 101(2), 232–250 DOI:10.1002/sce.21263.
  26. Lai K. et al., (2016), Measuring graph comprehension, critique, and construction in science, J. Sci. Educ. Technol., 25(4), 665–681 DOI:10.1007/s10956-016-9621-9.
  27. Levine J. M., (1983), Social comparison and education, in Levine J. M. and Wang M. C. (ed.), Teacher and Student Perception: Implications for Learning, Hillsdale, NJ: Lawrence Earlbaum and Associates, Inc., pp. 29–55.
  28. Lundstrom K. and Baker W., (2009), To give is better than to receive: The benefits of peer review to the reviewer's own writing, J. Second Lang. Writ., 18(1), 30–43 DOI:10.1016/j.jslw.2008.06.002.
  29. Martin R., (2000), ‘“Can I do X?”: Using the proxy comparison model to predict performance, in Suls J. and Wheeler L. (ed.), Handbook of Social Comparison, New York: Kluwer Academic.
  30. McConlogue T., (2015), Making judgements: investigating the process of composing and receiving peer feedback, Stud. High. Educ., 40(9), 1495–1506 DOI:10.1080/03075079.2013.868878.
  31. Miles M. B., Michael Huberman A. and Saldaña J., (2014), Qualitative Data Analysis: A Methods Sourcebook, 3rd edn, Los Angeles, CA: SAGE.
  32. Miller M. K., Reichert J. and Flores D., (2015), Social comparison, in The Blackwell Encyclopedia of Sociology, (Major Reference Works) DOI:10.1002/9781405165518.wbeoss140.pub2.
  33. Moon A. et al., (2017), Decentering: A characteristic of effective student–student discourse in inquiry-oriented physical chemistry classrooms, J. Chem. Educ., 94(7), 829–836 DOI:10.1021/acs.jchemed.6b00856.
  34. National Research Council, (2012), A framework for K-12 science education: Practices, crosscutting concepts, and core ideas, A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas, Washington, DC: The National Academies Press DOI:10.17226/13165.
  35. Nicol D., (2020), The power of internal feedback: exploiting natural comparison processes, Assess. Eval. High. Educ., 1–23 DOI:10.1080/02602938.2020.1823314.
  36. Nicol D. and McCallum S., (2021), Making internal feedback explicit: Exploiting the multiple comparisons that occur during peer review, Assess. Eval. High. Educ., 1–19 DOI:10.1080/02602938.2021.1924620.
  37. Nicol D., Thomson A. and Breslin C., (2014), Rethinking feedback practices in higher education: A peer review perspective, Assess. Eval. High. Educ., 39(1), 102–122 DOI:10.1080/02602938.2013.795518.
  38. Osborne J. F., et al., (2016), The development and validation of a learning progression for argumentation in science, J. Res. Sci. Teach., 53(6), 821–846 DOI:10.1002/tea.21316.
  39. Patchan M. M. and Schunn C. D., (2015), Understanding the benefits of providing peer feedback: How students respond to peers’ texts of varying quality, Instruct. Sci., 43(5), 591–614 DOI:10.1007/s11251-015-9353-x.
  40. Pepitone E. A., (1972), Comparison behavior in elementary school children, Am. Educ. Res. J., 9(1), 45–63 DOI:10.3102/00028312009001045.
  41. Piaget J., (1955), The Language and Thought of the Child, Cleveland, OH: Meridian Books.
  42. Pomery E. A., Gibbons F. X. and Stock M. L., (2012), Social comparison, in Encyclopedia of Human Behavior: Second Edition, pp. 463–469 DOI:10.1016/B978-0-12-375000-6.00332-3.
  43. Sadler D. R., (2010), Beyond feedback: Developing student capability in complex appraisal, Assess. Eval. High. Educ., 35(5), 535–550 DOI:10.1080/02602930903541015.
  44. Shah P. and Hoeffner J., (2002), Review of graph comprehension research: Implications for instruction, Educ. Psychol. Rev., 14(1), 47–69 DOI:10.1023/A:1013180410169.
  45. Singer S. R., Nielsen N. R. and Schweingruber H. A., (2012), Discipline-based education research, National Academies Press.
  46. Smith W. P. and Arnkelsson G. B., (2000), Stability of related attributes and the inference of ability through social comparison, in Suls J. and Wheeler L. (ed.), Handbook of Social Comparison, New York: Kluwer Academic.
  47. Teuscher D., Moore K. C. and Carlson M. P., (2016), Decentering: A construct to analyze and explain teacher actions as they relate to student thinking, J. Math. Teach. Educ., 19(5), 433–456 DOI:10.1007/s10857-015-9304-0.
  48. van Popta E., et al., (2017), Exploring the value of peer feedback in online learning for the provider, Educ. Res. Rev., 20, 24–34 DOI:10.1016/j.edurev.2016.10.003.
  49. Yan Z. and Brown G. T. L., (2017), A cyclical self-assessment process: Towards a model of how students engage in self-assessment, Assess. Eval. High. Educ., 42(8), 1247–1262 DOI:10.1080/02602938.2016.1260091.
  50. Zagallo P., Meddleton S. and Bolger M. S., (2016), Teaching real data interpretation with models (TRIM): Analysis of student dialogue in a large-enrollment cell and developmental biology course, CBE Life Sci. Educ., 15(2), 1–18 DOI:10.1187/cbe.15-11-0239.

This journal is © The Royal Society of Chemistry 2022