Elon
Langbeheim
*a,
Einat
Ben-Eliyahu
a,
Emine
Adadan
b,
Sevil
Akaygun
b and
Umesh Dewnarain
Ramnarain
c
aDepartment of Science and Technology Education, Ben-Gurion University of the Negev, Beer-Sheva, Southern, Israel
bDepartment of Mathematics and Science Education, Bogazici Universitesi, Istanbul, Turkey
cUniversity of Johannesburg, Auckland Park, Gauteng, South Africa
First published on 28th July 2022
Learning progressions (LPs) are novel models for the development of assessments in science education, that often use a scale to categorize students’ levels of reasoning. Pictorial representations are important in chemistry teaching and learning, and also in LPs, but the differences between pictorial and verbal items in chemistry LPs is unclear. In this study, we examined an Ordered Multiple Choice (OMC) LP assessment of explanations of physical properties and processes in matter, that included equivalent verbal and pictorial items. A cohort of 235 grade 7 students that learned the particle model of matter, responded to these assessments and the data was analyzed in terms of their apparent levels of reasoning. We employed two analyses to examine the role of pictorial items in the level-based model of the LP: a polytomous RASCH analysis of the multiple-choice responses, and a verbal analysis of the students’ explanations of their choices. We found that our data does not fit a fine-grained, four-level model, but that it does fit a coarse-grained three-level model. In addition, when fitting the data to the three-level model, the pictorial items placed more students in the midlevel than their verbal counterparts. The verbal analysis showed that explanations of selections of pictures that represent a partial, midlevel understanding, were significantly less aligned with the details in the picture, than explanations of the correct, upper-level selections. Finally, the proportions of student explanations of both upper-level choices and midlevel choices that were aligned with the information in the pictures were correlated with the overall difficulty of the items. This suggests that complex pictorial representations of processes are less likely to reveal coherent reasoning.
While some studies provide empirical validation of sequential, level-based LPs, others find these models controversial. Critics of the sequentially ordered model of learning progressions argue that LPs are, at best, constructs imposed by the investigators and not a valid description of learning (Sikorski and Hammer, 2010). Instead, they suggest a complex view of the learning process consisting of hosts of possible pathways that are sensitive to curriculum, instruction, social status of the learners, and other contextual characteristics (Battista, 2011). Moreover, the non-unitary view of LPs expects that the highest-level performances in the progression – its’ “upper anchors” – should represent rich and sophisticated ways of reasoning about a topic, that resemble those of experts (Sikorski, 2019). So, for example, in the context of the structure of matter, upper anchors should include visual representations of molecular structures and processes – like those used by chemists to communicate their ideas (Kozma et al., 2000).
The context dependence of LPs applies also to its validation. The empirical adequacy (or inadequacy) of a level-based model for a particular LP in one content topic (e.g., force and motion), does not necessarily imply that LP assessments in others will be empirically adequate (Alonzo and Elby, 2019). Even within the same content area, changing the context of the items can influence the validity of the LP instrument in assessing student levels of reasoning (Fulmer, Liang and Liu, 2014). The context of the items can be related to wording, the number of multiple-choice options or the accompanying visual/symbolic representations.
Visual representations such as simulations and pictures are common conceptual scaffolds when learning and teaching chemistry (Kozma et al., 2000). But despite the central utility of visual depictions of atoms and molecules in teaching, and their appearance in LP assessments (e.g., Johnson, 2013), their role as a specific context in revealing students’ levels of reasoning has not been studied in depth. Thus, the purpose of this study is to explore how changing the context of assessment items from verbal to pictorial, influences the validity of the level-based structure in a LP of the atomic/molecular model of matter. To do so, we use RASCH analysis, and verbal analysis of students’ explanations of their responses to pictorial items.
The main argument in favor of the level-based model is its utility for formative assessment since it “portrays less sophisticated understandings in terms of progress toward – rather than only as deficient” (Alonzo, 2018). That is, the level-based characterization provides teachers a more nuanced sense of students’ reasoning, than a simplistic “gets it/doesn’t get it” (Alonzo and Elby, 2019). The “context-based” perspective, argues that modeling learning along a level-based pathway is not necessarily helpful for formative assessment, since there are multiple such pathways that are inherently embedded in curricular, cultural, and social contexts (Sikorski and Hammer, 2010). Instead, the “context-based” perspective suggests that since student thinking is inherently fragmented, responding to students’ specific ideas is more productive than responding to their diagnosed “level” of reasoning (Alonzo and Elby, 2019). Despite this difference, both the levels-based perspective and the context-based perspective advocate for instruction that builds upon students’ intuitive ideas, rather than treating these ideas as “wrong”, or as misconceptions that need to be replaced.
Another controversy in the level-based model concerns the relation of the upper-anchors to the levels below. In the “cumulative” perspective, every level in the LP, includes some productive idea from the level below it (Sikorski, 2019). For example, level 1 in the force and motion LP entails the concept of force as a push or a pull, but does not connect force to motion (Alonzo and Steedle, 2009; Fulmer et al., 2014). Students who are at level 2, think that motion implies a force in the direction of motion and that no-movement implies no force. Like those in level 1, the students in level 2 also hold the productive idea that force is a push or a pull, so that level 2 represents an accumulation of knowledge from level 1. A different perspective, implies that lower-level student responses reflect a repertoire of ideas that are qualitatively different from responses that represent higher levels. This approach diagnoses student reasoning based on common alternative mental models or ways of thinking that they develop during teaching, and not based on the amount of detail or sophistication of their reasoning. For example, many students who acquired the idea that matter is made of particles, infer (incorrectly) that the properties of particles must resemble those of the macroscopic object and therefore particles in solids are not moving (Pozo and Gomez Crespo, 2005). This error is not apparent among students that hold a cruder, “hybrid” mental model, that particles are embedded in a continuous substance, that are considered at a lower level (e.g., Johnson, 1998). While the cumulative perspective organizes the levels according to sophistication of ideas, and is expected to separate learners at different age levels that were exposed to differing portions of instruction, the “repertoire” perspective, may be more adequate for diagnosing learners in a single age level that were exposed to the same instruction or information.
The differences in constructs, also influence the characteristics of the levels within a single dimension/construct. For example, in Hadenfeldt et al. (2016), explanations of physical properties of matter that correctly describe particle motion were ranked as the upper-anchors. Conversely, in Morell et al. (2017), only explanations that combine accounts of particle motion and arrangements were scored as upper-anchors. In the Morell et al. (2017) study, students who wrote that gas particles move more freely than particles in liquids and solids but did not mention the space-filling, random arrangement of the gas particles, reflect a partial understanding. Thus, the development of ideas about the structure of matter and its changes, depends on its division into sub-dimensions, and the characterization of the levels of those dimensions. The order of appearance of ideas about the properties of particles and their motion, differs considerably among students, and reflects several, different limiting assumptions (Talanquer, 2009). For example, novice learners may reject the idea that the particles move in solids because of its static appearance, while learners at a more advanced stage may accept the idea that particles move, but only when forced to do so by an external causal agent such as heat (Talanquer, 2009).
In learning progressions assessments that utilize the Ordered Multiple Choice (OMC) method (e.g., Alonzo and Steedle, 2009), each of the choices in a multiple-choice item, is intended to represent one of the hypothesized levels. OMC assessments are scored easily, and if they work as intended, they can provide information for teachers about the level of their students. Let us consider the following OMC item adapted from Merritt and Krajcik (2013):
Playing in his room, Danny smelled the smell of the flowers in the other room. How does the smell of the flowers reach Danny's nose?
A. The air particles absorb the smell of the flowers and carry it to the nose
B. Danny inhaled and thereby drew the smell particles from the flowers to his nose
C. The air particles collide with the smell particles so that some of them reach Danny's nose
D. The smell of the flowers expands like smoke
According to the OMC method, each one of the responses is designed to represent a reasoning level, and is scored accordingly. Choice D (the smell of the flowers expands like smoke) represents the “naïve” concept because it does not contain the idea of particles, and its score is 0. Choice “A” represents “basic” level reasoning since the air particles carry macroscopic properties such as the ability “absorb” smell and carry it around, and its score is 1. Choice B represents a different “basic” model that views the smell as particles, but incorrectly represents their motion in the room as being cause by inhaling. According to Hadenfeldt et al. (2014), adequate descriptions of particles, with incorrect representation of motion, are at a higher level than “basic” level reasoning, so its score is 2. Option C represents a correct, “systemic” understanding, and is therefore the upper-anchor, with the score of 3.
Some studies show that pictorial items are easier than equivalent text-based items, especially among children of elementary school age (Saß et al., 2012; Lindner et al., 2017). However, visual representations are not always clear, and may mislead students (Langbeheim, 2015), so that navigating them requires representational competence (Kozma et al., 2000). Students who are competent in their use of representations are able to identify, describe, and analyze features of representations, and select the most appropriate representation for explaining phenomena (Russel and Kozma, 2005). Students with lower representational competence may be confused by visual representations, thus, when assessing students’ levels in a learning progression of the structure of matter, we need to account for their ability to interpret pictorial and verbal information.
In our previous study, we found that patterns of responses to equivalent verbal and pictorial assessment items were significantly different (Langbeheim et al., 2018), however, we did not examine the differences between learners’ response patterns, as representing a level-based progression model, nor do we know of other studies who did. As mentioned above, the level-based analysis has important implications for formative assessment in helping teachers to evolve from the right/wrong dichotomy, towards a more nuanced analysis of student understanding (Alonzo, 2018). Thus, this study has two goals: first, to examine whether the progression in particle-based explanations of physical changes in matter fits a levels-based model in a cohort of students of the same age. Second, we explore the differences and similarities in students’ response patterns to verbal and pictorial items. Finally, we examine the characteristics of students’ performance in pictorial items, by analyzing their explanations for the choices. We ask:
1. Is a level-based model valid for describing the progression in particle-based explanations of physical properties and changes in matter in pictorial and verbal items?
2. What are the similarities and differences in students’ response patterns to pictorial and verbal items?
3. To what extent are students’ explanations of their choices in pictorial items, aligned with the conceptual model represented by the pictures?
We then distributed each version of the assessment randomly among students (i.e., half of the students received one version of the form and the other half received the other version). Four identical items were included in both versions for RASCH linking (Haebara, 1980). A total of 235 Grade 7 students from three Israeli schools (mean age = 12.8, boys = 99, girls = 123, NA = 13) responded to the survey: 112 responded to form I containing the pictorial format of the item in Fig. 1 and 123 responded to form II with the verbal item. One item that portrays intermolecular forces (item 3 – see ESI†) was removed from the analysis, because most students did not understand its pictorial representation of forces.
In addition to the multiple-choice component of the questionnaire, seven of the items had a second-tier component: if the item was pictorial, the students were asked to explain their choice, and if the item was verbal, the students were asked to draw a picture that represents their choice. Students were given ample time – 40 minutes to complete the questionnaire.
Fine-grained, 4 level model (Hadenfeldt et al., 2016) | Coarse-grained, 3 level model |
---|---|
Level 3 – describing matter as made of particles with collective behavior in terms of arrangement and movement – e.g., “the air molecules collide with the smell particles so that some of them reach Danny's nose” | Level 2 – describing matter as made of particles with collective behavior in terms of arrangement and movement |
Level 2 – describing matter as made of particles with properties that differ from its macroscopic appearance, but particle motion is incorrectly related to macroscopic, properties e.g., “as the butter melts, its particles start to vibrate” (indicating that particles in solids do not move) | Level 1 – describing matter as made of particles embedded in a substance, or as particles that resemble its macroscopic properties, or with describing particle motion incorrectly |
Level 1 – describing particles as entities embedded in a substance, or as entities that resemble its macroscopic properties – e.g., “the balloon expands because the air particles in it – become larger” | |
Level 0 – describing matter without particle concepts – e.g., “a water drop does not contain particles” | Level 0 – describing matter without particle concepts |
After assigning a score to each choice, a polytomous RASCH model, calculates “difficulty thresholds” (“Thurstonian thresholds”) according to the difference between the average abilities of students who scored 0 and 1, 1 and 2 or 2 and 3 on this item (Andrich and Marais, 2019). We expect students with an overall higher ability (higher overall score on the instrument), to score higher than their peers on each of these items (Briggs et al., 2006). This will be reflected in an ordered set of difficulty thresholds, for example, if the threshold between levels/scores 0 and 1 is τ0–1 = (−0.71) the threshold between 1 and 2 is τ1–2 = 0.11 and the between 2 and 3 it τ2–3 = 0.78, then the thresholds are in an ordered sequence since τ0–1 < τ1–2 < τ2–3. In this way, the polytomous RASCH model analysis validates the hypothetical learning progression, since it shows that more able students choose the higher-level options.
We note that while the fine-grained level-based model, has n − 1 thresholds, where n is the number of levels, the coarse-grained model (with the single “messy” midlevel), expects only two thresholds in an ordered sequence: one threshold between the lower-anchor and the midlevel, and one between the midlevel and the upper-anchor students.
We used polytomous RASCH analysis using the TAM package (Robitzsch et al., 2020) to examine the reliability and empirical validity of each model. The EAP/PV reliabilities were 0.698 for the fine-grained model and 0.688 for the coarse-grained model analysis of the responses to form I, and 0.619 for the fine-grained model and 0.605 for the coarse-grained model of form II. These values are acceptable and are similar to the EAP/PV reliabilities of Hadenfeldt et al. (2016). Also, although, the four-level model reliabilities are slightly higher, they are fairly similar to the three-level model reliabilities. In addition, Table 2 shows that item outfit and infit MNSQ were all within the acceptable range between 0.7 and 1.3.
Form I | Form II | ||||||
---|---|---|---|---|---|---|---|
Item | Upper-anchor difficulty estimate | Outfit MNSQ | Infit MNSQ | Item | Upper-anchor difficulty estimate | Outfit MNSQ | Infit MNSQ |
Item1V | 0.387 | 1.00 | 1.00 | Item1P | −0.02 | 0.99 | 1.00 |
Item2P | −0.668 | 0.98 | 1.01 | Item2V | −0.72 | 1.01 | 1.00 |
Item4P | −3.83 | 0.74 | 1.00 | Item4V | −2.56 | 0.96 | 1.03 |
Item5V | −2.1 | 1.09 | 0.99 | Item5P | −1.132 | 0.99 | 0.99 |
Item8P | 0.303 | 1.00 | 1.02 | Item8V | 0.126 | 0.99 | 1.00 |
Item9V | −0.29 | 1.01 | 1.00 | Item9P | 0.52 | 1.00 | 1.00 |
Item12V | 0.73 | 1.00 | 1.00 | Item12P | 0.431 | 1.01 | 1.03 |
Item13P | −0.375 | 1.01 | 0.99 | Item13V | −0.499 | 1.00 | 1.00 |
The second part of the analysis, concerned students’ explanations of their choices. We defined an explanation as aligning with the picture chosen by the student, when it did not add information that was not shown in the picture and did not disregard important information included in the picture. For example, Table 3 below shows the explanation of student 217 who chose option C – a midlevel choice. This student explained that “gas spreads throughout space”, whereas in the picture of option C (see Fig. 1 above) the gas particles are located at the top of the balloon. Such an explanation does not fully align with the pictorial choice.
Category | Example of explanations that align with the pictorial choice | Example of explanations that do not fully align with the pictorial choice |
---|---|---|
Upper-anchor choice (D) | “The particles inside the balloon disperse randomly, unevenly and occupy the entire volume in which they are” (Student 207) | “Since the particles maintain the same size and shape but their quantity decreases” (Student 209) |
Midlevel choices | “After closing the balloon, all the particles go down” (Student 443, choice A) | “Because air is a gas and gas spreads throughout space” (Student 217, choice C) |
The coding of the alignment of students’ explanations with their selections, was conducted by two, independent coders (two of the authors) who reached an initial agreement of 85%, and resolved the disagreements in further discussions, until reaching a 100% agreement.
![]() | ||
Fig. 2 Between-levels thresholds produced by the polytomous RASCH model of the fine-grained coding of student levels for pictorial (left) and verbal (right) items. |
![]() | ||
Fig. 3 Between-levels thresholds produced by the polytomous RASCH model of the coarse-grained coding of student levels for pictorial items (left) and verbal ones (right). |
The comparison reveals better correspondence with the level-based model for the pictorial items, and the coarse-grained, two threshold model. The thresholds produced by the coarse-grained, 3-level model (Fig. 3), indicate better fit to the hypothesized order.
Fig. 3 shows that only two verbal items and one pictorial item do not fit the expected order of thresholds in the coarse-grained model. Both of the misfitting verbal items have lower-anchor statements that resemble the upper anchor statements – which seemed to attract high ability students. For example, item 4 asked how the air in a balloon arranges itself after some of it was released (see Fig. 1): the lower anchor option was “the air scatters to all parts of the balloon, but is not made of particles”. It is likely that some of the students who chose this option only read the first part of the sentence and disregarded the rest.
The only pictorial item that showed an inversion of the sequence (item 4) had only one student who chose the lower-anchor, (level 0), so this result is likely a random effect of the small sample and not an actual feature of the item.
When examining the explanations of those who chose the mid-level representations, we found that only 18% of the 49 students whose explanations were aligned with their choice. Twenty percent of students did not provide an explanation, and the rest seemed to misinterpret or disregard some of the details in the picture. Table 4 shows that a larger proportion of explanations of upper-anchor choices were aligned with the information in picture than the proportion mid-level choices in all of items. Chi-squared tests indicate that the alignment of the explanations of the upper-anchor choices, was significantly different than the alignment of the explanations of the mid-level options in almost all items. Also, the alignment of the explanations of the correct/upper-anchor choices was higher for less-difficult items such as item 4 (shown in Fig. 1), that were answered correctly by more students. Interestingly, a similar trend is reflected by the alignment of the mid-level explanation choices and the difficulty of the item. That is, there is a higher alignment between student explanations of their choices of pictures that represent midlevel reasoning in simpler items, such as item 4, than in more difficult items, such as item 12, as shown in Fig. 4.
Item number | Proportion of correct/upper anchor responses | Proportion of aligned explanations (%) | Number of mid-level responses | Proportion of aligned explanations (%) | Chi-squared statistic, sig. |
---|---|---|---|---|---|
1 | 67/123 | 62.7 | 49/123 | 18.4 | 22.6, p < 0.0001 |
2 | 66/112 | 51.5 | 33/112 | 30.3 | 3.58, p = 0.058 |
4 | 95/112 | 70.5 | 11/112 | 54.5 | 0.23, p = 0.63 |
5 | 76/123 | 61.8 | 27/123 | 25.9 | 10.3, p = 0.0013 |
8 | 46/112 | 71.7 | 52/112 | 40.4 | 9.70, p = 0.0018 |
12 | 40/123 | 35.0 | 67/123 | 9.0 | 11.2, p = 0.0008 |
13 | 55/112 | 56.4 | 44/112 | 20.5 | 13.1, p = 0.0003 |
![]() | ||
Fig. 4 A scatterplot of the proportion of aligned explanations, vs. the proportion of upper-anchor responses, which decreases with the difficulty of the item. |
Nevertheless, our study shows that the level-based assumption works fairly well in the coarse-grained model that assumes a “messy-middle” without distinct levels. Thus, a fine-grained multi-level model of LPs, may be valid for a multi-age cohort of students, with different levels of exposure to curricular content, as described in Hadenfeldt et al. (2016). But a single age group that encountered the same curricular unit about the physical changes in matter, could not be separated to two (or more) mid-levels and a coarse-grained model is a better fit for this situation. The coarse-grained model resembles the context-based perspective of learning progressions, since it collapses at least two forms of partial conceptualization of physical states and changes in matter into the same midlevel, thereby admitting that the progression reflects at least two (and probably more) different developmental pathways. That is, the coarse-grained model reflects a limited version of the level-based perspective, that leans towards the context-based perspective. Nevertheless, a three-level model has been used to organize the implicit assumptions that constrain student thinking about the structure, properties and dynamics of particles on the continuum between novice and expert (Talanquer, 2009) and provides more information about the students’ reasoning, than a traditional correct/incorrect multiple-choice assessment (Briggs et al., 2006).
When comparing the pattern of responses to verbal and pictorial items using the coarse-grained model, we found that reaching the correct, upper-anchor responses was similar for verbal and pictorial items. In addition, the distances between computed threshold values shown in Fig. 3, indicate that the midlevel range was significantly larger for pictorial items than for verbal ones. This means that the pictorial items place more students in the midlevel range than verbal items, and less in the lower-anchor range. Thus, while in our study pictorial items are not easier than the verbal items as in studies with younger children and other topics (Saß et al., 2012; Lindner et al., 2017), they recognize more students as mid-levelers that progress towards understanding. We note that the pictorial items in our study contained various representations of particles such as single dots, colored ovals and space filling molecular models. This multitude of representations might have caused confusion and increased the difficulty of the items. It is likely that a uniform representation of particles in pictorial items would have been even easier than in the current items, allowing for more lower-anchor students to be included in the mid-level.
Our findings provide evidence for a potential benefit of pictorial items, but we note that they emerged from the forced selection of answers in a multiple-choice format. Multiple-choice items often yield response patterns that differ from questions posed in an open-ended format (e.g., Osborne and Cosgove, 1983). So, in order to examine how the multiple-choice response patterns reflect student reasoning, we analyzed students’ verbal explanations of their choices. The analysis of the verbal explanations indicates two main findings regarding the relationship between the verbal explanation and the intended reasoning level represented by the picture. The first finding is the significantly larger alignment of the verbal explanations and the pictures representing the correct/upper-anchor model, than the alignment between the explanations of the choices of pictures representing the midlevel model (see Table 4). The lower alignment of midlevel choices supports the notion that their selection is not rooted in a stable mental model, or level of reasoning as claimed by proponents of the context-based perspective (Sikorski and Hammer, 2010). Their choices have more likely originated in temporary, dynamic mental constructs, that activate a set of knowledge resources in response to the specific item (Taber and Garcia-Franco, 2010). The higher alignment of explanations of upper-anchor responses, echoes findings from other learning progression studies that showed greater coherence in the responses of students with higher levels of understanding, and lower coherence in responses of midlevel and struggling students (e.g., Steedle and Shavelson, 2009)
While the lower coherence of midlevel responses is not a new finding, the correlation between the alignment of the explanations of pictures representing the midlevel and the difficulty of items as indicated in Fig. 4, has not been reported before to the best of our knowledge. This trend is somewhat counterintuitive, since difficult items usually reflect attractive distractors that represent robust lower-level ideas for which we could expect more consistent explanations. The similarity of the linear correlations between the alignment of middle level and upper anchor explanations, and the proportion of the correct responses (Fig. 4), may imply a confounding factor that influences both the difficulty of reaching the upper-anchor (i.e., answering correctly) and the difficulty to explain the choice of midlevel distractors. The confounding factor can be a representational ambiguity in the pictures, that is common to upper-anchor picture and the midlevel pictures, or the cognitive load imposed by the number of details in the pictures. Indeed, items with pictures including more visual details (e.g., more than one type of particle such as item number 9 – see ESI†) and items with ambiguous symbols such as arrows that represent motion were more difficult and produced a lower alignment of explanations with the mid-level choice, than items with less detailed representations such as item 4, shown in Fig. 1.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2rp00119e |
This journal is © The Royal Society of Chemistry 2022 |