Intersecting visual and verbal representations and levels of reasoning in the structure of matter learning progression

Elon Langbeheim; Einat Ben-Eliyahu; Emine Adadan; Sevil Akaygun; Umesh Dewnarain Ramnarain

doi:10.1039/D2RP00119E

View PDF VersionPrevious ArticleNext Article

DOI: 10.1039/D2RP00119E (Paper) Chem. Educ. Res. Pract., 2022, 23, 969-979

Intersecting visual and verbal representations and levels of reasoning in the structure of matter learning progression†

Elon Langbeheim *^a, Einat Ben-Eliyahu ^a, Emine Adadan ^b, Sevil Akaygun ^b and Umesh Dewnarain Ramnarain ^c
^aDepartment of Science and Technology Education, Ben-Gurion University of the Negev, Beer-Sheva, Southern, Israel
^bDepartment of Mathematics and Science Education, Bogazici Universitesi, Istanbul, Turkey
^cUniversity of Johannesburg, Auckland Park, Gauteng, South Africa

Received 29th April 2022 , Accepted 28th July 2022

First published on 28th July 2022

Abstract

Learning progressions (LPs) are novel models for the development of assessments in science education, that often use a scale to categorize students’ levels of reasoning. Pictorial representations are important in chemistry teaching and learning, and also in LPs, but the differences between pictorial and verbal items in chemistry LPs is unclear. In this study, we examined an Ordered Multiple Choice (OMC) LP assessment of explanations of physical properties and processes in matter, that included equivalent verbal and pictorial items. A cohort of 235 grade 7 students that learned the particle model of matter, responded to these assessments and the data was analyzed in terms of their apparent levels of reasoning. We employed two analyses to examine the role of pictorial items in the level-based model of the LP: a polytomous RASCH analysis of the multiple-choice responses, and a verbal analysis of the students’ explanations of their choices. We found that our data does not fit a fine-grained, four-level model, but that it does fit a coarse-grained three-level model. In addition, when fitting the data to the three-level model, the pictorial items placed more students in the midlevel than their verbal counterparts. The verbal analysis showed that explanations of selections of pictures that represent a partial, midlevel understanding, were significantly less aligned with the details in the picture, than explanations of the correct, upper-level selections. Finally, the proportions of student explanations of both upper-level choices and midlevel choices that were aligned with the information in the pictures were correlated with the overall difficulty of the items. This suggests that complex pictorial representations of processes are less likely to reveal coherent reasoning.

Introduction

Studies of children’ developing ability to explain the properties of matter and its changes have been a backbone of science education research since Piaget (Piaget and Inhelder, 1974). Learning progressions (LPs) are recent attempts to model these developmental pathways (Smith et al., 2006; Johnson and Tymms, 2011; Hadenfeldt et al., 2016; Morell et al., 2017). As in Piaget's studies, a central characteristic of LPs is a step-wise model of conceptual development. Contrary to Piaget, these levels are not related to cognitive maturation but are conceptualized as “anchors” that place learners’ reasoning on the continuum from naïve to expert, and help teachers to provide feedback adjusted to the level of the learners (Wilson, 2005).

While some studies provide empirical validation of sequential, level-based LPs, others find these models controversial. Critics of the sequentially ordered model of learning progressions argue that LPs are, at best, constructs imposed by the investigators and not a valid description of learning (Sikorski and Hammer, 2010). Instead, they suggest a complex view of the learning process consisting of hosts of possible pathways that are sensitive to curriculum, instruction, social status of the learners, and other contextual characteristics (Battista, 2011). Moreover, the non-unitary view of LPs expects that the highest-level performances in the progression – its’ “upper anchors” – should represent rich and sophisticated ways of reasoning about a topic, that resemble those of experts (Sikorski, 2019). So, for example, in the context of the structure of matter, upper anchors should include visual representations of molecular structures and processes – like those used by chemists to communicate their ideas (Kozma et al., 2000).

The context dependence of LPs applies also to its validation. The empirical adequacy (or inadequacy) of a level-based model for a particular LP in one content topic (e.g., force and motion), does not necessarily imply that LP assessments in others will be empirically adequate (Alonzo and Elby, 2019). Even within the same content area, changing the context of the items can influence the validity of the LP instrument in assessing student levels of reasoning (Fulmer, Liang and Liu, 2014). The context of the items can be related to wording, the number of multiple-choice options or the accompanying visual/symbolic representations.

Visual representations such as simulations and pictures are common conceptual scaffolds when learning and teaching chemistry (Kozma et al., 2000). But despite the central utility of visual depictions of atoms and molecules in teaching, and their appearance in LP assessments (e.g., Johnson, 2013), their role as a specific context in revealing students’ levels of reasoning has not been studied in depth. Thus, the purpose of this study is to explore how changing the context of assessment items from verbal to pictorial, influences the validity of the level-based structure in a LP of the atomic/molecular model of matter. To do so, we use RASCH analysis, and verbal analysis of students’ explanations of their responses to pictorial items.

Literature review

The learning progressions debate

Learning progressions are based on the logical hierarchy and complexity of concepts in a content domain (i.e., introducing the concept of atoms before the concept of molecules), and on common thinking patterns that students express when learning these concepts. Establishing a learning progression begins with a draft of a hypothetical or possible developmental sequence of ideas about a concept, which is sometimes called a construct map (Wilson, 2005). The establishment of the progression model proceeds with drafting assessment items that probe the sequence (e.g., Smith et al., 2006), and culminates with a validation of the sequence by analyzing student responses to the assessment (e.g., Hadenfeldt et al., 2016; Morell et al., 2017). Every learning progression has an upper-anchor – the expected, “normative” performance of students at the end of the learning progression, and a lower-anchor – the “naïve” knowledge that students are likely to use before being formally introduced to the topic. The range between the upper and lower anchors is the focus of a theoretical debate: some claim it can be modeled as a sequence of intermediate performance levels. However, other empirical studies showed that the range between the lower and upper anchor, is a fragmented “messy-middle” with no step-like structure (Gotwals and Songer, 2010) and question the adequacy of the level-based model (Alonzo and Elby, 2019).

The main argument in favor of the level-based model is its utility for formative assessment since it “portrays less sophisticated understandings in terms of progress toward – rather than only as deficient” (Alonzo, 2018). That is, the level-based characterization provides teachers a more nuanced sense of students’ reasoning, than a simplistic “gets it/doesn’t get it” (Alonzo and Elby, 2019). The “context-based” perspective, argues that modeling learning along a level-based pathway is not necessarily helpful for formative assessment, since there are multiple such pathways that are inherently embedded in curricular, cultural, and social contexts (Sikorski and Hammer, 2010). Instead, the “context-based” perspective suggests that since student thinking is inherently fragmented, responding to students’ specific ideas is more productive than responding to their diagnosed “level” of reasoning (Alonzo and Elby, 2019). Despite this difference, both the levels-based perspective and the context-based perspective advocate for instruction that builds upon students’ intuitive ideas, rather than treating these ideas as “wrong”, or as misconceptions that need to be replaced.

Another controversy in the level-based model concerns the relation of the upper-anchors to the levels below. In the “cumulative” perspective, every level in the LP, includes some productive idea from the level below it (Sikorski, 2019). For example, level 1 in the force and motion LP entails the concept of force as a push or a pull, but does not connect force to motion (Alonzo and Steedle, 2009; Fulmer et al., 2014). Students who are at level 2, think that motion implies a force in the direction of motion and that no-movement implies no force. Like those in level 1, the students in level 2 also hold the productive idea that force is a push or a pull, so that level 2 represents an accumulation of knowledge from level 1. A different perspective, implies that lower-level student responses reflect a repertoire of ideas that are qualitatively different from responses that represent higher levels. This approach diagnoses student reasoning based on common alternative mental models or ways of thinking that they develop during teaching, and not based on the amount of detail or sophistication of their reasoning. For example, many students who acquired the idea that matter is made of particles, infer (incorrectly) that the properties of particles must resemble those of the macroscopic object and therefore particles in solids are not moving (Pozo and Gomez Crespo, 2005). This error is not apparent among students that hold a cruder, “hybrid” mental model, that particles are embedded in a continuous substance, that are considered at a lower level (e.g., Johnson, 1998). While the cumulative perspective organizes the levels according to sophistication of ideas, and is expected to separate learners at different age levels that were exposed to differing portions of instruction, the “repertoire” perspective, may be more adequate for diagnosing learners in a single age level that were exposed to the same instruction or information.

Learning progression of the atomic/molecular model of matter

Studying the development of ideas about the structure of matter and its changes is a central effort in science education (e.g., Novick and Nussbaum, 1981; Osborne and Cosgrove, 1983; Johnson, 1998; 2000). In order to track the progress of learning the topic of the structure of matter and its changes, Andersson (1990) suggested to separate it into four dimensions or constructs: the atomic/molecular structure of matter, chemical properties and changes, physical properties and changes, and conservation of matter. LPs that concern physical material properties, involve concepts such as pressure or malleability, and physical processes such as thermal expansion (Morell et al., 2017). Chemical properties such as acidity and chemical processes such as redox reactions are regarded as another dimension or progress variable within chemistry LPs (Ngai et al., 2014). One of the goals of empirical studies of actual learning progression research was to use student responses to determine using statistical methods whether these dimensions can be treated as distinct conceptual constructs with different developmental patterns (e.g., Liu and Lesniak, 2006; Hadenfeldt et al., 2016; Morell et al., 2017). These studies revealed four conceptual dimensions or constructs in the progression of the ideas about matter, as suggested by Andersson (1990). However, the definitions of the conceptual dimensions/constructs, and the separation between their developmental patterns are not the same. For example, Hadenfeldt et al. (2016) and Liu and Lesniak (2006) defined a conservation dimension but in Smith et al. (2006) and Morell et al. (2017) the ideas of conservation are part of the macroscopic physical change construct.

The differences in constructs, also influence the characteristics of the levels within a single dimension/construct. For example, in Hadenfeldt et al. (2016), explanations of physical properties of matter that correctly describe particle motion were ranked as the upper-anchors. Conversely, in Morell et al. (2017), only explanations that combine accounts of particle motion and arrangements were scored as upper-anchors. In the Morell et al. (2017) study, students who wrote that gas particles move more freely than particles in liquids and solids but did not mention the space-filling, random arrangement of the gas particles, reflect a partial understanding. Thus, the development of ideas about the structure of matter and its changes, depends on its division into sub-dimensions, and the characterization of the levels of those dimensions. The order of appearance of ideas about the properties of particles and their motion, differs considerably among students, and reflects several, different limiting assumptions (Talanquer, 2009). For example, novice learners may reject the idea that the particles move in solids because of its static appearance, while learners at a more advanced stage may accept the idea that particles move, but only when forced to do so by an external causal agent such as heat (Talanquer, 2009).

The determination of levels of reasoning and their order in a learning progression

Johnson (1998) described students’ explanations of physical processes in matter such as phase change, as a progression of common mental models. Students holding the most ‘naïve’ model do not explain these processes in terms of particles at all. Those with a slightly more developed ‘hybrid’ mental model, mention particles but view them as superficially embedded in the substance. Students, at the next ‘simple/basic particle concept’ level, think of matter as composed of particles but imagine these particles as having the same appearance and characteristics as the observable, macroscopic object. Such students erroneously explain that the bubbles in boiling water are composed of air particles, because they appear transparent like air (Johnson, 1998). Students with a ‘systemic’ model of matter envision particles that do not resemble that macroscopic object, interact through electric forces, constantly move in various patterns of motion, etc. Johnson's categorization was then refined and expanded to a five-level progression (or construct-map) with an additional ‘differentiated particle concepts’ level between the ‘simple’ and ‘systemic’ levels (Hadenfeldt et al., 2014).

In learning progressions assessments that utilize the Ordered Multiple Choice (OMC) method (e.g., Alonzo and Steedle, 2009), each of the choices in a multiple-choice item, is intended to represent one of the hypothesized levels. OMC assessments are scored easily, and if they work as intended, they can provide information for teachers about the level of their students. Let us consider the following OMC item adapted from Merritt and Krajcik (2013):

Playing in his room, Danny smelled the smell of the flowers in the other room. How does the smell of the flowers reach Danny's nose?

A. The air particles absorb the smell of the flowers and carry it to the nose

B. Danny inhaled and thereby drew the smell particles from the flowers to his nose

C. The air particles collide with the smell particles so that some of them reach Danny's nose

D. The smell of the flowers expands like smoke

According to the OMC method, each one of the responses is designed to represent a reasoning level, and is scored accordingly. Choice D (the smell of the flowers expands like smoke) represents the “naïve” concept because it does not contain the idea of particles, and its score is 0. Choice “A” represents “basic” level reasoning since the air particles carry macroscopic properties such as the ability “absorb” smell and carry it around, and its score is 1. Choice B represents a different “basic” model that views the smell as particles, but incorrectly represents their motion in the room as being cause by inhaling. According to Hadenfeldt et al. (2014), adequate descriptions of particles, with incorrect representation of motion, are at a higher level than “basic” level reasoning, so its score is 2. Option C represents a correct, “systemic” understanding, and is therefore the upper-anchor, with the score of 3.

The role of pictorial visualizations in the structure of matter LP

Visual representations are a key aspect in learning chemistry in general (Kozma et al., 2000) and in supporting students’ learning the atomic/molecular model of matter in particular (Nussbaum and Novick, 1982). Using illustrations for assessing student understanding has been reported in several studies (e.g., Merritt and Krajcik, 2013; Akaygun and Jones, 2014).

Some studies show that pictorial items are easier than equivalent text-based items, especially among children of elementary school age (Saß et al., 2012; Lindner et al., 2017). However, visual representations are not always clear, and may mislead students (Langbeheim, 2015), so that navigating them requires representational competence (Kozma et al., 2000). Students who are competent in their use of representations are able to identify, describe, and analyze features of representations, and select the most appropriate representation for explaining phenomena (Russel and Kozma, 2005). Students with lower representational competence may be confused by visual representations, thus, when assessing students’ levels in a learning progression of the structure of matter, we need to account for their ability to interpret pictorial and verbal information.

In our previous study, we found that patterns of responses to equivalent verbal and pictorial assessment items were significantly different (Langbeheim et al., 2018), however, we did not examine the differences between learners’ response patterns, as representing a level-based progression model, nor do we know of other studies who did. As mentioned above, the level-based analysis has important implications for formative assessment in helping teachers to evolve from the right/wrong dichotomy, towards a more nuanced analysis of student understanding (Alonzo, 2018). Thus, this study has two goals: first, to examine whether the progression in particle-based explanations of physical changes in matter fits a levels-based model in a cohort of students of the same age. Second, we explore the differences and similarities in students’ response patterns to verbal and pictorial items. Finally, we examine the characteristics of students’ performance in pictorial items, by analyzing their explanations for the choices. We ask:

1. Is a level-based model valid for describing the progression in particle-based explanations of physical properties and changes in matter in pictorial and verbal items?

2. What are the similarities and differences in students’ response patterns to pictorial and verbal items?

3. To what extent are students’ explanations of their choices in pictorial items, aligned with the conceptual model represented by the pictures?

Method

Instrument and data collection

We developed two versions of a LP assessment of particle-based explanations of physical properties and changes in matter, with pictorial and verbal items. We used verbal items from prior studies (e.g., Osborne and Cosgrove, 1983; Merritt and Krajcik, 2013; Hadenfeldt et al., 2016), and developed equivalent pictorial items as shown in Fig. 1. The pictures contain space-filling molecular structures that are used in textbooks, and arrows or “wiggle” symbols to indicate motion. Faster particle motion, was indicated by longer arrows, and a glossary explaining the difference between the long and short arrows. We then asked chemistry education specialists to check whether the illustrations are clear, and to rate the level of reasoning they identified in the pictures for face validity. The review panel suggested to change a few pictures that did not represent the expected ideas clearly, so that the final version of the instrument included 13 items assessing physical states (e.g., even, although random distribution of gas particles in space) and physical changes such as diffusion and phase changes (the full instrument can be found in the ESI†).


	Fig. 1 Two versions – pictorial and verbal of item 4 in the learning progression assessments.

We then distributed each version of the assessment randomly among students (i.e., half of the students received one version of the form and the other half received the other version). Four identical items were included in both versions for RASCH linking (Haebara, 1980). A total of 235 Grade 7 students from three Israeli schools (mean age = 12.8, boys = 99, girls = 123, NA = 13) responded to the survey: 112 responded to form I containing the pictorial format of the item in Fig. 1 and 123 responded to form II with the verbal item. One item that portrays intermolecular forces (item 3 – see ESI†) was removed from the analysis, because most students did not understand its pictorial representation of forces.

In addition to the multiple-choice component of the questionnaire, seven of the items had a second-tier component: if the item was pictorial, the students were asked to explain their choice, and if the item was verbal, the students were asked to draw a picture that represents their choice. Students were given ample time – 40 minutes to complete the questionnaire.

Data analysis

An ordered multiple-choice (OMC) coding of the item in Fig. 1, assigns the lowest score to option E that represents level 0 (air is not made of particles), an intermediate score to options A, B and C that represent level 1 – (the air is made of particles, but when releasing some of it, the remaining air particles bunch and remain in the same area). Level 2 – the scientific model – is represented by option D that envisions the gas as particles that are randomly distributed, but fill the space of the balloon (Hadenfeldt et al., 2016). The scoring of some of the items, such as the “smell” item (Merritt and Krajcik, 2013) that was mentioned above, utilized a fine-grained, 4-level progression, that treated incorrect particle motion ideas as a level in its own, similar to Hadenfeldt et al. (2016). However, particle-based explanations with incorrect particle motion descriptions may be treated as a partially correct model like the “hybrid” model, with a midlevel score of 1 and the upper-anchor choice will receive the score of 2 in a “coarse-grained” 3-level model, as shown in Table 1.

Table 1 Two suggested construct maps for the particle structure of matter

Fine-grained, 4 level model (Hadenfeldt et al., 2016)	Coarse-grained, 3 level model
Level 3 – describing matter as made of particles with collective behavior in terms of arrangement and movement – e.g., “the air molecules collide with the smell particles so that some of them reach Danny's nose”	Level 2 – describing matter as made of particles with collective behavior in terms of arrangement and movement

Level 2 – describing matter as made of particles with properties that differ from its macroscopic appearance, but particle motion is incorrectly related to macroscopic, properties e.g., “as the butter melts, its particles start to vibrate” (indicating that particles in solids do not move)	Level 1 – describing matter as made of particles embedded in a substance, or as particles that resemble its macroscopic properties, or with describing particle motion incorrectly

Level 1 – describing particles as entities embedded in a substance, or as entities that resemble its macroscopic properties – e.g., “the balloon expands because the air particles in it – become larger”

Level 0 – describing matter without particle concepts – e.g., “a water drop does not contain particles”	Level 0 – describing matter without particle concepts

After assigning a score to each choice, a polytomous RASCH model, calculates “difficulty thresholds” (“Thurstonian thresholds”) according to the difference between the average abilities of students who scored 0 and 1, 1 and 2 or 2 and 3 on this item (Andrich and Marais, 2019). We expect students with an overall higher ability (higher overall score on the instrument), to score higher than their peers on each of these items (Briggs et al., 2006). This will be reflected in an ordered set of difficulty thresholds, for example, if the threshold between levels/scores 0 and 1 is τ_0–1 = (−0.71) the threshold between 1 and 2 is τ_1–2 = 0.11 and the between 2 and 3 it τ_2–3 = 0.78, then the thresholds are in an ordered sequence since τ_0–1 < τ_1–2 < τ_2–3. In this way, the polytomous RASCH model analysis validates the hypothetical learning progression, since it shows that more able students choose the higher-level options.

We note that while the fine-grained level-based model, has n − 1 thresholds, where n is the number of levels, the coarse-grained model (with the single “messy” midlevel), expects only two thresholds in an ordered sequence: one threshold between the lower-anchor and the midlevel, and one between the midlevel and the upper-anchor students.

We used polytomous RASCH analysis using the TAM package (Robitzsch et al., 2020) to examine the reliability and empirical validity of each model. The EAP/PV reliabilities were 0.698 for the fine-grained model and 0.688 for the coarse-grained model analysis of the responses to form I, and 0.619 for the fine-grained model and 0.605 for the coarse-grained model of form II. These values are acceptable and are similar to the EAP/PV reliabilities of Hadenfeldt et al. (2016). Also, although, the four-level model reliabilities are slightly higher, they are fairly similar to the three-level model reliabilities. In addition, Table 2 shows that item outfit and infit MNSQ were all within the acceptable range between 0.7 and 1.3.

Table 2 Item difficulties and fit statistics for forms I and II, with the coarse-grained model

Form I				Form II
Item	Upper-anchor difficulty estimate	Outfit MNSQ	Infit MNSQ	Item	Upper-anchor difficulty estimate	Outfit MNSQ	Infit MNSQ
Item1V	0.387	1.00	1.00	Item1P	−0.02	0.99	1.00
Item2P	−0.668	0.98	1.01	Item2V	−0.72	1.01	1.00
Item4P	−3.83	0.74	1.00	Item4V	−2.56	0.96	1.03
Item5V	−2.1	1.09	0.99	Item5P	−1.132	0.99	0.99
Item8P	0.303	1.00	1.02	Item8V	0.126	0.99	1.00
Item9V	−0.29	1.01	1.00	Item9P	0.52	1.00	1.00
Item12V	0.73	1.00	1.00	Item12P	0.431	1.01	1.03
Item13P	−0.375	1.01	0.99	Item13V	−0.499	1.00	1.00

The second part of the analysis, concerned students’ explanations of their choices. We defined an explanation as aligning with the picture chosen by the student, when it did not add information that was not shown in the picture and did not disregard important information included in the picture. For example, Table 3 below shows the explanation of student 217 who chose option C – a midlevel choice. This student explained that “gas spreads throughout space”, whereas in the picture of option C (see Fig. 1 above) the gas particles are located at the top of the balloon. Such an explanation does not fully align with the pictorial choice.

Table 3 Coding of student explanations of their choices in item 4 shown in Fig. 1

Category	Example of explanations that align with the pictorial choice	Example of explanations that do not fully align with the pictorial choice
Upper-anchor choice (D)	“The particles inside the balloon disperse randomly, unevenly and occupy the entire volume in which they are” (Student 207)	“Since the particles maintain the same size and shape but their quantity decreases” (Student 209)

Midlevel choices	“After closing the balloon, all the particles go down” (Student 443, choice A)	“Because air is a gas and gas spreads throughout space” (Student 217, choice C)

The coding of the alignment of students’ explanations with their selections, was conducted by two, independent coders (two of the authors) who reached an initial agreement of 85%, and resolved the disagreements in further discussions, until reaching a 100% agreement.

Findings

The validity of the level-based categorization

Fig. 2 and 3 show the item thresholds produced by the fine-grained and coarse-grained levels models for the pictorial and verbal items. The lowest 0–1 threshold (marked in red) corresponds to the lower-anchor transition in all of the pictorial items, but the relation is messier in the verbal items and at the higher-level thresholds of the pictorial ones. The thresholds between the upper-anchor and the level below, (marked in blue), reflect the highest ability learner only in four out of ten, items (both pictorial and verbal). In the verbal items, none of the five items with 4-levels shows a proper sequence of the thresholds (red, green and then blue), and only two pictorial items show the proper sequence.


	Fig. 2 Between-levels thresholds produced by the polytomous RASCH model of the fine-grained coding of student levels for pictorial (left) and verbal (right) items.


	Fig. 3 Between-levels thresholds produced by the polytomous RASCH model of the coarse-grained coding of student levels for pictorial items (left) and verbal ones (right).

The comparison reveals better correspondence with the level-based model for the pictorial items, and the coarse-grained, two threshold model. The thresholds produced by the coarse-grained, 3-level model (Fig. 3), indicate better fit to the hypothesized order.

Fig. 3 shows that only two verbal items and one pictorial item do not fit the expected order of thresholds in the coarse-grained model. Both of the misfitting verbal items have lower-anchor statements that resemble the upper anchor statements – which seemed to attract high ability students. For example, item 4 asked how the air in a balloon arranges itself after some of it was released (see Fig. 1): the lower anchor option was “the air scatters to all parts of the balloon, but is not made of particles”. It is likely that some of the students who chose this option only read the first part of the sentence and disregarded the rest.

The only pictorial item that showed an inversion of the sequence (item 4) had only one student who chose the lower-anchor, (level 0), so this result is likely a random effect of the small sample and not an actual feature of the item.

Similarities and differences in students’ responses to verbal and pictorial items

The second research question concerned the similarities and differences in response patterns between pictorial and verbal items. When comparing the RASCH difficulty thresholds of the pictorial and verbal items in Fig. 3, we find a similar trend in the pictorial and verbal items, in terms of the upper-anchor difficulty estimates. The top items (8,9) and bottom items (1,12) in Fig. 3, yielded higher difficulty estimates (i.e., were answered correctly by less students) than the items in the middle (2,4,5,13) in both the pictorial and verbal versions. However, Fig. 3 also shows that the differences between the upper-anchor thresholds and the lower-thresholds were larger, on average, in the pictorial version (average of 4.3) than in the verbal version (average of 1.9). This means that the ability range of the students who chose the midlevel choices – was larger for pictorial items. The average proportion of students who chose midlevel options, was also larger in the pictorial items (40%) than in the verbal items (36%), although the average proportion of students who chose the upper-anchor choices, was almost the same (54.8% in pictorial items and 54.3% in verbal items).

Alignment between students’ choices and their explanations

The third research question examines the level-based interpretation of student performance from a different angle. We analyzed students’ verbal explanations of their choices in pictorial items to explore how they interpreted the pictorial representation, and whether their interpretation reflects the level of particle-model understanding the picture was intended to represent. The data indicate a marked difference in the degree of alignment between interpretations of students who chose pictures that represents a partial, medium-level model and those who chose the upper-anchor model (see Table 3). For example, 67 of 123 students who responded to item 1 that asked them to describe the composition of a water droplet, chose option (E) that shows water molecules on an empty, blank background – the upper-anchor model. 49 of the same 123 students chose the partial, mid-level models represented by images that the particles have the shape of a droplet, or that particles are portrayed as circles scattered in a liquid (the pictures can be found in the ESI†). Sixty three percent of the explanations of those who chose the upper-anchor representation relied on pictorial features that reflect the scientific/systemic model – the identical shape of the particles, the spacing between them, or the resemblance to the shape of a water molecule. Twenty four percent of students did not explain their choice, and 13% of the explanations included a reason that reflects a misinterpretation of the information in the picture.

When examining the explanations of those who chose the mid-level representations, we found that only 18% of the 49 students whose explanations were aligned with their choice. Twenty percent of students did not provide an explanation, and the rest seemed to misinterpret or disregard some of the details in the picture. Table 4 shows that a larger proportion of explanations of upper-anchor choices were aligned with the information in picture than the proportion mid-level choices in all of items. Chi-squared tests indicate that the alignment of the explanations of the upper-anchor choices, was significantly different than the alignment of the explanations of the mid-level options in almost all items. Also, the alignment of the explanations of the correct/upper-anchor choices was higher for less-difficult items such as item 4 (shown in Fig. 1), that were answered correctly by more students. Interestingly, a similar trend is reflected by the alignment of the mid-level explanation choices and the difficulty of the item. That is, there is a higher alignment between student explanations of their choices of pictures that represent midlevel reasoning in simpler items, such as item 4, than in more difficult items, such as item 12, as shown in Fig. 4.

Table 4 Proportion of aligned and misaligned explanations

Item number	Proportion of correct/upper anchor responses	Proportion of aligned explanations (%)	Number of mid-level responses	Proportion of aligned explanations (%)	Chi-squared statistic, sig.
1	67/123	62.7	49/123	18.4	22.6, p < 0.0001
2	66/112	51.5	33/112	30.3	3.58, p = 0.058
4	95/112	70.5	11/112	54.5	0.23, p = 0.63
5	76/123	61.8	27/123	25.9	10.3, p = 0.0013
8	46/112	71.7	52/112	40.4	9.70, p = 0.0018
12	40/123	35.0	67/123	9.0	11.2, p = 0.0008
13	55/112	56.4	44/112	20.5	13.1, p = 0.0003


	Fig. 4 A scatterplot of the proportion of aligned explanations, vs. the proportion of upper-anchor responses, which decreases with the difficulty of the item.

Discussion

The purpose of this study was to examine the level-based model of LPs in the context of the physical properties of matter topic, and the similarities and differences between verbal and equivalent pictorial items. We found that many of the thresholds of verbal items that were adapted almost verbatim from Hadenfeldt et al. (2016) did not align with a fine-grained level-based model in our study (although they did in the original study). We believe that this difference indicates that multi-level models that work well in a cross-sectional, multi-age study (e.g., Johnson and Tymms, 2011; Hadenfeldt et al., 2016) may not in a single age/grade-level study like ours. According to the cumulative view of the upper-anchor as containing all the ideas from the levels below (Sikorski, 2019), it is likely that younger students reached lower levels than older grade students, but that students in the same age group cannot be separated in a similar way. The difference between our findings and those of Hadenfeldt et al. (2016) may be related also to the number and variety of items that were not identical in our study and in theirs.

Nevertheless, our study shows that the level-based assumption works fairly well in the coarse-grained model that assumes a “messy-middle” without distinct levels. Thus, a fine-grained multi-level model of LPs, may be valid for a multi-age cohort of students, with different levels of exposure to curricular content, as described in Hadenfeldt et al. (2016). But a single age group that encountered the same curricular unit about the physical changes in matter, could not be separated to two (or more) mid-levels and a coarse-grained model is a better fit for this situation. The coarse-grained model resembles the context-based perspective of learning progressions, since it collapses at least two forms of partial conceptualization of physical states and changes in matter into the same midlevel, thereby admitting that the progression reflects at least two (and probably more) different developmental pathways. That is, the coarse-grained model reflects a limited version of the level-based perspective, that leans towards the context-based perspective. Nevertheless, a three-level model has been used to organize the implicit assumptions that constrain student thinking about the structure, properties and dynamics of particles on the continuum between novice and expert (Talanquer, 2009) and provides more information about the students’ reasoning, than a traditional correct/incorrect multiple-choice assessment (Briggs et al., 2006).

When comparing the pattern of responses to verbal and pictorial items using the coarse-grained model, we found that reaching the correct, upper-anchor responses was similar for verbal and pictorial items. In addition, the distances between computed threshold values shown in Fig. 3, indicate that the midlevel range was significantly larger for pictorial items than for verbal ones. This means that the pictorial items place more students in the midlevel range than verbal items, and less in the lower-anchor range. Thus, while in our study pictorial items are not easier than the verbal items as in studies with younger children and other topics (Saß et al., 2012; Lindner et al., 2017), they recognize more students as mid-levelers that progress towards understanding. We note that the pictorial items in our study contained various representations of particles such as single dots, colored ovals and space filling molecular models. This multitude of representations might have caused confusion and increased the difficulty of the items. It is likely that a uniform representation of particles in pictorial items would have been even easier than in the current items, allowing for more lower-anchor students to be included in the mid-level.

Our findings provide evidence for a potential benefit of pictorial items, but we note that they emerged from the forced selection of answers in a multiple-choice format. Multiple-choice items often yield response patterns that differ from questions posed in an open-ended format (e.g., Osborne and Cosgove, 1983). So, in order to examine how the multiple-choice response patterns reflect student reasoning, we analyzed students’ verbal explanations of their choices. The analysis of the verbal explanations indicates two main findings regarding the relationship between the verbal explanation and the intended reasoning level represented by the picture. The first finding is the significantly larger alignment of the verbal explanations and the pictures representing the correct/upper-anchor model, than the alignment between the explanations of the choices of pictures representing the midlevel model (see Table 4). The lower alignment of midlevel choices supports the notion that their selection is not rooted in a stable mental model, or level of reasoning as claimed by proponents of the context-based perspective (Sikorski and Hammer, 2010). Their choices have more likely originated in temporary, dynamic mental constructs, that activate a set of knowledge resources in response to the specific item (Taber and Garcia-Franco, 2010). The higher alignment of explanations of upper-anchor responses, echoes findings from other learning progression studies that showed greater coherence in the responses of students with higher levels of understanding, and lower coherence in responses of midlevel and struggling students (e.g., Steedle and Shavelson, 2009)

While the lower coherence of midlevel responses is not a new finding, the correlation between the alignment of the explanations of pictures representing the midlevel and the difficulty of items as indicated in Fig. 4, has not been reported before to the best of our knowledge. This trend is somewhat counterintuitive, since difficult items usually reflect attractive distractors that represent robust lower-level ideas for which we could expect more consistent explanations. The similarity of the linear correlations between the alignment of middle level and upper anchor explanations, and the proportion of the correct responses (Fig. 4), may imply a confounding factor that influences both the difficulty of reaching the upper-anchor (i.e., answering correctly) and the difficulty to explain the choice of midlevel distractors. The confounding factor can be a representational ambiguity in the pictures, that is common to upper-anchor picture and the midlevel pictures, or the cognitive load imposed by the number of details in the pictures. Indeed, items with pictures including more visual details (e.g., more than one type of particle such as item number 9 – see ESI†) and items with ambiguous symbols such as arrows that represent motion were more difficult and produced a lower alignment of explanations with the mid-level choice, than items with less detailed representations such as item 4, shown in Fig. 1.

Conclusions

Studies of learning progressions produced promising new ways of aligning assessment, curriculum and student learning (Jin et al., 2019). They have also fueled a theoretical debate about the validity of representing student reasoning as an ordered sequence of levels (Sikorski and Hammer, 2010; Alonzo and Elby, 2019). Our polytomous RASCH analysis of the responses of a single-aged group of 7th graders revealed that they do not form a valid step-wise pattern in many of the items whether pictorial or verbal, contrary to prediction of a fine-grained level-based model. However, when the LP is reduced to a coarse-grained, three-level model that separates the lower-anchor, upper-anchor and a broadly defined midlevel, it does produce a valid step-wise pattern especially in pictorial items. Disentangling the reasoning of students in the “messy” midlevel is likely to require a more sophisticated assessment tool that carefully investigates students’ ideas along several dimensions of analysis. Nevertheless, our study shows that although pictorial and verbal items produce similar patterns for their correct, upper-anchor response, the pictorial items place more students in the midlevel than the verbal items. The larger variety of students in the midlevel according to the pictorial items, indicates a potential advantage of these items of formative assessment of lower-level students. Finally, the analysis of student explanations of their selections of midlevel model pictures, revealed that explanations of choices of easier items were more aligned with the information in the picture than explanations of more difficult items. This implies a lurking variable related to the comprehensibility and detail in the pictures that might have led to a wider range of students with different abilities, to select the midlevel choices in these items. We hope to address this last finding in future studies.

Limitations

Our study has several limitations. First, it is based on a relatively small sample size of 235 students. Second, the EAP reliabilities of the instruments are not high (although within an acceptable level between 0.6 and 0.7), which implies that its reflection of students conceptual understanding of the structure and physical changes in matter may be limited. A different set of items could have had higher reliability and reveal different patterns. In addition, the 3-levels model might not be particularly helpful for formative assessment, since it does not provide clear information on the reasoning of midlevel students. Future research projects can examine teachers’ use of the 4 level-based characterization vs. the 3-level based characterization of their students as a tool for formative assessment. Finally, the relatively low alignment of students’ explanations of their selections of some of the midlevel pictures may indicate ambiguous pictorial details. For example, the pictures we used utilized various representations of particles (circles, water molecule, diatomic molecules), that might have confused students. Perhaps, using a more a more consistent set of simple particle representations could have raised their reliability. Thus, the findings of our study should be considered as an initial exploration into the issues of level-based scoring of pictorial and verbal assessments. We hope to repeat our study in a larger scale, with more respondents to validate our results.

Conflicts of interest

The authors have no conflict of interest in relation to this work.

References

Akaygun S. and Jones L. L., (2014), Words or pictures: A comparison of written and pictorial explanations of physical and chemical equilibrium, Int. J. Sci. Educ., 36(5), 783–807.
Alonzo A. C., (2018) An argument for formative assessment with science learning progressions, Appl. Meas. Educ., 31, 104–112.
Alonzo A. C. and Elby A., (2019), Beyond empirical adequacy: Learning progressions as models and their value for teachers, Cogn. Instruct., 37(1), 1–37.
Alonzo A. and Steedle J., (2009), Developing and assessing a force and motion learning progression, Sci. Educ., 93, 389–421.
Andersson B., (1990), Pupils’ conceptions of matter and its transformations (age 12–16), Stud. Sci. Educ., 18, 53–85.
Andrich D. and Marais I., (2019), The polytomous rasch model I, in A Course in Rasch Measurement Theory, Singapore: Springer, pp. 233–244.
Battista M. T., (2011), Conceptualizations and issues related to learning progressions, learning trajectories, and levels of sophistication, Math. Enthusiast, 8(3), 507–570.
Briggs D. C., Alonzo A. C., Schwab C. and Wilson M., (2006), Diagnostic assessment with ordered multiple-choice items, Educ. Assess., 11(1), 33–63.
Fulmer G. W., Liang L. L. and Liu X., (2014), Applying a force and motion learning progression over an extended time span using the force concept inventory, Int. J. Sci. Educ., 36(17), 2918–2936.
Gotwals A. W. and Songer N. B., (2010) Reasoning up and down a food chain: Using an assessment framework to investigate students’ middle knowledge, Sci. Educ., 94(2), 259–281.
Hadenfeldt J. C., Liu X. and Neumann K., (2014), Framing students’ progression in understanding matter: a review of previous research, Stud. Sci. Educ., 50(2), 181–208.
Hadenfeldt J. C., Neumann K., Bernholt S. and Liu X., (2016), Students’ progression in understanding the matter concept, J. Res. Sci. Teach., 53(5), 683–708.
Haebara T., (1980), Equating logistic ability scales by a weighted least squares method, Jpn. Psychol. Res., 22, 144–149.
Jin H., Mikeska J. N., Hokayem H. and Mavronikolas E., (2019), Toward coherence in curriculum, instruction, and assessment: A review of learning progression literature, Sci. Educ., 103(5), 1206–1234.
Johnson P. M., (1998) Progression in children's understanding of a ‘basic’ particle theory: A longitudinal study, Int. J. Sci. Educ., 20, 393–412.
Johnson P., (2000), Children's understanding of substances, part 1: Recognizing chemical change, Int. J. Sci. Educ., 22(7), 719–737.
Johnson P., (2013), How students’ understanding of particle theory develops: A learning progression, in Tsaparlis G. and Sevian H. (ed.), Concepts of matter in science education, NewYork: Springer, pp. 47–67.
Johnson P. and Tymms P., (2011), The emergence of a learning progression in middle school chemistry, J. Res. Sci. Teach., 48(8), 849–877.
Kozma R., Chin E., Russell J. and Marx N., (2000), The roles of representations and tools in the chemistry laboratory and their implications for chemistry learning, J. Learn. Sci., 9(2), 105–143.
Langbeheim E., (2015), Reinterpretation of students' ideas when reasoning about particle model illustrations, Chem. Educ. Res. Pract., 16(3), 697–700.
Langbeheim E., Adadan E., Akaygun E., Hlatswayo M. and Ramnarain U. A., (2018) Comparison of student responses to pictorial and verbal items focusing on conceptual understanding of the particle model of matter, in Finlayson O. E., McLoughlin E., Erduran S. and Childs P. (ed.), Electronic Proceedings of the ESERA 2017 Conference. Research, Practice and Collaboration in Science Education, Dublin, Ireland.
Lindner M. A., Lüdtke O., Grund S. and Köller O., (2017), The merits of representational pictures in educational assessment: Evidence for cognitive and motivational effects in a time-on-task analysis, Contemp. Educ. Psychol., 51, 482–492.
Liu X. and Lesniak K., (2006), Progression in children's understanding of the matter concept from elementary to high school, J. Res. Sci. Teach., 43 (3), 320–347.
Merritt J. and Krajcik J., (2013), Learning progression developed to support students in building a particle model of matter, in Tsaparlis G. and Sevian H. (ed.), Concepts of matter in science education, Netherlands: Springer, pp. 11–45.
Morell L., Collier T., Black P. and Wilson M., (2017), A construct-modeling approach to develop a learning progression of how students understand the structure of matter, J. Res. Sci. Teach., 54, 1024–1048.
Ngai C., Sevian H. and Talanquer V., (2014), What is this substance? What makes it different? Mapping progression in students’ assumptions about chemical identity, Int. J. Sci. Educ., 36(14), 2438–2461.
Novick S. and Nussbaum J., (1981), Pupils' understanding of the particulate nature of matter: A cross-age study, Sci. Educ., 65,187–196.
Nussbaum J. and Novick S., (1982), Alternative frameworks, conceptual conflict and accommodation: Toward a principled teaching strategy, Instruct. Sci., 11(3), 183–200.
Osborne R. and Cosgrove M., (1983), Children's conceptions of the changes of state of water, J. Res. Sci. Teach., 20, 825–838.
Piaget J. and Inhelder B., (1974), The child's construction of quantities: Conservation and atomism, London: Routledge & Kegan Paul.
Pozo J. I. and Gomez Crespo M. A., (2005), The embodied nature of implicit theories: The consistency of ideas about the nature of matter, Cogn. Instruct., 23(3), 351–387.
Robitzsch A., Kiefer T., Wu M., Robitzsch M. A., Adams W. and Lsamit R. R. E., (2020), Package ‘TAM’. Test Analysis Modules–Version, pp. 3–5.
Russell J. and Kozma R., (2005), Assessing learning from the use of multimedia chemical visualization software, in Visualization in science education, Springer, Dordrecht, pp. 299–332.
Saß S., Wittwer J., Senkbeil M. and Köller O., (2012), Pictures in test items: Effects on response time and response correctness, Appl. Cogn. Psychol., 26(1), 70–81.
Sikorski T. R., (2019), Context-dependent “upper anchors” for learning progressions, Sci. Educ., 28(8), 957–981.
Sikorski T. R. and Hammer D., (2010), A critique of how learning progressions research conceptualizes sophistication and progress, in Gomez K., Lyons L. and Radinsky J. (ed.), Proceedings of the 9th International Conference of the Learning Sciences – Volume 1, Full Papers, Chicago, IL: International Society of the Learning Sciences, pp. 1032–1039.
Smith C. L., Wiser M., Anderson C. W. and Krajcik J., (2006), Implications of research on children's learning for standards and assessment: A proposed learning progression for matter and the atomic-molecular theory, Meas.: Interdiscip. Res. Perspect., 4(1–2), 1–98.
Taber K. S. and Garcia-Franco A., (2010), Learning processes in chemistry: Drawing upon cognitive resources to learn about the particulate structure of matter, J. Learn. Sci., 19(1), 99–142.
Talanquer V., (2009) On cognitive constraints and learning progressions: The case of “structure of matter”, Int. J. Sci. Educ., 31(15), 2123–2136.
Wilson M., (2005), Constructing Measures: An Item Response Modeling Approach, Mahwah, NJ: Erlbaum.

Footnote

† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2rp00119e

Click here to see how this site uses Cookies. View our privacy policy here.