Adam G. L.
Schafer
,
Victoria M.
Borland
and
Ellen J.
Yezierski
*
Department of Chemistry and Biochemistry, Miami University, Oxford, Ohio 45056, USA. E-mail: yeziere@miamioh.edu
First published on 1st February 2021
Even when chemistry teachers’ beliefs about assessment design align with literature-cited best practices, barriers can prevent teachers from enacting those beliefs when developing day-to-day assessments. In this paper, the relationship between high school chemistry teachers’ self-generated “best practices” for developing formative assessments and the assessments they implement in their courses are examined. Results from a detailed evaluation of several high school chemistry formative assessments, learning goals, and learning activities reveal that assessment items are often developed to require well-articulated tasks but lack either alignment regarding representational level or employ only one representational level for nearly all assessment items. Implications for the development of a chemistry-specific method for evaluating alignment are presented as well as implications for high school chemistry assessment design.
The study described herein follows a cohort of high school chemistry teachers participating in a professional development designed to follow the process of data-driven inquiry (Fig. 1). The teachers started this professional development by generating a set of “best practices” for designing and interpreting formative assessments (Schafer and Yezierski, 2020a) before employing these “best practices” to guide the design of planned, formative assessments for their classrooms. Here, “best practices” are in quotations to reflect that these practices are the participating teachers’ proposed ideas about how to best design and interpret formative assessments, not necessarily evidence-based best practices from the literature, although the teacher-generated “best practices” were found to align with relevant literature for formative assessment design (Schafer and Yezierski, 2020a). This investigation uses the teacher-generated “best practices” to better understand the relationship between chemistry teacher assessment goals and design practices. Following data-driven inquiry, the “best practices” specified by the teachers should inform the design of formative assessments (i.e., the tools used to collect evidence) alongside any lesson-specific learning goals. What follows is a brief primer on formative assessments from the literature, a summary of the “best practices” generated by teachers in the earlier work (Schafer and Yezierski, 2020a), and literature that contextualizes and/or reinforces the high school chemistry teachers’ ideas.
When Schafer and Yezierski (2020a) examined the discourse between a group of high school chemistry teachers generating a set of “best practices” for formative assessment design, they learned that the teachers collaboratively proposed and revised practices as critical friends (i.e., peers who were comfortable and had experiences collegially contributing to discourse and resolving disagreements) (Curry, 2008; Loughran and Brubaker, 2015; Schafer and Yezierski, 2020a). Among the teachers’ “best practices” were the considerations that an assessment should:
• articulate clearly what the student should do when responding to the assessment item,
• address a variety of conceptual and representational levels, and
• align assessment items to the instructional materials (e.g., learning goals).
Although this is not an exhaustive list of the “best practices” generated by the teachers, the stated practices represent those specific to barriers high school chemistry teachers may face when designing planned, formative assessments. The same chemistry teachers who generated the practices in the bulleted list from Schafer and Yezierski are the participants in the study described herein. While generating the “best practices”, teachers had multiple opportunities to contribute, revise, and remove practices before agreeing on a final draft. As such, the “best practices” represent consensus ideas from the participating teachers. Employing the “best practices” generated by these teachers presents a unique opportunity to examine the relationship between chemistry teacher beliefs and practices about planned, formative assessment design. Before investigating the relationship between the chemistry teacher's beliefs about best practices and the practices they enact when designing planned, formative assessments, each of the teacher-generated “best practices” are described in further detail.
Assessing student knowledge within a variety of representational levels was a “best practice” generated by the chemistry teachers in the study by Schafer and Yezierski, but is not a commonly cited practice in domain-general literature (Schafer and Yezierski, 2020a). Establishing an appropriate representational level has been found as a practice chemistry teachers engage in when developing formative assessment items (Schafer and Yezierski, 2020b). Although some teachers explicitly focus on representational level when designing assessment items, several investigations have documented student difficulties navigating between different representational levels (Gabel et al., 1987; Nakhleh, 1992; Russell et al., 1997; Gkitzia et al., 2020). As such, literature guidance suggests carefully scaffolding the number of representational levels per item when the goal is to assess student knowledge and abilities about representational level (Taber, 2013).
While representational levels have been accepted within chemistry as important descriptors of chemical information, other disciplines likely do not have the same need for describing chemical phenomena (Johnstone, 1991; Taber, 2013; Vilardo et al., 2017). Inevitably, many assessment practices are domain-general, while others are more discipline specific (Coffey et al., 2011). Few investigations focus on the role representational level plays in conjunction with other assessment item components (such as task); however, better understanding as to how these components are developed in unison could improve understanding of chemistry teacher assessment design practices.
Since assessment and instruction do not exist in isolation, improvements in learning are dependent upon the quality of alignment among assessments, curriculum, and instruction (Bell and Cowie, 2001; National Research Council, 2001, 2014; Broman et al., 2015). However, criteria for alignment specific to the discipline of chemistry (such as representational level) can increase the difficulty of designing and implementing formative assessment tasks that are tightly aligned to instructional materials. The interrelatedness between assessment and instruction implies that teachers cannot make valid inferences from assessment data without alignment to instruction (Datnow et al., 2007; Hamilton et al., 2009; Sandlin et al., 2015; Harshman and Yezierski, 2017). Although several investigations recognize the importance of alignment, there is not a consensus on the number of aligned criteria necessary for adequate measurement of student competency. Typically, studies suggest that between six and eight observations of students performing a task is sufficient for a reliable measurement (Webb, 2006; Martone and Sireci, 2009; Praetorius et al., 2014; Briggs and Alzen, 2019). Although the study presented herein does not seek to confirm or refute these values, we recognize that teachers should consider how many items are necessary to evidence a given learning goal when designing a planned, formative assessment. Inevitably, what is considered a sufficient number of ways and instances of assessing a learning goal must be decided by the assessment designer based on the purpose of the assessment (Kane, 2006; Webb, 2006; American Educational Research Association et al., 2014; Harshman and Yezierski, 2017). Processes such as data-driven inquiry can support teachers’ considerations of alignment by establishing goals which inform the design of tools to collect evidence and the conclusions drawn from the evidence collected (Harshman and Yezierski, 2017). Better understanding the relationship between teachers’ perceived “best practices” (such as alignment to instruction) and the assessments they design is essential for helping teachers improve formative assessment design practices.
(1) How can the tasks and representational levels of planned, formative assessment items be diagrammed to allow for comparison with corresponding instructional materials (i.e., learning goals and learning activities)?
(2) What is the nature of the relationship between teachers’ self-generated “best practices” for formative assessments and the assessments they generate?
| Task code | Example tasks |
|---|---|
| Retrieval | Identify, recognize, calculate, complete, apply, demonstrate |
| Explanation | Explain, summarize |
| Representation | Draw, use models, represent, show |
| Analysis | Sort, categorize, differentiate, assess, critique, evaluate, diagnose |
| Knowledge utilization | Test, how would you determine, generate and test |
| Representational level codea | Representational level descriptions |
|---|---|
| a Mixtures of representational level codes are possible and are written as the two representational levels present in the item (e.g., macroscopic/symbolic). | |
| Macroscopic | Representation of species/events on a visible scale to communicate chemical ideas/events/species |
| Symbolic | The use of descriptive words, symbols, or values to communicate chemical ideas/events/species |
| Particle-level | Representation of species/events on an invisible scale to communicate chemical ideas/events/species |
| Ambiguous | No representational level communicated |
It is essential to establish trustworthiness of coding for any qualitative investigation (Patton, 2002). As such, evidence of trustworthiness was established by addressing the credibility, dependability, and transferability of the findings. All investigators have previous experience teaching at the high school level and facilitating professional development for high school chemistry teachers, granting credibility to the codes generated. To establish evidence of dependability, interrater agreement was conducted as well as frequent debriefings. Interrater agreement of task and representational level codes was established by having two researchers independently code items from one learning activity. Code applications were compared, with an agreement of 79%. Disagreements in code application were negotiated, code descriptions were collaboratively revised, and the codes were reapplied to the data set once complete agreement was established. Codes for chemistry content were generated by an individual researcher and were independent for each group of artifacts. Once chemistry content codes were generated, the descriptions were shared with other members of the research team and collaboratively revised until agreement was reached. Debriefings between the authors were held weekly. In addition, monthly debriefings were held with several graduate students and another chemistry education research faculty member who were not involved in the data collection. Transferability of the findings presented are limited but carefully defined; as a small group of high school chemistry teachers with several years of professional development. More generally, findings have the capacity to generate discussion and avenues for future studies as well as guidance for high school chemistry teachers and education researchers.
Assessment items were compared to items in the corresponding learning activity and learning goal to investigate for code co-occurrence. This comparison was facilitated through the use of a novel diagram called an “alignment plot.” These alignment plots illustrated each of the codes applied to the items within teacher artifacts. For this work, alignment is defined as the presence of code co-occurrence among all artifacts with regard to task and/or representation level within a particular chemistry content category. The alignment plots served to visualize the teachers’ enacted practices when designing planned, formative assessments. By organizing the features of the artifacts in the alignment plot, we were able to characterize how teachers designed their assessments as compared to how they thought assessments should be designed.
Table 4 communicates the total number of items that include each task, representational level, and chemistry content topic for Claude's artifacts. The learning goals for Claude's lesson were that the student would be able to:
| Category | Code | Learning goals | Assessment | Learning activity | Total |
|---|---|---|---|---|---|
| Task | Retrieval | 1 | 13 | 22 | 26 |
| Explanation | — | 1 | 13 | 14 | |
| Representation | 1 | 2 | 5 | 8 | |
| Analysis | 1 | — | 2 | 3 | |
| Knowledge utilization | — | — | — | 0 | |
| Representational level | Symbolic | 1 | 14 | 26 | 41 |
| Symbolic/particle | — | — | — | 0 | |
| Particle | 1 | 1 | 3 | 5 | |
| Particle/macroscopic | — | — | — | 0 | |
| Macroscopic | — | 1 | 11 | 12 | |
| Macroscopic/symbolic | — | — | 1 | 1 | |
| Ambiguous | 1 | — | 1 | 2 | |
| Chemistry content | Static chemical system | 3 | 7 | 28 | 38 |
| Chemical phenomena | — | 4 | 7 | 11 | |
| Electron count and movement | — | 5 | 2 | 7 | |
| Observations | — | — | 4 | 4 | |
| All content areas | — | — | 1 | 1 | |
| Total | 3 | 16 | 42 | ||
1. Identify a redox reaction based on symbolic representations.
2. Represent particulate level representations of redox reactions.
3. Predict products of redox reactions.
Claude generated one assessment and one learning activity to address these learning goals. Students in Claude's classes were provided one day to complete the learning activity, taking the assessment the following day.
Claude's artifacts addressed three chemistry topics within oxidation and reduction reactions with an additional code for items that call for in-lab observations. One item in Claude's learning activity asked students to summarize what happens during a redox reaction. This item addressed all content codes and thus was given its own content code (shown in Table 4 as “All Content Areas”). Example items from Claude's artifacts are shown in Fig. 3. The tasks articulated in Claude's items were mainly retrieval (26 total) with fewer explanation (14 total), representation (8 total), and analysis (3 total) tasks. Additionally, Claude's artifacts addressed a variety of representational levels; however, the symbolic representational level was emphasized much more than others with 41 items out of 61 total. The items in Claude's artifacts meet the self-imposed requirements of articulating specific tasks to complete while also incorporating a variety of conceptual and representational levels, although the symbolic representational level was disproportionately emphasized.
Table 5 communicates the total number of items that include each task, representational level, and chemistry content topic found in Celine's artifacts. The learning goals for Celine's lesson were that students will be able to:
| Category | Code | Learning goals | Assessment | Learning activity | Total |
|---|---|---|---|---|---|
| Task | Retrieval | 2 | 6 | 40 | 48 |
| Explanation | 1 | — | 3 | 4 | |
| Representation | — | 1 | — | 1 | |
| Analysis | — | — | — | 0 | |
| Knowledge utilization | — | — | — | 0 | |
| Representational level | Symbolic | 3 | 5 | 42 | 50 |
| Symbolic/particle | — | — | — | 0 | |
| Particle | — | — | — | 0 | |
| Particle/macroscopic | — | 1 | — | 1 | |
| Macroscopic | — | — | 1 | 1 | |
| Macroscopic/symbolic | — | 1 | — | 1 | |
| Ambiguous | — | — | — | 0 | |
| Chemistry content | Element and number relation | 3 | 3 | 27 | 33 |
| Element comparison | — | 4 | 2 | 6 | |
| Periodic table | — | — | 7 | 7 | |
| Math knowledge | — | — | 6 | 6 | |
| Observations | — | — | 1 | 1 | |
| Total | 3 | 7 | 43 | ||
1. Successfully employ ratios and proportions to obtain relative mass for particles of imaginary elements.
2. Successfully explain how Avogadro's law allows scientists to assign mass to particles as tiny as atoms.
3. Use ratio relationships and reasoning to assign a relative mass to an unknown particle, based on given information.
Celine generated one assessment and three learning activities to address these learning goals. Students in Celine's classes were provided one day to complete each learning activity, taking the assessment the day after the third learning activity.
Celine's artifacts addressed four chemistry topics within a lesson about isotopes and atomic mass, with an additional code for items about recording observations. Celine has multiple learning activities because she made in-class decisions about student progress, deciding that her students needed more in-class instruction before taking the assessment. Example items from Celine's artifacts are shown in Fig. 4. The tasks articulated in Celine's items were nearly all retrieval (48 total) with far fewer explanation tasks (4 total) and only one representation task. Similarly, nearly all items in Celine's artifacts incorporated the symbolic representational level (50 out of 53 total items). Celine's items were disproportionately retrieval tasks at the symbolic representational level, meaning that the self-imposed requirements of addressing a variety of conceptual and representational levels was not met.
Table 6 communicates the total number of items that include each task, representational level, and chemistry content topic for Emmerson's artifacts. The learning goals for Emmerson's lesson were that the learner will be able to:
| Category | Code | Learning goals | Assessment | Learning activity | Total |
|---|---|---|---|---|---|
| Task | Retrieval | 1 | 10 | 43 | 54 |
| Explanation | — | — | 29 | 29 | |
| Representation | 1 | — | — | 1 | |
| Analysis | 2 | — | 5 | 7 | |
| Knowledge utilization | — | — | 6 | 6 | |
| Representational level | Symbolic | — | 3 | 6 | 9 |
| Symbolic/particle | — | 2 | — | 2 | |
| Particle | 2 | 3 | 5 | 10 | |
| Particle/macroscopic | — | — | 1 | 1 | |
| Macroscopic | — | — | 54 | 54 | |
| Macroscopic/symbolic | 1 | 2 | 12 | 15 | |
| Ambiguous | 1 | — | 5 | 6 | |
| Chemistry content | Reaction type | 1 | 6 | 17 | 24 |
| Reaction representation | 3 | 4 | — | 7 | |
| Lab knowledge | — | — | 1 | 1 | |
| Observations | — | — | 65 | 65 | |
| Total | 4 | 10 | 83 | ||
1. Classify reactions as a synthesis, decomposition, double displacement, single displacement, or combustion reaction.
2. Interpret symbolic representations of equations to make predictions of observable behaviors to link the macroscopic and symbolic levels of understanding according to Johnstone's triangle of chemical levels of thinking.
3. Analyze and develop sub-microscopic representations of reactions.
Emmerson generated one assessment and one learning activity to address these learning goals. Students in Emmerson's classes were provided two days to complete the learning activity, taking the assessment electronically the day after the learning activity.
Emmerson's artifacts included two chemistry content codes for a lesson about types of chemical reactions, with additional codes to represent items that asked students to make observations or employ their laboratory knowledge. Example items from Emmerson's artifacts are shown in Fig. 6. The tasks articulated in Emmerson's items were largely retrieval (54 total) with at least 1 assessment or learning activity item incorporating explanation, analysis, and knowledge utilization tasks. Although the learning activity incorporated a variety of tasks, the assessment included only retrieval tasks. The emphasis of retrieval tasks is likely a result of the number of observations in the learning activity, which also likely contributes to the emphasis on the macroscopic representational level (54 items). Even though the macroscopic representational level was disproportionately emphasized, the learning activity included at least one item from all other possible representational levels (excluding items including both symbolic and particle levels) and the assessment spanned a variety of representational levels. Emmerson's items met the self-imposed requirements of articulating a task for each item and incorporating a variety of conceptual and representational levels, although retrieval items at the macroscopic level were emphasized.
Although investigating individual code occurrences provided insight about overall trends in task and representational level use, the complex nature of the items is not illustrated. For example, an item coded as a retrieval task may include a different representational level than other retrieval tasks within the same chemistry content. Highlighting the co-occurrence of the tasks and representational levels of items within each chemistry content code allows for a more thorough characterization of alignment to instructional materials which is described in the following section.
| Task code | Shape on alignment plot |
|---|---|
| Retrieval | Circle |
| Explanation | Triangle |
| Representation | Square |
| Analysis | Pentagon |
| Knowledge Utilization | Star |
Example items from Claude's artifacts are shown in Fig. 3. The items in Fig. 3 align with the “Static Chemical System” chemistry content code, because they ask the student about a system that is assumed to be static (i.e., unmoving). The learning goal's task is depicted by a circle to show that students are asked to identify. The learning goal sets the requirement that, to demonstrate competency, students need to recognize the correct answer when critical information is provided. The circle is colored red to show the symbolic representational level communicated in the learning goal. The assessment item's task is depicted by a triangle to show that the item asks the student to explain. When responding to this assessment item, students are provided with the information that a portion of a magnesium strip is placed in a solution of chromium(III) iodide. From this information, the student would need to discern critical information from noncritical information to demonstrate that they understand where the reaction occurred and what observations would be evident. Since the item asks for a macroscopic description, the item's shape is colored yellow. The task of the learning activity item in Fig. 3 is depicted as a square because the student is asked to draw. To demonstrate competency for this item, the student would need to generate a representation. The color of the shape is blue, since a particle-level representation is to be generated by the student. Example items for each task and representational level can be found in Appendix A and B.
Inspection for matching shapes and colors within content categories of the alignment plots (vertically) allows for a more thorough investigation of alignment than comparison of tasks and representational levels separately. A guide for reading the alignment plot is provided in Fig. 4.
Inspecting only the shapes within Claude's alignment plot (Fig. 5) reveals that the artifacts exhibit instances of both alignment and misalignment regarding assessment tasks. The “Static Chemical System” content in Fig. 5 shows that retrieval (circles) and representation (squares) tasks are present in all artifacts, demonstrating alignment of tasks. However, analysis (pentagons) and explanation (triangles) tasks are not present in all artifacts, indicating misalignment of tasks. The other chemistry content areas shown in Fig. 5 do not contain learning goals and thus cannot have alignment among assessment and instructional materials.
A similar evaluation can be performed for the representational level alignment by comparing colors present in Claude's artifacts. Fig. 5 shows two learning goals with specified representational levels, one particle-level (blue color) and the other symbolic (red color), within the “Static Chemical System” content. The items are aligned with items in the assessment and learning activity, as indicated by the presence of red and blue shapes in both the assessment and learning activity for that content. Alternatively, the lone macroscopic-level (yellow) item in Claude's assessment for the “Static Chemical System” content is not matched with any learning goals or items in the learning activity for that content, indicating misalignment of representational level. Again, the lack of learning goals for the other content areas means that there is not an opportunity for alignment among the assessment and instructional materials.
Although evaluating task and representational level separately reveals valuable insights about Claude's assessment design practices, the alignment plot allows for these criteria to be evaluated synchronously. For example, one of Claude's learning goals is a symbolic/retrieval item (red/circle). This learning goal is exactly matched with four assessment items and 13 learning activity items within the same content. Similarly, another of Claude's learning goals is a particle-level/representation item (blue/square) that is exactly matched with one assessment item and three learning activity items. These exact matches represent the highest degree of alignment within Claude's artifacts.
Beginning with the tasks (shapes) of Celine's artifacts in Fig. 7, nearly all of the items require a retrieval (circle) task. Although retrieval tasks (circles) are present throughout all content categories, alignment is only observed for the “Element and Number Math Relation” content, since this is the only content with learning goals. Additionally, the lone explanation (triangle) learning goal is misaligned with items in the assessment and learning activity since there are no corresponding shapes within the same chemistry content. Evaluating the representational level (color) of the artifacts in Fig. 7 reveals a similar trend. Nearly all the items are at the symbolic level (red color). However, since the learning goals are only within the “Element and Number Math Relation” content, only these are considered aligned.
Again, the alignment plot's true value stems from the ability to compare task (shape) and representational level (color) synchronously. Celine's artifacts show complete alignment within the “Element and Number Math Relation” content for items that involve symbolic/retrieval (red/circles) tasks as these items are present throughout all artifacts for this content.
Fig. 9 shows the tasks (shapes) in Emmerson's artifacts only exhibit alignment within the chemistry content “Reaction Type” for retrieval (circles) tasks, since all artifacts in that content include a circle. Emmerson's learning goals also incorporate representation (squares) and analysis (pentagons) tasks within the “Reaction Representation” content; however, the assessment and learning activity do not include corresponding shapes, indicating misalignment of tasks. Although Emmerson's artifacts incorporate a variety of representational levels (colors), the colors are not consistent throughout the artifacts, leading to no instances of representational level alignment within any chemistry content categories (i.e., one color is not present in each the learning goals, assessment, and learning activity for any particular content category). Without any instances of alignment in regard to representational level, Emmerson's artifacts do not exhibit any instances of matching task and representational level (shape and color) for the evaluated lesson.
The alignment plots allow for a synchronous evaluation of the items across instructional and assessment artifacts; however, it is unrealistic for a teacher to generate an alignment plot when designing every planned, formative assessment. Nonetheless, these alignment plots may be useful for teachers who wish to evaluate the quality of select planned, formative assessments.
The first teacher-generated “best practice” states that an assessment item should clearly articulate a task for the student to perform. Other investigations have found teachers often consider the specific action to be performed by students when designing and selecting assessment items (Tomanek et al., 2008; Kang et al., 2016; Schafer and Yezierski, 2020b). Teachers have a significant body of literature to draw from when considering how to design productive tasks for their learning environment (e.g., Harris et al., 2016). A task was able to be assigned for each assessment item within the assessments generated by this group of teachers. Additionally, each learning goal and learning activity item was also able to have a specific task assigned. As such, few barriers seem to exist preventing teachers from enacting the “best practice” of stating a specific task for students to perform.
The second listed “best practice” for formative assessments by these teachers was to assess a variety of conceptual and representational levels. Assessing content in a variety of ways can help the teacher triangulate student competency (Sadler, 1989; National Research Council, 2001; Means et al., 2011). Of the teachers in this study, Claude's assessment included the greatest variety of tasks, which were considered synonymous with “conceptual levels” for the purpose of this study. Claude included retrieval, explanation, and representation tasks in his assessments, as shown in Fig. 5 and Table 4. Celine included retrieval and representation tasks in her assessment, as shown in Fig. 7 and Table 5. Emmerson included only retrieval tasks in his assessment, as shown in Fig. 9 and Table 6. It is important to recognize that any particular task is not necessarily more conceptually difficult or challenging than other by default (National Research Council, 1999, 2014; Wauters et al., 2012). For example, students may have opportunities to consistently practice generating representations for a chemical phenomenon without being able to reason about the atomic-level events that explain why the phenomenon occurs. Additionally, the assessments provided were formative, meaning that each serves the purpose to inform continued instruction. As such, the assessments may be situated near the beginning of a unit of instruction, limiting the opportunity to include several conceptually challenging items. Either way, the greater variety of tasks on Claude's assessment more likely addresses a range of conceptual difficulties and potentially better equips him to judge the upper limit of his students’ understanding.
Also included in the second “best practice” is the suggestion to include a variety of representational levels. Several investigations document student struggles navigating among representational levels (Gabel et al., 1987; Nakhleh, 1992; Russell et al., 1997; Gkitzia et al., 2020). Thus, when triangulating student understanding, teachers could benefit from assessing competency using a variety of ways of perceiving chemical information (National Research Council, 1999). However, literature guidance suggests that carefully scaffolding items assessing representational level and that limiting the number of representational levels per task may be beneficial (Taber, 2013). Table 6 and Fig. 9 reveal that Emmerson addressed the greatest variety of representational levels by including four separate levels in his assessment. However, Claude and Celine both included three separate representational levels throughout their assessment items. All teachers seemed to meet the self-generated “best practice” of addressing a variety of representational levels in their assessments. Few barriers seem to exist preventing teachers from enacting the “best practice” of incorporating a variety of representational and conceptual levels. However, incorporating a variety of representational levels requires teachers to consider their students’ prior knowledge and experience, making enactment a more complex process compared to “clearly state a task.” Such considerations are commensurate with the process of data-driven inquiry, as employing data-driven methods must consider the needs their students and learning environment when designing tools for gathering evidence that are aligned with their goals (Harshman and Yezierski, 2017).
The final relationship evaluated to address the second research question was that of the “best practice” of aligning assessment and instructional materials. Assessment and instruction do not exist in isolation and are intertwined throughout the learning process, especially formative assessments. Assessing tasks and concepts not addressed during instruction can unjustly increase the cognitive demand of an item (National Research Council, 2014; Kang et al., 2016). While assessments in this study exhibited instances of alignment and misalignment regarding task or representational level, none were completely aligned or misaligned regarding task or representational level. The results from evaluating alignment of each teachers’ assessment are shown in Table 8.
| Teacher | Tasks aligned | Representational levels aligned | Tasks and representational levels synchronously aligned (shown as representational level/task) |
|---|---|---|---|
| Claude | Retrieval, representation | Symbolic, particle | Symbolic/retrieval, particle/representation |
| Celine | Retrieval | Symbolic | Symbolic/retrieval |
| Emmerson | Retrieval |
For this investigation, an assessment was considered “aligned” if one instance of a specified criterion was present in each of the included artifacts. This one-to-one definition of alignment is commonly employed by methods used to evaluate alignment for state-level artifacts (Martone and Sireci, 2009), but may not be suitable for evaluating formative assessments. Indeed, several works recognize that a teacher may need to make between six and eight observations of student behavior to reliably judge student competency (Webb, 2006; Martone and Sireci, 2009; Praetorius et al., 2014; Briggs and Alzen, 2019). The recommended six to eight observations of student observations come from investigations of assessment and instructional tasks, but not both. So, teachers may need to exercise caution when considering the number of assessment items necessary to reliably judge student knowledge. Teachers in this study generally included several assessment items to evaluate a single learning goal and even more learning activity items to address the assessment items. For example, Fig. 5 shows that Claude's artifacts included one learning goal that was symbolic/retrieval and matched to four symbolic/retrieval assessment items and 13 symbolic/retrieval learning activity items. There are currently no literature-based guidelines on the ratio of learning goals-to-assessment items-to-learning activity items recommended for a reliable judgement of student learning, and a specific ratio is unlikely to be generalizable across all learning environments. However, the previously mentioned guidelines about the number of student observations suggests that a structure similar to Claude's is more favorable than a 1-learning goal to 1-assessment item to 1-learning activity item ratio of co-occurrence.
Missing from the chemistry education literature is how representational level factors into evaluations of alignment. Results of synchronously evaluating task and representational level are included in Table 8. While Claude's assessment was aligned along the synchronous criteria of both symbolic-level/retrieval tasks and particle-level/representation tasks, Celine's assessment showed synchronous alignment regarding only symbolic-level/retrieval tasks and Emmerson's assessment showed no synchronous alignment of representational level and task. Considering students’ documented struggles navigating among representational levels (Gabel et al., 1987; Nakhleh, 1992; Russell et al., 1997; Gkitzia et al., 2020), developing a method for evaluating the alignment of chemistry formative assessments that incorporates task and representational level synchronously may be beneficial for research, teacher education, and teacher professional development. As such, significant barriers exist for chemistry teachers when evaluating alignment between assessment and instruction in terms of both (1) an appropriate ratio of learning goals-to-assessment items-to-learning activity items that can support reliable inferences from evidence; and (2) how to incorporate alignment criteria (such as task and representational level).
Findings from the second research question revealed relationships between high school chemistry teachers’ stated “best practices” for formative assessment design and the assessments they generated. The “best practices” developed by these teachers state that a formative assessment should articulate a specific task to be performed by the student, address a variety of conceptual and representational levels, and align assessment items to instructional materials. Results showed that teachers met the practices of articulating a task and including a variety of representational levels. However, only Claude's assessment incorporated a variety of conceptual levels (i.e., tasks). Teachers’ ability to consistently meet their “best practices” of articulating a task implies that there are few barriers to enacting this goal. Direct literature guidelines are available for designing learning activity and assessment tasks for science classrooms (Harris et al., 2016; Laverty et al., 2016; Penuel et al., 2019). Although teachers met the “best practice” of incorporating a variety of representational levels, the emphasis on symbolic items could indicate that barriers still hinder enactment of this goal. Overall, assessment items generally required retrieval tasks at symbolic representational levels, potentially limiting the amount of information about student competency available to the teacher for effectively interpreting student understanding (Stiggins, 2001; Towns, 2014b; Schafer and Yezierski, 2020a).
The third “best practice” generated by teachers required evaluating the extent that assessments aligned to instruction. For this study, a one-to-one ratio between items in instructional artifacts and assessment artifacts was used. While teachers were generally able to align some assessment tasks to instructional materials, representational levels of assessment items were frequently misaligned with instruction or only one representational level was employed for nearly all items. Synchronous alignment of tasks and representational levels to instructional materials was infrequently observed; however, Claude's assessment included several items that synchronously aligned task and representational level to instructional materials (while still employing a variety of tasks and representational levels). The instances of alignment revealed by the alignment plots indicate that even experienced teachers with several years attending professional development can still encounter significant barriers to aligning assessments to instructional materials. Although there is some literature-based guidance available to teachers for aligning assessment and instruction, most studies result in tools for teachers to use for evaluating alignment (e.g., Webb, 2007; Kaderavek et al., 2015; Achieve, 2016) which can be challenging for teachers to employ, given variations in learning environments and teacher goals. Additionally, literature about evaluating alignment has mixed guidance on what qualifies as “aligned” (Fulmer et al., 2018), variations in the criteria to use when considering alignment (Martone and Sireci, 2009; Fulmer et al., 2018), and disagreements in how many aligned criteria are necessary to assume a suitably reliable interpretation of student knowledge (Webb, 2006; Martone and Sireci, 2009; Praetorius et al., 2014; Briggs and Alzen, 2019). The existing barriers these teachers face for enacting their “best practice” of aligning assessment to instruction implies that teachers need more than tools for evaluating alignment. Chemistry teachers need guidance understanding the methodological and conceptual underpinnings of the available tools, interpreting the results from the tools employed, and appropriately adjusting classroom materials to better evaluate the success of the learning environment.
Certain limitations exist within the measures, as well. For example, Marzano's and Kendall's original framework sets knowledge categories as a hierarchy of knowledge levels (Marzano and Kendall, 2008). To exemplify the complexities of item difficulty, this study does not employ the hierarchical use of knowledge tasks. Future studies could investigate the extent to which individual knowledge categories are representative of item difficulty and depth of understanding.
A limitation of the analysis described herein is that each learning goal, learning activity, and assessment item were investigated according to what is asked of the student in the artifact, not the student's actual response to the item. This purposeful bounding of the investigation around the artifacts does not include response process validation from students. Additionally, the items generated by these teachers may not be deemed “high-quality” items by chemistry content experts. The items generated by the teachers were part of an ongoing professional development and reflect what was implemented in chemistry teacher classrooms during early sessions. During later professional development sessions, teachers collaboratively interpreted student responses to the assessment items presented and posited potential changes to planned, formative assessment design practices. Future studies may investigate the tasks that are asked of the student versus the task the student performs.
The “best practices” examined in this work were generated by high school chemistry teachers and align to high quality practices stated in relevant literature. The connection between these chemistry teachers’ goals and literature-backed practices implies that high school chemistry teachers may benefit from incorporating the stated “best practices” when developing formative assessments. The “best practices” were generated as goals for designing and interpreting formative assessments by a group of teachers knowingly using the process of data-driven inquiry (Schafer and Yezierski, 2020a). Chemistry teachers may similarly benefit by using processes like data-driven inquiry to scaffold their progression from designing goals, collecting classroom evidence, and forming an evidence-based instructional response. Additionally, we hope chemistry teachers are inspired to reflect on the alignment of the purpose and design of the assessments employed in their own classrooms and employ formal measures such as the alignment plot in their practice.
| Code | Description (demonstration of competency requires…) | Example task | Plot depiction |
|---|---|---|---|
| Retrieval | Identification or stating of standalone information without synthesis or analysis. | Identify, recognize, math functions, complete, apply, demonstrate | Circle (shape) |
| Explanation | Communication of critical/essential information from noncritical/nonessential information. | Explain, summarize | Triangle (shape) |
| Representation | Generation of a depiction of a chemical idea, phenomenon, or relationship | Draw, use models, represent, show | Square (shape) |
| Analysis | Processes that involve examining knowledge/content/observations with the intent of generating new conclusions (extending knowledge). | Sort, categorize, differentiate, assess, critique, evaluate, diagnose | Pentagon (shape) |
| Knowledge utilization | Processes that require the application or use of knowledge in a novel situation. | Test, how would you determine, generate and test | Star (shape) |
| Code | Description | Example | Plot depiction |
|---|---|---|---|
| Symbolic | The use of descriptive words, symbols, or values to communicate chemical ideas/events/species | Chemical equations | Red (shape) |
| Macroscopic | Representation of species/events on a visible scale to communicate chemical ideas/events/species | Observations | Yellow (shape) |
| Particulate | Representation of species/events on an invisible scale to communicate chemical ideas/events/species | Atomic-level particles in solution | Blue (shape) |
| Symbolic/macroscopic | Communication of chemical ideas/events/species includes combinations of individual representational levels | Equation paired with observation | Orange (shape) |
| Symbolic/particulate | Equation paired with atomic-level particle model | Purple (shape) | |
| Macroscopic/particulate | Atomic-level particles paired with observation | Green (shape) | |
| All | Communication of chemical ideas/events/species occurs at all representational levels | Answer incorporates all levels | White (shape) |
| Ambiguous | No representational level communicated | Representational level not specified | Black (shape) |
| Code | Description |
|---|---|
| Static chemical system | Content includes information about what atoms/molecules/ions present in a system, their features, and location within the system |
| Chemical phenomena | Content includes information about changes that occur between provided species in a chemical system (including hypothetical, or predicted changes) |
| Electron count and movement | Content includes information regarding electrons in a specified system |
| Observations | Content is dependent upon an in-lab observation |
| Code | Description |
|---|---|
| Mathematical relation between an element and a number | Content involves mathematical relationships and operations between values and their meanings in a chemistry context |
| Element comparison | Content requires the consideration of features/ideas/information of multiple elements (includes hypothetical elements) |
| Periodic table information | Content is sourced from general periodic table knowledge |
| Mathematical knowledge | Content includes knowledge about mathematical operations and principles |
| Observations | Content is dependent upon an in-lab observation |
| Code | Description |
|---|---|
| Reaction type | Content includes tasks specific to the type of reaction included in the item |
| Reaction representation | Item requires student to use the information embedded within a representation |
| Laboratory knowledge | Content includes general laboratory knowledge at the high school level |
| Observations | Content is dependent upon an in-lab observation |
| This journal is © The Royal Society of Chemistry 2021 |