Guiding teaching with assessments: high school chemistry teachers’ use of data-driven inquiry

Jordan Harshman and Ellen Yezierski *
Department of Chemistry and Biochemistry, Miami University, Hughes Hall, 501 East High Street, Oxford, OH 45056, USA. E-mail: yeziere@maimioh.edu

Received 8th September 2014 , Accepted 21st October 2014

First published on 30th October 2014


Abstract

Data-driven inquiry (DDI) is the process by which teachers design and implement classroom assessments and use student results to inform/adjust teaching. Although much has been written about DDI, few details exist about how teachers conduct this process on a day-to-day basis in any specific subject area, let alone chemistry. Nineteen high school chemistry teachers participated in semi-structured interviews regarding how they used assessments to inform teaching. Results are reported for each step in the DDI process. Goals – there was a lack in specificity of learning and instructional goals and goals stated were not conducive to informing instruction. Evidence – at least once, all teachers determined student learning based solely on scores and/or subscores, suggesting an over-reliance on these measures. Conclusions – many teachers claimed that students either did or did not “understand” a topic, but the teachers did not adequately describe what this means. Actions – very few teachers listed a specific action as a result of analyzing student data, while the majority gave multiple broad pedagogical strategies to address content deficiencies. The results of this study show the limitations of teachers' DDI practices and inform how professional development on DDI should have a finer grain size and be based in disciplinary content.


Introduction

High school chemistry teachers have daily access to assessment data from their students. Whether from homework assignments, quizzes, informal questions, classroom discussions, or lab activities, teachers have virtually endless ways of eliciting student feedback and using the information gained in the process to evaluate and adjust their instruction if necessary. Throughout the process of informing their teaching via assessment results, teachers make multiple inquiries about teaching and learning, using data to drive any decisions made, which is why we refer to it as data-driven inquiry (DDI). This process goes by many other names in the literature (Calfee and Masuda, 1997; Brunner et al., 2005; Datnow et al., 2007; Halverson et al., 2007; Gallagher et al., 2008; Hamilton et al., 2009), but all describe a very similar framework (Harshman and Yezierski, in press).

DDI is carried out daily in the teaching of chemistry and teachers are expected to “use student data as a basis for improving the effectiveness of their practice” (Means et al., 2011). All decisions that teachers make about what they will do differently or similarly next year, next topic, next class, and even next minute is in some way dictated by information that teachers consider (Calfee and Masuda, 1997; Suskie, 2004; Hamilton et al., 2009). Even so, previous research has failed to characterize this process at a level that is sufficient to inform advances teachers can make in their own DDI processes (Harshman and Yezierski, in press). Herein lies the purpose of our study; to thoroughly investigate how teachers use (or do not use) their classroom data to inform their teaching. Addressing this problem will allow teachers to tailor their instruction to meet the needs of their specific classroom environments with their specific content areas to promote lasting learning (Suskie, 2009). By enhancing assessment analysis and interpretation skills, teachers are better able to accommodate their individual students (Means et al., 2011). Further, the National Academy of Science reported in 2001 that classroom assessments “are not being used to their fullest potential” (National Research Council, 2001). Eight years later, the Institute of Education Sciences remarked that despite a keen interest in DDI, “questions about how educators should use data to make instructional decisions remain mostly unanswered” (Hamilton et al., 2009) and three extensive studies from the Department of Education (Gallagher et al., 2008; Means et al., 2010, 2011) suggest a similar need for investigation. Characterizing DDI practices in chemistry classrooms is a necessary first step in determining what reforms in assessment and teacher learning might be necessary to optimize such DDI practices which have been deemed critical to effective teaching and student learning.

For any one unit, topic, or concept, teachers provide instruction to students and then assess students via the variety of assessment types (e.g. informal, diagnostic, formative, summative, etc.). We focus here on formative assessments, the definition of which revolves more around what one does with the results as opposed to what it physically is (Bennett, 2011; Wiliam, 2014). When we use the word “assessment,” we are referring to anything that elicits student responses regardless of whether it is an actual object or not (informal questioning, for example). Throughout this manuscript, we will detail the process of DDI into four distinct steps as described by general education research, provide specific science education research of DDI, describe our methods, and present results from the study according to each of the four DDI steps. The research presented here was born from an extensive literature review we conducted (Harshman and Yezierski, in press), reviewing over 80 sources on the topic of using assessment results to inform instructional practice. In a general sense, DDI is simple to explain because it closely mimics the familiar scientific process of inquiry. Specifically applied to high school chemistry education, we describe DDI in terms of four steps: determining goals, collecting evidence, making conclusions, and taking actions.

Goals are what teachers are hoping to find out via the results of their assessments. Goals can range from strictly content goals (e.g. “Students will be able to explain why electrons transfer in a single displacement reaction.”) to more abstract ones (e.g. “I want my students to better appreciate the energy loss associated with internal combustion engines.”). Similarly, teachers can set the goal to evaluate the effectiveness of an instructional strategy (e.g. “Does my implementation of the s'mores lab promote understanding of mole-to-mole ratios?”). Although this particular aspect of aligning assessments with goals is important to the entire process of assessment in chemistry, it is largely outside the scope of this study but investigated using the same interviews presented here (Sandlin et al., in preparation).

Once the goal(s) for an assessment is (are) set, teachers design an assessment that will elicit the data needed to evaluate the goals. After implementation and collection of this assessment, teachers seek out the relevant information that will inform their goal(s), referred to as evidence. Like goals, sources of evidence range widely from simple, objective scores and sub-scores (e.g. “90% students responded correctly”) to complex affective characteristics such as the disposition of specific students on the day of assessment (e.g. “this student didn't really try”). Much like in research fields (Linn et al., 1991; American Educational Research Association, 1999; Schwartz and Barbera, 2014), the validity and reliability of sources of evidence used in educational environments should be considered (Popham, 2002; Datnow et al., 2007; McMillan, 2011; Witte, 2012). Often in conjunction with choosing valid and reliable evidence, teachers make conclusions relevant to the goals of the assessment. Conclusions are any declarative statements that a teacher makes about teaching and learning, the assuredness of which is dependent upon what can be supported by the evidence examined. Of note, it is mostly unavoidable that personal and professional experience play a role in making conclusions about teaching and learning, which is not always a bad thing (Suskie, 2004). Once data are analyzed and interpreted, teachers address issues, if any arise, through proposed actions. From the teachers' perspective, all actions are instructional actions. Finally, that action (or inaction) can also be assessed with the same or a different assessment, making the DDI process cyclical.

The impetus for the investigation lies in two crucial ideas. Firstly, teachers will not adopt new pedagogical ideas if the task of translating those ideas into practice is the primary responsibility of the teachers (Black and Wiliam, 1998). Secondly, previous formative assessment research has been largely generalized, and the specifics of a given content area have not been adequately addressed when considering design and analysis of formative assessments (Coffey et al., 2011). Both ideas suggest the necessity to study formative assessment practices while the content, chemistry in this case, is fully integrated into the investigation. As an example of this integration of chemistry, instead of just asking why a teacher designed an assessment as they did, this study probed chemistry-specific considerations, such as why the teacher chose a certain mole-to-mole ratio, why s/he included unit conversions or not, and what the rationale was for providing additional information such as molar masses and chemical equations. Thus, the findings presented here were generated because the study was designed around high school chemistry teachers' course-specific assessments and the central tenet of data analysis was chemistry content learning.

Background

In our literature review, we found that research on DDI is not specific enough to guide day-to-day practice and that a vast majority of the research does not specify a discipline or education level (Harshman and Yezierski, in press). Although they do not directly call out a DDI process, there are a handful of projects that have investigated analogous data-analytic frameworks or various aspects of DDI in chemistry/science. Focusing on the practices of teachers' informal, formative assessments, Ruiz-Primo and Furtak (2007) reported the number of complete and incomplete ESRU cycles (Elicits a question, Student responds, Recognize the response, and Use the information) as well as individual components of each step in the process (e.g. “teacher revises students words” was one specific thing the researchers coded for). In this study, three middle school science teachers' assessment practices were described according to steps of the ESRU framework. While this method provides valuable information of how these teachers implement ESRU in their classrooms, focusing only on theorized steps of ESRU neglects specific aspects not predicted by the ESRU model.

More specific to the secondary science setting, Tomanek et al. (2008) investigated several factors that explained the reasoning used to select tasks for students in science courses. This directly relates to the “goal” step of DDI as it dictates the alignment between the goals and the task. The major finding of this study was a detailed framework that delineates how teachers (experienced and preservice) choose a specific assessment task. While these two studies from Ruiz-Primo and Furtak (2007) and Tomanek et al. (2008) add valuable understanding to teachers' assessment process, much more is left to be investigated.

Research questions

Although assessment also serves the very important purposes of evaluating and providing feedback to students, the focus of our research is how teachers use evaluation of students to inform their teaching. The research questions are aimed at closing a gap in the knowledge base about specific ways in which chemistry teachers inform their teaching with assessment results:

1. How do high school chemistry teachers guide their teaching via the design, implementation, and analysis of classroom assessments?

a. What goals do teachers hope to achieve throughout the assessment process in general and in one specific assessment?

b. What sources of evidence do teachers consider when determining effect of their teaching and/or evaluation of learning?

c. What types of conclusions are made about students and as teachers themselves as a result of analyzing student assessment data?

d. What kind of actions do teachers take to address the results of assessments?

Methods

Sample

A total of 19 (12 female, 7 male) high school chemistry teachers from 10 different states participated in an interview conducted either in-person or via Skype. The sampling method used is difficult to label definitively, but had aspects of criterion and random purposeful sampling techniques (Miles and Huberman, 1994). The list of teachers invited to participate in interviews was compiled by selecting states from the low (approx. 140), middle (approx. 150), and high (approx. 160) performance on the science portion of the National Assessment of Educational Progress standardized tests. Using a random number generator, 3 states from each category were chosen, followed by 10 high schools in each state (found by randomizing a list of each state's counties available online and then searching for schools within those counties), and finally 1 high school chemistry teacher from each school was selected using school web directories. In addition to the 90 teachers chosen this way, more were added by listing acquaintances of the authors of this paper for a total of 126 teachers invited (15% response rate). Science Teacher Efficacy Beliefs Instrument (STEBI, Bleicher, 2004) distribution scores and years of teaching experience are available in Appendix A (ESI). We purposefully did not collect any information regarding previous professional development programs regarding chemistry-specific DDI for two reasons. First, we assumed that few professional development programs (and no chemistry-specific ones) directly address DDI, and it is even less likely that our sampled teachers had gained skills or knowledge from such a program (and our results support this). Secondly, we took great strides to discourage teachers from responding based on “what they thought the interviewer wanted to hear” rather than “what they actually do in their classes,” which could be prompted by even a rudimentary description of DDI.

Interviews

After informed consent (research was fully compliant with IRB) was collected, each of the 19 teachers participated in a semi-structured, three-phase interview that lasted between 30–90 minutes. Next, the teachers provided a homework assignment, quiz, laboratory experiment, or in-class activity (collectively referred to as an assessment) that had already been administered, collected, and evaluated by the teacher prior to the interview. This assessment was sent a day before the interview so that the interviewer could customize questions for specific items on the assessment provided. In Phase 1 of the interviews, teachers were asked about the definitions and purposes of assessment in general and in their own classrooms. For each assessment as a whole (Phase 2) and two to four individual items on the assessment (Phase 3), teachers were asked (1) what the goal(s) of the assessment/item was (were), (2) what conclusion(s) could be made about those goals, (3) how the teacher determined these conclusions, (4) what the teacher would do in light of the conclusions, (5) how the material was originally taught, and (6) any questions specific to the content area assessed by the item or assessment. For the complete interview protocol with an example assessment, see the Appendix B (ESI).

Coding

All transcripts were transcribed verbatim and all coding was carried out in NVivo 9.2 software (QSR International, 2011). Analysis of data went through multiple iterations within each of five methodological steps: (1) data were open coded (as described by Creswell, 2007) into 13 major categories. (2) Horizonalization to develop clusters of meaning (Moustakas, 1994), which were transformed to specific codes. (3) Data were coded according to these codes. (4) Individual codes and associated references were organized and coded into the four main aspects of DDI framework (closed coding). (5) The final organization of the codes was created by aligning the clusters of meaning hierarchically under the four main steps of DDI. See Appendix C (ESI) for an illustration of this entire process.

Reliability

Two separate stages of inter-rater reliability were conducted to determine the reliability of the coding performed. In both stages, Krippendorff's alpha (Krippendorff, 2004a) statistic was used. In opposition to percent agreement, Krippendorff's alpha corrects for chance agreement among raters. Although kappa statistics also correct for chance agreement (Cohen, 1960; Fleiss, 1971), Krippendorff's alpha calculates chance agreement via the proportion of individual codes applied globally versus proportions of agreements and disagreements between raters. This attenuates the differences that individual raters bring to coding and emphasizes the reliable use of codes given any rater (Krippendorff, 2004b). Also, kappa statistics have been shown to suffer from a base rate problem (Ubersax, 1987), which is to say that the interpretation of kappa may depend on the proportions of agreement/disagreement. In the first stage, approximately one third of two different interviews were coded for the four main steps of DDI (goals, evidence, conclusions, actions) by three chemistry education researchers plus the first author of this paper (α = 0.889). In the second stage, one full transcript was coded by the same four researchers in accordance with the full list of codes (81 total, α = 0.787). In addition, three researchers applied these 81 codes to three other full transcripts (one for each of the three, the first author coded all three transcripts); the three pairwise alpha values were 0.848, 0.874, and 0.729. Cutoffs for Krippendorff's alpha have been documented between 0.667–0.800 (Krippendorff, 2004b). These results indicate that the codes were reliably applied throughout the data. Nonetheless, the 81 individual codes were revised as needed from this process and the entire data set was recoded according to these revisions.

Results and discussion

Results will be presented in terms of the DDI steps, since this framework informed the interview protocol and analysis. Within each DDI step, major themes and representative quotations are given. To more efficiently present the results in the narrative, the number of teachers (out of 19) that were coded for a particular idea will be shown in boldface followed by the total number of times that idea was coded across all interview in parentheses (NVivo terms “sources” and “references,” respectively). Fig. 1 shows a graphical representation of the ideas coded along with the frequencies that they manifested in the data. Appendix D (ESI) contains a list of all codes along with their descriptions.
image file: c4rp00188e-f1.tif
Fig. 1 Coding organization chart.

Goals

Given the importance of delineating a goal for each item on any assessment, it is surprising that many teachers did not clearly articulate goals when asked (see Fig. 1). Opposed to stating objectives those in teacher education might be accustomed to seeing (i.e. “students will be able to…”), many teachers listed the required knowledge, equations, and concepts to solve the item or use colloquial terms like “just see if they get it” to describe their goals for an assessment or assessment item. Anecdotally, it seemed as if the teachers were unaccustomed to describing goals for specific assignments despite the heavy emphasis on lesson planning in high schools. Just because teachers did not delineate their goals clearly does not mean that they do not have well-defined goals for their assessments. However, it does indicate that they may not be consciously thinking about the specific reason for assessing students on a regular basis. Fig. 2 shows an example problem provided by Mandisa:
image file: c4rp00188e-f2.tif
Fig. 2 Example of one of Mandisa's assessment item with goals.

Interviewer : So what were the goals for this specific item?

Mandisa : Number one, they had to remember that delta H was the sum of the products minus the sum of the reactants and they had to multiply those delta H values by the coefficients, they forget that… Then they had to understand how to take that data, and plug it into the Gibbs free energy… they don't look at units. That delta S was in joules, not kilojoules…

The goals given by the teacher represent the steps that students must complete to arrive at the correct solution. The analysis of this goal, therefore, will only ever tell if students are or are not following the correct steps to achieve the right solution, a conclusion that does not readily inform instruction or the degree students have learned chemistry concepts.

In Phase 1 of the interviews (which does not allude to any specific assessment), 11 (13) teachers expressed at least one goal that was directly tied to their instruction. Examples of this were teachers that wished to use results of assessments to “know if what I'm doing is working” (Adie), “make the teaching better” (Chris), or “explain this better the next time” (Mark). However, in Phases 2 and 3 of the interview that probed the goals of specific assessments, only one explicitly stated that she designed and used her specific assessment to inform her instruction:

Alyssa : But almost every day I give, it's kind of a video quiz, an assessment of what they saw last night… Sometimes that informs me as a teacher, did they pick up what I wanted them to get out of that video. If not then it's my job to fill the gaps… am I doing what I need to do to support them?

None of the teachers stated instructional improvement as a goal for an individual assessment item, but just because this goal was not explicitly stated does not mean the results did not inform instruction. However, other evidence presented suggests that these teachers spent little time thinking about their instruction in light of assessment results.

While no specific instructional goals were reported by the teachers, two types of broad instructional goals can be seen in Fig. 1. First, 9 (14) teachers expressed their interest in using assessments to determine how to “make the teaching better” (Chris) or “see what areas I need to improve” (Mandisa). Second, Mark and 9 (12) others discussed using their assessments to determine whether or not to continue on with the curriculum versus going back and reteaching in some way. Although these goals have a growth mindset, they are very limited as they do not inform a specific conclusion or action because of their lack in specificity. Being able to conclude one or two specific aspects of teaching lead to extremely vague suggested actions (detailed in further sections). For example, when Adie wanted to use assessments to “know if what [she's] doing is working,” there are many inquiries that naturally follow: How did you teach it originally? How have students responded to that teaching style previously? How does what is measured (student understanding) translate to what you wish to conclude (effectiveness of teaching)? How do you measure student understanding of the chemistry phenomenon? Yet, these questions were not discussed by Adie or any of the other teachers that had similar instructional goals. In the data, setting general instructional goals lead to broad and oftentimes not very helpful actions.

Not surprisingly, every teacher at some point defined goals for their specific assessment and assessment items pertaining to their students' understanding of certain chemistry and math concepts. This is consistent with a DDI model because, in order to make decisions about instruction, evaluation of student-level aspects must be made. However, just because a goal is about chemistry content does not mean that a meaningful interpretation can be gained from results. Surprisingly, 2 (2) teachers describe their content-oriented goals in a manner that neither leads to a clearly assessable outcome nor yields valuable information, exemplified by Chris and Jess:

Chris : So the goal is basically to expose them to every type of chemical reaction that the CollegeBoard could possibly ask them and to try to give them a format exactly as they're going to see it.

Jess : Um, and so that was kind of the intent. So to give them practice with a variety of the problems and then in kind of a structured way.

“Exposing” formats and “giving” are verbs associated with teachers' actions that require no student feedback whatsoever. These teachers may not literally mean that their only goal is to expose content or give practice. However, their elaboration provides evidence that they are teacher-centric goals because they do not discuss anything related to students' learning when making conclusions. Because such goals do not depend on student performance, the results cannot directly inform teaching and learning, thus bringing the process of DDI to a halt.

Finally, 10 (18) teachers' goals for items and assessments revolve around the idea of “understanding” content. Unfortunately, the word “understand” appears to have many different meanings to many different teachers and will be discussed below.

Evidence

Chronologically, after teachers design and implement an assessment, evidence must be considered in order to make conclusions about teaching and learning. To make these conclusions, teachers used both evidence that can be measured with a high degree of certainty (calculated scores, attendance records, etc.) and evidence that is measured with a lesser degree of certainty (effort, confidence, paying attention, etc.). We refer to these as suppositional (lesser certainty) and non-suppositional (greater certainty) evidences. The primary source of non-suppositional evidence was the use of a percent correct on an item or whole assessment. In this logic, teachers concluded that because a high percentage of the class got an item correct, they must understand the content. The problem with this logic is that it heavily relies on the assumptions that (1) the assessment is designed to appropriately elicit chemistry understanding; and (2) the responses from students represent those students' mental models. Neither researchers nor teachers can guarantee that responses to assessment items absolutely reflect students' mental models (Pelligrino, 2012). According to Fig. 1, this over-simplified view of assessment occurred in 16 (40) of the teachers – the three teachers that did not exhibit this code either never collected their assessment or took completion points and did not ever calculate percent correct. Bart provides an example of this code:

Interviewer : …do the assessment results seem to support or refute this goal, that you've met this goal, or your students have?

Bart : It seems to support it.

Interviewer : And then why would you believe that?

Bart : Because the vast majority of the students got most of that problem right if not completely right.

Beyond simple descriptive statistics of scores on an assessment, the only non-suppositional measures of evidence used by teachers were the current students' performance on future (13 (20)) and past (5 (6)) assessments with similar content and performance of previous classes on the same or similar assessments (9 (12)). This means that teachers will use past assessments or look ahead to assessments that are not yet given to students:

Matthew : … are the kids really paying attention or do they not know, and I typically find that out on the final assessment for that unit.

Nichole : Our final in this unit, one of my classes the average was 89% and the other was 80%, and last year it was down in the lower 70's. And I'll be honest with you, these kids struggle more than the kids from last year. Just, they're not as bright, and they're not as motivated…

In the two examples, both teachers use performance on a final exam as a source of evidence to make their conclusions, but Nichole also compares current students' performance to previous years' performance (a second source of evidence). Teachers also used the same sources of evidence in very different ways. For instance, consider how Natalie used the previous performance of her students compared to Britt:

Natalie : [The students] definitely have met [my goals]. I did have a wide range from A's to F's, but knowing the student's background, of course the students that made the A's are the ones I anticipate getting extremely high scores on the national exams, those that made F's are thinking they are getting out of the class by transfer anyways so they have not been trying, so I did not expect my two F's to do well…

Interviewer : …was [conclusion that students could not construct particulate drawings from scratch] based on what you saw in this assessment?

Britt : No, just previous knowledge of because I've done this, I've used these particle drawings before so my expectation is ya know I, from previous assessments, I know that if I were, would have thrown something more complicated on there, I ask them to um, draw something that we haven't seen yet, they might not be able to do that… I think it's uh, ya know they've seen this before, but they've never had to draw them themselves…

Natalie makes the assumption that if a student performs well in the past they should perform well in the future and vice versa for those that did not perform well. While arguments can be made for the validity of this claim, its use as evidence in data analysis is built on faulty logic, as a wide variety of factors that affect student performance on her assessment were not taken into account by Natalie. Alternatively, Britt not only realizes her assessment (which requires students to recognize features of particulate drawings but does not require them to draw them) cannot answer the question posed by the interviewer, but she also relies on the performance of previous classes and states that students' lack of familiarity with this type of problem may mask conclusions she can make about her students' understanding of particulate images.

Alternatively, suppositional sources of evidence were used abundantly throughout all portions of the interviews. Comparisons using non-suppositional evidence generally led to more thoughtful conclusions in the sample of teachers whereas the use of more suppositional measures lead to greater uncertainty in conclusions. Of these suppositional sources, the most prevalent were observations from classroom discussions (13 (22)), teacher experience or instinct (9 (14)), and the level to which students were paying attention during classes (6 (8)). Teachers also used evidence that has both suppositional and non-suppositional characteristics. An example of these sources of evidence were the previous math and chemistry courses of students and what that means (7 (8)) and the motivation of students (3 (4)). In this category of codes, special attention is given to teachers who made conclusions about students based on how much practice the teacher perceived the student completed. This idea can be seen in Fig. 1 as a goal (purpose of assessment or item was to give students practice, 4 (8)), source of evidence (conclusions made were based on amount of practice, 10 (16)), conclusion (students did not get enough practice, 9 (17)), and an action (give students more practice opportunities, 12 (18)). Many times, it seemed as though the teachers were entirely focused on getting practice with solving a particular type of problem instead of anything to do with understanding the chemistry concepts:

Alyssa : Yeah, I think that for the most part, um, I think that students were able to do exactly what I had asked them to do because they had that practice in advance.

Britt : Um, but I'm happy with where we're at so far because they're going to be getting more practice on it.

Bart : …they usually from enough practice can tell me what the limiting reagent is, they can differentiate from the limiting and actual and find the percent yield…

The use of suppositional evidence in making instructional decisions is as unavoidable as it is encouraged (Hamilton et al., 2009). However, when these teachers used more suppositional sources of evidence, it was often in isolation of as opposed to in tandem with non-suppositional evidence. Making decisions in this way can be detrimental to the process of DDI.

Conclusions

By far, codes that represent conclusions that teachers made about students were the most abundant throughout the interviews. Looking at the blue regions in Fig. 1, it is clear that student-centered conclusions (any conclusions about students or learning) grossly outnumber teacher-centered conclusions (any conclusions about teachers or teaching). This suggests that no matter how much the teachers set out to use the results of their assessments for instructional modification, few actually concluded anything about their instruction, and instead made many declarative statements about their students or their students' learning. Not all questions can be appropriately used to inform instruction, but the high number of conclusions that the teachers made about students that were unrelated to their understanding of chemistry content was of interest. These conclusions included making affective judgments and determining student characteristics:

Michael : Yeah, because as I was saying these are capable students and highly motivated, they're very math oriented… these are students who have a very high math capability and so once they understand the methods here, they can just go run by themselves after that.

Britt : …ya know it really wasn't anything new as far as the actual content or discussion but that they actually went and took some measurements, just engaged them more so that they, um, they understood it better.

The two examples from Britt and Michael above could just as easily be displayed in the evidence section of this paper, as the teachers made conclusions to be used as evidence of understanding and learning. Michael informs his instructional decision (deciding that he will “let them run by themselves”) by stating that his students are capable, motivated, and math-oriented. Similar to previous discussion, these may not be the best sources of evidence for which to base what a teacher should do next, as it could lead to false-positive (teacher believes instruction promoted learning when it did not) or false-negative (teacher believes instruction did not promote learning when it did) results. Beyond representing evidence, these and other quotes like it reveal what teachers were aiming to conclude about with their assessments. Teachers that commented on aspects about students that were not tied to the chemistry content were comparable in abundance to aspects that were tied to chemistry content. This finding is very interesting because it demonstrates the value of evidence that can only be obtained by looking outside of the responses to the assessment and outside of considerations from the chemistry content. It was difficult to know if the teachers were determining these characteristics based on the results of the assessment or simply based on their experiences as teachers. Regardless of which, these conclusions regarding students came up during interview prompts that specifically asked them to consider what information could be gleaned from the results, indicating that ideas not directly from assessment results were a large part of the teachers' data analysis process.

Conclusions tied to the chemistry content are integral to the DDI process, as they inform the specific content area weaknesses of students, which in turn helps identify the pedagogical strategy used so that it also may be evaluated. As alluded to previously, a majority of the teachers mentioned conclusions revolving around the idea of student understanding (10 (18)). As an example, Mark used the root “understand” and its associated tenses 37 times in a 64 minute interview (not counting colloquial phrases such as “get it” and “on the same page”). Below is a variety of different ways in which teachers concluded that their students “understood” something:

Adie : … if they can represent the situation in more than one way, it shows me that they actually understand what's going on… they could participate in discussion, which tells me that at least they understood enough to discuss it.

Laura : …I don't think they're really understanding the difference between that that does not represent necessarily two molecules of hydrogen and two molecules of oxygen and then making the mole thing…

Nichole : …we just were not getting any clear understanding, what was the difference? What were the differences and what were the similarities of the four models [of bonding], we weren't getting those.

Amy : …they did pretty successfully, they understood the question, they understood the concept, they understood the math, and like I said that's what it's all about.

Here, a range of meanings of “understand” can be observed. To Adie, understanding means that students can represent phenomena in different ways with an emphasis on being able to talk about it, although she does not specify how the students talk about it, only that they do talk about it. Laura differs from this by equating understanding with demonstrating rote knowledge (of atomic versus molecular hydrogen and oxygen). Nicole likens understanding to being able to identify differences and similarities in different models of bonding. Lastly, Amy both provides scarce details on what it means to understand something, but only that they do or do not understand it. The ambiguity of “understanding” something is just as detrimental to conclusions as it is to goals: without the specification dictated by detailed models of chemistry phenomena, the teachers gathered very little useful information from assessments.

Actions

In contrast to conclusions, actions that teachers would take in light of assessment results were scarce. This is most likely a result of not having specific instructional goals and conclusions, because any prescribed action needs to address the conclusions made from assessment data. Similar to conclusions, teachers' proposed actions were often vague and marked with indecision:

Adie : Well, if they can't do those things then I need to go back and present the material again or do it in a different way.

Laura : So obviously they didn't get the concept, so I need to reteach it somehow, so if it doesn't work by just lecturing and putting an example on the board and doing a couple of them with it, I'll have to come up with a manipulative something…

Adie and Laura both desire to change their instruction, but do not detail what changes are going to be made or how the content deficiencies will be addressed. These were both coded as ambiguous actions (13 (36)) because it is unclear as to what the teachers will actually do. In addition, 10 (14) teachers responded with multiple options for instructional adjustments as a sort of laundry list of suggestions, but never committed to one even after follow-up questions trying to pinpoint which action(s) the teacher deemed necessary. While coming up with multiple options for how to adjust instruction is a good thing, teachers must eventually decide on one or more actions, or else no changes will be made. Because of the limited actions brought up, the data corpus holds limited findings on the decision-making processes of teachers' course of action.

Another noteworthy action exhibited by the teachers was the notion of “reteaching” a concept. While many more teachers used the words “reteach” or “recover,” only 7 (10) gave evidence to suggest they would physically do or say the same exact things to address a content deficiency. Alternatively, many teachers implied that they would actually teach with a different pedagogy even though they used the word “reteach” and were thus coded as ambiguous as described above. As an example of the former group of teachers, Michael stated that he originally taught stoichiometry to his students using an analogy of turkey sandwiches and bicycles. When the assessment yielded less-than-expected results from his students, Michael's response indicates that he will do exactly as he did previously:

Interviewer: …what exactly are you going to do or what have you done to address the mole to mole ratio?

Michael: Well I go back to bicycles and turkey sandwiches. How many seats do you need for each bike, so that's a one to one ratio. How many wheels do you need for each bike, two, that's a two to one ratio, so now let's come back to the chemical substances and it's the same methodology.

Our hypothesis to explain this finding is that these teachers adhere to the belief that students need to hear things multiple times before being able to understand the content and/or meet the teacher's goal(s). We argue for this belief because all but one of the teachers who said they would reteach exactly as originally taught also had heavy emphasis on practice inside and outside of class, indicating the value of repetition in student work.

Exemplary DDI

Throughout this project, it was never our intention to criticize high school teachers or to actively search for things they were doing “wrong.” So far, we have purposefully shown places where teachers' DDI practices limit their ability to inform their instruction so that we can promote future research to focus on interventions in these areas. Despite the previous results, we are happy to report just a sample of the promising aspects seen throughout the data. For example, with a heavy emphasis on mathematical assessments and items, 9 (17) teachers noted that just because students were able to complete algorithmic problems did not mean that they understood the concepts behind them:

Laura: I consider chemistry and application so I feel like, hey, doing math problems is applying it but again, everybody can plug and chug if they know how to do it, but do they understand why they're doing it, that's, I've been asking a lot of why questions lately, which I think is helping.

Quotes like the one above from Laura indicate at least partial alignment between goals and items on assessments because the teachers demonstrate that they interpret results knowing that correct mathematical answers do not necessarily indicate (mis)understanding of chemistry concepts.

Additionally, 5 (7) teachers specifically sought feedback from their students by use of metacognitive questions beyond “what are you struggling with?” When teachers invest in collecting more information from their students, they are gathering more data than just the results of the assessment and are better able to inform appropriate instructional modifications. Lastly, a fair number (approximately half, although this was never specifically coded) of the sample of teachers mentioned writing notes for the next time they would implement an assessment or activity. This common practice aligns well with DDI as the teacher uses data to drive what changes (if any) should be made to instruction.

Conclusions

In our response to specific research questions 1a–1d, major themes were identified in each subheading above. The high school chemistry teachers in this study set goals that were not very conducive to instructional improvements largely due to their ambiguous nature, as was determined by analyzing the chemistry content the teachers stated as goals. In evaluating these goals, teachers often used suppositional or non-suppositional evidence, but not both in conjunction with the one another, thus missing opportunities to cross-reference sources of evidence upon which to draw conclusions. As a result, many of the conclusions about student learning were based on potentially faulty evidence, compromising the integrity of the DDI process. In addition, conclusions regarding the teachers themselves as effective educators were not reported in the same quantity as the goals that were made for instructional improvement. This led to an even lesser number of tangible, specific actions that teachers said they would take to improve their instructional effectiveness.

Our study also shows several overall themes associated with how teachers enact the DDI process (response to broad research question 1). First, in every step teachers demonstrated a lack of elaboration when responding to interview prompts. Teachers may be oversimplifying the processes of learning and measuring learning, which are both extremely complex, by reducing complex chemical phenomenon to algorithms for which understanding can be determined by simple correct/incorrect dichotomies. Secondly, in several of the specific assessment items that teachers provided for the interviews, the DDI process was stunted by having dichotomous goals (e.g. “see if the students got it or not”) or by teachers' dismissal of assessment results because of affective aspects. Lastly, based on the results discussed, the sample of high school chemistry teachers in this study demonstrated limited DDI practice. We label it limited because the teachers' implementation of DDI constrains what information teachers could draw from their assessment results and what pedagogical strategies could be employed to support effective pedagogy and improve upon weaker pedagogy.

Limitations

To exhibit transparency in our work, we have provided a discussion of some limitations with our study. First, we asked several questions in a format that implies a yes/no answer so that teachers would be prompted to commit to explicitly stated conclusions as opposed to supplying responses that make it unclear as to what the teacher actually thinks. This encouraged shorter responses to our prompts and as a result, could have increased the prevalence of codes that targeted dichotomous responses. For this reason, the interviewer always asked unscripted questions to elicit more detail and codes were given more stringent descriptions so that what was coded was a result of what the teachers expressed as opposed to a product of the prompt format. Another limitation was the variety of teacher-provided assessment content. To rule out differences and find similarities in DDI, we originally limited the content to a few topics, but ended up with nine chemistry topics throughout the interviews due to participant availability. While this provides breadth, we recognize that the DDI process may look different depending on the topic being assessed because pedagogical strategies and types of assessment may also change depending on these topics. This limitation has theoretical backings that content knowledge interacts with pedagogical knowledge to engender pedagogical content knowledge (Shulman, 1987). In response to this issue, we sought information specific to the topic and later generalized the findings into themes that cut across multiple topics such as conceptual vs. mathematical understanding. Lastly, the data used in this project was self-reported data from the teachers that participated and findings are not based on observational evidence. We do not see this as much as a limitation, but more as our focus on what teachers were thinking during assessment interpretation – an aim that observational data would not be able to validly measure.

Implications

This study generated many implications for chemistry teachers, professional development providers, and chemistry education researchers. The implications for teachers and professional development providers are organized by the steps of DDI, followed by the implications for researchers.

Goals

When teachers set goals for assessments, caution should be used when trying to assess students' “understanding” of a certain topic. As was seen in our study, many vague goals of understanding did not lead to fruitful conclusions regarding the quality of learning or instruction. A great way for teachers to do this is to take extra time to ask “what does it mean to understand this topic and what evidence would show this?” and then ensure that assessment items can actually elicit such evidence. High quality assessments available to teachers such as Page Keely's assessment probe books from NSTA Press (Keely and Harrington, 2010), concept inventories (a sample of inventories in redox, Brandriet and Bretz, 2014; bonding, Luxford and Bretz, 2014; general chemistry, Mulford and Robinson, 2002), and the American Association for the Advancement of Science's Assessment web site (http://assessment.aaas.org/pages/home) can alleviate the burden on teachers to generate items so that teachers can focus more on aligning assessments with specific learning goals. Secondly, determining how effective a lesson was in encouraging learning should be a goal of specific assessments and items, not just a general concern as many teachers thought of it as. Teachers that actively consider how their teaching has impacted learning, which is at least to a degree displayed in student responses, will have a heightened ability to reflect on their practice (for reflection theory, Schon, 1983; Cowen, 1998; for example implementation in science education, Danielowich, 2007). We wish to encourage teachers to make specific conclusions tied to their instruction; without this reflection/consideration, very few changes to teaching will be occur, as was the case in our sample.

Evidence

The most pressing issue observed at least once from every single teacher (except three that did not collect their students' assessments) is the unquestioned assumption that student performance on assessments unequivocally correlates to understanding of the content. At a minimum, the use of scores and/or sub-scores on assessments needs to be considered in light of the design of the assessment and should be considered alongside other sources of evidence. For example, a teacher should consider more than a class average score to determine student understanding and gather evidence outside of scores, such as similar assessments given previously and/or students' questions during work time and discussions. Alternatively, teachers overly favored suppositional evidence. While professional experience certainly has a place within DDI (Suskie, 2004), it should be noted that judgments of the affective characteristics of students does not come without bias, and caution is warranted when making such judgments. We suggest that teachers have at least 2–3 sources of evidence when making an affective judgment. For example, considering in conjunction a large absence/tardy record, multiple conversations revealing apathy, and unsuccessful attempts to get a certain group of students to participate strongly suggests a lack of motivation where only one of these sources of evidence would weakly suggest motivational issues. In many cases, use of only one source of evidence (student performance data) will not enable the teacher to determine if one goal is fully met. The obvious solution to this is to collect more evidence, but this suggestion wasn't mentioned frequently by the teachers. Scaffolding problems is a way to access more information while providing feedback to students throughout the learning process (see Chapter 5: scaffolding learning in chemistry, in Taber, 2002).

Conclusions

As was observed in the data, without a great deal of conclusions about teaching, there are few teaching actions that can be specified. Therefore, we reiterate the importance that teachers consider the impact that pedagogical strategies have on learning as measured by assessments. Also, whether students' results are “good” or “bad,” teachers should attempt to specify the why behind what was observed. Asking clarifying questions such as “what was it specifically about this lesson that worked so well in helping them learn?” or “I think that showing a guided example of the whole stoichiometry process followed by an explanation of each step did not work as well as first explaining each step and then showing the whole process.” In these examples, ideas directly tied to teaching are being assessed by the assessment as opposed to solely assessing student learning. Also, when student learning is being assessed, we suggest (as we did in the goals section) to avoid making conclusions such as “the students understand this content.” Teachers should clarify both what chemistry the students specifically “understand” and what it means to “understand” that specific chemistry content.

Actions

Consistent with our theme of calling for detail, we find it important that chemistry teachers choose a specific action for a specific purpose. If a teacher suspects that one type of activity did not promote learning, it is better to uncover (or at least hypothesize) a reason why it did not than to try something different just because it is different. Targeting a reason for less-than-expected performance specifically informs a new pedagogical strategy needed that could address the reason that the strategy was less effective.

Research

In addition to the above implications for teachers, this study also has implications for researchers studying high school chemistry teachers and instruction. Because the teachers from our sample spoke ambiguously about their assessment processes, other studies investigating DDI or general assessment characteristics may find a similar lack of specificity. As we have reported, this does not mean that teachers neglect to think about their assessment practices, but a carefully constructed data solicitation method is required to prompt teachers to elaborate on their assessment practices. We found that the discussion of assessment practices in situ with their chemistry-specific curriculum, assessments, and instruction worked well and would serve future inquiries. This outcome emphasizes the necessity of studying assessment in a disciplinary context.

Future work

While this study investigates the DDI process of 19 high school chemistry teachers, it does not necessarily translate to the majority of high school chemistry teachers. Future studies should focus on characterizing which of these traits exist in a representative sample of high school chemistry teachers to see how widespread these themes are. After characterization of a large sample, professional development and pre-service educational programs should be developed to address limitations in the process found in larger scale studies. Also, our qualitative study focused on a bigger picture of the entire DDI process with details at each step. Additionally, an investigation into the alignment between stated goals and assessment items, teachers' process while developing assessments, and comparing assessment beliefs to assessment practices would be very valuable for both researchers and practitioners.

Acknowledgements

We would like to thank the teachers that participated in this study as well as the Yezierski and Bretz Research Groups at Miami University. Specifically, we thank Justin Carmel, Sara Nielsen, and Sarah Erhart for their assistance in qualitative coding for this project.

References

  1. American Educational Research Association, American Psychological Association, National Council on Measurement in Education and Joint Committee on Standards for Educational and Psychological Testing (U.S.), (1999), Standards for educational and psychological testing, Washington, DC: American Educational Research Association.
  2. Bennett R. E., (2011), Formative assessment: a critical review, Assess. Educ.: Princ., Pol., Pract., 18(1), 5–25.
  3. Black P. and Wiliam D., (1998), Inside the black box: Raising standards through classroom assessment, Phi Delta Kappan, 80(2), 139–144, 146–148.
  4. Bleicher R., (2004), Revisiting the STEBI-B: Measuring Self-Efficacy in Preservice Elementary Teachers, Sch. Sci. Math., 104(8), 383–391.
  5. Brandriet A. R. and Bretz S. L., (2014), The development of the Redox Concept Inventory as a measure of students' symbolic and particulate redox understandings and confidence, J. Chem. Educ., 91, 1132–1144.
  6. Brunner C., Fasca C., Heinze J., Honey M., Light D., Mandinach E., et al., (2005), Linking data and learning: The Grow Network study, Journal of Education for Students Placed at Risk, 10(3), 241–267.
  7. Calfee R. C. and Masuda W. V., (1997), Classroom assessment as inquiry, in Phye G. D. (ed.), Handbook of classroom assessment. Learning, adjustment, and achievement, San Diego: Academic Press.
  8. Coffey J. E., Hammer D., Levin D. and Grant T., (2011), The missing disciplinary substance of formative assessment, J. Res. Sci. Teach., 48(10), 1109–1136.
  9. Cohen J., (1960), A coefficient of agreement for nominal scales, Educ. Psychol. Meas., 20, 37–46.
  10. Cowen J., (1998), On Becoming and Innovative University Teacher, Buckingham: Open University Press.
  11. Creswell J. W., (2007), Qualitative inquiry and research design: Choosing among five traditions, 2nd edn, Thousand Oaks, CA: Sage.
  12. Danielowich R., (2007), Negotiating the conflicts: Reexamining the structure and function of reflection in science teacher learning, Sci. Educ., 91(4), 629–663.
  13. Datnow A., Park V. and Wohlstetter P., (2007), Achieving with data: How high-performing school systems use data to improve instruction for elementary students, Los Angeles, CA: University of Southern California, Center on Educational Governance.
  14. Fleiss J. L., (1971), Measuring nominal scale agreement among many raters, Psychol. Bull., 76(5), 378–352.
  15. Gallagher L., Means B. and Padilla C., (2008), Teachers' use of student data systems to improve instruction, 2005 to 2007, U.S. Department of Education, Office of Planning, Evaluation and Policy Development, Policy and Program Studies Service.
  16. Halverson R., Prichett R. B. and Watson J. G., (2007), Formative feedback systems and the new instructional leadership, Madison, WI: University of Wisconsin.
  17. Hamilton L., Halverson R., Jackson S., Mandinach E., Supovitz J. and Wayman J., (2009), Using student achievement data to support instructional decision making (NCEE 2009-4067), Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education.
  18. Harshman J. and Yezierski E., (in press), Assessment data-driven inquiry: A review of how to use assessment results to inform chemistry teaching, Sci. Educ., Summer, 2015.
  19. Keely P. and Harrington R., (2010), Uncovering Student Ideas in Physical Science, Washington, DC: National Science Teachers Association Press, vol. 1.
  20. Krippendorff K., (2004a), Content Analysis: An Introduction to its Methodology, 2nd edn, Thousand Oaks: Sage.
  21. Krippendorff K., (2004b), Reliability in content analysis: some common misconceptions and recommendations, Hum. Commun. Res., 30(3), 411–433.
  22. Linn R. L., Baker E. L. and Dunbar S. B., (1991), Complex, performance-based assessment: Expectations and validation criteria, Educ. Res., 20(8), 15–21.
  23. Luxford C. J. and Bretz S. L., (2014), Development of the Bonding Representations Concept Inventory to Identify Student Misconceptions about Covalent and Ionic Bonding Representations, J. Chem. Educ., 91, 312–320.
  24. McMillan J. H., (2011), Classroom assessment: principles and practice for effective standards-based instruction, 5 edn, Pearson.
  25. Means B., Chen E., DeBarger A. and Padilla C., (2011), Teachers' ability to use data to inform instruction: challenges and supports, Office of Planning, Evaluation and Policy Development, U.S. Department of Education.
  26. Means B., Padilla C. and Gallagher L., (2010), Use of education data at the local level: From accountability to instructional improvement, Office of Planning, Evaluation and Policy Development, U.S. Department of Education.
  27. Miles M. B. and Huberman A. M., (1994), Qualitative Data Analysis: An Expanded Sourcebook, Thousand Oaks, CA: Sage.
  28. Moustakas C., (1994), Phenomenological Research Methods, Thousand Oaks, CA: Sage.
  29. Mulford D. and Robinson W., (2002), An inventory for alternate conceptions among first-semester general chemistry students, J. Chem. Educ., 79, 739–744.
  30. National Research Council, (2001), Knowing what students know: The science and design of educational assessment. Committee on the Foundations of Assessment, in Pelligrino J., Chudowsky N. and Glaser R. (ed.), Board on Testing and Assessment, Center for Education. Division of Behavioral and Social Sciences and Education, Washington, DC: National Academy Press.
  31. Pelligrino J., (2012), Assessment of science learning: living in interesting times, J. Res. Sci. Teach., 49(6), 831–841.
  32. Popham W. J., (2002), Classroom Assessment: What Teachers Need to Know, 3 edn, Allyn and Bacon.
  33. QSR International (2011), NVivo qualitative data analysis software. Version 9.2.
  34. Ruiz-Primo M. A. and Furtak E. M., (2007), Exploring teachers' informal formative assessment practices and students' understanding in the context of scientific inquiry, J. Res. Sci. Teach., 44(1), 57–84.
  35. Sandlin B., Harshman J. and Yezierksi E., (2014), Formative assessment in high school chemistry teaching: investigating the alignment of teachers' goals with their items, J. Chem. Educ. Res., in preparation.
  36. Schon D. A., (1983), The Reflective Practitioner: How Professionals Think in Action, USA: Basic Books.
  37. Schwartz P. and Barbera J., (2014), Evaluating the Content and Response Process Validity of Data from the Chemistry Concepts Inventory, J. Chem. Educ., 91(5), 630–640.
  38. Shulman L. S., (1987), Knowledge and teaching: Foundations of the new reform, Harvard Educ. Rev., 57(1), 1–22.
  39. Suskie L., (2004), What is assessment? Why assess? In Assessing student learning: A common sense guide, San Francisco: Jossey-Bass Anker Series, pp. 3–17.
  40. Suskie L., (2009), Using assessment results to inform teaching practice and promote lasting learning, in Joughin G. (ed.), Assessment, Learning, and Judgement in Higher Education, Springer Science.
  41. Taber K., (2002), Chemical Misconceptions – Prevention, Diagnoses, and Cure Volume I: Theoretical Background, Piccadilly, London: Royal Society of Chemistry, pp. 67–84.
  42. Tomanek D., Talanquer V. and Novodvorsky I., (2008), What do science teachers consider when selecting formative assessment tasks?, J. Res. Sci. Teach., 45(10), 1113–1130.
  43. Ubersax J. S., (1987), Diversity of decision-making models and the measurement of interrater agreement, Psychol. Bull., 101(1), 140–146.
  44. Wiliam D., (2014), Formative assessment and contingency in the regulation of learning processes, paper presented at Annual Meeting of American Educational Research Association, Philadelphia, PA.
  45. Witte R. H., (2012), Classroom assessment for teachers, McGraw-Hill.

Footnote

Electronic supplementary information (ESI) available. See DOI: 10.1039/c4rp00188e

This journal is © The Royal Society of Chemistry 2015