Molly A.
Undersander
a,
Travis J.
Lund
*b,
Laurie S.
Langdon
*c and
Marilyne
Stains
*a
aDepartment of Chemistry, University of Nebraska-Lincoln, Lincoln, USA. E-mail: mstains2@unl.edu
bDepartment of Natural Sciences, Oregon Institute of Technology, USA. E-mail: Travis.Lund@oit.edu
cSchool of Education, University of Colorado, Boulder, USA. E-mail: laurie.langdon@colorado.edu
First published on 27th September 2016
The design of assessment tools is critical to accurately evaluate students' understanding of chemistry. Although extensive research has been conducted on various aspects of assessment tool design, few studies in chemistry have focused on the impact of the order in which questions are presented to students on the measurement of students' understanding and students' performance. This potential impact has been labeled the question order effect in other literature and may be considered as a threat to the construct validity of the assessment tool. The set of studies described in this article tested whether question order effects were present within a concept inventory on acid-based chemistry. In particular, we tested whether the order of two conceptually isomorphic questions, one pictorial and one verbal, affected students' performance on the concept inventory. Two different versions of the inventory were developed and collected from students enrolled in the second semester of first-year university chemistry courses (general chemistry; N = 774) at two different institutions and to students enrolled in the first semester of organic chemistry (N = 163) at one of the two institutions. Students were further divided in two groups based on their self-reported level of effort in answering the concept inventory. Interviews were also conducted with a total of 19 students at various stages of the studies. Analyses of differences in students' responses to the two versions of the inventory revealed no question order effect in all settings. Implications for instructors and researchers are provided.
In developing the concept inventory, efforts were made to ask several questions related to each sub-concept of interest. One pair of questions, and the focus of this study, asked students to consider relationships among strength, concentration, and pH of acidic solutions. As shown in Fig. 1, one question (labelled P, pictorial) required students to interpret molecular-level representations of acidic solutions, and one question was based on text (verbal, V).
Mid-semester cognitive interviews, in which students thought aloud as they worked through the entire 20-item instrument, revealed that some students seemed to approach the molecular-level “picture questions” differently than other questions, leading investigators to wonder whether encountering the molecular-level picture question (P) early in the inventory might help students with the verbal question (V) later in the inventory, or vice versa. Thus two slightly different versions of the instrument were administered at the end of the semester, in which the pictorial and verbal items related to acid strength/concentration/pH were reordered to probe a possible question order effect. Initial analyses of the post-assessment indicated that students performed better on the verbal question when it was ordered after the pictorial question. Due to some limitations with how the post-assessment was administered, we repeated the study at a different institution with an online version of the instrument. Analyses of results from both institutions will be presented below. The overall goals of this study are to examine whether altering the order of pictorial and verbal items related to acid concentration/strength/pH relationships affects student performance on those particular items, and if so, to also determine the extent to which these effects are consistent across institutional and instrument administration contexts.
It is useful to note that an assessment or test is never itself “validated.” Rather, validation studies are conducted to provide evidence for interpreting the meaning of test scores, which themselves are a function of the items, respondents, and contexts in which the assessment is given (Messick, 1995). Threats to validity also need to be considered. For instance, construct-irrelevant variance may be an important factor when items become more difficult or easier for some individuals or groups based on features that are irrelevant to the construct under measure (Messick, 1995). This is of interest in the current study as we investigate whether features of either a verbal-based question or pictorial question help or hinder respondents in subsequent related items, depending on the order they encounter them. If one question order produces an advantageous outcome over the other, that has implications for instrument design and score interpretations, especially in cases where item order is varied across administrations.
Leary and Dorans (1985) created a summary of the results of major studies on question order effect completed to date. The results were inconclusive as to whether or not a question order effect existed when questions are ordered from difficult to easy. However, they did not find a question order effect based on content order (i.e., sequenced versus randomly ordered tests). Overall, they found as many studies with significant test results as studies with insignificant results. They concluded that no firm conclusion could be reached as to the conditions under which the question order effect exists. Table 1 includes several additional studies on question order effect either completed since the summary by Leary and Dorans or not included in their summary. The results corroborate the conclusions made in Leary and Dorans that while some studies show significance for different factors, there is considerable disagreement in whether the question order effect exists.
Item order | Investigator | Test conditionsa | Significance of results |
---|---|---|---|
a A power test assesses a student's ability with no regard for how long it takes to complete the test; an ideal power test would give students all the time necessary for them finish the entire test. A speeded test ideally would contain homogenously simple tasks so that if students were given an unlimited amount of time, they should be able to get 100%, but because of the time limit, it is testing the students' processing speed. Most tests fall somewhere in between these two; most tests commonly fall under the category “timed power test” such as the SAT, ACT, and GRE because not all speed tests can have items that are “trivially easy” and most power tests are restricted to some kind of time constraint (Mead and Drasgow, 1993). | |||
Sequenced/content-ordered vs. Random content | Balch (1989) | Power | s |
Gohmann and Spector (1992) | Power | ns | |
Neely et al. (1994) | Power | ns | |
Pettijohn and Sacco (2007) | Power | ns | |
Tal et al. (2008) | Power | ns | |
Variations in placement of a section of a questionnaire | Bradburn and Mason (1964) | Power | ns |
Sequenced/content-ordered and Easy to Hard vs. Random content and Random difficulty | Carlson and Ostrosky (1992) | Power | s |
Easy to Hard vs. Hard to Easy | Coniam (1993) | Power | ns |
Hodson (1984) | Speed | ns |
Researchers have also been concerned about how question order could affect a test's reliability, validity, and difficulty, as well as students' motivation and post-exam evaluation (e.g., Monk and Stallings, 1970; Hodson, 1984; Balch, 1989; Carlson and Ostrosky, 1992; Coniam, 1993; Pettijohn and Sacco, 2007). Pettijohn and Sacco (2007) and Balch (1989) found no significant differences in the time it took students to complete different versions of their exams. Monk and Stallings (1970) and Carlson and Ostrosky (1992) found that question order did not seem to affect question validity or exam reliability.
Studies conducted to date on the question order effect thus provide inconclusive results. However, these studies had limitations regarding their testing methods and participants which render their comparisons difficult. For instance, although random assignment of participants to different version of the test is preferred, certain studies could only “randomize” the test population by handing out different testing packets as students walked through the test centre door, or by pre-assigned test packets “randomly” following alphabetical order of the class roster (Plake, 1980; Balch, 1989). Participants' demographics also differed: some studies have been conducted with high school students (Mollenkopf, 1950), while others with undergraduate students (Gohmann and Spector, 1989), and still others with adults in the workforce (Bradburn and Mason, 1964). Perhaps two of the most critical differences limiting the comparability of these studies are in the physical testing conditions and the subject being tested. Until tests began to be administered digitally, these studies conducted their tests on paper, and as such the researchers could not explicitly control whether students were answering questions in the order expected, and thus were not strictly testing the question order effect (Dean, 1973). In addition, these studies cover a wide range of topics including psychology (Balch, 1989; Neely et al., 1994), math (Mollenkopf, 1950; Leary and Dorans, 1985), verbal skills (Mollenkopf, 1950; Coniam, 1993), geography (Monk and Stallings, 1970), psychiatric nursing (Plake, 1980), physics (Gray et al., 2002), general social science (Crano, 1977), job related interviews (Bradburn and Mason, 1964), business and economics (Dean, 1973; Gohmann and Spector, 1989; Carlson and Ostrosky, 1992) and relatively few in chemistry (Hodson, 1984). As Bradburn and Mason stated: “it is impossible to generalize with any degree of confidence to other situations…the effect of a particular question or topic on later questions can only be determined empirically within the context of a particular questionnaire” (Bradburn and Mason, 1964, p. 61).
In summary, the literature across several disciplinary domains is inconclusive on whether or not a question order effect actually exists; many types of ordering and other factors have been tested, yet the empirical evidence shows mixed results. It is thus not yet clear whether researchers are introducing unwanted variability to their data if they utilize alternate versions of a concept inventory.
• The visuals must be relevant and not distracting to the text.
• The content of the visuals is more important than color, simplicity, or realism.
• The point of using visuals is to supplement, not replace text.
It is especially important to keep these guidelines in mind in the design of assessment tools because “the main risk of including images in the context of examining is that an image may lead to the formations of a mental representation of a question that does not match the meaning intended by the question setter” (Phillips et al., 2010, p. 141). Indeed, studies have found that students do not use the same visuals the same way (Angeli and Valanides, 2004; Crisp and Sweiry, 2006). For example, Duran and Balta (2014) found that having visuals within test questions had a significant effect on student scores for students who did not already excel at science, but no effect for students who already did well in science. In chemistry, studies have also found that students struggle in their analysis of visual questions (Nurrenbern and Pickering, 1987; Sanger and Phelps, 2007), in part because of the limitations associated with static representations (e.g., velocity of atoms and molecules are typically not represented even though this information may be critical to selecting the correct answer) (Sanger and Phelps, 2007). Duran and Balta (2014) thus suggested that it is not necessarily always better to have a visual for every test question but that some questions would be more effective with just text.
The balance between visuals and text has been an important topic in the assessment literature. Studies by Holliday (1975), Kapıcı and Savaşcı–Açıkalın (2015), Mayer (1989), Mayer and Anderson (1991), Mayer et al. (1996), and Phillips et al. (2010) have all concluded that pictures would actually be almost useless without an appropriate, small amount of text to accompany them in the form of captions or supporting text. Of course, the text must also be relevant to the visual to have any impact (Mayer and Anderson, 1991). In other words, with either too much or too little accompanying text or instructions, students tend to undervalue and ignore the visuals, in which case a “good picture” can actually fail to serve its purpose (Weidenmann, 1989, p. 163).
One of the most widely used theories to explain the cognitive processing of visuals and text is dual coding theory (Paivio, 1990; Clark and Paivio, 1991; Paivio, 2013). This theory states that we are cognitively capable of encoding both visual and verbal forms of information. Verbal information is only coded verbally, while imagery can be coded using both verbal and visual encoding. The dual coding theory suggests that using both representational and referential processing can aid in recall and recognition of learned information (Mayer and Anderson, 1991). Many researchers use dual coding theory to explain the benefits of using visuals in education (Winn, 1987; Weidenmann, 1989; Clark and Paivio, 1991; Mayer and Anderson, 1991). However, some are more skeptical. For example, Schnotz and Bannert (2003) claimed that dual coding theory is not sufficient because it does not take into account that students can encode visuals incorrectly, and therefore the visuals can have a negative effect on the learning. They concluded that it was not appropriate to assume that pictures have a generally beneficial effect on learning. In fact, Schnotz (2002) suggested that when presenting pictorial and verbal test questions, it may be better to present the visuals first because they require less working memory space, and then the verbal portion should follow.
In summary, the literature is in general agreement that the judicious use of visuals in test questions is generally beneficial, particularly in science education. The goal of the current study is to test for question order effect within the realm of verbal versus pictorial questions.
Following the pre-test, researchers conducted think-aloud interviews with seven students on the entire 20-item instrument. In this version, students encountered pictorial questions regarding acid concentration/strength/pH relationships early and responded to the verbal question on the same concept later. Upon noticing that some students approached the pictorial question differently than the related verbal question during the interview, the researchers created two versions of the instrument to administer at the end of the semester. One version (labelled PV for purposes of this study) was identical to the pre-test, with pictorial question (P) as Question 9, and the verbal question (labelled V) as Question 18. The second version (labelled VP for purposes of this study) switched the placement of these questions. The first 8 questions remained the same across both versions and were followed by either P or V, depending on the inventory version. The remaining questions were kept in the same order between the two versions.
Post-test versions were administered in the last week of the semester to students based on their regular recitation section; the PV version was given to students who attended their weekly recitation sections on Monday (five sections) and Tuesday (nine sections), while the VP version was given to students in recitation sections on Wednesday (five sections), Thursday (eleven sections), and Friday (two sections). Students were allotted 3 points out of 1000 total course points for completing the inventory, regardless of their score. Of 640 students in the class, a total 553 post-tests were completed.
All Graduate Teaching Assistants were instructed on how to administer the inventory to their recitation sections. During the weekly lab/recitation preparation meeting, the researcher emphasized the importance of collecting good data, meaning that all students needed to be encouraged to give their best efforts with also knowing their score would not affect their grades in any way. Students were allowed to take as long as they needed to complete the instrument, and most students finished within 25 minutes of the 50-minute recitation period.
Qualitative data were collected through 19 student interviews conducted after the collection of the inventories. Students received the same version of the concept inventory that they took online. Participants were selected based on how they answered P and V. If students took version PV, they were contacted if they answered P correctly and V correctly or incorrectly. If students took version VP, they were contacted if they answered V correctly and P correctly or incorrectly. Each interview consisted of two parts. First, students were asked to think-aloud as they solve questions P and V. This type of interview was chosen as it is one of the most effective strategies to capture students' thinking processes while they perform a task (Ericsson and Simon, 1980). Second, students were engaged in a semi-structured interview, in which they were probed about their preference of whether P or V was presented first, whether or not the initial question was helpful in answering the following question, and whether a change in the order of the two questions would have helped them answer the second question. Each interview transcript was read and annotated by the first author. The first and last author independently classified each transcript based on interviewee's preferences for seeing question V or P first and coded reasons for their choices. Upon comparisons of codes, few inconsistencies were found and resolved through discussion (Saldaña, 2015).
GCII post-instruction, moderate/high effort | GCII post-instruction, high effort | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Question | Test version | Statistics | Question | Test version | Statistics | ||||||
PV (n = 205) (%) | VP (n = 291) (%) | p value | Φ | 1 − β | PV (n = 165) (%) | VP (n = 235) (%) | p value | Φ | 1 − β | ||
P | 46.3 | 57.0 | 0.024 | 0.106 | 0.446 | P | 47.9 | 60.4 | 0.017 | 0.124 | 0.498 |
V | 38.5 | 30.6 | 0.081 | 0.083 | 0.260 | V | 40.0 | 34.5 | 0.306 | 0.057 | 0.090 |
GCII post-instruction, moderate/high effort | GCII post-instruction, high effort | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Question | Test version | Statistics | Question | Test version | Statistics | ||||||
PV (n = 101) (%) | VP (n = 101) (%) | p value | Φ | 1 − β | PV (n = 70) (%) | VP (n = 61) (%) | p value | Φ | 1 − β | ||
P | 39.6 | 49.5 | 0.157 | 0.100 | 0.144 | P | 47.1 | 57.4 | 0.242 | 0.102 | 0.094 |
V | 43.6 | 53.5 | 0.159 | 0.099 | 0.141 | V | 38.6 | 57.4 | 0.032 | 0.188 | 0.370 |
OCI post-instruction, moderate/high effort | OCI post-instruction, high effort | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Question | Test version | Statistics | Question | Test | Statistics | ||||||
PV (n = 82) (%) | VP (n = 81) (%) | p value | Φ | 1 − β | PV (n = 39) (%) | VP (n = 40) (%) | p value | Φ | 1 − β | ||
P | 29.3 | 30.9 | 0.824 | 0.017 | 0.015 | P | 35.9 | 40.0 | 0.707 | 0.042 | 0.019 |
V | 35.4 | 38.3 | 0.701 | 0.030 | 0.020 | V | 38.5 | 42.5 | 0.715 | 0.041 | 0.019 |
Cleaning of the Midwestern data was done by removing the scores of participants who self-reported using resources even though they were instructed not to at the beginning of the online concept inventory. No strict data cleaning could be done based on the amount of time spent taking the concept inventory online because only a start and stop time were included in the derived Qualtrics data. If the two time stamps indicated an unreasonably short time spent between opening the inventory and submission, this data was excluded, although this applied to very few participants. Because of the nature of the online medium, we could not control how long students left the inventory open on their computer. Technically students could have left the inventory open on their computer for the whole week while working on it a little bit at a time throughout the week, making the Midwestern data collection resemble an almost ideal power test, whereas the Western university's data collection resembles more of a timed power test since the questions are not trivially easy, but the inventory time was limited to the written lab quiz time (Mead and Drasgow, 1993). We were not concerned about the difference in mediums due to the fact that Mead and Drasgow (1993) found in their study that for power tests (not speeded) online versus paper-and-pencil medium did not affect participants' performance.
Concept inventory data were analysed using the statistics software SPSS. This software was used to compute a 2 × 2 contingency test based on whether the students' answer choices were correct or incorrect. We applied the Bonferroni correction to minimize Type I error; this led to a threshold level of significance of 0.013 for each population. In addition, SPSS was used to calculate t-tests for the total inventory scores and for the students' total scores on the first 8 questions of the inventory.
Since the first 8 questions of the inventory were always presented in an identical order on all versions of the inventory, they functioned as a set of control questions that enabled the student performance on the first portion of the instrument to be compared universally. Student performance on these initial control questions was compared across all inventory sections, and no statistically significant differences were observed (see Appendix).
We found no statistical significance for any of the comparisons at both institutions indicating an absence of the question order effect in our populations. After no differences were found, we calculated the statistical power for each test using the observed effect sizes (shown in Table 3). These results indicate relatively low probabilities (between 0.015 and 0.498) that the sample sizes used in this study would be large enough to identify the observed effect sizes even if such effects existed (Type II error rate was between 0.502 and 0.985). Therefore, we identified two trends that were present in our data set that warrant further investigation with larger sample sizes. First, we found at both institutions that GCII students who took version VP, in which they saw the pictorial question P after the verbal question V, performed approximately 10% better on P than the students who took version PV, in which they saw P prior to V. At the Midwestern university, we found that the High effort GCII student population performed almost 20% better on the verbal question when they saw it first (inventory version VP) than when they saw it second (inventory version PV). This trend was also observed in the moderate/high effort group. This suggests that students who take the inventory more seriously may be more affected by the order of questions. All these trends should be investigated further since currently they can only be interpreted as noise.
Interview question | Choice provided by interviewee | Interviewee | Example of quote providing justification for choice | |||
---|---|---|---|---|---|---|
GCII (n = 7) | OCI (n = 12) | |||||
PV | VP | PV | VP | |||
Did you prefer seeing the [verbal/pictorial] question first? Did it matter to you? | Pictorial | 2 | 0 | 1 | 0 | “I thought it [P1] kind of helped to visualize the dissociatedness because you can tell the stronger ones and they tell you here that's undissociated and you know it's a weak acid. So yeah that kind of helped me to see [P1] first.” |
Verbal | 0 | 1 | 0 | 3 | “Yeah because [V] was kind of like a definition almost and that kind of thing and [the diagram] was kind of more applied so it built off of it… [having V first] made me more sure of my answers.” | |
No preference | 2 | 0 | 4 | 4 | “…they kind of work in a package where like no matter which order you put them in they all kind of influence the other one the ones that follow…” | |
N/A | 1 | 1 | 0 | 0 | ||
As you moved to each successive question, were you thinking about previous questions to help you, or were they just separated in your mind? | Yes | 2 | 1 | 1 | 3 | “I did think that [V] influenced my answer because I, if I wasn’t 100% certain on the behavior of the strong and weak acid, I leaned back on my answer for [V] to answer [P], so by choosing an answer here in [V], I carried that information forward to [P].” |
No | 2 | 1 | 4 | 4 | “I just kind of went through them. I didn’t really think about the other questions. I guess that's just kind of how I take tests.” | |
Subconscious | 1 | 1 | 1 | 1 | “I guess subconsciously it did [influence my answer], but like I wasn’t aware of it.” | |
N/A | 1 | 0 | 0 | 0 |
“Yeah because [V] was kind of like a definition almost and that kind of thing and [the diagram] was kind of more applied so it built off of it…[having V first] made me more sure of my answers.”
Three out of the seven students preferred seeing the pictorial question (P) first because it helped them “visualize what was going on with the strong and weak acids.” Interestingly, the benefits advanced by these students for these preferences did not always materialize. Indeed, three out of the seven students who claimed that seeing one question before the other helped them answer the second question provided incorrect answer on this latter question.
Several reasons were advanced by the 11 students who did not use the first question to answer the second question. Five indicated dealing with questions independently of each other when taking any kind of tests. Each question is thus treated by these students in isolation. Four believed that there may have been some kind of subconscious effect having seen similar questions previously but did not consciously use the first question. Three indicated that they felt they knew the concept well enough that they could answer the questions regardless of their order. Another three students felt that P and V worked together as a package and that it was helpful to see them next to each other regardless of which question came first.
From a practical perspective, the lack of ordering effects observed in this study indicate to instructors that generation of alternate test versions for use in the classroom should not favour a particular group of students. However, the implications are more cautionary for educational researchers using concept inventories as a research tool. This study indicates that specific questions on an inventory may exhibit an ordering effect and that this effect may be more prevalent across different types of students. Researchers planning to run question-level psychometric analyses on concept inventory responses should use particular care in generating differently-ordered inventory versions. At times, question order effects may disrupt analyses of validity, reliability, or difficulty of particular questions.
Some studies suggest that other factors besides question order can have as much or more impact on student performance (e.g., answer choice order, item difficulty) inasmuch that having several complex questions in a row could cause cognitive overload as the student proceeds to successive questions (Tellinghuisen and Sulikowski, 2008; Schroeder et al., 2012). There is thus a critical need to further study which factors matter most when developing assessment tools intended to measure students' learning as accurately as possible.
First, this study investigated the question order among second semester general chemistry and first semester organic chemistry students in the United States. It is possible that students enrolled in lower-level chemistry undergraduate courses did not show any question order effect with pictorial and verbal questions, but upper level students, students from other science disciplines, or students from other countries might experience this effect. We are currently conducting a similar study with students enrolled in lower level geoscience courses to test whether a disciplinary effect exists.
Second, students from the two different institutions took the concept inventories under different conditions and incentives. For example, the students at the Western university were required to take the inventory in person and on paper during recitation and were offered a certain number of points towards their grade for taking the inventory. On the other hand, the Midwestern students took the inventory online under voluntary circumstances and were offered extra points toward their laboratory or lecture grade at the instructor's discretion. These differences in testing conditions could have affected how the students performed overall on the inventory. We did attempt to control for these differences in implementation by separating out students who indicated providing a high effort in answering the inventory from those indicating providing moderate effort.
Third, the sample size of the Midwestern university study was not as large as the one from the Western university, which diminished the identification of statistically significant results and resulted in low statistical power. The lower response rate at the Midwestern university may be due to two factors: first, the voluntary nature of the study at the Midwestern school could have dissuaded students from investing extra time and effort if they did not feel that they needed the extra credit; data had to be cleaned extensively based on self-reported effort levels and usage of external resources. Second, since names were attached to the inventory collected at the Western university and the collection of inventory was conducted in class, students may have felt more compelled to take the task seriously.
Student population | Average score on the first eight questions | Significance | ||||
---|---|---|---|---|---|---|
Course | Student effort level | PV version | VP version | p-value | Eta-squared | 1 − β |
GCII post | Moderate/high effort | 3.91 | 4.19 | 0.298 | 0.0054 | 0.013 |
High effort | 4.17 | 4.34 | 0.588 | 0.0023 | 0.013 | |
OCI post | Moderate/high effort | 3.66 | 3.74 | 0.787 | 0.0005 | 0.013 |
High effort | 4.05 | 4.35 | 0.512 | 0.0056 | 0.013 |
Student population | Average score on the first eight questions | Significance | ||||
---|---|---|---|---|---|---|
Course | Student effort level | PV version | VP version | p-value | Eta-squared | 1 − β |
GCII post | Moderate/high effort | 5.17 | 5.24 | 0.610 | 0.0231 | 0.026 |
High effort | 5.19 | 5.43 | 0.156 | 0.0703 | 0.140 |
Student population | Average total score on the concept inventory (/18)a | Significance | ||||
---|---|---|---|---|---|---|
Course | Student effort level | PV version | VP version | p-value | Eta-squared | 1 − β |
a Two questions are not included in the calculations of the total score because they were not distributed to all courses because of a programming error in Qualtrics. | ||||||
GCII post | Moderate/high effort | 7.50 | 8.05 | 0.196 | 0.0083 | 0.013 |
High effort | 7.93 | 8.43 | 0.332 | 0.0073 | 0.013 | |
OCI post | Moderate/high effort | 6.90 | 7.12 | 0.652 | 0.0013 | 0.013 |
High effort | 7.74 | 8.38 | 0.405 | 0.0090 | 0.013 |
Student population | Average total score on the concept inventory (/20) | Significance | ||||
---|---|---|---|---|---|---|
Course | Student effort level | PV version | VP version | p-value | Eta-squared | 1 − β |
GCII post | Moderate/high effort | 10.49 | 9.90 | 0.0574 | 0.0860 | 0.285 |
High effort | 10.62 | 10.35 | 0.4318 | 0.0399 | 0.046 |
This journal is © The Royal Society of Chemistry 2017 |