Open Access Article
Alexander Krause
*a,
Sebastian Polarz
b,
Anett Hoppe
cd,
Ralph Ewerth
cde and
Andreas Nehring
a
aInstitute of Science Education, Leibniz Universität Hannover, Hannover, Germany. E-mail: krause@idn.uni-hannover.de
bInstitute of Inorganic Chemistry, Leibniz Universität Hannover, Hannover, Germany
cTIB – Leibniz Information Centre for Science and Technology, Hannover, Germany
dDepartment of Mathematics and Computer Science, Philipps-Universität Marburg & Hessian Center for Artificial Intelligence (hessian.AI), Marburg, Germany
eL3S Research Center, Leibniz Universität Hannover, Hannover, Germany
First published on 17th September 2025
Stoichiometry is a significant yet challenging topic in chemistry education. While extensive research has explored students’ conceptions, difficulties, and learning approaches, this study adopts a competency-based approach to introduce a new model defining three competency levels in stoichiometry. The stoichiometry competency level model (StoiCoLe model) offers a framework for evaluating students’ performance in algorithmic stoichiometry. To test the assumptions of the StoiCoLe model, a 40-item test was developed to measure and categorise student's competencies according to the model's levels. Using data from 289 students enrolled in an introductory chemistry course across three semesters, psychometric properties and model assumptions were analysed through Rasch-analysis and item processing times. The results indicate that there is sufficient psychometric reliability in the categorization of students according to the StoiCoLe model. However, both the item difficulty and the processing times are only partially consistent with the assumptions of the model and indicate an adaptation of the model. In line with prior studies, the majority of students exhibited lower competency levels. These findings are discussed in terms of how the competency-based approach can enhance relevant competencies and contribute to literature on chemistry education in stoichiometry.
Stamovlasis et al. (2004, 2005) demonstrated that the competency to solve algorithmic or conceptual tasks are two independent dimensions: the mathematical steps required to solve algorithmic problems are not acquired solely by understanding the concepts, nor is a conceptual understanding acquired by solving algorithmic problems (Nurrenbern and Pickering, 1987; Sawrey, 1990; Mason et al., 1997; Cracolice et al., 2008).
Nevertheless, the use of mathematics is essential in achieving accurate stoichiometric outcomes when planning reactions or interpreting laboratory results. In the context of industrial processes, precise stochiometric calculations can be of paramount importance. When large quantities of reactants are used and large amounts of energy are released, conceptual understanding alone is not sufficient, as deficits in algorithmic skills can lead to catastrophic consequences. Moreover, the competency to solve stoichiometric calculations constitutes the basis to solve algorithmic tasks in advanced areas such as acid–base reactions, kinetics, and chemical equilibrium. In particular, the transfer of algorithmic skills from familiar stoichiometric examples to novel quantitative challenges is critical in contemporary chemical research.
Considering the example of a complementary research method, the Thermogravimetric analysis (TGA) is used to monitor mass changes of a material sample under a controlled temperature programme and atmosphere (Saadatkhah et al., 2020). The mass loss data provides insights into the composition and decomposition pathways of the sample, which may otherwise remain largely unknown. However, accurate interpretation of a TGA curve requires a high level of stoichiometric competency. Thus, both algorithmic and conceptual tasks are essential, and proficient learners must be capable of solving both types effectively (Bodner, 1987; Frank et al., 1987; Smith et al., 2010).
Stoichiometry has been a focal point in chemistry education research for over forty years. A significant emphasis of past research has been on identifying specific learning difficulties in stoichiometry and providing explanations for these challenges (Dierks, 1981; Schmidt, 1990; Huddle and Pillay, 1996; BouJaoude and Barakat, 2000; Arasasingham et al., 2004; BouJaoude et al., 2004; Agung and Schwartz, 2007; Dahsah and Coll, 2008; Gulacar and Fynewever, 2010; Scott, 2012; Gulacar et al., 2013a, 2013b; Shadreck and Enunuwe, 2018). Building on this foundation, various methods and approaches have been developed to support learners in stoichiometry learning processes. These approaches range from visual representations, such as flow diagrams or matrixes (Tyndall, 1975; Kauffman, 1976; Berger, 1985; Cameron, 1985; Poole, 1986; Koch, 1995; Krieger, 1997; Olmsted, 1999; Ault, 2001; Wagner, 2001), visualisations and experientiality through models (Kashmar, 1997; Witzel, 2002; Molnar and Molnar-Hamvas, 2011; Ramesh et al., 2020) and analogies from everyday life (Umland, 1984; Cain, 1986; Haim, 2003) to experiments (Arlotto, 1974; Figueira et al., 1988; Martínez and Ibanez, 2020), mathematical methods (Garst, 1974; Tykodi, 1987; Mousavi, 2019; Carlson, 2022) problem-solving strategies (Schmidt, 1997; Schmidt and Jignéus, 2003; Hand et al., 2007; Okanlawon, 2008; Rosenberg et al., 2016; Gulacar et al., 2022) and new, especially digital teaching formats (Cotes and Cotuá, 2014; Gayeta, 2017; Nufus et al., 2020; Rasmawan, 2022).
In this paper, we contribute to the existing literature by applying a competency-based approach to the field of stoichiometry. We propose and test a model that delineates various competency levels. Based on the framework of algorithmic questions from Smith et al. (2010), the framework of sub-problems from Gulacar and Fynewever (2010) and the framework of the multidimensional analysis system from Dori and Hameiri (2003), our model is based on content knowledge on central stoichiometric concepts of algorithmic stoichiometry. It outlines the competency levels of learners and potential trajectories for their learning progression, particularly concerning complex, multi-step algorithmic tasks. This model offers a comprehensive perspective on the abilities students should develop and demonstrate as they advance in their competency in algorithmic stoichiometry. However, this means that the model does not allow any conclusions to be drawn about students' conceptual understanding of stoichiometry, as this would require items to capture concepts, such as free text explanations, multiple choice questions or drawings. In our approach, we measure the competency in algorithmic stoichiometry that relate to the outcome and the process-related application of the sub-competencies. The model therefore represents a significant synthesis of existing approaches and an essential advancement. Specifically, it refines certain sub-problems identified by Gulacar and Fynewever (2010) and structured them based on the framework of the multidimensional analysis system by Dori and Hameiri (2003). Our objective is to facilitate the development of effective instructional strategies and learning methods, as indicated above, through the precise delineation of (sub-)competencies. This endeavor also aims to address the research gap identified by Blömeke et al. (2015) concerning the definition and assessment of competencies in higher education for the particular topic of chemistry.
By modelling competency levels, our aim is to provide a framework for summarising and assessing the learning level and progress. This model facilitates the evaluation of competencies both at individual student level and across entire groups. Given the robust psychometric test results obtained in this study, the StoiCoLe model presents a promising avenue for assessing individual student competencies in the future.
Overall, the StoiCoLe model could contribute to a clear depiction of subsequent steps in competency development and, subsequently, assist in determining the types of support learners require to advance to higher competency levels, particularly in the algorithmic aspects of stoichiometry.
The competency classification in the dimension “interconnectedness” is based on the increasing interconnectedness of knowledge elements that are required for solving problems. As the complexity of stoichiometric problems increases, so does the need of interconnected knowledge elements to solve them. This increased number of elements is associated with increased mental demand, which contributes to an increase in task difficulty (Niaz, 1988, 1989).
This dimension is divided into two sub-dimensions (Commons et al., 1998; Vorholzer and von Aufschnaiter, 2020). The sub-dimension “number of elements to be connected” reflects that an increase in the required content – such as facts, relationships or concepts – leads to enhanced complexity (Kauertz et al., 2010; Vorholzer and von Aufschnaiter, 2020). The sub-dimension “connection character” attributes the increase in complexity to a change in the relationship between the elements. For instance, a shift from isolated facts to interconnected relationships represents a change in character, thereby increasing complexity. Furthermore, the way in which the elements are connected – whether unconnected, monocausal or interdependent – also represent a change in character (Vorholzer and von Aufschnaiter, 2020). Calculating the amount of substance, masses, volumes or other important properties of substances, requires connected and interdependent thinking.
The second dimension “cognitive processes” describes required cognitive processes (Schecker and Parchmann, 2006; Kauertz et al., 2010; Vorholzer and von Aufschnaiter, 2020). According to sequential processes of information processing defined by Weinstein and Mayer (1986), Kauertz et al. (2010), describe four cognitive processes of increasing complexity: reproducing, selecting, organising and integrating. This aligns well with the demands in stoichiometry: students must reproduce and select central concepts of stoichiometry as well as corresponding calculations (such as the stoichiometric equations). Also, they must organise and integrate the concepts and calculations according to the specific property to be calculated.
The identification of the competencies necessary for solving algorithmic problems in stoichiometry was based on the sub-problems defined by Gulacar and Fynewever (2010), which have been further refined for this study. In the StoiCoLe model, these essential competencies are termed “sub-competencies” and are split into three distinct categories: “Basic equations,” “Extension equations,” and “Reaction equations”. The first competency level is characterised by the ability to apply the sub-competencies within each category. For instance, demonstrating competency at this level includes the ability to determine the molar mass from a given chemical formula (see Fig. 1).
The following competency levels were defined on the basis of the dimension “interconnectedness” and “cognitive processes”. In accordance with the framework proposed by Vorholzer and von Aufschnaiter (2020), the task complexity related to the interconnectedness dimension is influenced by both the number of sub-competencies required to solve the task and the character of their connection.
The sub-dimension “connection character” is reflected in the fact that an understanding of the connections between the sub-competencies becomes necessary. Furthermore, the increasing interconnectedness changes the required cognitive processes. While simple stoichiometric tasks require one or the selection of a few sub-competencies, more complex tasks demand the selection and organisation of several sub-competencies. Consequently, learners must possess not only the requisite content knowledge and sub-competencies but also the procedural knowledge needed to effectively select and organise them. In the example task for the second competency level (see Fig. 2), four sub-competencies of two categories and their organisation are required.
In order to solve this task, students have to be able to
1. determine the molar mass of glucose (C6H12O6)
2. determine the amount of substance (C6H12O6)
3. set up the amount of substance ratio
4. determine the volume (CO2)
Additionally, students must integrate these sub-competencies into a coherent problem-solving process, which is often (mono)causal; thus, a lack of content knowledge or a missing sub-competency cannot be compensated for. For instance, determining the amount of substance of glucose is impossible without knowing how to determine molar mass. However, tasks can often be approached through various solution paths utilising different sub-competencies, depending on the available information and the learners' preferred methods. For example, tasks can be resolved using ratios (proportionality method) or different quantity equations and the amount of substance, such as the mole method (Schmidt and Jignéus, 2003). In the example task at the second competency level, the volume of carbon dioxide can be calculated either using the general gas equation or the molar volume. If the density were provided, a third approach would be possible.
The third competency level is distinguished by the requirement to select and organise an even greater number of sub-competencies across all categories to solve the problem effectively. This is illustrated by the example task provided for the third competency level (see Fig. 3).
With each competency level, stoichiometric tasks increasingly reflect real-world problems encountered in laboratory or industry settings. At each level, problem-solving necessitates that students select and organise a larger number of sub-competencies into coherent steps:
1. convert substance's names into chemical formulae
2. set up the reaction equation
3. balance the reaction equation
4. determine the molar mass of glucose (C6H12O6)
5. determine the amount of substance (C6H12O6)
6. set up the amount of substance ratio
7. determine the volume (CO2)
In summary, the StoiCoLe model is grounded in the idea that students demonstrate a higher stoichiometry competency as they progressively integrate sub-competencies into coherent solutions for increasingly complex stoichiometric problems. The model comprises three competency levels, as outlined in Fig. 4. According to this model, content knowledge forms the foundation for being competent in stoichiometry. This includes essential quantity equations, relevant stoichiometric quantities (e.g. mass, volume or amount of substance) and their associated units, as well as components of the chemical formula language (e.g. element symbols or chemical formula of substances with trivial names).
The first competency level contains the sub-competencies, which are divided into the three categories “Basic equation”, “Extension equations” and “Reaction equations”, with the exception of the sub-competency “Determine of amount of substance ratios”. Hence, students at the first competency level are therefore able to successfully apply the sub-competencies independently of each other. The sub-competency categories serve to form the second and third competency level.
Students at the second competency level are able to successfully use sub-competencies from two out of the three categories. This results in three competency categories for the second competency level: basic equation and extension equation, basic equation and reaction equation as well as extension equation and reaction equation.
Accordingly, at the third competency level, sub-competencies of all three categories are necessary to solve a stoichiometric task. Based on these considerations, we assume that reaching higher competency levels is more difficult for students. In addition, we assume an increase in processing time due to the higher task difficulty caused by the larger number of sub-competencies and their organisation.
Due to the broad subject area of stoichiometry, it must be clarified that this competency model “does not capture all attributes […] of the original [subject area of stoichiometry], but only those […] that appear [in this first approach as most] relevant” (Stachowiak, 1973, p. 132). For example, the limiting reagent, the percentage yield and the mathematical determination of chemical formulas were not taken into account in the StoiCoLe model.
1. What psychometric properties are revealed by a test designed to assess the stoichiometric competencies outlined in the StoiCoLe model?
2. To what extent do item difficulty and processing time confirm the competency levels of the StoiCoLe model?
3. At which levels of the StoiCoLe model do students demonstrate competency?
In total, the test comprises 40 items. To ensure that all students address at least one task at the third competency level, the sequence of items was adjusted after the first competency level. Specifically, the linear sequence was modified to an alternating pattern, where two items from the second competency level are followed by one item from the third competency level.
The implementation was carried out in accordance with local data protection regulations as required by the European Union (General Data Protection Regulation). Participation was voluntary and all students signed a consent form. Data collection was conducted as part of a regular classroom exercise, using the open-source online survey tool LimeSurvey (Limesurvey GmbH, 2003) in the first and second semester and a paper and pencil test in the third semester. By switching the data collection from digital to analogue format, we wanted to gain an additional insight into the processing steps and documentation of the students. Participants had to use a mobile device with internet access to carry out the test in LimeSurvey and a calculator to carry out the stoichiometric calculations in the first and second semester. In the third semester, participants only needed a pen and a calculator. All Participants were allowed to take handwritten notes. Other aids, such as recipe sheets or the use of the internet, were prohibited. The processing time was set at 105 minutes for the full stoichiometry test and 45 minutes for the shortened version. The processing time of individual items was only recorded via LimeSurvey in the first and second data collection.
| Factor | Sample | |
|---|---|---|
| Age | N = 280 | |
| M = 20.6 | SD = 2.6 | |
| Gender | N = 289 | |
| Male | 147 | (51.9%) |
| Female | 134 | (46.4%) |
| Divers | 3 | (1.0%) |
| No response | 5 | (1.7%) |
| Semester | N = 285 | |
| 1. Semester | 223 | (78.3%) |
| 3. Semester | 45 | (15.8%) |
| 5. and higher semester | 11 | (3.9%) |
| Study program | N = 289 | |
| Subject-related | 219 | (75.8%) |
| Bachelor of Chemistry | 126 | (43.6%) |
| Bachelor of Biochemistry | 76 | (26.3%) |
| Bachelor of Nanotechnology | 14 | (4,8%) |
| Interdisciplinary Bachelor of Science | 2 | (0.7%) |
| Bachelor of Physics with Chemistry | 1 | (0.4%) |
| Teaching-related | 68 | (23.5%) |
| Interdisciplinary Bachelor's degree | 57 | (19.7%) |
| Bachelor's degree in Technical Education | 9 | (3.1%) |
| Certificate program for a third subject | 2 | (0.7%) |
| Special program at the authors' university | 2 | (0.7%) |
Item parameters were estimated using the marginal maximum likelihood method (MML), while personal parameters and their reliability were determined through the Expected A Posteriori method (EAP). Parameter visualisation was done via a Wright Map, which displays the distribution of both person and item parameters, providing a comparative overview of the data (Wilson, 2023). The visualisation was created with the package WrightMap (Torres Irribarra and Freund, 2024).
In line with the StoiCoLe model's assumption, item difficulty is expected to increase across the competency levels. To verify this assumption, the mean item difficulties for each competency level were determined and compared. Furthermore, the students' processing time was measured for each item in the first and second year. According to the competency classification, differences in the mean processing times are expected to enhance between the content knowledge level and the competency levels. Therefore, the mean processing time was determined and further compared. To prevent distortion from items that were skipped, only items with a response or a processing time of more than 60 seconds (especially for items of the 2nd and 3rd competency level) were included in the analysis. The mean processing time was also determined and compared.
In order to verify whether the items can distinguish between different levels of difficulty, item reliability, item separation and item strata value were calculated (Wright and Masters, 1982). The separation and the strata value provide insights into how many distinct strata (groups) the continuously measured values (item and person parameters) can be divided into (Wright and Masters, 1982; Boone et al., 2014). While the separation value is suitable for large normal distributions, the strata values are suitable for distributions that are heavy-tailed and should include extreme performance levels (Bond et al., 2020). The person reliability, the person separation and the person strata value were also calculated in order to verify how many groups the sample of students can be divided into (Wright and Masters, 1982).
Finally, the competency level of each student was assigned based on their person parameters and the median of the item difficulty as the threshold value for the respective competency level. We have based the probability of success (65%) on other assessments, such as PISA, in order to determine with greater certainty whether learners can answer the items correctly (Boone et al., 2014). The median of item difficulty was chosen for this purpose because (a) it is less susceptible to outliers and (b) with person parameters at the median, at least half of the tasks at that competency level should be solvable (see Hartig, 2007).
The final model validation was approached through an argumentation-based approach (Cronbach and Meehl, 1955; Kane, 1992). This type of “validation is the process of gathering supporting evidence for the intended test score interpretations” and checking it (Hartig et al., 2020, p. 535). Consequently, the validity of test score interpretations was evaluated based on the discussed arguments related to item difficulty and processing time.
| Model features | Test values |
|---|---|
| Person reliability | 0.78 |
| Observed variance | 1.39 |
| Separation index | 1.88 |
| Number of person strata | 2.83 |
| Person parameters | |
| Max | 3.12 |
| Min | −3.78 |
| Average | 0.00 |
| SD | 1.04 |
| SE | 0.53 |
| Item reliability | 0.99 |
| Observed variance | 3.06 |
| Separation index | 8.24 |
| Number of item strata | 11.32 |
| Task parameters (65%) | |
| Max. | 3.83 |
| Min. | −3.07 |
| Average | 0.06 |
| SD | 1.74 |
| SE | 0.20 |
| Item | Responses | Infit | Outfit | Item | Responses | Infit | Outfit | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| MNSQ | ZSTD | MNSQ | ZSTD | MNSQ | ZSTD | MNSQ | ZSTD | ||||
| Note: Item with infit and outfit values outside the range between 0.7 and 1.3 are shown in bold. In addition, the items with a standardized z-score outside the range between −2 and 2 are shown in bold. | |||||||||||
| 1 | 185 | 0.89 | −0.71 | 0.92 | −0.55 | 21 | 219 | 1.06 | 0.95 | 1.18 | 2.55 |
| 2 | 220 | 1.00 | 0.07 | 1.09 | 0.76 | 22 | 228 | 1.00 | −0.03 | 1.02 | 0.21 |
| 3 | 185 | 1.00 | 0.05 | 0.96 | −0.16 | 23 | 185 | 0.88 | −0.86 | 0.81 | −1.52 |
| 4 | 223 | 1.14 | 1.98 | 1.28 | 3.78 | 24 | 184 | 0.86 | −1.76 | 0.78 | −2.86 |
| 5 | 220 | 0.95 | −0.25 | 0.81 | −1.23 | 25 | 208 | 0.94 | −0.64 | 0.94 | −0.63 |
| 6 | 214 | 0.97 | −0.30 | 0.99 | −0.09 | 26 | 210 | 1.10 | 1.38 | 1.15 | 1.99 |
| 7 | 220 | 1.07 | 0.83 | 1.05 | 0.59 | 27 | 184 | 0.99 | −0.36 | 1.00 | 0.06 |
| 8 | 206 | 1.14 | 2.11 | 1.32 | 4.36 | 28 | 184 | 0.98 | −0.19 | 1.11 | 0.97 |
| 9 | 218 | 1.00 | 0.01 | 0.91 | −0.88 | 29 | 219 | 0.96 | −0.57 | 0.89 | −1.48 |
| 10 | 237 | 1.02 | 0.17 | 1.33 | 1.95 | 30 | 203 | 1.08 | 1.15 | 1.15 | 2.17 |
| 11 | 185 | 1.01 | 0.11 | 1.38 | 1.23 | 31 | 183 | 1.08 | 0.65 | 1.27 | 1.91 |
| 12 | 185 | 1.17 | 2.16 | 1.42 | 4.94 | 32 | 282 | 0.90 | −1.60 | 0.83 | −2.86 |
| 13 | 213 | 0.97 | −0.43 | 0.95 | −0.74 | 33 | 183 | 0.92 | −0.52 | 0.71 | −2.05 |
| 14 | 215 | 1.08 | 0.85 | 1.24 | 2.43 | 34 | 153 | 0.89 | −1.14 | 0.71 | −3.15 |
| 15 | 185 | 1.01 | 0.16 | 1.05 | 0.55 | 35 | 145 | 1.02 | 0.19 | 1.08 | 0.71 |
| 16 | 220 | 0.99 | −0.09 | 1.01 | 0.18 | 36 | 108 | 1.01 | −0.13 | 0.75 | −1.26 |
| 17 | 185 | 1.02 | 0.28 | 1.04 | 0.61 | 37 | 183 | 0.89 | −0.83 | 0.72 | −2.40 |
| 18 | 185 | 0.96 | −0.36 | 1.02 | 0.21 | 38 | 132 | 0.85 | −0.48 | 0.51 | −2.13 |
| 19 | 207 | 0.94 | −0.48 | 1.02 | 0.14 | 39 | 193 | 0.92 | −0.35 | 0.66 | −1.91 |
| 20 | 185 | 0.87 | −1.70 | 0.78 | −3.07 | 40 | 122 | 0.98 | −0.18 | 0.96 | −0.45 |
| Level | Items | Item difficulty (50%) | Item difficulty (65%) | Processing time [s] | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| n | M | Mdn | SD | M | Mdn | SD | M | Mdn | SD | |
| Content knowledge level | 9 | −1.82 | −1.88 | 0.86 | −1.20 | −1.26 | 0.86 | 41.2 | 37.6 | 14.7 |
| 1st competency level | 22 | −1.07 | −1.13 | 1.17 | −0.45 | −0.50 | 1.17 | 115.7 | 99.5 | 76.5 |
| 2nd competency level | 5 | 1.72 | 1.50 | 0.66 | 2.34 | 2.12 | 0.66 | 295.4 | 286.0 | 86.4 |
| 3rd competency level | 4 | 2.25 | 2.49 | 1.11 | 2.87 | 3.10 | 1.11 | 418.3 | 431.8 | 166.2 |
post hoc tests using a Bonferroni correction showed no significant difference in item difficulty and processing time between the content knowledge level and the first competency level as well as the second and third competency levels (see Table 5).
| Level | Item difficulty | Processing time | ||||
|---|---|---|---|---|---|---|
| 1st competency level | 2nd competency level | 3rd competency level | 1st competency level | 2nd competency level | 3rd competency level | |
| Content knowledge level | 0.49 | 0.001 | 0.001 | 0.16 | >0.001 | >0.001 |
| 1st competency level | — | 0.001 | 0.001 | — | >0.001 | >0.001 |
| 2nd competency level | — | — | 1.00 | — | — | 0.18 |
| Level | Success probability (50%) | Success probability (65%) | ||||
|---|---|---|---|---|---|---|
| Mdn | Number of students | Achieved [%] | Mdn | Number of students | Achieved [%] | |
| Content knowledge level | −1.88 | 278 | [96.2] | −1.26 | 250 | [86.5] |
| 1st competency level | −1.13 | 247 | [85.5] | −0.50 | 205 | [70.9] |
| 2nd competency level | 1.50 | 16 | [5.5] | 2.12 | 5 | [1.7] |
| 3rd competency level | 2.49 | 2 | [<0.01] | 3.10 | 1 | [<0.01] |
The distribution of the competency achieved by the students (person parameters) and the item difficulty (item parameters) with a 65% success probability is represented in a Wright Map (see Fig. 7).
Rasch analysis indicated that the collected data exhibit adequate psychometric properties. The evaluation was based on accepted infit and outfit value ranges, i.e. the tolerated deviation of the responses from the Rasch model. The value range depends on the test type, the sample size and the desired measurement accuracy (Bond et al., 2020). Item fit values for the mean square (MNSQ) infit and outfit values in a range of 0.7 to 1.3 are acceptable (Wright and Linacre, 1994; Bond et al., 2020). The infit values for all items are within this acceptable range and thus show that the items have a good fit for people in the medium ability range. In contrast, there are six mean square outfit values outside the acceptable range, indicating that there are outliers within items when considering all students that require closer examination.
The items that show an underfit of the model are item 8, 10, 11 and 12. According to this, some students with competencies on a higher level show difficulties in rather easy tasks, such as naming the quantity equation of amount of substance and number of particles (N = n × NA), determining the molar mass and determining the chemical formula.
The difficulties with the quantity equation between amount of substance and number of particles is probably due to the fact that this quantity equation is not or rarely used in stoichiometry due to the amount of substance as a measure of the number of particles. This relationship relevant to an understanding of the mole concept poses a difficulty for learners and is criticized for its lack of explication (Steiner, 1986; Karp, 1988). In the case of the determining of the molar mass, it is possible that some careless mistakes were made. Determining chemical formulas requires content knowledge, depending on whether it is a molecular or ionic compound. While both types of compounds require knowledge of the symbols associated with the elements, molecular compounds require knowledge of the Greek and Latin prefixes and the numbers they represent. For ionic compounds, the charge numbers of the atomic and molecular ions as well as the composition of molecular ions based on the name are necessary, e.g., for the sulphate ion (SO42−). This mirrors the difficulties with chemical formula language described by Gulacar and Fynewever (2010) and Taskin and Bernholt (2014).
The items that show an overfit of the model and thus a significantly lower variance than expected from the Rasch model are item 38 and 39 (3rd competency level). However, it should be borne in mind that the majority of students of the sample are at the content knowledge and 1st competency level and the group of students who are at the 2nd and 3rd competency level and for whom a distinction would be less strict is significantly smaller. If students are unable to solve tasks at a lower level, it is highly likely that they will also be unable to solve tasks at a higher level. Against this background, the relevance of the items for the StoiCoLe model and the generally good infit values, we see no reason to remove items from the test.
However, we revised the items with regard to the standardized z-value (ZSTD) of the infit and outfit, which indicates how likely the misfit of the Rasch model is (Bond et al., 2020). Standardized z-values within the acceptable range of −2 to 2 can be neglected (Boone et al., 2014; Bond et al., 2020). While the mean square values show a decreasing deviation with increasing sample size, the standardized z-values show a significant deviation. Against this background, it is worth looking at the deviating standardized z-values and checking the associated items (Bond et al., 2020). A revised version of the stoichiometry test contains adaptations of the conspicuous items (see ESI†). These include item 3, as hydrogen peroxide (H2O2) is not, strictly speaking, a trivial name, item 14 and 15, which do not clearly reflect the sub-competency “Setting up the amount of substance ratio”. Furthermore, item 35 was revised as it was possibly problematic because the reaction equation was not explicitly addressed.
The observed variance in person parameters indicates at a first glance that the test effectively differentiates between students. A closer look at the person reliability, the person separation index and the number of person strata indicate the existence of three different person ability levels (Fisher, 1992; Boone et al., 2014). This might contradict the assumption of the StoiCoLE model with four levels and will be discussed later with regard to model adjustments. In contrast, item reliability, the item separation index and the number of item strata show that the items are capable of depicting and distinguishing between different levels of difficulty. However, item reliability should be viewed critically in view of the large sample size (Bond et al., 2020).
Both item difficulty and processing time were utilised to validate the model assumptions. Notably, processing time partially confirms the competency level assumptions, although no significant difference was observed between the content knowledge level and the first competency level as well as between the second and third competency level. Additionally, some items, such as “Balance the reaction equation” (1st competency level) and item 33 (2nd competency level), had processing time differences of Δ399.5 seconds and Δ435.3 seconds, respectively, which were significantly longer than those for most items at the respective competency levels.
Item difficulty also partially supports the StoiCoLe model. Specifically, within the content knowledge level and the 1st competency level, there is substantial variance and considerable overlap among the items. The notably higher item difficulty in the content knowledge level suggests that essential content, such as stoichiometric quantity equations, may not have been adequately learned or internalised. The differences between the sub-competencies are consistent with Gulacar and Fynewever (2010), who demonstrated that sub-problems in stoichiometry vary in difficulty. For example, the sub-competency “Determine the molar mass” has a notably low item difficulty (M = −3.17), whereas “Set up the reaction equation” has an item difficulty (M = 0.76) comparable to that of items at the 2nd and 3rd competency levels. However, it must be pointed out that the chemical formulas were not given in the two tasks for setting up the reaction equation and therefore the sub-competency “Determining the chemical formula” was also required for the solution. These items were, as already mentioned, adapted in the revised version of the stoichiometry test. Furthermore, there is considerable variance among items within the same sub-competency, such as “Determine the oxidation number” (Δ2.71) and “Balance the reaction equation” (Δ2.52). This variance between and in the sub-competencies has a corresponding effect on the difficulty of the higher competency levels, regardless of the number of sub-competencies and cognitive processes. This interaction is discussed later in relation to an adaptation of the StoiCoLe model.
There are three possible explanations for the high variance in item difficulty within the sub competencies. Firstly, it is likely that various factors influence item difficulty within a given sub-competency. Regarding the sub-competency “Balance the reaction equation”, Segedinac et al. (2018) have shown that balancing a reaction equation becomes more challenging with an increasing number of substances or when forming a compound from elements obtained from different sources. Accordingly, other influential factors may affect item difficulty across different sub-competencies. In the sub-competency “Determine of oxidation numbers”, increased difficulty could result from factors such as a higher number of different atoms, the presence of the same atoms with multiple oxidation states, or the type of chemical representation used (chemical formula item 28; versus structural formula item 29).
Second, the variation in item difficulty may be attributed to differences in item type and the approaches or methods employed by learners. For example, there are four different methods that can be applied to balance reaction equations, including trial and error or inspection, algebraic, oxidation number and half-reaction method (Herndon, 1997). While for small numbers of reactants and products (e.g. item 31 or 40) a trial-and-error approach (i.e. inspection method) might be the most effective, for larger numbers (e.g. item 30 or 38), the algebraic, oxidation number or half-reaction method might be more effective. Supporting evidence for varying effectiveness of these methods is provided by of Charnock (2016) and Chibuye and Mupela (2019), who compare the inspection and algebraic methods in terms of their effect on performance and indicate that students taught the algebraic method performed significantly better. On the contrary, Kolb (1981), Tóth (1997) and Guo (1997) were able to show that even complex chemical equations can be solved using a systematical inspection method. In summary, the methods are sometimes described as superior or inferior (see Yde, 1989; Herndon, 1997). In contrast, Yalman (1959) and Kolb (1979) assert that there is no single superior method and that only the outcome is important, regardless of whether one, several or a combination of methods is used. We propose that the variation in item difficulty within the sub-competency “Balance the reaction equation” may be attributed to the use of unsystematical trial-and-error approaches by students, rather than systematic inspection or other systematic balancing methods. This idea is supported by the observed lower item difficulty items with reactions involving fewer reactants and products, as well as the longer processing times required for more complex reactions.
Third, competencies are acquired on concrete chemical content. Consequently, task difficulty may depend on two factors: whether the competency was acquired using the same content (Walpuski and Ropohl, 2014) and the extent to which learning transfer, including reasoning processes such as analogical reasoning, is involved (Alfieri et al., 2013). Future research could explore the aspects of learning transfer and its impact on stoichiometric competencies. It is possible that the sub-competency “Determine the oxidation number” was acquired based on potassium permanganate (item 28) as a learning content, resulting in lower item difficulty for this content. For a competency assessment, it is therefore important that students complete several items of a competency (level) in order to prevent misattribution of their abilities. However, accommodating the necessary time poses a challenge. To address this, we included at least two tasks per sub-competency or level in this study. In the future, further improvements should be made to shorten the stoichiometry test, for example by using a booklet design, as was already done in the third data collection.
Between the 1st and 2nd competency level, the results indicate a clear gradation between the majority of the items, aligning with the model assumptions. In particular, the item difficulties at the higher competency levels illustrate that, in addition to the increasing number of sub-competencies, the change in cognitive processes from the reproduction and application of sub-competencies to the selection and organisation of these to solve a stoichiometric problem could be the decisive factor causing difficulties, as otherwise significantly more students would have reached the higher competency levels.
However, there is an overlap between items of the 2nd and 3rd competency level, suggesting a lack of clear progression and contradicting the model's predictions. In this context, it should be taken into account that the number of students who have the ability to solve items of the 2nd and 3rd competency level with a 65% success probability lies under 2% in the sample. A better differentiation between the 2nd and 3rd competency levels would therefore have required more students being competent on these levels. Nevertheless, clear differences can be observed in the items of the two competency levels. With regard to the second competency level, items with provided or simple reaction equation result in lower item difficulty. This result is in line with observations from Tang et al. (2014) and Schmidt (1990) that stoichiometric tasks with given chemical formula or reaction equation are easier. In addition, item 33 and 38 required the transfer of a systematic substance name, such as iron(II,III)-oxide into the corresponding chemical formula. Based on the students' responses, we can confirm the difficulties in dealing with chemical formula language described by Gulacar and Fynewever (2010) and Taskin and Bernholt (2014). On the other hand, items that required connecting stoichiometric quantity equations generally exhibited lower item difficulty. The minimal variation in item difficulty among applying stoichiometric quantity equations at the 1st competency level may reflect students' difficulties with some mathematical operations (Lazonby et al., 1982; Scott, 2012).
On the 3rd competency level, especially item 40 displayed a particularly lower item difficulty compared to the other items. This is likely because item 40 featured an easier reaction equation, as suggested by Segedinac et al. (2018), relative to the other items at this level.
The result confirms the high dependence of the success rate of algorithmic tasks on the type, as shown by Stamovlasis et al. (2004, 2005). While simpler algorithmic tasks—comparable to sub-competencies of the 1st competency level in our framework—show high success rates, more demanding algorithmic tasks, which align with higher competency levels, exhibit lower success rates. This challenges the notion that algorithmic tasks generally have a higher success rate compared to conceptual tasks, as reported in several studies (Nurrenbern and Pickering, 1987; Sawrey, 1990; Mason et al., 1997; Cracolice et al., 2008). Algorithmic tasks typically have higher success rates for two main reasons. First, traditional chemistry education has emphasised these types of problems. Second, students often memorise solution paths without fully understanding the underlying chemical concepts (Lythcott, 1990; Pickering, 1990; Nakhleh, 1993; Smith and Metz, 1996; Smith et al., 2010). Although Stamovlasis et al. (2004, 2005) differentiate between conceptual and algorithmic competencies and a high algorithmic competency not necessarily indicating high conceptual understanding, there is evidence that conceptual understanding positively influences algorithmic problem-solving abilities (Niaz, 1989, 1995a, 1995b; BouJaoude and Barakat, 2003). This is shown by the fact that conceptual learners solve algorithmic problems correctly and with fewer steps (Mason and Crawley, 1994; BouJaoude and Barakat, 2003). We agree that the primary goal in chemistry education is to build conceptual understanding, from which algorithms should be derived (Bodner, 1987; Niaz, 1989; BouJaoude and Barakat, 2000; Arasasingham et al., 2004). However, this also means that both algorithmic and conceptual tasks are important and competent learners should be able to solve both types of tasks (Bodner, 1987; Frank et al., 1987; Smith et al., 2010). The difficulties students face in stoichiometry, as identified in other studies (Dierks, 1981; Schmidt, 1990; Huddle and Pillay, 1996; Boujaoude and Barakat, 2000, 2003; Arasasingham et al., 2004; Dahsah and Coll, 2008; Shadreck and Enunuwe, 2018; Rosa et al., 2022) emphasise the need for improved stoichiometric instruction, even for algorithmic tasks.
As described in other fields of competency-based research (Neumann, 2020; Krell et al., 2022), locating our work in existing literature shows that linking students’ cognitive processes to their learning results, as measured in our stoichiometry test, remains a subject of future research. Specific questions like “Which reasoning processes are used to be “competent” on a certain level of the StoiCoLe model?”, “To which extent is transfer of learning need to become more competent?” “Which strategies are used by competent learners?”, “What are concrete difficulties leading to not reaching a competency level?” or “Which learning processes have to be initiated in order to reach the next competency level?” arise in this context. While the estimated person parameters via the test answers (item output) make it possible to locate the learners in the model and identify learning difficulties and learning goals, the analysis of processes with the help of learning analytics represents an approach for future research.
1. the high variance within the sub-problems.
2. the overlap between the competency levels.
In view of the high variance within the sub-competencies, one approach would be to further differentiate the sub-competencies, but this would also increase the complexity of the StoiCoLe model and make it more difficult for learners and teachers to use. A sensible balance needs to be struck here.
In particular, the overlap between the second and third competency levels suggests that there is no clear gradation. However, this finding should be interpreted cautiously, as it may be influenced by the small number of students, who were presumably more competent, that engaged with items at these higher levels. A notable issue is the higher difficulty of tasks requiring the sub competencies “Determining the chemical formula” and “Balancing the reaction equation” of the Reaction equation category. Conversely, tasks that required establishing connections between stoichiometric equations were found to be less difficult.
Based on this, the categories “Basic equation” and “Expansion equation” of the 1st competency level should be combined in a new category for mathematical–chemical calculations in the future StoiCoLe model. This category should also include the mathematical determination of the ratio formula and sum formula. Consequently, students at the 2nd competency level would be able to solve stoichiometric tasks using only sub-competencies from the mathematical–chemical category. Considering the clear difficulty of the tasks in the category “Reaction equation”, the 3rd competency level should be characterized by the students being able to solve stoichiometric problems using sub-competencies from the mathematical–chemical category and sub-competencies from the “Reaction equation” category. Additionally, incorporating limiting reagent and percentage yield as advanced competency levels should be considered in future version of the StoiCoLe model.
1. Using the StoiCoLe model, educators can make a priori assumptions about the difficulty of stoichiometric tasks.
2. It maps the terrain of algorithmic competencies in this topic: it helps educators to define goals of instruction based upon the different competency levels. It can also be used for designing a curriculum across study programs aiming at highest competency level as final stage of competency development.
3. Compared to the framework of sub-problems from Gulacar and Fynewever (2010) and the framework of the multidimensional analysis system from Dori and Hameiri (2003), our model can provide learners and teachers with a more comprehensible orientation and concretise learning objectives. Here are some examples: locating students on a concrete competency level gives content-oriented implications on what students are able to carry out in stoichiometry.
• Learners who fall below the content knowledge level need to focus on fundamental stoichiometric content, such as stoichiometric quantity equations, units, and the chemical formula language.
• Learners who have not achieved the 1st competency level struggle with sub-competencies essential for progressing to higher levels.
• Learners who have not achieved the 2nd or 3rd competency level struggle with selecting and organising the correct sub-competencies. Additionally, it can point out difficulties with sub competencies, too.
Due to the interconnected and hierarchical structure of stoichiometric tasks, mastering these individual sub-competencies is crucial for advancement and cannot be bypassed. This approach highlights the significant challenge in teaching stoichiometry: ensuring that foundational knowledge and sub-competencies are thoroughly mastered and integrated before advancing to more complex tasks.
For the use of the stoichiometry test instrument, we recommend the use of a shortened version of the revised stoichiometry test instrument with regard to the effective use of time. However, one must be aware that this reduces the reliability of the location in the StoiCoLe model. Further development of the stoichiometry test instrument by improving validity and reliability according to Arjoon et al. (2013) and Lazenby et al. (2023) should be considered in future research. To search for existing instruments, we can recommend the Chemistry Instrument Review and Assessment Library (CHIRAL) (Barbera et al., 2022).
In the field of chemistry education research, we see–in line with existing literature – great potential in the use of Item Response Theory approaches, such as the Rasch model, to test a priori assumptions of competency levels or learning progressions (Boone et al., 2014). This procedure allows us to identify deviations from the model assumptions, which can then be investigated in more detail, for instance as part of a qualitative study. In the absence of any a priori assumptions, it is possible to conduct an explorative Rasch analysis (see for example Taskin et al., 2015). Furthermore, Rasch analysis techniques can and should be used to determine and improve the quality of assessment instruments such as for concept inventories (see Barbera, 2013). Especially since the Rasch-based assessment evaluation, like the learning gain, is direct, comprehensible and interpretable (Pentecost and Barbera, 2013). In case of concept inventory evaluation, we see the potential to re-analyze existing test developments via rapid scaling to see if there are any thresholds in the response that indicate levels or fundamental advances in understanding. The idea of threshold concepts (e.g., Meyer, 2016) suggests that there may be concepts that allow easy items to be answered correctly (which have no reference to threshold concepts) and then difficult items are only answered once a threshold concept (e.g., the chemical equilibrium in the context of understanding chemical reaction) has been mastered. It may be possible to prove the existence of specific threshold concepts through post hoc Rasch analysis.
65::AID-TEA4
3.0.CO;2-N.
905::AID-TEA5
3.0.CO;2-Y.
:
12
:
14
:
16 to Teach the Concept of Mole and Associated Stoichiometric Relationships, J. Chem. Educ., 97(4), 986–991 DOI:10.1021/ACS.JCHEMED.9B00665.Footnote |
| † Electronic supplementary information (ESI) available: Stoichiometry test instrument, results of the stoichiometry test, revised stoichiometry test instrument. See DOI: https://doi.org/10.1039/d5rp00077g |
| This journal is © The Royal Society of Chemistry 2025 |