Philip
Nahlik
* and
Patrick L.
Daubenmire
Department of Chemistry and Biochemistry, Loyola University Chicago, Chicago, IL 60660, USA. E-mail: pnahlik@luc.edu
First published on 23rd May 2022
A method is adapted for calculating two measures of entropy for gaze transitions to summarize and statistically compare eye-tracking data. A review of related eye-tracking studies sets the context for this approach. We argue that entropy analysis captures rich data that allows for robust statistical comparisons and can be used for more subtle distinctions between groups of individuals, expanding the scope and potential for eye-tracking applications and complementing other analysis methods. Results from two chemistry education studies help to illuminate this argument and areas for further research. The first experiment compared the viewing patterns of twenty-five undergraduate students and seven instructors across word problems of general chemistry topics. The second experiment compared viewing patterns for eighteen undergraduate students divided into three intervention groups with a pre- and post-test of five problems involving periodic trends. Entropy analysis of the data from these two experiments revealed significant differences between types of questions and groups of participants that complement both visualization techniques like heat maps and quantitative analysis methods like fixation counts. Finally, we suggest several considerations for other science education researchers to standardize entropy analyses including normalizing entropy terms, choosing between collapsed sequences or transitions within areas of interest, and noting if fixations in blank spaces are included in the analysis. These results and discussion help to make this powerful analysis technique more accessible and valuable for eye-tracking work in the field of science education research.
Eye-tracking methods have been developed for and applied to a wide variety of areas, including interpreting mechanical diagrams (Graesser et al., 2005), completing repetitive visual tasks (Manelis and Reder, 2012), grouping flight paths by air traffic controllers (Kang and Landry, 2015), learning Kanji symbols (Roderer and Roebers, 2014), solving arithmetic word problems (Hegarty et al., 1995), physiological responses to viewing garden scenes (Liu, et al., 2020), and using algebraic equations (Susac et al., 2014).
Several key terms from these studies will help make sense of our own research. Early work began to describe eye behavior as a combination of fixations—focused attention on an individual point or area—and saccades—movements between points during which information is not perceived (Rayner, 1998). Scanpaths are one method of describing the sequence of saccades across visual data (Kang and Landry, 2015; Day et al., 2018; Peysakhovich and Hurter, 2018). More recent analyses have considered researcher-defined areas of interest (AOIs) and the transitions or collective saccades between them (Kang and Landry, 2015; Susac et al., 2014). The various ways of describing eye movement data allow for statistical analyses that differ widely between applications. We will take a more focused look at science education research to frame the purpose of our study.
Several Chemistry Education studies show the breadth of eye-tracking methods within a single discipline. Topczewski et al. (2017) used pattern analysis of AOIs to study how students with varying levels of organic chemistry experience worked through NMR problems. Cullipher and Sevian (2015) similarly presented students with spectral data and compared the sequences of their fixations with their verbal explanations of the problem. Stieff et al. (2011) relied on fixation duration and diagrams of transitions between AOIs to study which type of representations students use to answer questions about bonding and molecular shape. Cook et al. (2008) worked with a large group of high school students to study their eye gaze patterns across micro and macro-representations of molecular diffusion scenarios. Tang and Pienta (2012) investigated the complexity of gas law problems via eye-tracking to show differences between successful and unsuccessful problem-solvers. Tang et al. (2014) showed how eye-tracking can be used to help develop stoichiometry problems of varying difficulty by changing complexity factors like the number format. Chen et al. (2015) studied how teaching with dynamic or static representations of orbital diagrams affected students’ eye fixations patterns based on their spatial abilities. Williamson et al. (2013) studied student fixation data in their use of ball-and-stick or electrostatic potential images to answer questions about molecular features like charge or electron density. Nehring and Busch (2018) studied differences in fixation counts and durations for video demonstrations using different Gestalt principles for instrumentation arrangements.
In a recent example, Rodemer et al. (2020) introduced the fixation/transition ratio to distinguish viewing behavior for advanced and beginner chemistry students. In their analysis, a lower ratio indicated more comparative viewing of AOIs, and a higher ratio indicated more focused viewing. This ratio allows for more sophisticated comparisons such as a time-based analysis to distinguish stages of problem solving.
Another potential solution has been developed by Krzyzstof Krejtz and colleagues (Kreitz et al., 2014, 2015, 2016). Their entropy analyses treat eye-tracking data as Markov chains (Ciuperca and Girardin, 2007), which test the extent to which sequences of gaze transitions are ordered or random. Two statistical terms can be assigned based on these analyses to describe the randomness or entropy of these transition sequences. Fig. 1 presents a graphical representation of these two terms for dramatized cases of transitions between four AOIs.
![]() | ||
Fig. 1 Illustration of entropy terms with example AOIs (shown as gray boxes) and the transitions (black arrows) within or between them. |
Transition entropy (Ht) correlates with the randomness of the sequence of transitions between visual data. A lower Ht indicates a more predictable, less random, series of transitions between AOIs. For example, the two cases on the left side of Fig. 1 show transitions between AOIs that are highly-structured and predictable, which therefore have low Ht. Stationary entropy (Hs) correlates with the distribution of attention or fixations across the AOIs on a display. A higher Hs indicates more equal distribution of attention across AOIs. For example, the two cases at the top of Fig. 1 show more equally-distributed attention across the four AOIs and, therefore, high Hs as might be expected for visual images with equally interesting or useful AOIs. Considered together, Ht and Hs can quickly summarize complex viewing behaviors and provide another option for statistical comparisons.
A recent set of publications have employed this type of entropy analysis to study strings of AOI transitions. Day et al. (2018) describe their use of scanpath analysis and transition entropy calculations to distinguish significant differences between groups of students. Tang and Pienta (2018) provide a detailed explanation of these types of analyses for chemistry education researchers. A related article from Tang et al. (2018) gives a more technical description of the GrpString package in R that can facilitate the entropy calculations. This type of pattern analysis helps to capture more detail about the way participants view problems, yet it relies on various methods of collapsing the data into isolated patterns of transitions, losing the overall view of fixations and heat maps.
Of course, no single analysis will capture all nuances of eye-tracking data, but we argue that stationary and transition entropy terms can be a powerful option for Science Education researchers to study and describe subtle differences between groups. Next, we consider two example studies to illustrate how entropy analyses might be used.
1. Will entropy measures help reveal statistically significant differences between participants or groups across types of questions, placement of AOIs, pre- and post-tests, or major?
2. If so, what additional insights does entropy analysis provide compared to visualization techniques like heat maps and quantitative analysis techniques like fixation counts?
We then use the results of this study to comment on the kinds of information that can be described by entropy analysis and some potential future applications of entropy analysis in science education research.
Because these projects involved human subjects, Institutional Review Board approval was obtained through Loyola University Chicago for both projects before any data collection began (application numbers #1483 and #3905 for experiment 1 and 2 respectively). Informed consent was obtained from each participant with a signed form at the beginning of each session. All procedures and data were handled in compliance with the approved protocols.
For experiment 1, twenty-five students total volunteered from a second-semester general chemistry course. As compensation, they were entered into a raffle to win a restaurant gift card. Seven instructors who taught general chemistry (n = 5) or Advanced Placement chemistry (n = 2) within the past five years were also recruited to participate. After calibrating the eye-tracker on the computer monitor, each participant was presented with four word-problems involving general chemistry topics such as calculating concentrations and bond enthalpy. They were asked to think-aloud as they solved each problem (Bowen, 1994), and a researcher took notes to record their answers. The data were collected in two separate rounds over two years. The first round, with 3 instructors and 15 students, was designed with four single-answer problems, while the second round, with 4 additional instructors and 10 additional students, included the first two single-answer problems along with two additional open-ended problems.
For experiment 2, eighteen students were recruited from introductory science classes or by reference from other participants. Nine of the students were science majors, and nine were non-science majors. A computer randomizer placed participants into three separate groups with three majors and three non-majors in each group. Each group was provided with a set of video modules about periodic trends and a corresponding set of diagrams to use when solving a set of problems. Group 1 was only provided with a standard periodic table. Group 2 was provided a standard periodic table and a color-coded periodic table with no instruction on it. Group 3 was provided a standard periodic table, color-coded periodic table with instruction about its use, and an electron shell diagram. A 30 minute pre- and post-test of five questions was given to each participant while their eye movements were recorded. The first three questions focused on using periodic trends in general chemistry, and the last two questions focused on using the trends in organic chemistry. Two example slides are shown in Fig. 2. The students were asked to watch their assigned videos before returning for the post-test up to two months later. During the pre- and post-tests, participants were told to keep their eyes on the screen, read the question in their head, and verbally explain their answer to the question. Every answer was written down by a researcher for reference.
First, AOIs are defined for each slide (Fig. 3 and 4), either equally dividing the slide with a grid overlay or intentionally placing them through semantic designation (Holmqvist et al., 2011; Day et al., 2018). Second, transition matrices are calculated for each participant and each slide using a software like OGAMA (Voßkühler et al., 2008). Sample data are shown in Table 1 for an instructor participant from experiment 1 with transitions n counted across three AOIs (A, B, and C) placed via grid overlay on question 4 (the left side of Fig. 4).
![]() | ||
Fig. 3 Example slides for experiment 1 showing question 1 with AOIs outlined in purple and placed via grid overlay (left image) or semantic designation (right image). |
![]() | ||
Fig. 4 Example slides from experiment 1 showing question 4 with AOIs outlined in purple and placed via grid overlay (left image) or semantic designation (right image). |
AOI | To A | To B | To C |
---|---|---|---|
From A | 53 | 43 | 1 |
From B | 36 | 132 | 43 |
From C | 8 | 35 | 36 |
Some analyses exclude transitions within a single AOI or collapse the scanpath sequences to exclude repeated letters (Day et al., 2018), equivalent to omitting the cells found diagonally in the matrix from top left to bottom right.
Additionally, when using semantic designation, an AOI can be defined for the blank space not covered by any other AOI, or transitions can only be counted between more meaningful AOIs. Our analyses follow Kreitz et al. (2014) to include transitions within an AOI to account for the relative interest of each AOI. With semantic designation, we also included transitions to or from the blank space on a slide under the assumption that these transitions also provide meaningful data about viewing behavior. Future work is needed to compare the advantages or disadvantages to each approach.
Third, marginal distributions ni, proportional maximum likelihood estimators (MLEs) of the transition probabilities pij, and MLEs of the stationary distribution πi are calculated using the transition matrices in a program like Excel via the formulas below. The transition matrix P for the sample data gives each MLE of the theoretical probability pij of transitions nij from initial AOI i to final AOI j as a proportion of total transitions ni from each AOI. The vector π gives the stationary probabilities needed to calculate each entropy term.
Hs = 0.914, Ht = 0.801 |
We present an example of our process for two experiments to facilitate other researchers’ reflection on and use of these methods. Tang et al. (2018) developed an R package GrpString for importing and analyzing eye-tracking data as groups of strings including options for transition entropy. Tang and Pienta (2018) applied this approach specifically to a Chemistry Education Research study. Our analyses highlight some of the necessary choices in this entropy approach to illuminate it for other researchers. We will revisit these methods in more depth in the Results and discussion section.
All statistical tests for this article were run using SPSS with data organized in Excel. We highlight some of our results in the discussion to illustrate how these tests might be used in future studies. Future work is needed to define how these values might contribute to eye-tracking research in science education research and to create more stringent criteria for evaluating them.
Question 1 was a text-based problem that asked participants to calculate the percent by mass concentration of a perchloric acid solution (Fig. 3). AOIs were placed semantically around types of information in the question, including: any numbers, the phrase “perchloric acid,” the word “water,” any other nouns, as well as an AOI count of any remaining spaces on the slide. Question 4 included a data table of bond enthalpies and asked participants to arrange four compounds in order of increasing heat of vaporization (slide shown with AOI arrangement in Fig. 4). Results of the entropy analyses for these two questions are shown in Table 2.
Student average stationary entropy (±SD) | Instructor average stationary entropy (±SD) | Student average transition entropy (±SD) | Instructor average transition entropy (±SD) | |
---|---|---|---|---|
Significance values for these comparisons are (a) 0.008 and (b) 0.047. *Hs for question 4 with semantic AOIs failed Levene's test F(1,12) = 5.023, p = 0.045) with Welch statistic F(1,3.352) = 2.972, p = 0.174, so the ANOVA result cannot be taken as significant. | ||||
Question 1 with semantic AOIs | 0.8929 ± 0.0446 (n = 25) | 0.9132 ± 0.0240 (n = 7) | 0.8066 ± 0.0446 (n = 25) | 0.8374 ± 0.0435 (n = 7) |
Question 1 with grid overlay AOIs | 0.9111 ± 0.0490 (n = 25) | 0.9066 ± 0.0307 (n = 7) | 0.7748 ± 0.0634 (n = 25) | 0.8276 ± 0.0399 b (n = 7) |
Question 4 with semantic AOIs | 0.9189 ± 0.0759* (n = 10) | 0.7415 ± 0.2001* (n = 4) | 0.5369 ± 0.1026 (n = 10) | 0.4913 ± 0.1335 (n = 4) |
Question 4 with grid overlay AOIs | 0.9520 ± 0.0245 (n = 10) | 0.8897 ± 0.0513 a (n = 4) | 0.7489 ± 0.0627 (n = 10) | 0.7606 ± 0.0494 (n = 4) |
For question 1, instructors had a significantly higher transition entropy than students across grid overlay AOIs. This result suggests that instructors generally viewed the slide more randomly or in a less predictable order than students. One explanation might be that students follow a more rote process for solving standard problems such as percent by mass calculations. Their scanning through the slide might appear more ordered because they carefully read line by line through a problem while solving it. On the other hand, instructors are more likely to adapt their problem solving strategy, easily identifying the type of problem and then finding the information needed to solve it. Because the averages differed for grid overlay AOIs but not for semantic AOIs, instructors viewed the slide in a more spatially-random way, rather than viewing the types of information differently.
Interestingly, the stationary entropy did not differ significantly for either placement of AOIs on question 1. Similar to comparing fixation counts, this result suggests that both groups placed the same relative visual weight on each type of information or area of the slide. Fig. 5 shows the composite heat maps which visualize relative fixation counts on a slide. The numbers in the problem seemed to receive the most attention from each group, reflected by the density of warm colors over those values, and the high average values of Hs. Stationary entropy allows for statistical comparisons between groups for those data. In the case of question 1, the visual differences in the heat maps are not accompanied by statistical differences in stationary entropy values between students and instructors.
![]() | ||
Fig. 5 Composite heat maps for question 1 of experiment 1 for students (left image) and instructors (right image). |
For question 4, the stationary entropy terms differed significantly between instructors and students for grid overlay AOIs. Students had significantly higher stationary entropy values, suggesting that their attention was more equally distributed across the slide. Based on the verbal responses to this question, we know that most instructors ignored the bond enthalpy data, which is unnecessary to order compounds by their heats of vaporization, whereas all the students used this information to answer the question. Fig. 6 shows the composite heat maps for this question where students had the highest fixation density in the bond enthalpy table while instructors had very little density in the same location. The stationary entropy term highlights the statistical difference reflected by these strategies and visualized in the heat maps.
![]() | ||
Fig. 6 Composite heat maps for question 4 of experiment 1 for students (left image) and instructors (right image). |
The transition entropy terms were not significantly different for either placement of AOIs on question 4, which implies that the scanpaths of instructors and students were equally structured, even though Hs comparisons revealed differences in the overall focus of visual attention. Additionally, Ht means for semantic AOIs were relatively lower in value (0.5369 and 0.4913) suggesting more structured, less random scanpaths throughout problem 4 as compared to problem 1.
Several insights became clear through our work with these data. First is that intentionally and equally placed AOIs can reveal different types of information. For problems with more visual information, like question 4, semantically placed AOIs can reveal differences in the use of different types of data. Especially for students, including distracting information can clearly distinguish levels of problem-solving ability through eye tracking. For problems that are more text-based or for which clear AOIs are difficult to identify, like question 1, equally placed AOIs are a useful option that still allow for statistical comparisons of entropy terms. Another insight that answers our second research question is that the stationary entropy term can be used to complement visualization methods like heat maps or transition diagrams with statistical comparisons (Stieff et al., 2011). The differences suggested visually by heat maps can be statistically confirmed or nuanced by entropy analysis.
Variable | Type III sum of squares | df | Mean square | F | Sig. | Partial eta squared |
---|---|---|---|---|---|---|
a Maulchy's test of sphericity was violated, so Greenhouse–Geisser corrected values were used instead. | ||||||
Question Ht (within subject)a | 0.539 | 1.948, 17.532 | 0.277 | 18.943 | <0.001* | 0.678 |
Time Ht (within subject) | 0.001 | 1, 9 | 0.001 | 0.042 | 0.843 | 0.005 |
Major Ht (between subjects) | 0.069 | 1, 9 | 0.069 | 3.106 | 0.112 | 0.257 |
Treatment Group Ht (between subjects) | 0.016 | 2, 9 | 0.008 | 0.364 | 0.705 | 0.075 |
Question Hs (within subject)a | 0.229 | 1.874, 16.865 | 0.122 | 8.201 | 0.004* | 0.477 |
Time Hs (within subject) | 0.003 | 1, 9 | 0.003 | 0.613 | 0.454 | 0.064 |
Major Hs (between subjects) | 0.000 | 1, 9 | 0.000 | 0.084 | 0.779 | 0.009 |
Treatment Group Hs (between subjects) | 0.027 | 2, 9 | 0.014 | 2.871 | 0.109 | 0.390 |
Question had a significant main effect for both transition entropy (F(1.948,17.532) = 18.943, p < 0.001) and stationary entropy (F(1.874,16.865) = 8.201, p = 0.004). Both effect sizes were quite large (partial eta squared > 0.14; see Richardson, 2011). To better visualize these data, mean values are displayed for Ht (Fig. 7) and Hs (Fig. 8) across each question with averages for majors and non-majors displayed as dotted lines.
![]() | ||
Fig. 7 Graph of transition entropy means for questions 1–5 of experiment 2 with standard error shown with brackets and averages for majors and non-majors shown with lines. |
![]() | ||
Fig. 8 Graph of stationary entropy means for questions 1–5 of experiment 2 with standard error shown with brackets and averages for majors and non-majors shown with lines. |
No other variables had a significant main effect. Additionally, there were no significant interaction effects for transition entropy. The significant interaction effects for stationary entropy were question × major (F(4,36) = 3.603, p = 0.014) and major × group (F(2,9) = 4.691, p = 0.040).
Our experience with data from experiment 2 solidified several lessons about working with entropy terms and statistical comparisons. The first lesson was the question-dependence of transition and stationary entropy. The variety of types of statistical comparisons that can be used helps to emphasize the value of these entropy terms as simple values that capture and help tell the story of complex data. The second lesson was the surprise that sometimes groups that were expected to be more visually efficient problem-solvers, like chemistry majors, could have higher transition entropy values on certain questions although no overall statistical differences were found for major. This counter-intuitive result also occurred in our first experiment, where instructors had higher average transition entropy than students, and emphasized the need to understand more about participants’ problem-solving strategies to avoid simplistic understandings of the entropy terms. The third lesson was that entropy terms can help bridge the gap between small and large groups of participants in eye-tracking studies. Because of the limitation of methodologies for visualizing eye-tracking data, like heat maps or transition diagrams, many studies use very small participant groups. Simpler methods like fixation duration or counts allow for larger group comparisons but sacrifice some complexities of eye-tracking data. Entropy terms offer another approach that captures details about the distribution and randomness of participants gaze patterns that are appropriate to both small and large groups.
Question | Type III sum of squares | df | Mean square | F | Sig. | Partial eta squared |
---|---|---|---|---|---|---|
a For measures that failed Maulchy's test (questions 1–3), Greenhouse–Geisser corrected values are displayed. | ||||||
1a | 64405.053 | 1.467, 13.205 | 43895.642 | 12.635 | 0.002* | 0.584 |
2a | 854171.655 | 1.298, 12.979 | 658098.486 | 53.080 | <0.001* | 0.841 |
3a | 229547.155 | 2.114, 21.141 | 108576.716 | 44.197 | <0.001* | 0.815 |
4 | 217683.396 | 4, 36 | 54420.849 | 34.709 | <0.001* | 0.794 |
5 | 51519.516 | 5, 45 | 10303.903 | 5.494 | <0.001* | 0.379 |
The MANOVAs for each question revealed a significant and large effect for AOI on fixation count. For questions 1–4, one AOI had significantly more fixations compared with at least one other AOI. The areas with the most fixations included important visual data like tables on which the question was based and were often located in the center of the slide. This result is unsurprising from a visual and problem-solving perspective but helps to confirm the visual and question-dependence of fixations. No other significant main or interaction effects were found.
The entropy terms carry several advantages over more traditional methods of analyzing or visualizing eye-tracking data. First, Ht correlates with the sequence of transitions rather than only the total number of fixations or gaze density. Quantifying the entropy of these sequences can tell us how participants compare AOIs to each other rather than only which AOI receives the most attention, as visualized in heat maps or compared through fixation counts. Second, the entropy measures assign a value to summarize entire sequences of eye-tracking data, rather than needing complicated diagrams or multiple measures to compare. The simplicity of these terms makes studies with larger groups of participants more feasible. Hs conveys similar information as fixation counts because it is based on the cumulative transitions between AOIs. However, summarizing this information in a single, normalized value like Hs can allow for clearer comparisons between different stimuli or experiments.
String analysis requires identifying windows or phases of analysis for scanpaths, which allows for deep comparisons of shorter time frames or stages of problem-solving. Strings are often represented as series of letters to identify the order of AOI fixations, including repeated letters for transitions with an AOI or only including unique transitions between AOIs. The typical practice is to collapse strings which removes the relative weight of fixations within AOIs. These three articles used semantically-placed AOIs and transition entropy, although grid overlay AOIs and stationary entropy offer another set of data to consider. Using an R package like GrpString helps to automate the analysis especially for researchers who are less familiar with the methods. Our exploration of entropy analysis in this article is provided to facilitate deeper reflection and discussion among researchers about these methods.
We propose several considerations for other researchers. First is to normalize entropy terms as we did to allow for easier comparisons between experiments and stimuli. Second is to consider the value of using collapsed sequences or transitions within AOIs. Third is to clarify if an analysis uses fixations within blank space on a stimulus, especially with semantic AOIs. Future work with problem-solving should include questions designed to separate groups more clearly, such as including visual elements, potentially distracting information, and uncluttered visual arrangements. Statistical analyses would also benefit from comparing similar types of questions and AOIs as a repeated measure of entropy terms.
Our discussion helped to unpack some of the nuance of interpreting entropy terms, such as the surprise of higher transition entropy for more advanced groups. To interpret entropy well, researchers must understand more about how participants are solving the problems, through previous research or other data collection. Finally, the visual cues in these experiments must be designed in a way that significant visual differences are expected. Examples from our studies were the use of visual data like diagrams or data tables. Purely text-based problems were not as appropriate for this analysis.
Overall, transition and stationary entropy simplify large amounts of data while preserving significant differences in the distribution of visual attention. These terms allow for statistical comparisons between similar or disparate groups at a single point or over time. Entropy terms are powerful tools to explain how participants use visual data in complex ways, distributing attention and developing transition sequences to solve problems. There is great potential to develop the use of these tools for different applications like videos, websites, and visualizations where assigning AOIs can be consistent.
This journal is © The Royal Society of Chemistry 2022 |