CREOLab: A Procedure Captioning Dataset for Understanding Creative Tool Use in Object-Rich Laboratory Videos

Abstract

“Creative tool use” refers to the flexible application of tools beyond their intended purpose. In scientific experiments, this behavior is described as a “lab hack,” and its automatic documentation is valuable for accumulating experimental knowledge. Recently, vision-language models (VLMs) have shown promise for generating procedural descriptions from experimental videos. However, VLMs typically rely more on object-based knowledge than on understanding the manipulations. This issue is often overlooked in existing laboratory video datasets as tools are typically used in standard, prescribed ways. Thus, the extent to which these models can interpret and describe actions that extend beyond object-based knowledge, such as creative tool use, remains uncertain. Moreover, laboratory environments often contain numerous items unrelated to the operation (i.e., decoy objects), which can divert the model’s attention and further complicate the accurate identification of creative manipulations. To address this limitation, we developed an evaluation dataset called “CREOLab” (CREative tool use in Object-rich Laboratories), consisting of 65 videos from 13 experimental scenarios featuring creative tool use, each recorded across five levels of decoy object density. Using a state-of-the-art, cloud-based VLM captioning system, we evaluated model performance. As the number of decoy objects increased, the model tended to insert redundant procedural steps or omit essential ones. As a result, it failed to document scenarios involving creative tool use accurately. These findings suggest that enhancing the reliability of automatic experimental recording with VLMs requires mechanisms for automated verification of generated outputs, as well as recording protocols that reduce the influence of decoy objects.

Article information

Article type
Paper
Submitted
06 Dec 2025
Accepted
08 May 2026
First published
12 May 2026
This article is Open Access
Creative Commons BY-NC license

Digital Discovery, 2026, Accepted Manuscript

CREOLab: A Procedure Captioning Dataset for Understanding Creative Tool Use in Object-Rich Laboratory Videos

S. Goto and T. Hasebe, Digital Discovery, 2026, Accepted Manuscript , DOI: 10.1039/D5DD00542F

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements