Open Access Article
Uriel
Garcilazo-Cruz
*ab,
Joseph O.
Okeme
ab and
Rodrigo A.
Vargas-Hernández
*abc
aDepartment of Chemistry and Chemical Biology, McMaster University, Hamilton, ON, Canada. E-mail: garcilau@mcmaster.ca
bSchool of Computational Science and Engineering, McMaster University, Hamilton, ON, Canada. E-mail: vargashr@mcmaster.ca
cBrockhouse Institute for Materials Research, McMaster University, Hamilton, ON, Canada
First published on 13th January 2026
The lack of flexible annotation tools has hindered the deployment of AI models in some scientific areas. Most existing image annotation software requires users to upload a precollected dataset, which limits support for on-demand pipelines and introduces unnecessary steps to acquire images. This constraint is particularly problematic in laboratory environments, where on-site data acquisition from instruments such as microscopes is increasingly common. In this work, we introduce LivePixel, a Python-based graphical user interface that integrates with imaging systems, such as webcams, microscopes, and others, to enable on-site image annotation. LivePyxel is designed to be easy to use through a simple interface that allows users to precisely delimit areas for annotation using tools commonly found in commercial graphics editing software. Of particular interest is the availability of Bézier splines and binary masks, and the software's capacity to work with non-destructive layers that enable high-performance editing. LivePyxel also integrates a wide compatibility across video devices, and it's optimized for object detection operations via the use of OpenCV in combination with high-performance libraries designed to handle matrix and linear algebra operations via Numpy effectively. LivePyxel facilitates seamless data collection and labeling, accelerating the development of AI models in experimental workflows. LivePyxel is freely available at https://github.com/UGarCil/LivePyxel.
In the fields of object detection and image segmentation, accessible annotation tools must be compatible with a wide range of imaging devices. We argue that many scientific workflows, particularly those involving microscopes or specimen curation, would benefit greatly from the ability to capture and annotate images in real time. For instance, navigating a microscope slide or processing large biological collections often involves domain experts who serve simultaneously as annotators. Separating the capture and annotation steps can interrupt this workflow and reduce efficiency. In these contexts, the ability to annotate during image acquisition improves both efficiency and accuracy.
An image annotation tool focused on usability should also prioritize simple graphical user interface (GUI) components with a shallow learning curve, minimizing the need for technical support or costly training for specialized personnel. Although existing tools like LabelMe,19 VGG image annotator (VIA),20 and COCO Annotator21 offer pixel-level annotations for segmentation tasks, they lack live camera integration, a critical gap for workflows requiring immediate feedback or iterative labeling. These tools, though flexible in formatting (e.g., COCO JSON, Pascal VOC), often require users to navigate feature-heavy interfaces or offline workflows (e.g., biologists, field technicians). Moreover, most annotation software supports pixel-level annotation only through polygons or rectangular boxes. While effective for rigid geometries, these primitives are poorly suited for organic or curved structures. Approximating a smooth contour with polygons requires a high number of vertices, introducing annotation inefficiency and potential geometric bias that can propagate into downstream vision models. By contrast, graphic design software routinely uses Bézier splines to capture curves with minimal control points and high precision. This approach offers a more natural representation for biological or irregular shapes, making splines a compelling alternative to conventional polygon-based labeling. Commercial platforms such as Labelbox and Supervisely emphasize collaboration features but omit both live annotation and spline support. RectLabel (macOS-only) supports Bézier curves but does not allow on-site input. Web-based tools like CVAT22 and Label Studio23 provide scalability yet remain restricted to pre-recorded media. Together, these limitations highlight the absence of an integrated solution that combines live annotation, spline-based precision, and lightweight usability.
We present LivePyxel, developed here, an open-source Python GUI that integrates on-site video device input with Bézier spline-based segmentation, enabling precise pixel-level annotation of curved structures. LivePyxel was initially developed for on-demand annotation of microscopy images, but it also supports any video device accessible via OpenCV and Python. LivePyxel combines a lightweight, accessible interface with flexible annotation tools, including Bézier splines, polygons, and threshold-based masks, making it suitable for segmentation workflows across diverse research domains. The software is freely available at https://github.com/UGarCil/LivePyxel and can be installed through PyPI, with installation details found in the latest version of the repository along with tutorials, examples, and additional code.
The paper is structured as follows: Section 2 describes the specification and deployment details of LivePyxel, developed in this work. Section 3 demonstrates its application through an image segmentation task, where annotated images are used to train a ViM24 in two scenarios: (1) segmenting eight different microorganisms and (2) performing data engineering with binary masks. LivePyxel is designed to streamline data collection and labeling, accelerating the development of AI models in experimental workflows that require on-site data manipulation or large-scale batch processing.
The broad compatibility of LivePyxel with imaging devices is achieved through the use of the OpenCV25 library, which enables on-site video input from virtually any camera and streams it directly to the annotation canvas. We chose OpenCV for its efficient handling of images as numerical matrices, leveraging a compiled back-end for performance while maintaining flexibility through seamless integration with NumPy26 for fast manipulation in Python. The GUI itself is built using the Qt framework, which allows NumPy arrays to be rendered directly as images, resulting in a responsive and user-friendly interface. Fig. 1 is an overview of the LivePyxel architecture.
The LivePyxel GUI integrates several interactive components to streamline the annotation workflow. Mask display properties, such as opacity and binary threshold, can be adjusted using the sliders in the control section (Fig. 2A). Annotation categories are managed in the labels panel (Fig. 2B), where users can add, edit, or delete classes. The annotation panel (Fig. 2C) provides high-level controls for switching input sources, capturing frames, and toggling annotation mode. Drawing and editing actions are performed using the toolbar (Fig. 2D), which offers tools for creating, modifying, or erasing masks. The canvas (Fig. 2E) displays either live microscope or uploaded images, over which masks are layered non-destructively in real time. Navigation buttons (Fig. 2F) allow users to cycle through frames or dataset entries. Once annotations are complete, LivePyxel saves both the annotated images and corresponding masks into automatically generated subfolders within a user-specified directory. In addition to on-site webcam annotation, LivePyxel also supports traditional workflows by allowing users to upload pre-existing image datasets that follow the required folder structure.
Each Bézier unit is defined by three control points: two end points and a central handle that determines curvature; see Fig. 3B. By connecting multiple units, users can construct complex outlines with high fidelity to natural forms. This modular structure offers fine control over both curvature and sharp transitions. The advantage of splines for delimiting contours is illustrated by SplineDist,28 which extends over the popular StarDist framework by modeling objects as planar parametric spline curves, allowing more flexible and smooth segmentation boundaries and solving issues with non-convex geometries. Moreover, the use of splines in the preparation of annotated data for segmentation tasks has been documented in clinical applications.29,30 In biomedical imaging, the use of splines enables users to trace smooth cell membranes with highly organic contours and tightly coiled or angular biological features, something that would require excessive effort and precision with polygon tools. The ability to produce accurate, pixel-level masks with fewer interactions not only improves annotation speed but also reduces user fatigue and annotation bias. As a result, Bézier splines in LivePyxel improve both the efficiency and the quality of segmentation in tasks where precision is critical, such as microscopy, radiography, and digital morphology.
Binary masks significantly reduce annotation time by providing an initial segmentation that can be manually refined, in contrast to beginning an annotation from scratch. Particularly useful in situations where there are many structures to annotate in a controlled environment with a static background (see Section 3.2 targeting an example using snail shells), LivePyxel provides a way to critically boost the gathering of image/mask pair data. The generated dataset can further be augmented or ‘engineered’ to generate a much larger dataset (Fig. 7).
Finally, binary masks can also be used in combination with vector-based tools such as Bézier splines. Once a region is roughly defined using thresholding, the user can trace or refine its boundaries using spline functions to achieve pixel-level precision. This hybrid workflow enhances segmentation quality while minimizing manual effort, offering a powerful solution for datasets where both speed and anatomical accuracy are crucial.
LivePyxel employs a stack-based compositing system, in which each annotation exists on an independent layer logically stacked from bottom to top. This layered architecture facilitates non-destructive editing: individual segments or anatomical regions can be added, modified, or removed without affecting neighboring annotations; see Fig. 4. Each layer retains its own shape, color, and mask information, granting users fine-grained control over the annotation workflow. When the Annotate button is pressed, these layers are rasterized and composited in order, from bottom to top, into a single merged mask, encoding the labels in the form of RGB colors and overwriting every pixel with the top color (ignoring black). This final image can be saved or exported in formats compatible with most deep learning frameworks.
Importantly, this integrated pipeline also facilitates instance segmentation tasks, where models must not only classify each pixel but also associate it with a specific object instance. By storing each layer separately, LivePyxel can export instance-specific masks, with each layer representing a distinct object. This capability enables seamless integration with training workflows for models such as Mask R-CNN and SAM.
We trained a U-Net model24 for image segmentation across eight categories, initializing it with VGG-19 pretrained weights.39 Each image in the dataset had an original resolution of 720 × 480 pixels. Due to the dataset's limited size, certain categories were underrepresented (see Fig. S2 in the SI). To mitigate this imbalance, we applied categorical weighting to the cross-entropy loss function and employed data augmentation techniques. Additional architectural and training details are provided in the SI. Training was carried out over 131 epochs using two NVIDIA V100 16 GB GPUs, with a total training time of approximately 8 hours. For evaluation, we reserved an independent test set of 200 images that were excluded from both training and validation.
Fig. 5 presents the final F1 scores of the trained U-Net model, which performed well in identifying the most abundant classes despite the limited dataset size. These scores reflect consistent accuracy for dominant taxa, but also highlight the challenges of detecting rare or visually ambiguous classes such as tardigrades, diatoms, and rotifers, an issue aligned with the class imbalance of the dataset, where these taxa collectively account for only about 10% of samples (see Fig. S2 in the SI). As shown in Fig. 6, the model also struggled to detect specimens that exhibit transparent bodies, highly variable morphologies, and poor representation in training data, such as Vorticella sp., resulting in incorrect contour recognition.
The environmental microorganisms dataset served as an example of the capacity of LivePyxel to rapidly produce quality annotations of a rapidly changing community of species. The U-Net attained reliable F1 scores for the most abundant classes, but performance degraded for rare, transparent, or morphologically variable organisms (e.g., Vorticella sp., diatoms, rotifers), reflecting the dataset's class imbalance. These outcomes underscore two complementary needs for ecological vision models deployed in dynamic microbial communities: (i) the importance of capturing morphological variability; and (ii) the use of training strategies that are robust to scarcity and ambiguous boundaries, including categorical weighting in the cross-entropy loss to account for rare classes, data augmentation and the potential in using data engineering techniques to boost the capacity of the network to recognize rare classes.
In certain scenarios, a dataset can be constructed under highly controlled lighting and background conditions to semi-automate the annotation process. For example, using a uniform background (e.g., a solid white or green backdrop) and exploiting simple threshold-based image operations to obtain binary masks for the foreground objects. In a controlled setup where the background is a known uniform color (such as white), images can be converted to grayscale (or other color space) and apply a threshold that classifies each pixel as background (0) or object (1) based on its intensity or hue.41 This effectively binarizes the image, separating the objects from the backdrop. For example, researchers have captured objects on a plain white background under consistent lighting, which reduces the complexity of background removal and allows easy discrimination of the object by simple intensity thresholding when dealing with organic shapes.42 In fields like medical imaging, placing surgical instruments in front of a green screen has been used to automatically extract tool masks – the monochromatic green background makes it easy to isolate the instrument by hue thresholding.41 In such controlled conditions, the threshold can be tuned so that any pixel brighter (or darker) than a set value is labeled as background, while the rest are labeled as foreground (object). This semi-automated mask generation dramatically speeds up dataset creation, since the bulk of the annotation is handled by an algorithm (with minimal human correction for any errors).
We used LivePyxel to test its capabilities in the automation and generation of image/mask pairs in a highly controlled environment using an assortment of snail shells purchased at a dollar store in Toronto, Canada. The dataset is composed of 4 different classes, believed to correspond to different species of mollusks (see Fig. S3 in the SI).
The original training data set was collected by placing a large number of shells from the same category at the same time, using the binary mask tool to discriminate the background from the shells, and assigning a color to the final annotation (Fig. 7A). This resulted in 1400 individual annotations, each containing only a single class per image. To prepare these data, we used each binary mask to isolate the pixels of the object class within the original image. This yielded a trimmed version of the object with transparency, where all background pixels were removed and only the object pixels remained (Fig. 7B). These trimmed images were then stored along with their masks.
We implemented a randomized compositing strategy. For each of the 10
000 engineered samples, a base background image was selected, and the class folders were shuffled to introduce variation. A random number of classes were sampled, and for each selected class, a transparent image was randomly chosen and placed on the background (Fig. 7C). The same transformation, such as flipping or rotation, was applied to both the image cutout and its corresponding mask. This ensured that pixel-level alignment was preserved within an image/mask pair. Multiple instances from different classes were overlaid in succession, resulting in composite scenes that mimic realistic configurations with precise pixel-accurate masks for each object (Fig. 7D).
This pipeline produced a total of 10
000 synthetic image/mask pairs, significantly enriching the dataset and introducing diverse combinations of object instances, orientations, and overlaps. These engineered samples were subsequently used to train the same U-Net segmentation model used for Section 3.1, keeping the image size at 512. The model was trained for 24 hours on 4 NVIDIA A100 GPUs. In contrast with the water tank dataset (Section 3.1), the F1 scores for the snail shells dataset demonstrated consistently high performance across all classes, with minimal variation between training and validation sets and a very high F1 score across all categories (Fig. 8). The model also exhibited a rapid decline in the loss function, accompanied by a steep rise in F1 scores within the first three epochs (Fig. S5 and S6 in SI). Fig. 9 showcases the predicted masks by the trained U-Net for some of the images in the dataset.
![]() | ||
| Fig. 8 F1 scores achieved by a U-Net model with VGG-19 backbone during training and validation. The plot highlights performance across 5 different classes. | ||
The snail-shells study demonstrates LivePyxel's capacity to automate the creation of masks under highly controlled setups, yielding fast, stable optimization and uniformly high F1 across classes. Compared to the environmental microscopy setting, the engineered dataset reduces annotation costs in time and user effort. The U-Net architecture, initialized with VGG-19 and trained on 10
000 composite image/mask pairs, converged within a few epochs and exhibited a minimal train–validation gap. These results underscore that with accurate binary masks and controlled backgrounds, segmenting small, round objects becomes straightforward under laboratory conditions and with the use LivePyxel.
Our results show that the accuracy achieved with LivePyxel is comparable to, or slightly higher than, other annotation software packages, while maintaining similar annotation times; see Section 5 in the SM. As highlighted in Section 3.2, a key advantage of LivePyxel is its capability to perform Boolean operations directly within the mask. This feature, uncommon among existing tools, enables rapid and flexible labeling of complex regions, including those with internal holes like the central object in Fig. 10.
In terms of performance, LivePyxel exhibited a balanced trade-off between false positives (5.7%) and false negatives (0.5%), comparable to other tools (Fig. 10). CVAT achieved the lowest overall error (2.5% false positives, 1.3% false negatives) through AI-assisted segmentation, but at the cost of a more complex setup, internet dependency, and non-local data handling. VIA offered the simplest installation (a standalone HTML file), whereas LabelMe required manual dependency management on some systems. COCO Annotator had the most challenging setup, involving Docker and SQL-based database configuration. Among these, only LivePyxel integrates Bézier-spline support, providing smoother boundary representation than polygon-based tools such as VIA, LabelMe, and COCO Annotator.
Supplementary information (SI) is available. See DOI: https://doi.org/10.1039/d5dd00421g.
| This journal is © The Royal Society of Chemistry 2026 |