Timur Alieva,
Ilya Korolev
a,
Mikhail Yasnova,
Michael Nosonovsky
*b and
Ekaterina V. Skorb
a
aInfochemistry Scientific Center, ITMO University, 9 Lomonosov St., St. Petersburg, 191002, Russia
bDepartment of Mechanical Engineering, University of Wisconsin-Milwaukee, N Cramer St. 3200, Milwaukee, WI 53211, USA. E-mail: nosonovs@uwm.edu
First published on 17th February 2025
This study presents a machine learning (ML)/Artificial Intelligence (AI) approach to classify types of sparkling wines (champagnes) and their respective containers using image data of bubble patterns. Sparkling wines are oversaturated with dissolved CO2, which results in extensive bubbling when the wine bottle is uncorked. The nucleation and properties of bubbles depend on the chemical composition of the wine, the properties of the glass, and the concentration of CO2. For carbonated liquids supersaturated with CO2, the interaction of natural and cavitation bubbles is a non-trivial matter. We study ultrasonic cavitation bubbles in two types of sparkling wines and two types of glasses with the computer vision (CV) analysis of video images and clustering using an artificial neural network (NN) approach. By integrating a segmentation NN to filter out irrelevant frames and applying the Contrastive Language-Image Pre-Training (CLIP) NN for feature embedding, followed by TabNet for classification, we demonstrate a novel application of ML/AI for distinguishing champagne characteristics. The results show that the bubbles are significantly different to be classified by the ML techniques for different types of wine and glasses. Consequently, our study demonstrates that CV/AI/ML analysis of ultrasound cavitation bubbles can be used to analyze carbonated liquids.
Bubbles in Champagne and sparkling wine have been studied by many researchers.2–10 Sparkling wine is supersaturated with carbon dioxide (CO2), whose partial pressure in the bottleneck is about P = 5–7 atm at 20 °C. The total amount of dissolved CO2 in a standard 0.75 L bottle is close to 9 g, which corresponds to about 5 L volume of gaseous CO2 under standard conditions. Due to the presence of alcohol, wine's surface tension is about γ ≈ 50 mN m−1 (pure water has γ ≈ 72 mN m−1) and its viscosity is about 50% larger than that of pure water.4,5 The critical radius of bubble nucleation is given by the Laplace equation as
![]() | (1) |
The quality control of liquids by the analyses of visual images of ultrasonically induced cavitation bubbles is a relatively new method of non-destructive quality control, which has been applied to various liquids including water–ethanol solutions and petroleum products.11–16
During the ultrasound cavitation, bubbles of dissolved gas and vapors of the solvent itself form due to localized pressure changes caused by ultrasound. The size of bubbles oscillates due to the ultrasonic acoustic excitation, which causes compressive and tensile stress. At the compressive phase, bubbles shrink, while at the tensile phase, they expand for the amount that exceeds shrinking thus resulting in the growth of the average bubble radius.
While behavior close to the instability is difficult to predict by traditional deterministic methods, oscillating and collapsing bubbles can provide large amounts of datasets (e.g., visual images), which makes them an almost ideal object for Artificial Intelligence (AI) and Machine Learning (ML) analyses and searches for correlations in data. Artificial Neural Network (ANN) was trained to determine the composition (alcohol concentration) of these solutions based on the bubble images.11 Besides the ultrasonic cavitation, ML/AI and other novel computational methods are widely used to analyze the taste quality of food products such as the bitter taste in wines17 and the umami taste in various products.18
In water–ethanol solutions, the shape and evolution of microbubbles is sensitive to the viscosity and surface tension, which are dependent on the ethanol concentration. Using a large amount of data (bubble images obtained from video recording) it is possible to determine solution concentrations by Machine Learning (ML) algorithms.11
Applying a similar approach to carbonated beverages such as sparkling wine is more challenging for two reasons. First, alcohol concentration does not vary significantly in different sorts of wine. The difference in the chemical composition of different types of wine is often a fraction of a percent, and, therefore, it does not always affect properties significant for bubbles, such as surface tension and viscosity. Second, sparkling wine is oversaturated with carbon dioxide, generating a large amount of CO2 bubbles even without ultrasonic cavitation. These natural bubbles' behavior may differ from that of cavitation bubbles. The interaction of the bubbles formed in the supersaturated carbonated beverage with the bubbles induced by ultrasound cavitation is a non-trivial matter. It is therefore desirable to study the feasibility of the ultrasonic cavitation method for sparkling wine classification. It is known also that the behavior of bubbles in sparkling wine depends on the surface properties of the glass. It may also depend on the time since uncorking. These factors introduce additional variables that should be considered.
In this study, we will apply the Computer Vision (CV) methods to the visual recordings of cavitation bubbles in two sorts of sparkling wine kept for certain time intervals after the uncorking of the bottle in a glass and plastic cups. After that, ML methods will be applied by training a Contrastive Language-Image Pre-Training (CLIP) Artificial Neural Network (ANN) to cluster and classify data points corresponding to different samples. The data for sparkling wines will be compared with the data for water-alcohol solutions with similar alcohol concentrations.
Compound | Concentration |
---|---|
Ethanol (C2H5OH) | ≈12.5% |
Sugars (e.g. C6H12O6) | 10–50 g L−1 |
CO2 | 10–12 g L−1 |
Glycerol (C3H8O3) | ≈5 g L−1 |
Tartaric acid (C4H6O6) | 2.5–4 g L−1 |
Lactic acid (C3H6O3) | ≈4 g L−1 |
Volatile organic compounds | ≈0.7 g L−1 |
K+ | 0.2–4.5 g L−1 |
Ca2+ | 0.06–0.12 mg L−1 |
SO42− | 0.2 mg L−1 |
Polysaccharides | ≈0.2 mg L−1 |
Polyphenols | ≈0.1 g L−1 |
Mg2+ | 0.05–0.09 g L−1 |
C6H12O6 → 2CH3CH2OH + 2CO2 | (2) |
The second stage of fermentation, referred to as prise de mousse, occurs in cool cellars at temperatures between 12 °C and 14 °C. Wine is kept in tightly closed bottles along with added sugar and yeast. At this stage, wine becomes saturated with CO2. The concentration of dissolved CO2 is proportional to its partial pressure in the vapor phase, which, in turn, is proportional to the amount of added sugar. Sparkling wine is supersaturated with CO2, whose partial pressure in the bottleneck is about 5–7 atm at 20 °C. The total amount of dissolved CO2 in a standard 0.75 L bottle is close to 9 g, which corresponds to about 5 L volume of gaseous CO2 under standard conditions for temperature and pressure.
During the secondary fermentation, the concentration of CO2 in wine is in equilibrium with the partial vapor pressure PCO2 in is given by Henry's law
c = kHPCO2 | (3) |
![]() | (4) |
The temperature-dependency of the gas pressure in the bottle can be calculated by combining Henry's law with the ideal gas state law, and it was estimated by Liger-Belair et al.3 as
![]() | (5) |
The rapid pressure drop facilitates an adiabatic expansion governed by
P(1−Γ)TΓ = const | (6) |
The physicochemical parameters relevant to bubbles in pure water, pure ethanol, water–ethanol 12.5% solution, and sparkling wine are presented in Table 2.
Water | Ethanol | Water–ethanol 12.5% mixture | Champagne2 | |
---|---|---|---|---|
Density, kg m−3 | 1000 | 789 | 975 | 998 |
Viscosity, mPa s | 1.002 | 1.14 | 1.48 | 1.48 |
Surface tension, mN m−1 | 72 | 22 | 50 | 48 |
CO2 pressure, atm | 0 | 0 | 0 | 5–7 |
Critical radius of droplets, ![]() |
1.44 | 0.44 | 1.0 | 0.2 |
Bubbles tend to stick to the surface. Artificial effervescence is related to bubbles nucleated from glasses with imperfections done intentionally by the glassmaker to promote or eventually replace a deficit of natural nucleation sites. Thus, in plastic cups, gas desorption happens through heterogeneously nucleated CO2 bubbles.7
In this section, we have seen that physical and physicochemical models of bubble formation can provide insights into such characteristics as density, viscosity, and carbon dioxide concentration. However, it is not possible to directly distinguish between different types of wine, such as rosé and white, or the type of glass. We hypothesize that CV and ML/AI methods allow for distinguishing these characteristics. Since the rate of natural bubbling decreases within minutes after uncorking, we will also the effect of time for which wine was exposed to air within a given type of glass (cup).
![]() | ||
Fig. 1 (A) Schematic of the experimental procedure. (B) Two types of wine (C) were kept in glass and plastic cups and then (D) treated with ultrasound (US). |
The working hypothesis was that bubble formation is different in glass and plastic cups due to the presence of centers of nucleation, such as scratches, at the plastic cup surface, as opposed to the relatively smooth glass.7 The intensity of bubble formation decreases after some time following uncorking the bottle. For that end, different time intervals, up to 20 minutes, were used.
To generate cavitation bubbles, the ultrasonic generator UZG 55-22 (BSUIR, Belarus) with a nominal rated frequency of 22 kHz and a nominal maximum power of 100 W was used. Ultrasonic oscillations were generated by the titanium sonotrode. The titanium sonotrode shaped as a truncated cone with a disk (15 mm diameter, 2 mm thickness) at the edge was positioned at the angle of 45° to the surface of a glass Petri dish filled with the wine being studied. The immersion depth was adjusted to ensure that the entire disk was fully submerged in the sample liquid.
The images of cavitation bubbles were recorded with the high-speed camera Phantom Miro C110, connected to the microscope Mikmed-6 (LOMO, Russia) with a 10× objective. The capturing frequency was 700 fps with a resolution of 768 × 768 px. The image set was automatically composed from the frame images as a single video file. The video file was edited with the application Phantom CV 3.3 to identify sections containing bubble formation, evolution, and collapse for each sample. In the past, the 12.5% water–ethanol solution was studied with the same methodology.11
The computations were carried out on an AMD Ryzen Threadripper 3960X processor. To accelerate the training of neural networks, an NVIDIA GeForce RTX 3090 GPU was used, enabling fast computation of complex models and handling large volumes of data. This configuration facilitated the efficient execution of the entire data analysis cycle, from preliminary image processing to training and testing of advanced machine learning architectures.
Data Processing Stages:
(1) Video segmentation
For preliminary video processing, the YOLOv8 segmentation model was employed, enabling efficient image processing by isolating frames containing cavitation bubbles. We utilized pretrained YOLOv8 weights from the study,12 which significantly accelerated the integration of the model into the pipeline. After identifying the bubbles, their contours were outlined as polygons generated by the model. This step was added to the processing workflow following a series of experiments that demonstrated the contouring of polygons improved the classification model's accuracy by enhancing the quality of extracted features.
(2) Data splitting
After segmentation, the data was divided into training (60%), validation (20%), and test (20%) sets. This data distribution was chosen to ensure balanced model training and objective performance evaluation. This step minimizes the risk of overfitting and ensures that the test data remains completely independent of the training phase.
(3) Frames extraction
To reduce the data volume and improve processing efficiency, the videos were split into individual frames at a rate of 1 frame per 10 seconds. This approach preserved key temporal patterns, ensuring the data's representativeness for subsequent analysis.
Final dataset volume
- Train: 12895 images
- Validation: 3744 images
- Test: 3314 images
- Total: 19953 images
The use of YOLOv8 with pre-trained weights,12 along with the added step of polygon contouring for detected bubbles, improved the data quality for classification. This approach enabled the extraction of more precise visual features, positively impacting the overall performance of the model. The general pipeline is presented in Fig. 2.
In the initial stages of the study, classical CNNs, such as VGG16, VGG19, ResNet18, and ResNet50, were employed for feature extraction. These models demonstrated good results but were limited in capturing complex relationships due to the lack of semantic context. This limitation motivated the integration of more advanced methods, such as CLIP.
Contrastive Language-Image Pretraining (CLIP)20 marked the next stage in the evolution of the pipeline. CLIP enables the extraction of embeddings that combine both visual and semantic features of images, making the model more robust to noise in the data. Paired with the TabNet classifier,21 CLIP demonstrated superior performance. TabNet was chosen for its ability to efficiently process embeddings while preserving model interpretability.
To improve classification quality, a TabNet Pretrainer was added, which was trained in an unsupervised mode to uncover hidden structures in the data. This allowed the model to better adapt to the characteristics of the embeddings, resulting in increased accuracy on the test dataset.
To further enhance feature quality, CLIP was replaced with its improved version, SigLIP.22 The primary advantage of SigLIP lies in its expanded embedding space, enabling the model to capture more information from the images. Combined with TabNet and the TabNet Pretrainer, this approach proved to be the best among all tested methods.
At the final stage, all images containing cavitation bubbles were processed through the segmentation model, followed by embedding extraction using SigLIP (Fig. 2). The embeddings served as input for the TabNet Pretrainer, which structured the data, and the TabNet Classifier, which performed the final classification. This approach achieved maximum accuracy and classification stability.
![]() | ||
Fig. 3 (A) Typical bubbles in the control series (no ultrasound) and in the cavitation series. (B) Bubbles in various experimental series. |
It is observed that based on the visual analysis it is very difficult to find the difference between different series in the cavitation experiments.
The classification accuracy evaluation is presented in Fig. 4(A) and (B) as normalized confusion matrices. These matrices depict the results of two primary classification tasks: the type of champagne (rosé or white) and the type of glass (glass or plastic).
Fig. 4(A) shows the confusion matrix for the classification of rosé (pink) and white champagne. The overall classification accuracy for this task was 84%. While the model exhibits a strong ability to distinguish between the two types of champagne, an error rate of 14–15% persists due to visual similarities in the bubble patterns characteristic of different wine types. This highlights the limitations of classification tasks based solely on visual data.
Fig. 4(B) presents the confusion matrix for the classification of container types: glass (glass) or plastic (plastic). The overall accuracy for this task was 82%. While the model demonstrated a good ability to differentiate between containers, the error rate of 15–20% suggests that the surface properties of the containers, which influence bubble behavior, may require more detailed analysis or additional data processing to improve the results.
The obtained confusion matrices demonstrate the high efficiency of the proposed pipeline for both classification tasks. However, the accuracy rates of 84% for champagne classification and 82% for container type classification indicate potential areas for improvement. For instance, incorporating temporal dynamics from the videos or enhancing the data preprocessing stage (e.g., contrast enhancement or additional normalization techniques) could help reduce the error rate.
The use of SigLIP22 as a feature extractor played a key role in achieving these results. Its enhanced embedding space provided richer and more informative features compared to previously employed methods. Additionally, the integration of TabNet21 as the classifier enabled effective processing of complex embedding and tabular data. These innovations significantly improved classification accuracy compared to earlier versions of the pipeline, affirming the validity of the chosen approach.
It is interesting to compare bubble dynamics in sparkling wines with that of other carbonated beverages, such as beer and sodas. According to Bossaerts et al., the properties of bubbles in beer depend strongly on the concentration of alcohol (non-alcoholic beers have fewer bubbles) and is affected strongly by the CO2 concentration and the surface tension, which, in turn, is dependent on alcohol, protein, and iso-alpha-acid content in beer.23 An explicit comparison of bubbling dynamics in champagne wines and beers24 shows that the critical radius of bubbles in beers is about twice as that in champagne due to higher concentration of dissolved CO2 in champagne. As far as other carbonated liquids, the bubble generation is promoted by hydrophilic structures on container's surface and suppressed by hydrophobic structures.25 These insights suggest that different carbonated beverages are characterized by different bubbling behavior, which is consistent with our observation on rosé and white sparkling wine.
This journal is © The Royal Society of Chemistry 2025 |