An image-based food quality analysis framework driven by large language models

Xiao-Yue Yin; Hai-Long Wu; Luo-Yuan Han; Ye He; Xiao-Zhi Wang; Tong Wang; Ru-Qin Yu

doi:10.1039/D6AY00214E

An image-based food quality analysis framework driven by large language models

Xiao-Yue Yin,^a Hai-Long Wu,

^a Luo-Yuan Han,^a Ye He,^a Xiao-Zhi Wang,*^a Tong Wang

*^a and Ru-Qin Yu

Author affiliations

* Corresponding authors

^a State Key Laboratory of Chemo and Biosensing, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, P. R. China
E-mail: wangtong@hnu.edu.cn, wangxiaozhi@hnu.edu.cn

Abstract

Although computer vision approaches based on machine learning (ML) and deep learning (DL) have been applied to image-based food quality analysis, they typically require large labeled datasets, intensive training, and expert knowledge. Recent advances in large language models (LLMs) have enabled enhanced reasoning capabilities and flexible multimodal perception. In this work, an LLM-driven vision inspection framework featuring two inference routes was developed. The text-only inference (TOI) converts pre-extracted image features into structured textual prompts for LLM-based reasoning. The vision-language inference (VLI) directly processes raw food images using LLMs with built-in visual encoders, enabling end-to-end multimodal inference. A lightweight graphical user interface (GUI) was integrated to automate feature extraction and structured data generation with one-click operation. Several state-of-the-art LLMs were evaluated on food ripeness, freshness, and authenticity datasets. Using GPT-5, the TOI approach attained an accuracy of 1.000 on low-complexity datasets and maintained strong performance (0.875) on more complex tasks, while the VLI approach demonstrated consistently high accuracy (0.964–1.000) with substantially fewer training images, matching or surpassing traditional ML and DL baselines. The proposed framework relies solely on prompt-based in-context learning, eliminating task-specific fine-tuning. These results demonstrate the feasibility and practicality of efficient, fine-tuning-free, LLM-driven vision inspection for food quality analysis.

Supplementary files

Article information

DOI: https://doi.org/10.1039/D6AY00214E
Article type: Paper
Submitted: 05 Feb 2026
Accepted: 09 Apr 2026
First published: 27 Apr 2026

Download Citation

Anal. Methods, 2026,18, 3465-3475

Permissions

Request permissions

An image-based food quality analysis framework driven by large language models

X. Yin, H. Wu, L. Han, Y. He, X. Wang, T. Wang and R. Yu, Anal. Methods, 2026, 18, 3465 DOI: 10.1039/D6AY00214E

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Analytical Methods

An image-based food quality analysis framework driven by large language models

Abstract

Supplementary files

Article information

Download Citation

Permissions

An image-based food quality analysis framework driven by large language models

Social activity

Search articles by author

Spotlight

Advertisements