Open Access Article
Yihang Feng
ab,
Yi Wang
a,
Xinhao Wanga,
Bo Zhaob,
Jinbo Bib,
Song Han*b and
Yangchao Luo
*a
aDepartment of Nutritional Sciences, University of Connecticut, Storrs, CT 06269-4017, USA. E-mail: yangchao.luo@uconn.edu; Web: https://yangchao-luo.uconn.edu/ Fax: +860-486-3674; Tel: +860-486-2180
bSchool of Computing, University of Connecticut, Storrs, CT 06269-4017, USA. E-mail: song.han@uconn.edu; Web: https://cps.cse.uconn.edu/ Tel: +860-486-8771
First published on 18th February 2026
Consumer concerns about food additives have intensified amid widespread misinformation, with the 2024 IFIC survey revealing that 35% of consumers actively avoid artificial ingredients despite authoritative safety data existing in FDA and USDA databases. This work investigates whether on-device artificial intelligence can effectively translate complex regulatory information into accessible consumer education while maintaining scientific accuracy and privacy. This paper presents Food Additive Lens (FAL), an iOS application implementing a three-agent architecture: (1) a food category classifier achieving 87.2% top-3 accuracy across 257 categories, (2) a hybrid additive identifier combining database lookup with AI extraction (F1-score: 0.757), and (3) an explanation generator producing contextualized, consumer-friendly descriptions. The system deploys Meta's Llama 3.2 3B model quantized to 1.8 GB through 4-bit compression, achieving a generation speed of 13–30 tokens/second while operating entirely offline. Integration of FDA's Substances Added to Food Inventory (3971 substances) and USDA's Global Branded Food Products Database enables comprehensive coverage with direct links to the Code of Federal Regulations for professional users. The Retrieval-Augmented Generation workflow grounds AI responses in authoritative sources, reducing hallucination while maintaining accessibility. Performance evaluation on iPhone 14 and MacBook Air M1 demonstrated stable memory usage (peak: 2.36 GB) with complete offline functionality, ensuring user privacy. The application transforms complex ingredient lists into accessible information through camera-based OCR scanning, progressive disclosure interfaces, and context-aware explanations tailored to specific food products. This work demonstrates the feasibility of deploying sophisticated AI for science communication on consumer devices, offering a scalable model for combating food-related misinformation while preserving privacy and accessibility.
000 different additives globally, with regulatory frameworks varying significantly across jurisdictions.1,2 In the United States alone, the FDA's Substances Added to Food Inventory (SAFI) catalogs nearly 4000 approved substances, each serving specific technological functions ranging from antimicrobial preservation to texture modification, color stabilization, and nutritional fortification.3 Despite rigorous safety assessments and regulatory oversight, public perception of food additives has become increasingly negative over the past decade, creating a significant disconnect between scientific evidence and consumer beliefs that has profound implications for public health policy and food industry practices.4
Recent consumer research reveals the depth of this perception gap. The 2024 International Food Information Council Food and Health Survey5 found that 35% of consumers actively try to limit or avoid artificial ingredients/colors and 25% of consumers' top choice of definition of healthy food is “limited or no artificial ingredients or preservatives”, with younger demographics showing even higher avoidance rates despite having less nutritional knowledge overall. This widespread apprehension is not merely a preference but often manifests as food anxiety, with studies documenting that concerns about additives contribute to disordered eating patterns and unnecessary dietary restrictions.6 The phenomenon has been further amplified by social media, where misinformation about food ingredients spreads rapidly through viral posts and influencer content. A recent study7 found that nutritional misinformation, particularly on platforms like TikTok, was not only prevalent but also significantly more engaging than accurate content, with inaccurate posts receiving higher likes and comments. This dynamic fosters a digital environment where fear-based narratives about food additives dominate public discourse, often outpacing factual, science-based communication. Social media platforms tend to amplify emotionally charged and sensational content, which includes misinformation about food ingredients. For instance, negative claims about additives, especially those framed around health risks, tend to receive significantly more engagement than posts presenting scientific evidence or regulatory context.8 This amplification contributes to a broader phenomenon known as chemophobia, a disproportionate fear of chemicals in food, which can distort public understanding and hinder informed decision-making.9 As a result, legitimate safety discussions are frequently overshadowed by pseudoscientific claims, making it increasingly difficult for consumers to distinguish between evidence-based concerns and unfounded fears.
The challenge of food additive communication is compounded by the complexity of chemical nomenclature and regulatory language used in ingredient lists. Consumer research demonstrates significant barriers to understanding food ingredient information, with studies revealing that consumers frequently rely on simplified heuristics when interpreting chemical names on food labels. Aschemann-Witzel et al. found that consumers perceived additives as more harmful when the additives had names that were difficult to pronounce, indicating that unfamiliarity creates greater risk perception and leads to avoidance behaviors.10 For instance, the same consumer who might be comfortable with “vitamin C” may experience concern when encountering “ascorbic acid” on a label, despite these being identical substances. This nomenclature barrier represents a fundamental obstacle to informed food choices, as consumers demonstrate a strong preference for ingredients with “familiar or recognizable” names rather than “chemical-sounding” names.11 Eye-tracking studies conducted on food label reading behavior reveal that consumers demonstrate systematic but brief visual attention patterns when examining ingredient information. Research using mobile eye-tracking technology found that participants' visual attention to health labels was significantly reduced under time constraints, with consumers spending limited time processing complex ingredient information.12 The resulting cognitive overload leads many consumers to rely on simplified decision-making strategies, such as avoiding products with longer ingredient lists or unfamiliar chemical names, heuristics that often lead to suboptimal nutritional choices.13
The current landscape of digital tools addressing food additive information reveals significant gaps in meeting consumer needs for accurate, accessible, and actionable information. Analysis of mobile health applications in the nutrition domain shows that most food scanning apps lack comprehensive data validation and evidence-based information sources. Research examining food tracking mobile applications found that only a small percentage incorporate data from authoritative regulatory sources, with many relying on simplified scoring algorithms that may perpetuate misconceptions rather than provide educational value.14 Among applications that do reference nutritional databases, studies indicate significant limitations in providing contextual, product-specific explanations that account for individual dietary needs or knowledge levels.15 Furthermore, existing solutions predominantly operate on client-server architectures, requiring constant internet connectivity and raising substantial privacy concerns. Comprehensive privacy assessments of mobile health applications reveal widespread data collection practices, with research showing that 88.0% of analyzed mHealth apps included code that could potentially collect user data, and 28.1% provided no privacy policy at all.16 Analysis of mHealth app privacy policies demonstrates that a significant proportion collect and share user dietary data with third parties for marketing purposes, with many building behavioral profiles that could potentially be used for discriminatory practices.17 This privacy-functionality trade-off forces consumers to choose between accessing information about their food and protecting their personal health data, a choice that becomes particularly problematic for individuals with specific dietary requirements or stigmatized health conditions.18
The technical challenges of developing effective food additive education tools extend beyond simple database queries. Food additives often serve multiple functions depending on the food matrix, processing conditions, and interactions with other ingredients. Research on food matrices demonstrates that additives exhibit context-dependent functionality, with studies showing that the same additive can perform different roles based on environmental factors such as pH, temperature, and the presence of other compounds.19 For example, citric acid may function as an acidulant in beverages, a chelating agent in canned vegetables, or a flavor enhancer in confectionery products, requiring context-aware explanation systems that can account for these nuances.20 Additionally, the same additive may be derived from different sources or produced through various methods (synthetic, fermentation, and extraction), each with different implications for consumers with specific dietary restrictions or preferences.21 Current database structures and query systems struggle to capture these multidimensional relationships, resulting in oversimplified or potentially misleading information when translated for consumer audiences.22
Recent advances in artificial intelligence, particularly in natural language processing and on-device deployment, offer unprecedented opportunities to address these challenges. The development of transformer-based language models has revolutionized the ability of machines to understand and generate human-like text, with models demonstrating remarkable capability in translating technical information into accessible explanations.23 Studies on transformer models for text simplification have shown significant improvements in converting complex scientific language into plain language formats, particularly for domain-specific applications such as biomedical text.24 However, deploying these models on mobile devices has historically been infeasible due to their massive computational requirements, with popular models requiring gigabytes of memory and server-grade processing power.25 Research on mobile AI deployment challenges reveals that large language models (LLMs) typically require hundreds of megabytes of memory footprints, making it challenging to deploy on resource-constrained platforms such as mobile devices and IoT systems.26
This limitation has been dramatically altered by recent breakthroughs in model compression techniques. Quantization methods, which reduce the precision of model weights from 32-bit floating-point to as low as 4-bit integers, have demonstrated the ability to compress LLMs by factors of 8–10x while maintaining over 95% of their original performance on domain-specific tasks.27,28 Studies on deep neural network quantization for mobile deployment show that post-training quantization can achieve up to a 95% reduction in parameters while maintaining model accuracy, making deployment on edge devices feasible.29 These advances, combined with hardware acceleration frameworks specifically designed for mobile devices such as Apple's MLX and Google's MediaPipe, have made it possible to run sophisticated AI models entirely on consumer smartphones.30,31 MLX provides optimized machine learning (ML) inference for Apple silicon through unified memory architecture and efficient computation graphs, while MediaPipe enables cross-platform deployment of ML pipelines with GPU acceleration and multi-threading capabilities.
The emergence of Retrieval-Augmented Generation (RAG) architectures represents another crucial development for domain-specific AI applications. Unlike traditional language models that rely solely on patterns learned during training, RAG systems dynamically retrieve relevant information from external knowledge bases to ground their responses in authoritative sources.32 In scientific and medical domains, RAG implementations have demonstrated significant reductions in hallucination rates—instances where AI generates plausible but factually incorrect information. Béchard33 demonstrated that RAG systems can dramatically improve the quality of structured outputs while reducing hallucinations in enterprise applications, with their implementation showing substantial improvements in generalization to out-of-domain settings. Similarly, Shuster et al.34 found that retrieval augmentation significantly reduces hallucination in conversational AI systems, particularly when querying based on complex multi-turn dialogue contexts. For food science applications, where accuracy is paramount and misinformation could have health implications, the ability to anchor AI responses in regulatory databases and peer-reviewed literature is essential. Recent systematic reviews of RAG applications in educational contexts have shown that users consistently rate RAG-enhanced responses as more trustworthy and actionable compared to traditional language model outputs, with trust improvements observed when sources are explicitly referenced and retrieved information is validated against authoritative knowledge bases.32
The intersection of privacy concerns and AI deployment has become increasingly critical as consumers become more aware of data collection practices. Traditional cloud-based AI services require transmitting user queries to remote servers, creating permanent records of personal interests and concerns that can be aggregated into detailed behavioral profiles.35 For health-related queries, including those about food and nutrition, this raises significant ethical and legal concerns under frameworks such as The General Data Protection Regulation (GDPR) and The Health Insurance Portability and Accountability Act of 1996 (HIPAA).36 Studies examining privacy in AI healthcare applications have documented widespread data collection practices, with comprehensive assessments revealing that traditional cloud-based systems create substantial privacy risks through data transmission and storage in external servers.37 On-device AI processing eliminates these privacy risks by ensuring that all computation occurs locally on the user's device, with no data transmission required. This approach aligns with the principle of data minimization and provides users with complete control over their information, addressing one of the primary barriers to adoption of digital health tools. Recent research on user acceptance of privacy-preserving AI applications has found empirical evidence supporting increased willingness to use health-related AI tools when guaranteed on-device processing is implemented. Wang et al.38 demonstrated that local data processing significantly enhances user privacy protection in health monitoring applications, while survey studies examining technology acceptance in healthcare contexts have shown that privacy concerns are among the most significant factors influencing user adoption of AI-powered health applications.39
This paper presents Food Additive Lens (FAL), a novel iOS application that synthesizes recent advances in on-device AI, RAG, and food science communication to address the critical gap between scientific knowledge about food additives and consumer understanding. The application implements a three-agent AI architecture, comprising a food category classifier, additive identifier, and explanation generator, that works in concert to provide contextual, accurate, and accessible information about food additives. By deploying a quantized version of Meta's Llama 3.2 3B model (compressed to 1.8 GB through 4-bit quantization) directly on iOS devices, the system achieves processing speeds of 13–30 tokens per second while maintaining complete offline functionality. The integration of SAFI and USDA's Global Branded Food Products Database (GBFPD) through an embedding-based search system enables the application to provide authoritative information for nearly 4000 additives across one million food products, with direct links to relevant Code of Federal Regulations (CFR) sections for professional users. This work demonstrates the feasibility of deploying sophisticated AI systems for food science education on consumer devices while maintaining privacy, accuracy, and accessibility. The implications extend beyond food additives to suggest new paradigms for science communication in an era of information overload and digital misinformation, where the challenge is not the absence of authoritative information but rather its translation and delivery at the point of need.
The application architecture follows a modular design pattern implemented in Swift using SwiftUI for the user interface layer. The core system integrates Apple's MLX framework for on-device ML acceleration, enabling the deployment of a quantized Llama 3.2 3B language model compressed to 1.8 GB through 4-bit quantization. The architecture maintains complete offline functionality by embedding all necessary databases and models within the application bundle, including SAFI, GBFPD samples, and pre-computed embeddings for semantic search capabilities.
The iOS application implements the Model-View-ViewModel (MVVM) architectural pattern40 using SwiftUI's reactive framework. The architecture separates concerns through three distinct layers: (1) models representing data structures (FoodRecord, ClassificationResult, and AdditiveKnowledge), (2) ViewModels managing business logic and state (MLXAdditiveIdentifier, AdditiveKnowledgeManager, CFRManager, and FoodCategoryClassifier) marked with ‘@Observable’ for reactive updates, and (3) views implemented in SwiftUI that bind reactively to ViewModel state changes. This MVVM implementation ensures clear separation of concerns, facilitates unit testing, and maintains efficient UI updates through SwiftUI's declarative binding system.
The RAG workflow forms the backbone of the explanation system as shown in Fig. 2. When processing user queries, the system first generates search embeddings using a custom embedding function that captures both lexical and semantic features of food additive names. These embeddings enable similarity-based retrieval from the knowledge base containing 3971 FDA-approved substances with their technical effects and regulatory information. Retrieved information is then augmented with contextual data from the food category classification before being passed to the language model for explanation generation, ensuring that responses are grounded in authoritative sources while remaining accessible to consumers.
The cleaning pipeline addressed multiple data quality issues inherent in the government database export. HTML entities and formatting artifacts (e.g., special characters and embedded HTML markup tags) were systematically removed using regular expression patterns and HTML unescaping utilities. The “Other Names” field, containing pipe-delimited alternative nomenclature for each substance, required special handling to preserve synonym relationships while removing redundant entries. Technical effect descriptions were normalized from their original all-caps format to sentence case, with multi-effect entries separated by pipe delimiters for structured retrieval. Missing data analysis revealed that 183 substances (4.6%) lacked technical effect descriptions and 26 substances (0.7%) had no alternative names listed, necessitating fallback strategies in the retrieval system.
For efficient on-device search capabilities, we generated semantic embeddings for all substances using a custom embedding algorithm optimized for chemical nomenclature. The embedding generation process created 384-dimensional vectors capturing both character-level patterns common in chemical names (e.g., suffixes like “-ate”, “-ine”, and “-ide”) and word-level semantic features. These embeddings enable fuzzy matching for additive identification, crucial for handling variations in naming conventions and potential OCR errors from ingredient label scanning.
048
575 branded food products with detailed compositional data, though memory constraints required selective sampling for mobile deployment. We extracted 10% (104K) representative products across diverse food categories, prioritizing entries with complete ingredient lists exceeding 50 characters and containing multiple ingredients separated by commas. The original dataset contained diverse branded food category labels that were used directly without consolidation, resulting in 257 unique categories for classification. Each product's ingredient list underwent lowercase conversion and special character normalization to create consistent training examples.The parsing pipeline for CSV data required specialized handling due to encoding issues and complex field structures. The original files used latin-1 encoding with nested quotation marks and comma-delimited fields containing internal commas. We developed a custom CSV parser that correctly handled these edge cases, extracting seven essential fields: FDC ID, brand owner, brand name, sub-brand name, ingredients, branded food category, and product description. These structured data enable the random sampling feature (n = 49 k) on-device, allowing users to explore real product examples while learning about food additives in context.
For mobile deployment, the trained PyTorch model underwent conversion to Core ML format using coremltools with post-training quantization. The conversion process included input tokenization compatibility layers to handle the DistilBERT vocabulary of 30
522 tokens within the Core ML framework. The final model package occupies 134 MB in the application bundle, with inference times within 0.5 s on iPhone 14 or newer devices. The tokenization pipeline implements special token handling for [CLS] and [SEP] markers while maintaining a maximum sequence length of 256 tokens to accommodate lengthy ingredient lists.
The AI-powered extraction pathway utilizes the on-device Llama 3.2 3B model42 with carefully crafted prompts optimized for additive identification. The prompt engineering process involved iterative refinement to minimize false positives while maintaining high recall for less common additives. The system prompt explicitly defines inclusion criteria (preservatives, emulsifiers, stabilizers, artificial colors, etc.) and exclusion criteria (basic ingredients like flour, water, and sugar) to guide the model's extraction behavior. Post-processing validates AI-extracted additives against SAFI, filtering results through cosine similarity thresholds (empirically set at 0.244) to balance precision and recall.
Prompt engineering for the explanation generator followed evidence-based principles for technical communication to lay audiences. The system prompt instructs the model to use exact additive names as they appear in the original ingredients, explain each additive's function in 1–2 sentences using plain English, focus on why additives are used rather than chemical properties, and group similar additives when appropriate. The prompt explicitly incorporates food context, adjusting explanations based on product category—for instance, explaining citric acid as a flavor enhancer in beverages versus a chelating agent in canned vegetables. Temperature parameter tuning (set to 0.3) balances creativity with factual consistency, preventing hallucination while maintaining natural language flow.
The quantized model weights are distributed in MLX's native format, optimized for memory-mapped loading on Apple Silicon devices. This format enables rapid initialization without loading the entire model into RAM simultaneously, leveraging the unified memory architecture of modern iOS devices. The model bundle includes metadata headers specifying quantization parameters, vocabulary mappings, and architectural configuration. Memory mapping reduces application launch time from 12 seconds to 3 seconds on iPhone 14+ devices while maintaining a peak memory footprint of 2.32 GB during inference operations. The bundled model path “Llama-3.2-3B-Instruct-4bit” is embedded directly in the application resources, eliminating the need for runtime downloads.
Performance profiling using Xcode Instruments revealed critical optimization opportunities in the attention mechanism. We implemented Flash Attention-inspired optimizations within MLX constraints, including chunked attention computation to maintain activation tensors within the Neural Engine's 20 MB scratchpad memory. The model operates with a maximum token generation limit of 1000 tokens per request. The implementation achieves 15.2% GPU utilization and 24% CPU utilization during inference, with the display consuming the remaining computational resources for UI updates.
The embedding generation algorithm processes each additive name through text normalization (lowercasing and tokenization) before feature extraction. For character-level features, the algorithm iterates through each character position and applies trigonometric transformations (using sine functions) weighted by word position and character ASCII values to generate embedding activations. Word-level features incorporate position weighting, where words appearing earlier in the additive name receive higher weights, reflecting the convention that primary substances typically appear first. Chemical pattern features activate specific embedding dimensions based on detection of common chemical suffixes including “acid”, “ate”, “ine”, “ium”, “ide”, “oxy”, “meth”, “eth”, and “prop”, enabling the system to recognize chemical families and functional groups.
Vector normalization ensures consistent similarity metrics across the embedding space. Each embedding undergoes L2 normalization, projecting vectors onto the unit hypersphere to enable cosine similarity calculations using simple dot products. The normalized embeddings are serialized to JSON format with 16-bit float precision, reducing storage requirements to 18.08 MB for the complete SAFI while maintaining sufficient precision for similarity calculations. The embedding index loads into memory during application initialization, enabling sub-millisecond similarity searches without disk I/O operations.
The second stage employs embedding-based similarity search for fuzzy matching when exact matches fail. The system computes cosine similarity between query embeddings and all database embeddings using vectorized operations accelerated by the Accelerate framework's vDSP functions. A dynamic thresholding mechanism adjusts the similarity cutoff based on query characteristics: shorter queries (under 10 characters) require higher similarity scores (%3e0.5) to prevent false positives, while longer chemical names accept lower thresholds (%3e0.244) to accommodate nomenclature variations. The retrieval system also implements query expansion for common additive categories, automatically searching for related terms when queries match category patterns (e.g., expanding “artificial colors” to include specific FD&C dyes).
Post-processing of OCR output addresses common recognition errors specific to food additives. The system implements a domain-specific spell correction algorithm using Levenshtein distance weighted by character confusion probabilities derived from empirical OCR error analysis. Common misrecognitions (e.g., “1” vs. “l” and “0” vs. “O”) receive special handling when occurring within known additive names. The correction algorithm maintains a confidence threshold, flagging uncertain recognitions for user verification rather than silently introducing errors. Ingredient list detection employs keyword spotting for markers like “INGREDIENTS:”, “CONTAINS:”, or “MADE WITH:”, automatically extracting relevant text regions while filtering nutritional information and marketing claims.
The results presentation employs a three-tier information architecture accommodating different user expertise levels. The first tier displays identified additives with single-sentence purpose descriptions suitable for general consumers. The second tier, revealed through expandable sections, provides technical effects, alternative names, and regulatory classifications for users seeking deeper understanding. The third tier, accessed via CFR links, connects to federal regulations for professional users requiring legal documentation. This graduated approach prevents information overload while ensuring comprehensive access for specialized needs. Animation transitions (SwiftUI's withAnimation) provide visual continuity between states, with spring animations (response: 0.3, damping fraction: 0.7) creating responsive feedback for user interactions.
Privacy protection extends to the application's data persistence layer, which employs Core Data with SQLite backing stores that benefit from iOS's default device encryption when the device is locked. User interaction history, including scanned ingredients and generated explanations, remains confined to the device's encrypted storage partition. The application deliberately avoids using UserDefaults for sensitive data, with plans to leverage the Keychain Services API for any authentication tokens that may be required for future premium features. The application implements a privacy-first architecture with no third-party analytics or tracking frameworks integrated into the codebase.
Core data optimization for history storage implements batch faulting and relationship prefetching to minimize memory overhead when displaying analysis history. The fetch request configurations use optimized fetch requests to minimize memory overhead when displaying analysis history to load only displayed attributes, with full additive details loaded on-demand through relationship traversal. The OCR system for ingredient scanning processes images directly without persistent storage, converting captured images immediately to text through the Vision framework's VNRecognizeTextRequest. After text extraction, the original image data are released from memory, maintaining only the extracted ingredient text for analysis. This approach eliminates the need for image caching infrastructure while ensuring efficient memory utilization during the scanning process.
The user interface employs predictive prefetching for likely user actions, pre-warming the OCR pipeline when the camera button becomes visible and pre-generating embeddings for the text field content during typing pauses. SwiftUI's task priority system ensures UI updates receive precedence over background processing, with inference operations using Task detached with priority ‘‘background’’ to prevent interface stuttering. Memory pressure responses implement graceful degradation, first clearing image caches, then unloading the classification model, and finally reducing LLM context length, ensuring core functionality remains available even on memory-constrained devices.
To benchmark the specialized DistilBERT classifier against general-purpose large language models, we evaluated GPT-4o (gpt-4o-2024-08-06) on the same 108 samples used for additive identification validation (Section 2.10.2). The GPT-4o API was queried with a system prompt defining it as a food science expert and a user prompt requesting top-3 category predictions based on ingredient lists, with responses structured in JSON format for consistent parsing. GPT-4o then self-evaluated its predictions against ground truth categories using a separate prompt that assessed top-1 and top-3 accuracy. This comparison provides context for the performance of task-specific models versus general-purpose AI agents in food categorization tasks.
To benchmark the hybrid additive identification system against general-purpose large language models, we evaluated GPT-4o (gpt-4o-2024-08-06) on the same 108 samples and ground truth annotations. The GPT-4o API was queried with a system prompt (“Identify all food additives from an ingredient list”) and a user prompt requesting identification of all food additives from each ingredient list with responses structured in JSON format containing an array of additive names. For each sample, GPT-4o's identified additives were compared against the manually annotated ground truth using the same evaluation criteria applied to FAL: precision (percentage of GPT-4o-identified additives that are correct), recall (percentage of true additives identified by GPT-4o), and F1-score. Alternative chemical names were accepted as correct matches according to FDA documentation. This comparison provides context for the performance of hybrid database-AI approaches versus pure general-purpose AI agents in additive identification tasks.
![]() | ||
| Fig. 3 Xcode Instrument analysis of the application on MacBook Air. (a) Memory usage patterns. (b) CPU utilization characteristics. (c) Energy impact. | ||
![]() | ||
| Fig. 4 Xcode Instrument analysis of the application on iPhone 14. (a) Memory usage patterns. (b) CPU utilization characteristics. (c) Energy impact. | ||
During active processing phases involving additive identification and LLM generation, memory usage exhibited a characteristic dual-peak pattern. The first peak corresponded to additive identification operations, while the second, higher peak of 2.36 GB occurred during explanation generation. This generation peak represents the maximum memory demand of the system, encompassing the loaded model, active computation graphs, and intermediate tensor storage. Following generation completion, memory usage stabilized at 1.87 GB, indicating successful cleanup of temporary computational structures.
iPhone 14 performance demonstrated similar patterns with platform-appropriate scaling. Model initialization required 1.84 GB of stable memory (33% of total system memory) with peaks reaching 1.92 GB. The higher percentage utilization on iPhone reflects the device's 6 GB total memory compared to the MacBook's 16 GB, yet the absolute memory requirements remained remarkably consistent. During processing, the iPhone exhibited the same dual-peak pattern with a maximum of 2.32 GB during generation, stabilizing at 1.96 GB post-processing.
The consistency of absolute memory requirements across platforms validates the quantized model's efficiency and the MLX framework's optimization. The peak memory usage of approximately 2.3 GB across both devices demonstrates successful resource management within mobile hardware constraints while maintaining full functionality.
| Component | Mean (seconds) | Std dev (seconds) |
|---|---|---|
| Additive embeddings loading | 0.853 | 0.025 |
| CFR data loading | 0.238 | 0.008 |
| Food category classifier model loading | 2.399 | 0.123 |
| Llama-3.2-3B-Instruct-4bit model loading | 4.893 | 0.135 |
| OCR processing | 0.932 | 0.027 |
| Food category classification | 0.112 | 0.046 |
| Additive identification | 3.488 | 0.407 |
| Explanation generation initialization | 2.847 | 0.106 |
iPhone 14 CPU usage showed more conservative patterns, with typical generation loads of 72% against 600% total available capacity (6-core configuration). Initialization peaks reached 129%, higher than generation loads, suggesting that model loading operations place greater instantaneous demands on CPU resources than sustained inference. The lower sustained utilization during generation reflects the iPhone's thermal management optimizations and the efficiency gains from the Neural Engine integration.
The utilization patterns demonstrate successful adaptation to hardware constraints while maintaining performance targets. The MacBook's higher sustained utilization leverages available thermal headroom and power resources, while the iPhone's more conservative approach balances performance with battery life and thermal constraints.
MacBook Air energy patterns showed 9 wake events in the last second with an average of 46 wake events per second during active generation. These wake patterns indicate intensive computational activity balanced with efficient scheduling, preventing unnecessary background processing while maintaining responsive user interaction.
The GPU utilization of 15.2% reflects a deliberate optimization choice in memory management. The GPU chunk size was constrained to 20 MB to balance peak memory usage against inference speed. Increasing GPU memory allocation would improve token generation rates but results in non-linear increases in peak memory consumption, potentially exceeding device capabilities during complex analyses.
To provide detailed quantitative characterization of energy consumption patterns, we conducted 60-second Power Profiler measurements in iOS developer mode on iPhone 14 (n = 5 runs) capturing complete FAL workflows from initialization through explanation generation completion. Fig. 5 illustrates a representative power consumption profile with the CPU profile over the measurement period, demonstrating the temporal dynamics of energy utilization across different workflow stages. Table 2 summarizes the component-specific energy metrics. The analysis revealed average power usage of 40.260 ± 5.141% per hour with display brightness maintained at 71.6 ± 2.5%. Apple's impact metrics, which range from 0–15 (low), 15–40 (medium), to >40 (high impact), showed a CPU impact of 6.160 ± 0.559, a GPU impact of 18.660 ± 2.248, and a display impact of 5.140 ± 0.371. Note that Apple intentionally reports energy consumption using relative impact metrics rather than absolute units (e.g., watts or milliampere-hours) because power consumption varies significantly across device models, thermal conditions, and usage contexts, making standardized comparisons challenging. The GPU impact of 18.660 positions FAL within the medium range despite the computational demands of on-device LLM inference, while the CPU impact of 6.160 indicates efficient computational resource utilization. The display impact of 5.140 corresponds to progressive disclosure interface updates during streaming text generation. Critically, networking impact remained at 0 across all measurements, validating the complete offline operation of the privacy-preserving architecture. The power usage rate of 40.3% per hour indicates approximately 2.5 hours of sustained continuous operation, though typical real-world usage involves brief, intermittent queries rather than continuous processing.
![]() | ||
| Fig. 5 Representative 60-second power monitoring and CPU usage profiles from app initialization through explanation generation completion. | ||
| Metric | Mean | Std dev |
|---|---|---|
| Power usage (% per hour) | 40.260 | 5.141 |
| Display brightness (%) | 71.6 | 2.5 |
| CPU impact | 6.160 | 0.559 |
| GPU impact | 18.660 | 2.248 |
| Display impact | 5.140 | 0.371 |
| Networking impact | 0 | 0 |
The iPhone's higher memory utilization percentage (33% vs. 11.4% during initialization) demonstrates successful optimization for resource-constrained environments. Despite operating closer to hardware limits, the iPhone maintains full functionality without performance degradation, indicating robust memory management and efficient model quantization.
CPU utilization differences reflect platform-specific optimization strategies. The MacBook's higher utilization leverages available thermal and power headroom for maximum performance, while the iPhone's conservative approach prioritizes sustained operation and battery efficiency. Both approaches achieve target token generation rates of 13–30 tokens per second, ensuring consistent user experience across devices.
The energy analysis reveals the fundamental trade-offs of on-device AI processing. While classified as high energy impact, this approach eliminates network dependencies, ensures complete privacy, and provides instantaneous responses. The display-dominated energy consumption suggests that interface optimizations could significantly improve overall efficiency without compromising core AI functionality.
These performance characteristics validate the technical feasibility of sophisticated on-device AI for consumer applications while highlighting the platform-specific optimizations necessary for effective deployment across diverse hardware configurations.
Confidence calibration analysis revealed well-calibrated model behavior with clear separation between correct and incorrect (top-1) predictions in Fig. 7. Correct predictions exhibited an average confidence of 0.852, while incorrect predictions averaged 0.549, indicating the model's ability to distinguish between confident correct classifications and uncertain cases. The confidence distribution analysis showed most correct predictions clustered at high confidence scores (0.8–1.0), with a small number of low-confidence correct predictions appearing as outliers. This distribution pattern supports the implementation of confidence-based user confirmation thresholds in the deployed system.
Category-specific performance identified systematic challenges in certain food categories. Detailed accuracy-by-category performance can be found in SI Fig. S1. The highest error rates occurred in “Soda” (104 errors), “Popcorn, Peanuts, Seeds & Related Snacks” (35 errors), and “Yogurt” (34 errors). The high error rate for soda products likely reflects ingredient list similarities with other beverage categories, while snack food errors suggest overlapping ingredient profiles across snack subcategories. Yogurt classification challenges may stem from the diverse range of yogurt-based products that blur category boundaries with desserts and beverages.
Unknown token analysis revealed vocabulary limitations affecting model performance. The validation set contained substantial numbers of out-of-vocabulary ingredients, with most samples (>85%) containing ≥5 unknown tokens. The most frequent unknown tokens included nutritional additives (niacin: 584 occurrences, mononitrate: 471 occurrences, riboflavin: 358 occurrences) and common food processing ingredients (starch: 454 occurrences, citric: 439 occurrences, lecithin: 269 occurrences). This pattern indicates that the DistilBERT vocabulary, trained primarily on general text corpora, lacks comprehensive coverage of food industry terminology, particularly nutritional supplements and specialized food additives.
The validation results demonstrate acceptable performance for consumer-facing food categorization while highlighting areas for potential improvement. The 87.2% top-3 accuracy provides sufficient reliability for the application's use case, where users can select from multiple category suggestions. The clear confidence calibration enables effective implementation of uncertainty-based user confirmation, enhancing system reliability in ambiguous cases.
Comparative evaluation of GPT-4o on the same 108 samples using structured prompts for additive identification yielded overall precision of 0.841, recall of 0.762, and F1-score of 0.800, with per-entry means of 0.838 ± 0.189 (precision), 0.787 ± 0.206 (recall), and 0.793 ± 0.173 (F1-score). GPT-4o correctly identified 635 additives with 120 false positives and 198 false negatives, compared to FAL's hybrid system identifying 652 correct additives with 244 false positives and 175 false negatives. While GPT-4o demonstrated higher precision (0.841 vs. 0.728), FAL achieved slightly higher recall (0.788 vs. 0.762), resulting in comparable F1-scores (0.757 vs. 0.800). The performance differences reflect trade-offs in precision-recall balance, with GPT-4o showing more conservative identification (fewer false positives) while FAL's hybrid approach captures more true additives at the cost of additional false positives. Despite GPT-4o being orders of magnitude larger and requiring cloud-based processing, the comparable performance validates FAL's hybrid database-AI architecture for on-device deployment in consumer applications.
Then, food category classifier predictions included “Processed Cereal Products”, “Cookies & Biscuits”, and “Biscuits/Cookies” with confidence scores of 21.8%, 20.9%, and 16.6%, respectively. After the user selected the “Cookies & Biscuits” category, the app identified additives from the ingredient list including canola oil, DATEM, disodium pyrophosphate, ferric orthophosphate, folic acid, iron, niacin, niacinamide, pyridoxine hydrochloride, riboflavin, soy lecithin, vitamin b1, vitamin b2. For each identified additive, the app showed the matched additive name in SAFI, the “used for” technical effects, other names of the additive, and the CFR link for the additive. For example, disodium pyrophosphate was matched to sodium acid pyrophosphate in SAFI, and it has a range of technical effects, e.g. leavening agent for cookies. Disodium pyrophosphate has other names like diphosphoric acid. The CFR for disodium pyrophosphate is 21 CFR 182.1087.
The LLM agent grouped the additives with similar technical effects to provide clear and user-friendly explanations as below: “Let's break down the additives in this cookie recipe. I'll explain what each one does in simple terms. Emulsifiers: (1) CANOLA OIL: acts as an emulsifier, helping mix oil and water-based ingredients together and making the dough easier to work with. (2) SOY LECITHIN: also helps mix oil and water-based ingredients, creating a smooth dough. Stabilizers and thickeners: (1) CANOLA OIL: besides being an emulsifier, it also helps stabilize the dough and keep it moist. (2) BAKING SODA: helps release carbon dioxide gas, causing the dough to rise and giving the cookies their light texture. Coloring agents: (1) RIBOFLAVIN (VITAMIN B2): acts as a coloring agent, giving the cookies a yellow color. (2) DISODIUM PYROPHOSPHATE: can also act as a coloring agent, although its primary function is as a leavening agent. Flavor enhancers: (1) NIACIN: provides a flavor and nutritional boost to the cookies. (2) NICOTINAMIDE: enhances flavor and serves as a nutrient supplement. (3) PYRIDOXINE HYDROCHLORIDE: acts as a flavor enhancer and nutrient supplement. (4) VITAMIN B1: contributes to flavor and serves as a nutrient supplement. (5) RIBOFLAVIN: also serves as a flavor enhancer and nutrient supplement. Nutrient supplements: (1) FERRIC ORTHOPHOSPHATE: provides iron, an essential nutrient. (2) FOLIC ACID: acts as a nutrient supplement, providing folate. (3) IRON: serves as a nutrient supplement, providing iron. (4) NIACIN: provides a nutrient supplement, niacin. (5) PYRIDOXINE HYDROCHLORIDE: acts as a nutrient supplement, providing vitamin B6. (6) RIBOFLAVIN: provides a nutrient supplement, riboflavin. These additives work together to create a delicious and nutritious cookie”.
The case study demonstrates successful end-to-end performance across all three AI agents. The OCR system accurately extracted the complete ingredient list from the biscuit packaging, the food category classifier correctly identified “Cookies & Biscuits” among its top-3 predictions (20.9% confidence), and the additive identification system successfully identified 13 additives with proper SAFI matching and regulatory links. The explanation generator effectively organized these additives into functional categories (emulsifiers, stabilizers, coloring agents, flavor enhancers, and nutrient supplements) with consumer-friendly explanations, validating the system's ability to transform complex ingredient information into accessible, science-based consumer education.
By utilizing a pre-quantized 4-bit version of Llama 3.2 3B, the system achieves a generation speed of 13–30 tokens/second on consumer smartphones, validating the practical deployment of large language models without compromising functionality or privacy.
The integration of authoritative databases (FDA's SAFI and USDA's GBFPD) through retrieval-augmented generation establishes a paradigm for grounding AI responses in regulatory sources, critical for health-related applications where misinformation poses genuine risks. The system's complete offline operation eliminates privacy concerns inherent in cloud-based solutions, addressing a primary barrier to digital health tool adoption. Performance consistency across devices (iPhone 14 and MacBook Air M1) with a peak memory usage of 2.36 GB demonstrates practical deployment feasibility across the iOS ecosystem.
Several limitations warrant acknowledgment. The DistilBERT vocabulary's limited coverage of food-specific terminology (>85% of samples containing ≥5 unknown tokens) suggests potential improvements through domain-specific pre-training. The additive identification system's tendency toward over-identification (precision: 0.728) may require refinement for professional applications. Explanation quality assessment, while encouraging, relied on limited evaluators and would benefit from larger-scale user studies.
Future work should explore expansion to nutrition claims analysis, integration with electronic health records for personalized dietary guidance, and adaptation for international regulatory frameworks. The modular architecture facilitates extension to other food science domains including allergen detection, sustainability metrics, and nutritional optimization. The demonstrated success of on-device AI for complex scientific communication suggests broader applications in combating misinformation across health and science domains while preserving user privacy and autonomy.
Supplementary information: high-resolution food category classification accuracy results and the demo video of food additive lens are provided. See DOI: https://doi.org/10.1039/d5dd00444f.
| This journal is © The Royal Society of Chemistry 2026 |