Inspiration

One of our friends (who has a nut allergy) uses an app to track his allergies and ensure that the foods he buys do not contain ingredients that could trigger a reaction. These allergy apps require users to scan the barcode of each item to determine whether it is safe. This means grabbing every product off the shelf, turning it around, and scanning the barcode or ingredient list. Recently, after breaking his arm while playing basketball, this process became especially difficult. Scanning items one by one was slow and, at times, nearly impossible to do with only one hand. We wanted to implement a hardware solution as well to help people like our friend. Additionally, the tools he previously used relied on static barcode databases for verification. We wanted to build something more intelligent, since barcode data can sometimes be inaccurate. In some cases, manufacturing errors lead to food recalls reported in the news; however, barcode scans alone would not necessarily reflect these issues. By utilizing generative AI, our solution can search online for recalls, inconsistencies, or special situations to more effectively ensure user safety. We were also inspired by Cathy Hackl’s talk on spatial computing on Saturday morning at the HackBI opening ceremony, and wanted to develop a solution that leverages AR technology to help people like our friend. Finally, this solution is not just for one person. Over 250 million people worldwide suffer from food allergies, and a fast, accessible tool like this could make everyday grocery shopping significantly safer for them.

What it does

Users can use Metabolism to scan products on shelves while walking down the aisles of a grocery store. The user opens the app, selects which ingredients they're allergic to, then walks down aisles while shopping for products. As they do this, Metabolism uses OCR from Cloudflare's Workers AI to capture the product names and descriptions from product packaging as you walk down the aisle. Then, through DigitalOcean Gradient™ AI, we implemented Llama 3.2 to check if the product scanned by OCR contains ingredients with the previously mentioned allergen. It also checks if the product was made in a factory that processes the allergen and if any active recalls or manufacturing errors led to the product being unsafe when it should be safe. When Gemini finds products that are safe to buy, it displays a green AR overlay over that product on the camera view of the app. For risky products that might or might not be compromised, there is a yellow overlay. For products that obviously contain an allergen, there is a red AR overlay. If the user has additional questions about any products, like why it was marked a certain color, we implemented an ElevenLabs voice assistant that lets you "chat" with the product itself. For users with disabilities or to improve ease of use, Metabolism also comes in physical form. Using a Raspberry Pi 4 with an integrated display and camera, users can simply attach Metabolism to their shopping cart and scan products by just walking through the aisles. Finally, to make the recognition process faster and more accurate, users can classify products based on the allergens they contain and will be rewarded with Solana, thus improving trust by providing an incentive to contribute

How we built it

Architecture & Core Stack

We built Metabolism as a three-tier system: a React + TypeScript frontend for the AR interface, a FastAPI backend managing Auth0 authentication and MongoDB user profiles, and a Python WebSocket server handling real-time camera processing. The camera server captures frames at 30 FPS but only processes OCR every 30 frames to balance responsiveness with API costs - a 95% reduction in calls without sacrificing user experience.

Real-Time Text Detection & Analysis

For text detection, we use OpenCV's MSER algorithm running client-side to identify text regions and draw colored bounding boxes (gray → green/yellow/red based on allergen risk). Extracted frames are sent to Cloudflare Workers AI running the LLaVA 1.5 7B vision model for OCR, chosen for its <200ms edge network latency. The OCR text then feeds into Meta's Llama 3.2 3B via OpenRouter's free tier, which we prompt to conservatively analyze ingredients for allergens, cross-contamination warnings, and alternative ingredient names. The LLM returns structured JSON with risk levels and reasoning.

Voice Interface & Optimization

We integrated ElevenLabs for text-to-speech, allowing users to ask questions hands-free. Responses from the LLM are converted to MP3 audio client-side. To maintain the real-time feel, we implemented aggressive caching (60-second allergen result cache, request deduplication within 5 seconds) and WebSocket message compression, reducing bandwidth by 40%.

Raspberry Pi Deployment

The physical cart-mounted version runs a stripped-down Python server on Raspberry Pi 4 with a 7" touchscreen and wide-angle camera. All heavy processing offloads to edge/cloud services - the Pi only handles capture and WebSocket communication, powered by a USB battery pack clipped to shopping cart handles.

Challenges we ran into

Our initial architecture relied heavily on WebSockets to stream camera frames from our backend to our frontend. We encountered frequent connection drops, especially on Wi-Fi, and struggled with latency that made real-time AR overlays feel sluggish. After extensive debugging, we implemented a hybrid approach that utilizes WebSocket pooling with an HTTP fallback and client-side frame buffering, which significantly improved reliability. Deploying Gemini through DigitalOcean's platform also presented numerous unexpected challenges, including API authentication and request formatting. The documentation didn't fully cover our specific use case (fast analysis requests), leading to rate-limiting issues. We solved this by implementing intelligent request batching and caching common product lookups, reducing API calls by 60% while maintaining response speed. Along with this, we originally planned to use custom computer vision models for product recognition, but quickly discovered they couldn't load efficiently on the Raspberry Pi 4's limited resources, with initial load times exceeding 30 seconds. Cloud-hosted CV models introduced unacceptable latency (2-3 seconds per frame). After considering our issues, we realized that products needing strict allergen identification almost always had readable text labels, while items that resisted OCR (fresh produce, bulk items) typically didn't require the same level of scrutiny. This insight led us to pivot entirely to OCR-based recognition, which proved faster, more accurate for our specific needs, and significantly less resource-intensive.

Accomplishments that we're proud of

In just 36 hours, we successfully integrated seven different platforms and APIs (Auth0, MongoDB Atlas, Cloudflare Workers AI, DigitalOcean Gradient™ AI, Gemini, ElevenLabs, and Solana) into a cohesive, working application. Each integration required learning new documentation, authentication patterns, and API quirks, and accomplishing this without breaking existing features made us really proud and happy.

What we learned

We learned the importance of flexibility in our architecture. Initially relying on WebSockets for continuous camera frame streaming led to connection instability and latency issues. By transitioning to a hybrid approach with WebSocket pooling, HTTP fallback, and client-side buffering, we enhanced reliability and responsiveness in real-time AR overlays. This experience underscored the necessity of considering network conditions and adapting system design to ensure optimal performance. Along with this, as we experimented with custom computer vision models, we realized the limitations of hardware, especially the Raspberry Pi 4. Initially, the loading times were prohibitive, and cloud-based models introduced new latency challenges. Analyzing our needs led us to shift towards OCR-based recognition, which was faster and more efficient for our specific application. This shift emphasized the importance of adapting technology choices based on available resources and specific application requirements, ultimately enabling a more streamlined and effective solution.

What's next for Metabolism

Our current Raspberry Pi 4 prototype proves our concept, but we're exploring more compact designs using smaller board computers like the Raspberry Pi Zero 2 W or custom PCB designs. The goal is a device similar in size to a smartphone that can clip unobtrusively to any shopping cart. We're also integrating a directional speaker system into the handheld version, providing private audio feedback without requiring earbuds or some other audio output, which is important for users who need to remain aware of their surroundings while shopping. We also want to begin testing in local grocery chains to conduct pilot programs in live store environments. These tests will help us refine our OCR accuracy across diverse lighting conditions, shelf configurations, and product packaging styles. Real-world testing will also validate our AR tracking performance and gather lots of preliminary user feedback on the interface, especially from individuals with severe allergies and visual impairments.

Built With

Share this project:

Updates