SMART CART

GIF
GIF
No more scanning product barcodes or looking at ingredients! Instantly know which products are unsafe!
Super specific suggestions! Want flour for croissants? Want tea to help you decontract? Anything can be suggested!
Use profiles for easy recommendations! Get one recommendation per criteria, and make an enlightened choice!

Disclaimer

This tool is a hackathon prototype trained specifically on the sliced bread aisle of a single grocery store in Paris. The product database is very restricted, we didn't have time to add more!. However, the architecture is fully scalable: the app uses an embedding model, we need to train it with more products and ask users to scan unknown products to add them to the embedding database ! Additionally, please note that the demo video is heavily blurred to comply with hackathon rules prohibiting the display of corporate logos and brand names.

Updates

This project was started in the SensAI Hackathon of Barcelona (end of November). We completly changed the approach for product suggestion : trying to find product images and deduce barcode instead of looking at the pricetags and relying on chatGPT. This allows for far more precise recommendations!

Inspiration

We are students living in Paris, and as you probably know, groceries are expensive here. We tried using apps to compare prices, but they are not really efficient. You lose too much time looking at your screen and they often lack information.

We have been discussing this idea for a while now. Grocery shopping is a frequent and boring task, so we wanted to create a product that actually makes it faster. We realized Mixed Reality was the best answer. Instead of checking a phone, you just see the best products directly on the shelves."

What it does

Our app transforms the Meta Quest 3 into an intelligent shopping assistant. First, you select a specific mode like Budget, Vegan, or Eco-friendly. You then simply dictate your grocery list out loud and validate it. As you walk through the store, the headset tracks and analyzes products on the shelves in real-time. Finally, it pinpoints the best option directly in your field of view, allowing you to stop reading labels and just grab and go. The app takes the video feed from your meta quest and extracts the exact barcodes of the products on the shelf, allowing for some very accurate suggestions! Allergies, tastes, everything can be filtered!

How we built it

The app features local inference and api calls. The flow goes like this:

-Create your list : select your filtering criteria by showing your left palm towards you. Say your list out loud, and possible allergies. Wit.AI will pickup your list. When finished, the raw spoken text will be fed to OpenAI's api to extract the products and exlusion criteria.

-Detect the products : using a YOLOv11n, find the bounding boxes of products. The app will run into problems : the model is not perfect, it sometimes finds products where there are none. There is no persistance of products if you turn your head. There is no confirmation for the products. To address these issues we run a confirmation logic : Each time a bounding box is found, we raycast on the world the center of the box. If a product is within a radius, then we consider both bounding boxes to belong to the same product. We adjust the product's position if necessary. We save the product's texture from the current inference. A product that gets no confirmations for 2 seconds is discarded. If a product is confirmed 7 times, it is definitely confirmed and sent to the labelling algorithm.

-Label the products : We run each confirmation's image through a mobilenet v3 embedding model, trained on custom data. The output vectors will be compared to a database, outputting barcodes. We only keep barcodes with high confidence. We then check for a consensus in the barcodes : at least 4 of the high confidence barcodes need to be the same. If all filters are passed, the product is labelled!

-Fetch product information : based on the product's barcode, we fetch info online : product name, ingredients, weight... These informations will be cached for later products.

-Check with list : when a product gets its information, a request is sent to OpenAI, asking if the product name corresponds to an item on the list. The result will be cached for later products. To save tokens, requets are batched : when a product requests an API call, the app waits a bit for latecomers and sends all of their data at the same time.

-Recommendation : We now have a list of items that can fit in our grocery list : for a few select criteria, LLM inference is not necessary (lowest calories, allergies...). Thoses cases are handled algorithmically on device. If criteria is not fitting, we send a request to OpenAI. OpenAI also returns an "Overall pick" product, and gives reasons for its recommendation. Recommendations also wait for latecomers to save tokens.

-Products found later : some products may be identified later. They will be sent for recommendation. The recommendation query will only feature the new products and the already recommended ones for comparaison.

Challenges we ran into

We encountered lots of challenges !

We were first faced with a performance issue : even yolo nano models were holding the main thread using Unity's inference engine. The solution was to use NCNN for async inference. We used AI to generate a C# wrapper for the c++ lib. 2nd issue : gathering data for the embedding model. We tried several approaches : -Using openfoodfact's database of product images. This approach didn't work because most images were either too clean or complete nonsense.

-Using openfoodfact's best images with heavy data augmentation. Keep only the relevant images and try to make the most out of them. This approach didn't work either : 3-4 images per product were too few, even with augmentation. These images were also often too clean, or complete nonsense.

-Using a custom mobile app to label products : the app used the cropping logic from the YOLO model. It shows each crop and asks the user to scan the product's barcode. This approach is solid but painfully slow.

-Final approach : using videos to label images fast : run a video through the yolo model. it cuts images every 0.1s. Feed all of those images to my embedding model, and try to define groups. Show images 50 at a time : i discard images that don't belong in the group. The model learns from its mistakes and gets better on the fly. This creates unlabled groups of images, each group corresponding to a single product : i then label the groups by hand.

Accomplishments that we're proud of

Having something that works really well! We went through a lot, and having an image->barcode model that has ~99% accuracy is just crazy. We had a lot of fun developing this software, acheiving its current state and the making the video. It was our first time making such a video with voice over and the result is better than expected.

What we learned

Adaptability: We learned not to cling to an initial idea. When our first technical approach (reading price tags) wasn't accurate enough, we pivoted completely to visual barcode recognition, which drastically improved the product.

Data-Centric AI: We discovered that better data beats better models. We built our own custom video-labeling tool to generate a high-quality dataset, rather than trying to force-fit a public dataset that wasn't working.

Mobile Performance Optimization: We learned how to optimize heavy ML workloads for mobile hardware. By bridging C++ (NCNN) and C#, we managed to run real-time object detection on the Quest 3 without causing lag.

Hybrid System Architecture: We learned to balance costs and speed by creating a hybrid flow: doing the heavy lifting (detection) locally on the device and only using Cloud APIs (OpenAI) for the final, high-value logic.

What's next for Smart Cart

Our immediate focus is to intensively train the embedding model to cover a full supermarket inventory. Since our underlying logic is built to be scalable, the system can easily grow alongside the model's knowledge without requiring architectural changes. We also aim to introduce deeper analytical features, such as providing visual tips on how to choose the ripest fruit or filtering for highly specific needs like detergent for white clothes. We believe we have achieved a robust technical foundation that allows us to scale up rapidly from this point forward.