Foodex

Inspiration

We all have traditions in group chats that we share any unique meals we eat with each other. We wanted to build an app that allows us to share these meals in a feed style and use advanced AI tools to facilitate capturing the food.

What it does

Foodex allows you to take pictures of your food, and utilizing an advanced VLM (vision-language model) powered by Gemini Robotics, we detect the foods present in your photo. Once the VLM finds food, we query our database to give it a rarity score (based on the average consumer), and users can add foods to their logbook. Users can also add friends, and use the feed functionality to see what their friends are eating. Users will get points based on the rarity of foods, and have a unique profile page with all of their discoveries, as well as constellations, or lists of their favorite dishes.

How we built it

We used Cloudflare workers to host our backend. The backend utilizes the Gemini Robotics VLM API to output structured JSON of what foods are seen in the image. We also upload the image to Cloudflare R2 and utilize the Cloudflare D1 database to store information.

The mobile app was written in React Native with Tanstack Query for data fetching, and Tamagui for our UI library. This allowed for fast iteration since we didn’t have to spend a lot of time building UI elements and styling.

Challenges we ran into

All of our team for this project was using at least one new technology, whether that be the Gemini Robotics API, Cloudflare Workers, or even React Native. In addition, our app had many moving parts to it, including but not limited to the capture functionality, the logbook, and the user profile information, so trying to implement all of this in such a short time period was challenging.

Accomplishments that we're proud of

The VLM output is very consistent and provides very good bounding boxes of the foods detected. We were able to give it a list of known foods and allow the VLM to conform what it sees to those labels. We also developed an OpenAPI compliant API on cloudflare workers that aided in the generation of queries and client code.

What we learned

We learned a lot about vision language models such as Gemini Robotics and how they worked. We also improved our skills in React Native and app development in general, as well as making a fully functional mobile app with a backend rather than a frontend-only app. Finally, we also learned about the intricacies of Cloudflare’s many different platforms, including Workers (which powered our entire backend), R2 (for image storage), and D1 (for databases). This was our entire team’s first time using Cloudflare, so that was by far the biggest thing we learned and can take away from this project.

What's next for Foodex

We want to improve the multi-user experience with an improved friend system, likes, and a better feed for viewing activity. In addition, we want to refine the VLM prompt to give better outputs that can be used to generate new food items that are not in the app already.

We also want to finish implementing the “constellation” feature, where you can group together captures similar to playlists in a music app. We need to improve the constellation building experience to be easier to use and intuitive. In addition, we also want to improve the sharing/discovery functionality of the constellations