WTF ( Where is The Food?)

landing page
Upload photo and/or enter caption
enter location and date
App lists out the restaurants from Yelp Ai API that match the user query
view the details related to your selection
multi agent debate screen

Inspiration:

Ever scroll past mouth-watering food on Instagram or TikTok and instantly wish it could appear on your plate? This app is inspired by that exact moment. It takes the dishes you see online, screenshots, photos, foodie posts, and turns them into real, nearby options you can actually eat. By combining image recognition with smart recommendations, it bridges the gap between “That looks delicious” and “Let’s go get it". Whether you’re craving something trending, discovering hidden gems, or letting your eyes decide your next meal, the goal is simple: Make your cravings real, straight from your feed to your feast.

What it does :

This app lets you upload a food image or caption and instantly finds restaurants that serve that dish at the location you prefer using Yelp API. It uses AI to identify the food, craft a smart Yelp query, and rank the best matches. Users can set location, date, and time so results fit real-time availability. A multi-agent system then reviews each restaurant, highlighting pros, cons, and gives one balanced verdict. Every recommendation comes with an option to call the restaurant or order or reserve a table through Yelp.

How we built it :

The system is built around two backend pipelines. Pipeline 1 (image - > Yelp discovery ) : We process uploaded images or captions using a multimodal LLM to generate a structured Yelp Query. This Query is sent to the Yelp AI Chat API, and the returned businesses are normalized (ratings, reviews, hours, photos, availability) and ranked within Yelp.

Challenges we ran into:

Image processing and multimodal inference introduced significant latency, requiring aggressive optimization and multithreading to keep responses fast. Limited reviews for certain restaurants also made it harder for the agents to produce accurate summaries, forcing us to build fallback logic and LLM-based review synthesis. We frequently had to switch between free-tier model endpoints, which added instability and required extra engineering to maintain consistent performance across pipelines.

Accomplishments that we're proud of:

We’re especially proud of the multi-agent debate system. Instead of blindly picking the highest-rated restaurant, the Optimist–Critic–Judge trio evaluates each option from multiple angles. This creates a more personalized, preference-aware recommendation that reflects real tradeoffs rather than simple rating scores.

What we learned:

We learned how to design end-to-end pipelines that combine multimodal models, structured query generation, and integrating Yelp API in a reliable way. Building the multi-agent debate system taught us how different reasoning personas can produce more balanced recommendations than a single model. We also gained experience handling inconsistent API outputs, normalizing business data, and optimizing latency across chained LLM calls. Most importantly, we learned how to turn an ambiguous user input (a food photo) into a fully contextualized, actionable recommendation.

What's next for WTF ( Where is The Food?):

Some features we would like to have are user profile, taste preference learning which will enable more personalized ranking of results. On the systems side, we would incorporate methods to reduce latency even more. Download the APK from the link given below.