TLDR:

Planning a special occasion in Sydney means hours of scrolling blogs, Instagram, and Google, piecing together fragmented information while battling decision fatigue. Vibe Plan uses Gemini 3 Flash Preview to help users transform loose vibes into a personalised plan to wow others for special occasions and to discover Sydney’s best venues.

Inspiration

We knew exactly the vibe we wanted for Valentines day - romantic but not cliché, impressive without being pretentious, somewhere we could actually talk - but translating that feeling into a concrete reservation felt impossible.

Google gave us filters. Instagram gave us aesthetics. Blogs gave us lists. But none could bridge the gap between "quiet luxury harbour views" and an actual table we could book.

So instead of spending another 5 hours researching, we built VibePlan - the tool we wished existed. Turns out, when you have Gemini's multimodal reasoning and Google Places' real-world data, you can finally turn vibes into plans.

What it does

VibePlan is Sydney's first vibe-to-venue translator. Users describe their occasion through text, uploaded images, or curated vibe selections - like "anniversary dinner" + rooftop aesthetic photos + "quiet luxury, somewhere impressive." Gemini 3 Flash Preview processes this multimodal input to understand not just keywords, but spatial aesthetics, ambiance, and emotional context. It then:

  1. Intelligently searches Google Places for venues matching that vibe
  2. Cross-references reviews, photos, pricing, and location data
  3. Generates personalized plans with insider tips, booking links, weather-aware backup options, and complete evening itineraries

The result: 5 hours of fragmented research becomes a 5-minute conversation with AI that actually understands what "lively but not loud" means.

How I built it

Architecture:

  • Frontend: React + Vite with clean UI components (Landing, VibeSelection, ComparisonView, FullPlanView)
  • Backend: Supabase Edge Functions (serverless TypeScript)
  • AI Engine: Gemini 3 Flash Preview (gemini-3-flash-preview)
  • Data Sources: Google Places API, Weather API, Unsplash, Spotify

Key Technical Components:

  1. Multimodal Input Pipeline:
  2. generate-vibe-images → Curates aesthetic references from Unsplash
  3. Users select vibe images + add text descriptions
  4. Combined input sent to Gemini for holistic interpretation
  5. Agentic Venue Search:
  6. generate-venues → Gemini constructs Google Places queries based on vibe analysis
  7. cluster-venues → Groups and categorises results spatially
  8. Real-time cross-referencing of photos, reviews, and metadata
  9. Intelligent Recommendation:
  10. suggest-activities → Gemini reasons across occasion type, weather, vibe tags, and venue data
  11. Generates editorial context ("This 33rd-floor bar has quiet luxury vibes, couples and finance types, perfect for starting a glam evening")
  12. Provides insider tips, booking strategy, and contingency plans

Data Flow: User Input (text + images) → venueService.ts orchestration → Edge Functions → Gemini + Google Places APIs → Personalised plan output

Challenges we ran into

  1. Making recommendations feel personalized, not generic Problem: Early versions returned technically accurate but soulless recommendations Solution: Engineered prompts to have Gemini analyse why a venue matches the vibe, not just that it matches. We feed occasion context, user aesthetic preferences, and spatial data together, forcing Gemini to reason about emotional fit.

  2. Prompt engineering for vibe interpretation Problem: "Quiet luxury" means different things in different contexts (anniversary vs. business dinner) Solution: Built a structured prompt template that includes: occasion type, relationship context, aesthetic references (from images), and explicit vibe descriptors. Gemini's multimodal understanding lets it weigh visual mood against text nuance.

  3. Handling multimodal input effectively Problem: Images alone are ambiguous; text alone lacks richness Solution: Designed a sequential flow: users select curated vibe images (reducing noise), then add text to refine. Gemini receives both as a unified context, analyzing lighting, layout, and crowd density from images while grounding it with text specificity.

To avoid the "list of top 10" trap, we got Gemini to generate one perfect plan with backups, rather than overwhelming users with options.

Accomplishments that I'm proud of

🎯 First working prototype of vibe-to-venue translation No existing tool can take "impressive but not pretentious" + rooftop photos and return bookable venues. We built the first one.

🏗️ Clean multimodal pipeline Seamlessly combining text, curated images, and real-time API data in a way that feels conversational, not transactional

✅ Real user validation Tested with friends planning Valentine's dates - they actually used it for real reservations and said "I'd use this over tiktok browsing and google maps."

🚀 Production-ready architecture Serverless Edge Functions + Gemini + Google Places = scalable, fast, and cost-effective

💡 Making AI feel human Recommendations include editorial context and insider tips that sound like a friend's advice, not an algorithm's output

What I learned

About Gemini: Multimodal reasoning is genuinely different: Gemini doesn't just "see" images - it understands spatial relationships, ambiance cues, and aesthetic coherence in ways that pure text models can't replicate. Prompt context matters more than we expected: Feeding occasion type + relationship context + vibe images produces dramatically better results than generic "find restaurants" queries. Agentic workflows unlock real value: Having Gemini autonomously construct Google Places queries, reason about results, and cross-reference data feels like having a smart research assistant, not a chatbot.

About building with AI: Multimodal input requires careful UX: We learned to curate image options rather than allow freeform uploads - it reduces noise and guides better outputs. Personalisation comes from context, not memory: Even without a learning layer, rich contextual prompts make recommendations feel tailored. The gap between "technically works" and "feels magical" is huge: Most of our time went into prompt engineering to make outputs sound human and thoughtful.

What's next for Vibeplan

Immediate (Post-Competition): Expand beyond Sydney: Melbourne, Brisbane, then international cities More occasion types: Creative activities, Nature, Bachelor/bachelorette parties

Medium-term: Personalization layer: Learn user preferences over time (saved venues, past selections) Social sharing: "Here's the plan for Sarah's 30th" → collaborative planning Venue partnerships: Exclusive reservations, insider access for VibePlan users

Long-term Vision: Become the vibe translation layer for all experiences: Travel planning, gift recommendations, entertainment - anywhere people know the feeling but can't find the thing B2B applications: White-label for concierge services, wedding planners, corporate event teams

Technical Evolution: Integrate Gemini Live API for real-time conversational planning Use Gemini's video understanding for venue walk-throughs Build predictive models: "Based on your aesthetic, here are venues you'll love"

Built With

Share this project:

Updates