About Recipee

🎯 Inspiration

The idea for Recipee came from a common frustration: staring at a fridge full of ingredients but having no idea what to cook. We've all been there—wondering "What can I make with what I have?" Instead of resorting to expensive food delivery or letting ingredients go to waste, we wanted to create an intelligent assistant that transforms whatever you have into delicious, personalized recipes.

🧠 What We Learned

Building Recipee taught us invaluable lessons about:

  • Multimodal AI Integration: Combining Google's Gemini AI for vision and text processing with Apple's Speech Recognition framework to create a seamless, multi-input experience
  • Real-time Audio Processing: Implementing YouTube audio extraction and transcription pipelines using cloud infrastructure (Google Cloud Run) and iOS Speech Recognition APIs
  • SwiftUI State Management: Managing complex app flows with @ObservableObject, @Published, and Swift Concurrency (async/await)
  • API Architecture: Designing a Node.js backend that bridges RapidAPI services with iOS clients, handling streaming audio downloads efficiently
  • Difficulty Scaling Algorithm: Creating a recipe variation system that maintains ingredient coherence while scaling complexity. If $n$ is the number of base ingredients, we generate variations where:
    • Easy: $|I_e| \approx 0.6n$ (minimal ingredients)
    • Intermediate: $|I_m| \approx 0.8n$ (moderate complexity)
    • Advanced: $|I_a| \approx n + k$ (full ingredients + $k$ specialty items)

🛠️ How We Built It

Frontend (iOS - SwiftUI)

  • Voice Input: Leveraged SFSpeechRecognizer for real-time ingredient capture via speech
  • Computer Vision: Integrated Gemini Vision API to analyze fridge photos and extract ingredients using multimodal prompts
  • YouTube Integration: Built a pipeline that downloads audio from YouTube videos via RapidAPI, then transcribes using iOS Speech Recognition to extract recipe instructions
  • Adaptive UI: Designed a step-by-step flow (Voice → Image → Manual → Recipes) with SwiftUI's declarative syntax

Backend (Node.js + Google Cloud Run)

  • Audio Extraction Service: Created an Express.js API that fetches YouTube audio via RapidAPI's youtube-mp3-audio-video-downloader endpoint
  • Streaming Architecture: Implemented efficient audio streaming using Node.js streams to pipe audio directly to iOS without intermediate storage
  • Cloud Deployment: Containerized the service with Docker and deployed to Google Cloud Run for scalability

AI/ML Pipeline

  1. Ingredient Extraction:
  2. Recipe Generation:
  3. Difficulty Mapping:
    • We map API responses ("easy", "intermediate", "advanced") to enum cases
    • Display labels are decoupled: E, M, H for compact UI while preserving semantic meaning

Mathematical Model for Recipe Scoring

We considered implementing a recipe relevance score based on ingredient overlap:

$$ \text{Relevance}(R, I) = \frac{|R \cap I|}{|R|} \times 100 $$

Where:

  • $R$ = set of recipe ingredients
  • $I$ = set of user's available ingredients
  • $|R \cap I|$ = number of matching ingredients

This would allow sorting recipes by feasibility, prioritizing those requiring fewer missing ingredients.

💪 Challenges We Faced

1. YouTube Audio Extraction Complexity

Initially, we attempted direct YouTube scraping, but quickly hit rate limits and legal concerns. Switching to RapidAPI's licensed service solved this, but required careful handling of audio format conversions (WebM → M4A) and streaming large files efficiently.

2. Speech Recognition Accuracy

iOS Speech Recognition sometimes misheard ingredients (e.g., "leeks" vs "leaks"). We mitigated this by:

  • Allowing manual editing in the review stage
  • Using context-aware prompts with Gemini to validate ingredients
  • Implementing fuzzy matching for common substitutions

3. Enum Parsing Bug

When we changed difficulty display labels from "Easy""Quick", the JSON parsing broke because it relied on rawValue matching. We fixed this by mapping API strings to enum cases rather than raw values:

switch variation.difficulty.lowercased() {
case "easy": difficulty = .easy      // Maps to display "Quick"
case "intermediate": difficulty = .intermediate  // Maps to "Mid"
case "advanced": difficulty = .advanced         // Maps to "Pro"
}

Built With

Share this project:

Updates