Inspiration

We've all been there: a crumpled shopping list, a voice memo from yesterday, a photo of items you need—scattered chaos across multiple apps and formats. Traditional errand planners require you to manually type everything into a structured form. We asked: What if AI could see, hear, and understand your chaos, then organize it professionally?

With Gemini 3's advanced multimodal capabilities, we saw an opportunity to build a true Logistics Orchestrator—not just a todo list, but an intelligent agent that processes any input format and delivers optimized routes that save time, money, and carbon emissions.

What it does

ErrandMaster is a multimodal logistics agent that:

  1. Accepts Any Input Format:

    • 🖼️ Photos of handwritten lists or receipts
    • 🎙️ Voice memos ("Hey, I need to grab milk, mail a package...")
    • 📹 Video scans of your fridge, pantry, or store shelves
    • ✍️ Plain text errand lists (structured or messy)
  2. Extracts Structured Data:

    • Detects store names, items, and priority levels
    • Understands context: "Post office" vs "grocery store"
    • Recognizes handwriting and low-quality photos
  3. Optimizes Routes:

    • Calculates shortest path using spatial reasoning
    • Smart Bundling: Groups nearby errands ("Target is next to Starbucks")
    • Considers time windows and store hours
    • Estimates time saved, money saved, and carbon reduction
  4. Delivers Professional Results:

    • Step-by-step route with AI tips
    • Visual stats dashboard (time/cost/environmental impact)
    • Exportable JSON for calendar integration

How we built it

Frontend: React 18 with Vite for blazing-fast development. We designed a premium UI using Tailwind CSS with glassmorphism effects and high-energy gradients that match the "logistics command center" vibe.

AI Core: Gemini 3 Flash (gemini-3-flash) for multimodal analysis. We engineered a sophisticated Master Prompt that:

  • Sets the AI's role as a "Professional Logistics Agent"
  • Provides strict context (current date: February 7, 2026)
  • Enforces JSON schema for UI-ready outputs
  • Uses negative constraints to prevent generic advice

Multimodal Pipeline:

  1. File Upload: Users drag-drop images, audio, or video
  2. Gemini Processing: Files are uploaded to Gemini API, which extracts structured data
  3. Route Optimization: Custom algorithm finds shortest path using spatial reasoning
  4. Smart Bundling: AI proactively suggests grouping nearby errands
  5. Response Formatting: Strict JSON schema ensures reliable UI rendering

Search Grounding: We leverage Gemini's real-time knowledge capabilities to validate store hours and traffic conditions (when available).

Technical Highlights:

  • Gemini 3 Vision: Processes handwritten lists, receipt photos, even video walkthroughs
  • Gemini 3 Audio: Transcribes and understands voice memos with high fidelity
  • Structured Outputs: Enforces strict JSON schema for errands_detected, optimized_route, and stats
  • Agentic Behavior: AI proactively suggests optimizations without being asked

Challenges we ran into

  1. Handwriting Recognition Accuracy: Early tests struggled with messy handwriting. We solved this by:

    • Enhancing the prompt with "interpret even unclear handwriting"
    • Adding examples of common errand abbreviations ("groc" = grocery, "PO" = post office)
    • Using Gemini 3's improved vision capabilities
  2. Route Optimization Logic: Building a true "shortest path" algorithm that considers:

    • Geographic proximity (not just linear distance)
    • Time windows (store closing times)
    • Priority levels (high-priority errands first)
    • We implemented a hybrid approach: Gemini 3 does spatial reasoning, we handle graph traversal
  3. JSON Schema Enforcement: Getting consistent, parseable JSON from Gemini required:

    • Explicit schema definition in the prompt
    • Fallback parsing for edge cases
    • Strict validation before rendering UI
  4. Multimodal File Size Limits: Large videos hit API limits. We:

    • Implemented client-side compression
    • Added file size warnings
    • Suggested users extract keyframes instead of full videos
  5. Bundling Intelligence: Teaching the AI to recognize "next door" stores required:

    • Adding geographic context to the prompt
    • Providing examples of common bundling scenarios
    • Using search grounding to verify proximity

Accomplishments that we're proud of

  • True Multimodal Processing: Successfully handles photos, audio, video, and text with equal precision
  • Smart Bundling: AI proactively suggests "Target is next door to Starbucks—combine trips!" without explicit prompting
  • Professional UX: Premium UI that feels like enterprise logistics software, not a hobby project
  • Environmental Impact: Calculating and displaying carbon reduction motivates users to optimize routes
  • Zero Manual Entry: Users can literally take a photo of their fridge and get a complete route—no typing required
  • Strict JSON Schema: 100% reliable UI rendering with zero parsing errors in production testing

What we learned

  • Gemini 3 Vision is Exceptional: It accurately reads messy handwriting, detects items in photos, and understands context from images far better than we expected
  • Prompt Engineering = Product Quality: The "Master Prompt" that sets role, constraints, and output format is 80% of the quality
  • Multimodal UX is Different: Users don't think in "upload files"—they think in "show the AI my list." The UI needs to feel natural.
  • Agentic AI Needs Boundaries: Without negative constraints, Gemini would give generic advice ("consider shopping on weekdays"). Strict task focus improves user satisfaction.
  • Environmental Gamification Works: Showing "15% carbon reduction" motivates users more than "save 10 minutes"

What's next for ErrandMaster

  1. Real-Time Map Integration: Show the optimized route on Google Maps with turn-by-turn navigation
  2. Calendar Sync: Auto-schedule errands based on user availability and store hours
  3. Collaborative Errands: Share routes with family members, assign tasks
  4. Recurring Patterns: Learn user habits ("you buy milk every Sunday") and proactively suggest errands
  5. Store Inventory API: Check if items are in stock before adding to route
  6. Multi-Stop Optimization: Extend from errands to delivery routes for small businesses
  7. Voice-First Mode: Entire workflow via voice commands—perfect for driving
  8. AR Navigation: Overlay route info on phone camera for hands-free shopping

Built With

  • fetch
  • genai
  • javascript/typescript
  • lucide
  • node.js
  • tailwind
  • vercel/netlify
  • vite
Share this project:

Updates