Inspiration
We've all been there: a crumpled shopping list, a voice memo from yesterday, a photo of items you need—scattered chaos across multiple apps and formats. Traditional errand planners require you to manually type everything into a structured form. We asked: What if AI could see, hear, and understand your chaos, then organize it professionally?
With Gemini 3's advanced multimodal capabilities, we saw an opportunity to build a true Logistics Orchestrator—not just a todo list, but an intelligent agent that processes any input format and delivers optimized routes that save time, money, and carbon emissions.
What it does
ErrandMaster is a multimodal logistics agent that:
Accepts Any Input Format:
- 🖼️ Photos of handwritten lists or receipts
- 🎙️ Voice memos ("Hey, I need to grab milk, mail a package...")
- 📹 Video scans of your fridge, pantry, or store shelves
- ✍️ Plain text errand lists (structured or messy)
Extracts Structured Data:
- Detects store names, items, and priority levels
- Understands context: "Post office" vs "grocery store"
- Recognizes handwriting and low-quality photos
Optimizes Routes:
- Calculates shortest path using spatial reasoning
- Smart Bundling: Groups nearby errands ("Target is next to Starbucks")
- Considers time windows and store hours
- Estimates time saved, money saved, and carbon reduction
Delivers Professional Results:
- Step-by-step route with AI tips
- Visual stats dashboard (time/cost/environmental impact)
- Exportable JSON for calendar integration
How we built it
Frontend: React 18 with Vite for blazing-fast development. We designed a premium UI using Tailwind CSS with glassmorphism effects and high-energy gradients that match the "logistics command center" vibe.
AI Core: Gemini 3 Flash (gemini-3-flash) for multimodal analysis. We engineered a sophisticated Master Prompt that:
- Sets the AI's role as a "Professional Logistics Agent"
- Provides strict context (current date: February 7, 2026)
- Enforces JSON schema for UI-ready outputs
- Uses negative constraints to prevent generic advice
Multimodal Pipeline:
- File Upload: Users drag-drop images, audio, or video
- Gemini Processing: Files are uploaded to Gemini API, which extracts structured data
- Route Optimization: Custom algorithm finds shortest path using spatial reasoning
- Smart Bundling: AI proactively suggests grouping nearby errands
- Response Formatting: Strict JSON schema ensures reliable UI rendering
Search Grounding: We leverage Gemini's real-time knowledge capabilities to validate store hours and traffic conditions (when available).
Technical Highlights:
- Gemini 3 Vision: Processes handwritten lists, receipt photos, even video walkthroughs
- Gemini 3 Audio: Transcribes and understands voice memos with high fidelity
- Structured Outputs: Enforces strict JSON schema for
errands_detected,optimized_route, andstats - Agentic Behavior: AI proactively suggests optimizations without being asked
Challenges we ran into
Handwriting Recognition Accuracy: Early tests struggled with messy handwriting. We solved this by:
- Enhancing the prompt with "interpret even unclear handwriting"
- Adding examples of common errand abbreviations ("groc" = grocery, "PO" = post office)
- Using Gemini 3's improved vision capabilities
Route Optimization Logic: Building a true "shortest path" algorithm that considers:
- Geographic proximity (not just linear distance)
- Time windows (store closing times)
- Priority levels (high-priority errands first)
- We implemented a hybrid approach: Gemini 3 does spatial reasoning, we handle graph traversal
JSON Schema Enforcement: Getting consistent, parseable JSON from Gemini required:
- Explicit schema definition in the prompt
- Fallback parsing for edge cases
- Strict validation before rendering UI
Multimodal File Size Limits: Large videos hit API limits. We:
- Implemented client-side compression
- Added file size warnings
- Suggested users extract keyframes instead of full videos
Bundling Intelligence: Teaching the AI to recognize "next door" stores required:
- Adding geographic context to the prompt
- Providing examples of common bundling scenarios
- Using search grounding to verify proximity
Accomplishments that we're proud of
- True Multimodal Processing: Successfully handles photos, audio, video, and text with equal precision
- Smart Bundling: AI proactively suggests "Target is next door to Starbucks—combine trips!" without explicit prompting
- Professional UX: Premium UI that feels like enterprise logistics software, not a hobby project
- Environmental Impact: Calculating and displaying carbon reduction motivates users to optimize routes
- Zero Manual Entry: Users can literally take a photo of their fridge and get a complete route—no typing required
- Strict JSON Schema: 100% reliable UI rendering with zero parsing errors in production testing
What we learned
- Gemini 3 Vision is Exceptional: It accurately reads messy handwriting, detects items in photos, and understands context from images far better than we expected
- Prompt Engineering = Product Quality: The "Master Prompt" that sets role, constraints, and output format is 80% of the quality
- Multimodal UX is Different: Users don't think in "upload files"—they think in "show the AI my list." The UI needs to feel natural.
- Agentic AI Needs Boundaries: Without negative constraints, Gemini would give generic advice ("consider shopping on weekdays"). Strict task focus improves user satisfaction.
- Environmental Gamification Works: Showing "15% carbon reduction" motivates users more than "save 10 minutes"
What's next for ErrandMaster
- Real-Time Map Integration: Show the optimized route on Google Maps with turn-by-turn navigation
- Calendar Sync: Auto-schedule errands based on user availability and store hours
- Collaborative Errands: Share routes with family members, assign tasks
- Recurring Patterns: Learn user habits ("you buy milk every Sunday") and proactively suggest errands
- Store Inventory API: Check if items are in stock before adding to route
- Multi-Stop Optimization: Extend from errands to delivery routes for small businesses
- Voice-First Mode: Entire workflow via voice commands—perfect for driving
- AR Navigation: Overlay route info on phone camera for hands-free shopping
Built With
- fetch
- genai
- javascript/typescript
- lucide
- node.js
- tailwind
- vercel/netlify
- vite
Log in or sign up for Devpost to join the conversation.