Inspiration
The average American spends nearly 18 hours researching a single trip (Go City / Talker Research, 2024), viewing 141 pages of travel content across dozens of websites (Expedia Group / Luth Research, 2023). AI travel planners promise to fix this, but every one on the market is fundamentally a single-prompt chatbot that generates an itinerary in one pass with zero verification.
On the TravelPlanner benchmark — 1,225 planning tasks with real-world constraints — even GPT-4 achieves only a 0.6% success rate (Xie et al., ICML 2024 Spotlight). Language agents struggle to stay on task, use the right tools, or keep track of multiple constraints.
We asked: what if AI travel planning worked the way a real travel agency does — with specialized agents that collaborate, check each other's work, and adapt when constraints collide?
What it does
Sojurn is a 7-agent AI travel planner where you describe a trip in natural language and watch specialized agents plan it in real-time. Type "5 days in Lisbon, food-focused, $3,500 budget" and watch:
- A Scout Agent discovers 20+ places via Google Nearby Search + Tavily editorial discovery — filtering out tourist traps with visible reasoning
- A Logistics Critic validates every recommendation against transit times, budget constraints, and venue hours — rejecting places that don't meet hard criteria and forcing the Scout to find replacements in a live debate loop
- A Flight Finder searches real Amadeus flights, scoring by price, duration, and arrival time
- A Lodging Curator finds hotel options filtered by budget and neighborhood proximity
- A Route Planner clusters places by neighborhood using K-means and optimizes walking routes with nearest-neighbor algorithms
- An Assembler reconciles the budget, generates a climate-aware summary, and produces match scores
- A Vibe Artist generates a unique hero image for your trip using Amazon Nova Canvas
The result: a complete itinerary with real flights, verified hotels, optimized walking routes, seasonal intelligence, and AI-generated destination imagery — in under 30 seconds, for~$0.04 in Amazon Nova inference cost.
Surgical Swap: Click "Swap" on any single stop, and only that slot re-plans while the entire rest of the itinerary stays byte-for-byte identical. 3-5 seconds.
Budget Stress Test: Click "Make it 20% cheaper" and watch agents cascade-negotiate in real-time: the Lodging Curator finds a guesthouse in the same neighborhood, the Scout swaps paid activities for free alternatives, the Assembler confirms you're under budget.
Seasonal Intelligence: The system knows August in Bangkok means monsoon season and April in Tokyo means cherry blossoms — adjusting recommendations and narrating the reasoning.
No other AI travel planner can do any of this.
How we built it
Backend: Python/FastAPI with a custom DAG pipeline orchestrating 7 agents via WebSocket streaming. Each agent has a specialized role — LLMs handle creative and ambiguous tasks (Scout discovery, Manager conversation, Assembler narration), while deterministic algorithms handle optimization and validation (K-means routing, haversine distance checks, budget arithmetic). a Day Skeleton Builder pre-allocates typed activity and meal slots before agents run. This hybrid approach directly addresses the core finding of the TravelPlanner benchmark (Xie et al., ICML 2024 Spotlight): pure LLM approaches achieve only 0.6% success on complex multi-constraint planning, while hybrid approaches that combine LLM reasoning with deterministic validation dramatically outperform them.
Amazon Nova Integration: We use multiple Nova models on Bedrock — the right model for each cognitive task. Nova Lite powers deep reasoning (Scout analysis, Manager conversation, Assembler narration, Critic verdict narration). Nova Canvas generates unique destination imagery. All inference runs through Amazon Bedrock.
Frontend: React/TypeScript with Zustand for state management, Google Maps for visualization, and Framer Motion for animations. The Studio phase shows all agents working simultaneously with streaming thought bubbles and map pin drops. The Canvas phase presents the finished itinerary with day tabs, budget breakdowns, swap controls, and AI-generated hero images.
Key Architectural Decision: The DAG pipeline supports partial re-execution. When a user swaps one stop, the system checkpoints the trip state and re-runs only Scout + Critic + Route Planner for that single slot. This "surgical swap" completes in 3-5 seconds — versus 30+ seconds for a full re-plan — because the architecture was designed for targeted modification from day one. The same partial execution pattern enables the budget stress test, where multiple agents negotiate savings simultaneously without regenerating the full itinerary.
Challenges we ran into
The Logistics Critic debate loop. Adding a cyclic edge (Scout → Critic → back to Scout) inside an otherwise acyclic graph required careful design. We needed pre-clustering before the Critic runs (to check transit distances from cluster centroids), a shared Google Places lookup budget across initial and supplemental rounds, and graceful degradation when the debate doesn't converge after 3 cycles.
Tavily API quota management. Our test suite burned through 1,000 Tavily credits in a single hung run due to retry logic on quota-exhausted responses. We added circuit breakers, separated live-API tests from mock tests, and implemented credit protection guards.
Budget stress test negotiation ordering. When the user asks for a 20% cost reduction, the system needs to prioritize high-impact savings (lodging downgrade) before granular ones (activity swaps). Getting the negotiation priority order right — and making the streaming reasoning feel natural rather than mechanical — required iteration.
Making the invisible visible. The Critic's deterministic checks (haversine distance, budget arithmetic, hours parsing) are sophisticated but inherently invisible. The biggest design challenge was surfacing this reasoning in a way that's immediately comprehensible — streaming rejection verdicts with specific numbers ("3.2km from Day 3 cluster centroid, exceeds 1.5km walk limit") rather than just pass/fail badges.
Accomplishments that we're proud of
- Surgical swap — click one stop, only that slot re-plans, everything else byte-for-byte identical. 3-5 second completion.
- Visible agent debate — the Logistics Critic's rejections stream live with specific reasoning: "Rejecting 'Remote Viewpoint' — 3.2km from Day 3 cluster centroid, exceeds 1.5km walk limit."
- Budget cascade negotiation — "Make it 20% cheaper" triggers multi-agent negotiation that saves real money through intelligent tradeoffs, with honest reporting when the target can't be fully met.
- Seasonal intelligence — the system knows June in Lisbon means Santo António Festival and adjusts recommendations accordingly.
- 600+ tests with zero failures across all agents, the planning graph, swap pipeline, and budget negotiation.
- ~$0.04 in Amazon Nova inference cost per trip using Amazon Nova's pricing — a full 5-day itinerary for under five cents.
- AI-generated hero images via Nova Canvas that capture the specific vibe of each trip — not stock photos.
What we learned
The most important insight: visibility trumps complexity. The Logistics Critic's debate loop is technically sophisticated, but it only became impressive when we made it visible — streaming rejections with specific reasoning, showing pass/fail badges, animating the debate in real-time. Features that judges can't see might as well not exist.
Second: the hybrid LLM-solver pattern is real. Using LLMs for creative tasks (discovery, narration, conversation) and deterministic algorithms for optimization (routing, validation, budget math) produces dramatically better results than pure LLM approaches. The TravelPlanner benchmark (Xie et al., ICML 2024) showed GPT-4 achieves only 0.6% on complex planning — our Critic catches the constraint violations that LLMs consistently miss.
Third: partial graph execution is the killer feature. Designing the DAG for targeted re-execution (not just full re-runs) enabled surgical swap, budget stress test, and gap-filling — three features from one architectural decision.
Fourth: the right Nova model for each task matters. Nova Lite for deep reasoning, Nova Canvas for image generation — matching model capabilities to task requirements keeps latency low and quality high.
What's next for Sojurn
- Photo-to-Location: Upload a vacation photo and Nova Lite Vision identifies the destination and builds a trip matching that aesthetic
- Bedrock Guardrails for PII protection and responsible AI safeguards
- Calendar export (.ics) for one-click itinerary download
- Multi-city trips with inter-city transit optimization
Built With
- amadeus-flight-offers-api
- amazon-bedrock
- amazon-nova-canvas
- amazon-nova-lite
- fastapi
- framer-motion
- google-directions
- google-maps
- google-places
- python
- react
- tailwind-css
- tavily-web-search
- typescript
- websocket
- zustand
Log in or sign up for Devpost to join the conversation.