Inspiration

Wilderness First Responder (WFR) training relies heavily on static paper scenarios or expensive live-action roleplay. Students rarely get dynamic, unpredictable repetitions to truly test their decision-making under pressure. After collaborating with Keenan Grady, President of the Ogden Avalanche Center, I realized that textbook memorization isn't enough for backcountry emergencies. We needed a way to give students infinite, medically accurate repetitions to build real critical thinking.

What it does

Summit-Sim is an AI-powered wilderness rescue simulator that generates dynamic, curriculum-informed emergencies. It provides two distinct user experiences:

  • For Instructors: A creation tool where teachers configure scenarios, review them via a human-in-the-loop (HITL) interrupt, and dynamically inject hidden variables (like a rattlesnake) before publishing.
  • For Students: An interactive game loop where students use free-text natural language to assess the scene and treat the patient. The true medical state is completely concealed and only progressively revealed based on their decisions, dynamically updating a Cumulative PAS Score across 5 WFR milestones.

How we built it

We built a production-grade AI stack focused on medical safety, strict schemas, and observability:

  • Orchestration & State: Two interconnected LangGraph workflows manage complex state flows, utilizing LangGraph's interrupt() for the HITL review. We use DragonflyDB for checkpoint persistence.
  • Agent Framework: We used PydanticAI across four specialized agents (Generator, Image Generator, Action Responder, Debrief) to enforce strict Pydantic schemas, ensuring structured outputs and medical safety.
  • Multimodal LLMs: Powered by OpenRouter, leveraging Gemini Flash for text and Gemini Flash Image for unique 16:9 atmospheric scenario visuals.
  • UI & Infrastructure: The frontend is an async, reactive Python UI built in Chainlit. Everything is deployed on a production homelab running Kubernetes (Talos Linux on Proxmox) with ArgoCD for GitOps.

Note: To accelerate our development timeline, we leveraged opencode and kimi-k2.5 for AI-assisted coding

Challenges we ran into

Initially, we received feedback from SMEs that our Action Responder agent was "too generous," allowing students to bypass critical assessments and prematurely complete sessions. Additionally, we built a runtime validation system, but had to disable it in production due to an active MLflow bug (#20782), given how new the MLflow GenAI features are. To overcome this, we built four custom MLflow judges (Completion, Feedback, Narrative, Medical) to perform offline GEPA (Genetic Pareto) optimization, strictly aligning the agent's prompts to SME feedback.

Accomplishments that we're proud of

We successfully hid a massively complex multi-agent architecture behind a clean, intuitive UI. We didn't just build an LLM wrapper; we built a fully observable, evaluated, and optimized ML system. The ability to maintain a persistent hidden state that progressively reveals itself while tracking milestone completion (PAS scoring) creates a genuinely novel educational tool. As first time users to PydanticAI and LangGraph, we're proud to build a system leveraging the two systems together.

What we learned

We learned that GenAI tracing and observability is still very new and some features were missing or broken in MLflow. We also learned how to leverage PydanticAI alongside LangGraph and the benefits of a chainlit frontend for async AI applications. Ultimately, we learned how to leverage human feedback to mathematically improve agentic systems through techniques like GEPA prompt optimization.

What's next for summit-sim

We plan to continue sourcing expert feedback by reaching out to NOLS and other non-profit organizations after this event. Additional features will include a user profile system and a teacher/student dashboard to track progress, text streaming, providing more opportunities for HITL feedback (thumbs up/down) and A/B testing different foundational models.

Built With

Share this project:

Updates