Axon | Devpost

Inspiration

University students need proper teacher outside class room and "TA"s are dumb. Most learning tools (llms, websites etc) feel like search engines with a chat box bolted on. You type a question, get a wall of text, and you're left to figure out the rest yourself. We wanted to flip that—what if your tutor could show you, not just tell you? What if it felt less like Googling and more like sitting across from someone who genuinely adapts to how you learn?

That question became Axon.

What it does

Axon is an adaptive AI learning workspace with a 3D avatar tutor that teaches through live conversation, visual canvas explanations, and an embedded code editor.

Avatar tutor: A Ready Player Me 3D avatar that reacts in real-time—speaking, thinking, listening, and encouraging based on the session context.
Visual canvas: Step-by-step animated explanations (diagrams, flowcharts, algorithm visualizations) rendered dynamically as the tutor speaks.
Live IDE: Write, run, and get instant AI feedback on code without ever leaving the workspace.
Persistent sessions: Every conversation, canvas state, and code snippet is saved so your learning journey continues seamlessly across sessions.
Web Support: Make it watch an entire video and ask questions, summarize, get flashcards etc. You can do the same for any web page ie. wikipedia.
Suggestion chips: Contextually generated follow-up prompts ensure you always know what thread to pull next.

How we built it

Frontend: Built with React and Vite. We used React Three Fiber (@react-three/fiber + @react-three/drei) to render the 3D avatar, Zustand for state management, and Lucide for iconography. The UI follows a "Modern Dark Cinema" design system featuring ambient light blobs, glassmorphism overlays, expo-out easing (cubic-bezier(0.16,1,0.3,1)), and a floating panel layout.
Backend: Powered by FastAPI with an agentic loop (tool-use + structured output) driven by Claude. The agent produces a structured JSON response on every turn containing the speech text, emotional state, canvas actions, follow-up suggestions, and step tracker updates—all in a single round trip.
Avatar: We loaded a Ready Player Me .glb model via useGLTF. We lit the scene with three-point cinema lighting (including a striking indigo rim light) and bust-framed it using a custom camera rig.
Persistence: We used SQLAlchemy and Alembic for robust session, message, and workspace state storage in the database.

Challenges we ran into

Structured agent output: Getting the LLM to reliably return parseable JSON on every response without hallucinating extra fields required rigorous prompt engineering and strict output validation schemas.
Avatar framing: Translating a 3D world-space camera into a natural bust shot that looks good across different viewport sizes was trickier than expected. It required leaning heavily into world-space math—like calculating transformation matrices such as $M = T \cdot R \cdot S$ and projection coordinates—rather than standard screen-space UI intuition.
Layout architecture: Designing four distinct zones (avatar panel, tab bar, canvas/IDE, chat bar) that coexist without fighting for space. This was especially difficult because the 3D canvas consumes the WebGL context, requiring careful flexbox composition and overflow management.
Glassmorphism on WebGL: Applying backdrop-filter blur overlays on top of a Three.js canvas required isolating the compositing layer correctly so the frosted-glass badges actually blurred the 3D render behind them, rather than just the background color.

Accomplishments that we're proud of

Engineering a fully functional agentic tutor loop that synchronizes speech, visuals, and emotion in a single API call.
Creating a 3D avatar that visually reflects the tutor's cognitive state in real-time.
Designing a clean, highly-polished UI that feels like a premium, shipped product rather than a weekend hackathon prototype.
Implementing persistent sessions—you can close the tab, come back, and pick up exactly where you left off.

What we learned

Schema-first thinking matters: Structured LLM output is only as reliable as the prompt contract you define upfront.
3D in React is viable: React Three Fiber makes integrating 3D elements surprisingly ergonomic, though camera and lighting still require solid 3D math fundamentals.
UI polish is a force multiplier: The exact same features feel twice as impressive when wrapped in a meticulously crafted shell.
State synchronization is hard: Agentic state machines (mapping emotion $\rightarrow$ avatar $\rightarrow$ canvas $\rightarrow$ steps) need incredibly careful synchronization to avoid screen flicker and race conditions.

What's next for Axon

Voice input: Letting learners speak their questions naturally while the avatar responds in kind.
Curriculum mode: Multi-session learning paths featuring spaced repetition and comprehensive progress tracking.
Collaborative rooms: The ability to study with a friend and share a canvas session in real-time.
Custom avatars: Allowing learners to bring their own Ready Player Me avatars so the tutor mirrors their personal style.
Export capabilities: Downloading a session as a structured, Markdown-based study guide complete with diagrams.