Mic Page
Project Architecture
Home Page
Login Page
Map Info
Weather Info
Delayed Flight Info
Rebooking
Airport Destination Info
Seat Modal
Quick Info
Core Workflow
Orchestrator
Tools
Sequence Example
Tool Execution
Error Handling

Aria: The Multimodal AI Airport Kiosk

Inspiration

Traditional airport kiosks rely on touchscreens that are unhygienic, hard to use with luggage, and lack personalization. We wanted a voice-first, multimodal AI assistant that feels natural and helpful—like talking to a knowledgeable airport employee. By combining conversational AI with a 3D visual interface, we make airport navigation more intuitive and engaging.

What It Does

Aria is a multimodal AI kiosk designed for airports. Passengers authenticate via face recognition or QR code, then interact with a 3D animated assistant through voice or text.

Core Capabilities

Answer airport questions using a RAG knowledge base
(baggage policies, security procedures, facilities)
Provide flight information
(gates, boarding times, delays, cancellations)
Show weather for origin and destination cities
Display interactive maps
(gates, restrooms, customer service)
Guide travelers to lounges, restaurants, shops, and terminals
Handle flight changes
(delays, cancellations, rebooking, overbooking offers)

Visual Experience

A 3D audio-reactive particle sphere that responds to voice input
Real-time FFT analysis drives particle movement, creating an immersive and responsive UI

How We Built It

Backend (Python / FastAPI)

LangGraph agent orchestration with Vultr Serverless Inference (Kimi-K2-Instruct (1T total, 32B active))
Layered architecture:
Transport → Session Manager → Agent Orchestrator → Tools / RAG / TTS
WebSocket streaming for real-time audio and text responses
ElevenLabs TTS with character-level alignment for synchronized text reveal
MongoDB Atlas Vector Search for RAG (airport FAQ knowledge base)
DeepFace (Facenet512) for biometric authentication
Tool system:
- Flight info
- Weather API
- Map rendering
- Destination info
- RAG search

Frontend (React / TypeScript)

Three.js 3D scene with 20,000+ particles
Web Audio API FFT analysis
(bass, mid, high frequency bands)
TanStack Router for file-based routing
TanStack Query for data fetching
WebSocket client for real-time communication
Performance optimizations:
- Refs instead of React state for 60fps animations
- Delta-time–aware animation updates

Architecture Highlights

Self-correcting agent loop with retry logic
Resilient tool execution with timeouts and error translation
Session management with conversation history persistence
Event-driven streaming for frontend observability

Challenges We Ran Into

WebSocket streaming complexity
Synchronizing audio chunks, alignment data, and component events required careful buffering and ordering logic.
3D performance
Maintaining 60fps with 20,000+ particles while processing audio in real time. Solved by using refs instead of React state and optimizing the animation loop.
Audio-reactive visuals
Implementing smooth FFT analysis with proper frequency band separation and smoothing to prevent jitter.
Error handling
Designing an error translation system that converts technical failures into user-friendly messages without crashing the app.
Architecture refactoring
Migrating from a monolithic backend to a layered architecture mid-development while maintaining the WebSocket contract.
MongoDB Vector Search setup
Correctly configuring vector dimensions (384 for embeddings) and similarity metrics.

Accomplishments We’re Proud Of

Full-stack integration
Seamless connection between the 3D frontend, WebSocket backend, and AI agent.
Audio-reactive 3D UI
Real-time particle deformation responding smoothly to voice input.
Robust error handling
Comprehensive exception hierarchy that never crashes and always provides helpful feedback.
Clean architecture
Maintainable, well-structured code with clear separation of concerns.
Biometric authentication
Functional face recognition with fast MongoDB vector matching.
RAG integration
Reliable airport-specific knowledge retrieval.
Performance
Smooth 60fps animations with complex 3D graphics and real-time audio processing.

What We Learned

LangGraph patterns for agentic systems with tool calling, state management, and streaming
WebSocket streaming with multiple event types
(audio, alignment, components, errors)
Three.js optimization using refs, delta-time calculations, and efficient particle systems
MongoDB Vector Search configuration and similarity tuning
Error resilience through graceful failure handling
Refactoring strategies for evolving architecture without breaking contracts

What’s Next for Aria

Production deployment with scaling, monitoring, and security
Enhanced tools
- Real flight APIs
- Expanded map coverage
- Restaurant menus and reviews
Improved RAG
- Larger airport knowledge bases
- Multi-document retrieval
Accessibility
- Sign language support
- Screen readers
- Multiple languages
Personalization
- Learn user preferences
- Proactive recommendations
Multimodal input
- Image uploads (“Where is this gate?”)
- Gesture recognition
Analytics dashboard
- Common questions
- User satisfaction
- System performance
Mobile companion app
- Continuous assistance on passengers’ phones throughout their journey

Built With

elevenlabs
langgraph
mongodb
python
react
vite
vultr

Submitted to

TAMUhack 2026
- Winner [MLH] Best Use of Vultr
- Winner Second Place for American Airlines Challenge

Created by

I worked on the backend user information and authentication database, as well as the flights database. Both databases were hosted in MongoDB Atlas. The user authentication database incorporated a vector index that contained face embeddings. Using Deepface, faces were vectorized and checked against the database to authorize and obtain information pertaining to users and their flights.

Jadon Lee
Sean Hau Goh
Alexander Bui
Lucas Vadlamudi