Aria: The Multimodal AI Airport Kiosk
Inspiration
Traditional airport kiosks rely on touchscreens that are unhygienic, hard to use with luggage, and lack personalization. We wanted a voice-first, multimodal AI assistant that feels natural and helpful—like talking to a knowledgeable airport employee. By combining conversational AI with a 3D visual interface, we make airport navigation more intuitive and engaging.
What It Does
Aria is a multimodal AI kiosk designed for airports. Passengers authenticate via face recognition or QR code, then interact with a 3D animated assistant through voice or text.
Core Capabilities
- Answer airport questions using a RAG knowledge base
(baggage policies, security procedures, facilities) - Provide flight information
(gates, boarding times, delays, cancellations) - Show weather for origin and destination cities
- Display interactive maps
(gates, restrooms, customer service) - Guide travelers to lounges, restaurants, shops, and terminals
- Handle flight changes
(delays, cancellations, rebooking, overbooking offers)
Visual Experience
- A 3D audio-reactive particle sphere that responds to voice input
- Real-time FFT analysis drives particle movement, creating an immersive and responsive UI
How We Built It
Backend (Python / FastAPI)
- LangGraph agent orchestration with Vultr Serverless Inference (Kimi-K2-Instruct (1T total, 32B active))
- Layered architecture:
Transport → Session Manager → Agent Orchestrator → Tools / RAG / TTS - WebSocket streaming for real-time audio and text responses
- ElevenLabs TTS with character-level alignment for synchronized text reveal
- MongoDB Atlas Vector Search for RAG (airport FAQ knowledge base)
- DeepFace (Facenet512) for biometric authentication
- Tool system:
- Flight info
- Weather API
- Map rendering
- Destination info
- RAG search
Frontend (React / TypeScript)
- Three.js 3D scene with 20,000+ particles
- Web Audio API FFT analysis
(bass, mid, high frequency bands) - TanStack Router for file-based routing
- TanStack Query for data fetching
- WebSocket client for real-time communication
- Performance optimizations:
- Refs instead of React state for 60fps animations
- Delta-time–aware animation updates
Architecture Highlights
- Self-correcting agent loop with retry logic
- Resilient tool execution with timeouts and error translation
- Session management with conversation history persistence
- Event-driven streaming for frontend observability
Challenges We Ran Into
- WebSocket streaming complexity
Synchronizing audio chunks, alignment data, and component events required careful buffering and ordering logic. - 3D performance
Maintaining 60fps with 20,000+ particles while processing audio in real time. Solved by using refs instead of React state and optimizing the animation loop. - Audio-reactive visuals
Implementing smooth FFT analysis with proper frequency band separation and smoothing to prevent jitter. - Error handling
Designing an error translation system that converts technical failures into user-friendly messages without crashing the app. - Architecture refactoring
Migrating from a monolithic backend to a layered architecture mid-development while maintaining the WebSocket contract. - MongoDB Vector Search setup
Correctly configuring vector dimensions (384 for embeddings) and similarity metrics.
Accomplishments We’re Proud Of
- Full-stack integration
Seamless connection between the 3D frontend, WebSocket backend, and AI agent. - Audio-reactive 3D UI
Real-time particle deformation responding smoothly to voice input. - Robust error handling
Comprehensive exception hierarchy that never crashes and always provides helpful feedback. - Clean architecture
Maintainable, well-structured code with clear separation of concerns. - Biometric authentication
Functional face recognition with fast MongoDB vector matching. - RAG integration
Reliable airport-specific knowledge retrieval. - Performance
Smooth 60fps animations with complex 3D graphics and real-time audio processing.
What We Learned
- LangGraph patterns for agentic systems with tool calling, state management, and streaming
- WebSocket streaming with multiple event types
(audio, alignment, components, errors) - Three.js optimization using refs, delta-time calculations, and efficient particle systems
- MongoDB Vector Search configuration and similarity tuning
- Error resilience through graceful failure handling
- Refactoring strategies for evolving architecture without breaking contracts
What’s Next for Aria
- Production deployment with scaling, monitoring, and security
- Enhanced tools
- Real flight APIs
- Expanded map coverage
- Restaurant menus and reviews
- Improved RAG
- Larger airport knowledge bases
- Multi-document retrieval
- Accessibility
- Sign language support
- Screen readers
- Multiple languages
- Personalization
- Learn user preferences
- Proactive recommendations
- Multimodal input
- Image uploads (“Where is this gate?”)
- Gesture recognition
- Analytics dashboard
- Common questions
- User satisfaction
- System performance
- Mobile companion app
- Continuous assistance on passengers’ phones throughout their journey
Log in or sign up for Devpost to join the conversation.