Aria: The Multimodal AI Airport Kiosk

Inspiration

Traditional airport kiosks rely on touchscreens that are unhygienic, hard to use with luggage, and lack personalization. We wanted a voice-first, multimodal AI assistant that feels natural and helpful—like talking to a knowledgeable airport employee. By combining conversational AI with a 3D visual interface, we make airport navigation more intuitive and engaging.


What It Does

Aria is a multimodal AI kiosk designed for airports. Passengers authenticate via face recognition or QR code, then interact with a 3D animated assistant through voice or text.

Core Capabilities

  • Answer airport questions using a RAG knowledge base
    (baggage policies, security procedures, facilities)
  • Provide flight information
    (gates, boarding times, delays, cancellations)
  • Show weather for origin and destination cities
  • Display interactive maps
    (gates, restrooms, customer service)
  • Guide travelers to lounges, restaurants, shops, and terminals
  • Handle flight changes
    (delays, cancellations, rebooking, overbooking offers)

Visual Experience

  • A 3D audio-reactive particle sphere that responds to voice input
  • Real-time FFT analysis drives particle movement, creating an immersive and responsive UI

How We Built It

Backend (Python / FastAPI)

  • LangGraph agent orchestration with Vultr Serverless Inference (Kimi-K2-Instruct (1T total, 32B active))
  • Layered architecture:
    Transport → Session Manager → Agent Orchestrator → Tools / RAG / TTS
  • WebSocket streaming for real-time audio and text responses
  • ElevenLabs TTS with character-level alignment for synchronized text reveal
  • MongoDB Atlas Vector Search for RAG (airport FAQ knowledge base)
  • DeepFace (Facenet512) for biometric authentication
  • Tool system:
    • Flight info
    • Weather API
    • Map rendering
    • Destination info
    • RAG search

Frontend (React / TypeScript)

  • Three.js 3D scene with 20,000+ particles
  • Web Audio API FFT analysis
    (bass, mid, high frequency bands)
  • TanStack Router for file-based routing
  • TanStack Query for data fetching
  • WebSocket client for real-time communication
  • Performance optimizations:
    • Refs instead of React state for 60fps animations
    • Delta-time–aware animation updates

Architecture Highlights

  • Self-correcting agent loop with retry logic
  • Resilient tool execution with timeouts and error translation
  • Session management with conversation history persistence
  • Event-driven streaming for frontend observability

Challenges We Ran Into

  • WebSocket streaming complexity
    Synchronizing audio chunks, alignment data, and component events required careful buffering and ordering logic.
  • 3D performance
    Maintaining 60fps with 20,000+ particles while processing audio in real time. Solved by using refs instead of React state and optimizing the animation loop.
  • Audio-reactive visuals
    Implementing smooth FFT analysis with proper frequency band separation and smoothing to prevent jitter.
  • Error handling
    Designing an error translation system that converts technical failures into user-friendly messages without crashing the app.
  • Architecture refactoring
    Migrating from a monolithic backend to a layered architecture mid-development while maintaining the WebSocket contract.
  • MongoDB Vector Search setup
    Correctly configuring vector dimensions (384 for embeddings) and similarity metrics.

Accomplishments We’re Proud Of

  • Full-stack integration
    Seamless connection between the 3D frontend, WebSocket backend, and AI agent.
  • Audio-reactive 3D UI
    Real-time particle deformation responding smoothly to voice input.
  • Robust error handling
    Comprehensive exception hierarchy that never crashes and always provides helpful feedback.
  • Clean architecture
    Maintainable, well-structured code with clear separation of concerns.
  • Biometric authentication
    Functional face recognition with fast MongoDB vector matching.
  • RAG integration
    Reliable airport-specific knowledge retrieval.
  • Performance
    Smooth 60fps animations with complex 3D graphics and real-time audio processing.

What We Learned

  • LangGraph patterns for agentic systems with tool calling, state management, and streaming
  • WebSocket streaming with multiple event types
    (audio, alignment, components, errors)
  • Three.js optimization using refs, delta-time calculations, and efficient particle systems
  • MongoDB Vector Search configuration and similarity tuning
  • Error resilience through graceful failure handling
  • Refactoring strategies for evolving architecture without breaking contracts

What’s Next for Aria

  • Production deployment with scaling, monitoring, and security
  • Enhanced tools
    • Real flight APIs
    • Expanded map coverage
    • Restaurant menus and reviews
  • Improved RAG
    • Larger airport knowledge bases
    • Multi-document retrieval
  • Accessibility
    • Sign language support
    • Screen readers
    • Multiple languages
  • Personalization
    • Learn user preferences
    • Proactive recommendations
  • Multimodal input
    • Image uploads (“Where is this gate?”)
    • Gesture recognition
  • Analytics dashboard
    • Common questions
    • User satisfaction
    • System performance
  • Mobile companion app
    • Continuous assistance on passengers’ phones throughout their journey

Built With

Share this project:

Updates