Inspiration

Short-term rentals are awesome… until you’re standing in a stranger’s kitchen wondering how the espresso machine works, or digging through a 20-page PDF for the Wi-Fi password. Hosts repeat the same explanations; guests ask the same questions. We wanted a faster, more human way to share “how this home works” without long manuals or back-and-forth messaging.
ARBnB turns any Airbnb into an interactive, mixed-reality guide—so the space teaches you itself.

What it does

ARBnB is a shared MR layer for hosts and guests that:

  • Anchors notes and tips to real objects (e.g., “Twist knob left” floating right on the thermostat).
  • Understands the scene using visual recognition to identify appliances, switches, or fixtures.
  • Guides guests with conversational AI—ask out loud, get a spoken answer, see the exact control highlighted in your view.
  • Automates onboarding & troubleshooting with step-by-step flows (“Starter: washer/dryer”, “Fix: tripped breaker”, “Check-out: trash + thermostat”).
  • Syncs host knowledge—hosts drop pins, upload short voice notes, or record micro-tutorials once; future guests benefit instantly.

How we built it

  • Engine & MR: Unity (XR Interaction Toolkit) for spatial anchoring, object placement, and interaction cues.
  • Speech & Dialog: wit.ai (Meta SDK) for speech-to-text, intent parsing, and voice responses.
  • Vision: Florence (vision model) for on-device/image-based object and scene detection to locate appliances and UI elements.
  • Reasoning Agent: A lightweight Gemini-based agent to plan multi-step troubleshooting and generate concise guidance.
  • Unity glue code: C# scripts trigger highlights, tooltips, haptics, and voice prompts; custom state machine for “intro”, “onboarding”, and “resolve issue” modes.
  • Data model: A simple “home graph” linking objects (Oven → Controls → Safety), notes, and host policies; cached locally for privacy, synced to the cloud for updates.

Challenges we ran into

  • Feature merge in XR: Integrating voice, vision, and spatial anchors so they feel like one experience (timing + UX) was tricky.
  • Bridging product vs. tech: We iterated on what “guest magic” feels like while keeping the architecture feasible in a weekend.
  • Mixed reality prototyping in Unity: Getting reliable anchors, occlusion, and object highlights across devices took time.
  • Niche stack collisions (XR × AI): Tooling versions and SDKs didn’t always play nicely; we fought build settings, mic permissions, and model formats.

Accomplishments we’re proud of

  • Seamless speech-to-speech interaction that triggers Unity highlights and steps in real time.
  • Live visual recognition → spatial guidance: Point at an appliance, ask a question, watch the correct knob light up.
  • Host → guest knowledge handoff: Drop a note once; every future guest sees it exactly where it matters.

What we learned

  • Unity XR in the real world: Anchors, device quirks, and why small, legible labels beat fancy effects.
  • AI integration patterns: Chaining ASR → NLU → planning → TTS with tight latency budgets.
  • Prompt + product design: How to structure agent outputs so they map cleanly to UI steps and Unity triggers.
  • Privacy-by-design: Keeping most inference local and syncing only minimal, non-sensitive metadata.

What’s next for ARBnB

  • Smart-glasses support: Hands-free onboarding using monocular/binaural devices.
  • Richer scene understanding: Persistent room-scale maps, semantic segmentation, and auto-anchoring to object meshes.
  • Host templates & analytics: One-tap “Washer 101” packs and insights on common questions to improve listings.
  • Offline mode: Ship the home graph + core models for low-connectivity stays.
  • Multi-language voices: On-device translation and localized TTS.

Bonus: what to include on your Devpost page (quick checklist)

  • Tagline: “AR notes that live where you need them—your Airbnb, not a binder.”
  • 1–2 min demo video: Show: (1) guest asks “How do I use the oven?”, (2) Florence identifies the panel, (3) highlight + step overlay, (4) host drops a new tip, (5) next guest sees it.
  • Screenshots:
    • Object-anchored tooltip (“Turn counter-clockwise to 90°C”).
    • Voice transcript bubble + animated highlight.
    • Host editor view (placing a note).
    • Troubleshooting flow (e.g., “Washer door won’t lock”).
  • Tech stack badges: Unity, wit.ai, Florence, Gemini (lightweight), C#, XR Interaction Toolkit.
  • Repo links: Client (Unity project), simple backend (if any), model configs.
  • Privacy note: Local processing first; minimal cloud sync of non-sensitive notes/anchors.

Example “How it works” (optional section on Devpost)

  1. Detect & Ground: On launch, we build/restore a spatial map; Florence detects candidate objects.
  2. Ask & Understand: Guest speech → wit.ai intents/entities.
  3. Plan: Gemini agent selects a recipe (“Oven: preheat”) and returns step cards + guardrails.
  4. Guide: Unity highlights exact controls, plays TTS, and tracks completion.
  5. Learn: Hosts drop anchored notes; we version them in the home graph for future guests.

Built With

Share this project:

Updates