Inspiration

Short-term rentals are awesome… until you’re standing in a stranger’s kitchen wondering how the espresso machine works, or digging through a 20-page PDF for the Wi-Fi password. Hosts repeat the same explanations; guests ask the same questions. We wanted a faster, more human way to share “how this home works” without long manuals or back-and-forth messaging.
ARBnB turns any Airbnb into an interactive, mixed-reality guide—so the space teaches you itself.

What it does

ARBnB is a shared MR layer for hosts and guests that:

Anchors notes and tips to real objects (e.g., “Twist knob left” floating right on the thermostat).
Understands the scene using visual recognition to identify appliances, switches, or fixtures.
Guides guests with conversational AI—ask out loud, get a spoken answer, see the exact control highlighted in your view.
Automates onboarding & troubleshooting with step-by-step flows (“Starter: washer/dryer”, “Fix: tripped breaker”, “Check-out: trash + thermostat”).
Syncs host knowledge—hosts drop pins, upload short voice notes, or record micro-tutorials once; future guests benefit instantly.

How we built it

Engine & MR: Unity (XR Interaction Toolkit) for spatial anchoring, object placement, and interaction cues.
Speech & Dialog: wit.ai (Meta SDK) for speech-to-text, intent parsing, and voice responses.
Vision: Florence (vision model) for on-device/image-based object and scene detection to locate appliances and UI elements.
Reasoning Agent: A lightweight Gemini-based agent to plan multi-step troubleshooting and generate concise guidance.
Unity glue code: C# scripts trigger highlights, tooltips, haptics, and voice prompts; custom state machine for “intro”, “onboarding”, and “resolve issue” modes.
Data model: A simple “home graph” linking objects (Oven → Controls → Safety), notes, and host policies; cached locally for privacy, synced to the cloud for updates.

Challenges we ran into

Feature merge in XR: Integrating voice, vision, and spatial anchors so they feel like one experience (timing + UX) was tricky.
Bridging product vs. tech: We iterated on what “guest magic” feels like while keeping the architecture feasible in a weekend.
Mixed reality prototyping in Unity: Getting reliable anchors, occlusion, and object highlights across devices took time.
Niche stack collisions (XR × AI): Tooling versions and SDKs didn’t always play nicely; we fought build settings, mic permissions, and model formats.

Accomplishments we’re proud of

Seamless speech-to-speech interaction that triggers Unity highlights and steps in real time.
Live visual recognition → spatial guidance: Point at an appliance, ask a question, watch the correct knob light up.
Host → guest knowledge handoff: Drop a note once; every future guest sees it exactly where it matters.

What we learned

Unity XR in the real world: Anchors, device quirks, and why small, legible labels beat fancy effects.
AI integration patterns: Chaining ASR → NLU → planning → TTS with tight latency budgets.
Prompt + product design: How to structure agent outputs so they map cleanly to UI steps and Unity triggers.
Privacy-by-design: Keeping most inference local and syncing only minimal, non-sensitive metadata.

What’s next for ARBnB

Smart-glasses support: Hands-free onboarding using monocular/binaural devices.
Richer scene understanding: Persistent room-scale maps, semantic segmentation, and auto-anchoring to object meshes.
Host templates & analytics: One-tap “Washer 101” packs and insights on common questions to improve listings.
Offline mode: Ship the home graph + core models for low-connectivity stays.
Multi-language voices: On-device translation and localized TTS.

Bonus: what to include on your Devpost page (quick checklist)

Tagline: “AR notes that live where you need them—your Airbnb, not a binder.”
1–2 min demo video: Show: (1) guest asks “How do I use the oven?”, (2) Florence identifies the panel, (3) highlight + step overlay, (4) host drops a new tip, (5) next guest sees it.
Screenshots:
- Object-anchored tooltip (“Turn counter-clockwise to 90°C”).
- Voice transcript bubble + animated highlight.
- Host editor view (placing a note).
- Troubleshooting flow (e.g., “Washer door won’t lock”).
Tech stack badges: Unity, wit.ai, Florence, Gemini (lightweight), C#, XR Interaction Toolkit.
Repo links: Client (Unity project), simple backend (if any), model configs.
Privacy note: Local processing first; minimal cloud sync of non-sensitive notes/anchors.

Example “How it works” (optional section on Devpost)

Detect & Ground: On launch, we build/restore a spatial map; Florence detects candidate objects.
Ask & Understand: Guest speech → wit.ai intents/entities.
Plan: Gemini agent selects a recipe (“Oven: preheat”) and returns step cards + guardrails.
Guide: Unity highlights exact controls, plays TTS, and tracks completion.
Learn: Hosts drop anchored notes; we version them in the home graph for future guests.