Inspiration
Short-term rentals are awesome… until you’re standing in a stranger’s kitchen wondering how the espresso machine works, or digging through a 20-page PDF for the Wi-Fi password. Hosts repeat the same explanations; guests ask the same questions. We wanted a faster, more human way to share “how this home works” without long manuals or back-and-forth messaging.
ARBnB turns any Airbnb into an interactive, mixed-reality guide—so the space teaches you itself.
What it does
ARBnB is a shared MR layer for hosts and guests that:
- Anchors notes and tips to real objects (e.g., “Twist knob left” floating right on the thermostat).
- Understands the scene using visual recognition to identify appliances, switches, or fixtures.
- Guides guests with conversational AI—ask out loud, get a spoken answer, see the exact control highlighted in your view.
- Automates onboarding & troubleshooting with step-by-step flows (“Starter: washer/dryer”, “Fix: tripped breaker”, “Check-out: trash + thermostat”).
- Syncs host knowledge—hosts drop pins, upload short voice notes, or record micro-tutorials once; future guests benefit instantly.
How we built it
- Engine & MR: Unity (XR Interaction Toolkit) for spatial anchoring, object placement, and interaction cues.
- Speech & Dialog: wit.ai (Meta SDK) for speech-to-text, intent parsing, and voice responses.
- Vision: Florence (vision model) for on-device/image-based object and scene detection to locate appliances and UI elements.
- Reasoning Agent: A lightweight Gemini-based agent to plan multi-step troubleshooting and generate concise guidance.
- Unity glue code: C# scripts trigger highlights, tooltips, haptics, and voice prompts; custom state machine for “intro”, “onboarding”, and “resolve issue” modes.
- Data model: A simple “home graph” linking objects (Oven → Controls → Safety), notes, and host policies; cached locally for privacy, synced to the cloud for updates.
Challenges we ran into
- Feature merge in XR: Integrating voice, vision, and spatial anchors so they feel like one experience (timing + UX) was tricky.
- Bridging product vs. tech: We iterated on what “guest magic” feels like while keeping the architecture feasible in a weekend.
- Mixed reality prototyping in Unity: Getting reliable anchors, occlusion, and object highlights across devices took time.
- Niche stack collisions (XR × AI): Tooling versions and SDKs didn’t always play nicely; we fought build settings, mic permissions, and model formats.
Accomplishments we’re proud of
- Seamless speech-to-speech interaction that triggers Unity highlights and steps in real time.
- Live visual recognition → spatial guidance: Point at an appliance, ask a question, watch the correct knob light up.
- Host → guest knowledge handoff: Drop a note once; every future guest sees it exactly where it matters.
What we learned
- Unity XR in the real world: Anchors, device quirks, and why small, legible labels beat fancy effects.
- AI integration patterns: Chaining ASR → NLU → planning → TTS with tight latency budgets.
- Prompt + product design: How to structure agent outputs so they map cleanly to UI steps and Unity triggers.
- Privacy-by-design: Keeping most inference local and syncing only minimal, non-sensitive metadata.
What’s next for ARBnB
- Smart-glasses support: Hands-free onboarding using monocular/binaural devices.
- Richer scene understanding: Persistent room-scale maps, semantic segmentation, and auto-anchoring to object meshes.
- Host templates & analytics: One-tap “Washer 101” packs and insights on common questions to improve listings.
- Offline mode: Ship the home graph + core models for low-connectivity stays.
- Multi-language voices: On-device translation and localized TTS.
Bonus: what to include on your Devpost page (quick checklist)
- Tagline: “AR notes that live where you need them—your Airbnb, not a binder.”
- 1–2 min demo video: Show: (1) guest asks “How do I use the oven?”, (2) Florence identifies the panel, (3) highlight + step overlay, (4) host drops a new tip, (5) next guest sees it.
- Screenshots:
- Object-anchored tooltip (“Turn counter-clockwise to 90°C”).
- Voice transcript bubble + animated highlight.
- Host editor view (placing a note).
- Troubleshooting flow (e.g., “Washer door won’t lock”).
- Object-anchored tooltip (“Turn counter-clockwise to 90°C”).
- Tech stack badges: Unity, wit.ai, Florence, Gemini (lightweight), C#, XR Interaction Toolkit.
- Repo links: Client (Unity project), simple backend (if any), model configs.
- Privacy note: Local processing first; minimal cloud sync of non-sensitive notes/anchors.
Example “How it works” (optional section on Devpost)
- Detect & Ground: On launch, we build/restore a spatial map; Florence detects candidate objects.
- Ask & Understand: Guest speech → wit.ai intents/entities.
- Plan: Gemini agent selects a recipe (“Oven: preheat”) and returns step cards + guardrails.
- Guide: Unity highlights exact controls, plays TTS, and tracks completion.
- Learn: Hosts drop anchored notes; we version them in the home graph for future guests.
Log in or sign up for Devpost to join the conversation.