Landing page
Admin dashboard
landing page on how it works mini simulation
landing page
ARCHITECTURE DESIGN
Agent Triage

OPSLY — Voice-First, Vision-Powered Property Management

Inspiration

Property management is stuck in the past. Tenants call a front desk, leave voicemails, or fill out forms, then wait days wondering if anyone heard them. Managers juggle spreadsheets, phone calls, and sticky notes. Technicians drive blind with no briefing until they knock on the door.

We asked: What if AI could close every gap in that chain in real time?

Not a chatbot bolted onto a ticketing system, but an AI that can see damage through photos, hear tenants describe problems through voice, and act by creating, triaging, and dispatching work orders all within a single conversation.

OPSLY was born from that idea: a voice-first, vision-powered property management platform where every participant, tenant, manager, and technician gets an AI copilot.

What It Does

OPSLY connects three roles through AI:

Tenants report issues by speaking naturally to an AI voice agent. They snap a photo of the damage, and Gemini Vision scores its severity \(0\text{–}10\), recommends priority, and identifies the damage type — all before the conversation ends.
Managers see work orders appear live on a real-time command center dashboard. KPIs, SLA countdowns, escalation feeds, and AI severity scores update via WebSocket — no refresh needed.
Technicians get hands-free voice briefings for their daily schedule. They update job status by voice while working, and tenants are notified instantly.

The entire flow — from "I have a leak" to a dispatched, tracked, SLA-monitored work order — takes under 60 seconds.

Architecture

╔══════════════════════════════════════════════════════════════════════╗
║                         OPSLY  SYSTEM                               ║
╠══════════════════════════════════════════════════════════════════════╣
║                                                                      ║
║   ┌─────────────┐     ┌─────────────┐     ┌─────────────────────┐  ║
║   │   TENANT    │     │   MANAGER   │     │    TECHNICIAN       │  ║
║   │  Browser    │     │  Dashboard  │     │   Voice Briefing    │  ║
║   └──────┬──────┘     └──────┬──────┘     └──────────┬──────────┘  ║
║          │                   │                        │             ║
║  ════════╪═══════════════════╪════════════════════════╪═══════════  ║
║                   FRONTEND  (React 18 + Vite + Tailwind)            ║
║  ════════╪═══════════════════╪════════════════════════╪═══════════  ║
║          │                   │  WebSocket (Socket.IO) │             ║
║          ▼                   ▼                        ▼             ║
║   ┌──────────────────────────────────────────────────────────────┐  ║
║   │               BACKEND  (NestJS + Prisma)                     │  ║
║   │                                                              │  ║
║   │   ┌────────────┐   ┌──────────────┐   ┌─────────────────┐   │  ║
║   │   │  REST API  │   │  WebSocket   │   │  Ephemeral      │   │  ║
║   │   │  (RBAC)    │   │  Gateway     │   │  Token Manager  │   │  ║
║   │   └────────────┘   └──────────────┘   └─────────────────┘   │  ║
║   └──────────────────────────┬───────────────────────────────────┘  ║
║                              │                                       ║
║                              ▼                                       ║
║                   ┌─────────────────────┐                           ║
║                   │    PostgreSQL DB     │                           ║
║                   └─────────────────────┘                           ║
║                                                                      ║
║  ════════════════════════════════════════════════════════════════   ║
║                         AI  LAYER                                    ║
║  ════════════════════════════════════════════════════════════════   ║
║                                                                      ║
║   ┌──────────────────────────────┐  ┌───────────────────────────┐  ║
║   │       GEMINI LIVE API        │  │     GEMINI FLASH VISION    │  ║
║   │  gemini-2.5-flash-native-    │  │                           │  ║
║   │         audio               │  │  · Severity score (0–10)  │  ║
║   │                             │  │  · Damage classification  │  ║
║   │  · Bidirectional audio PCM  │  │  · Repair recommendations │  ║
║   │  · Function calling         │  │  · Structured JSON output │  ║
║   │  · 16kHz in / 24kHz out     │  │                           │  ║
║   └──────────────────────────────┘  └───────────────────────────┘  ║
║                                                                      ║
╚══════════════════════════════════════════════════════════════════════╝

Voice Pipeline

  TENANT MIC
      │  16kHz PCM
      ▼
  AudioWorklet ──────► Gemini Live WebSocket
      │                       │
      │                       │ function call
      │                       ▼
      │              Backend REST Endpoint
      │                       │ tool result
      │                       ▼
      │              Gemini incorporates result
      │                       │
      ◄───────────────────────┘
      24kHz audio response

The voice system uses Gemini's ephemeral token flow for security — the API key never reaches the browser:

Backend creates a scoped ephemeral token (single-use, 2-min connection window)
Frontend connects directly to Gemini Live via WebSocket using the token
Audio captured at \(16\,\text{kHz}\) via AudioWorklet, streamed as PCM
Gemini responds with \(24\,\text{kHz}\) audio + function calls (tool use)
Tool calls route through the frontend to authenticated backend REST endpoints
Results stream back to Gemini, which incorporates them into its spoken response

All of this happens in a single unbroken conversation — the tenant never leaves the voice session.

Vision Pipeline

  TENANT UPLOADS PHOTO
          │
          │  base64 in memory
          ▼
  ┌───────────────────┐
  │  Gemini Flash     │  ◄── "Take your time, I'll wait"
  │  Vision Analysis  │         (agent pauses)
  └────────┬──────────┘
           │  structured assessment
           ▼
  Injected into live voice session
           │
           ▼
  Agent reads findings aloud
           │
           ▼
  Work order created with photo + AI score attached

Real-Time Dashboard

  Work Order Event (create / assign / update)
           │
           │  Socket.IO (role-filtered)
           ▼
  ┌─────────────────────────────────────────┐
  │          MANAGER DASHBOARD              │
  │                                         │
  │  ┌──────────┐  ┌──────────────────────┐ │
  │  │ KPI Cards│  │  SLA Countdown Timer │ │
  │  │(animated)│  │  (client-side ticks) │ │
  │  └──────────┘  └──────────────────────┘ │
  │  ┌─────────────────────────────────────┐ │
  │  │  Work Order Table (live in-place)   │ │
  │  └─────────────────────────────────────┘ │
  │  ┌─────────────────────────────────────┐ │
  │  │  Escalation Alerts + AI Severity    │ │
  │  └─────────────────────────────────────┘ │
  └─────────────────────────────────────────┘
           No page refresh required

How We Built It

Layer	Technology
Voice AI	Gemini Live API (`gemini-2.5-flash-native-audio`) — bidirectional audio streaming with function calling
Vision AI	Gemini Flash — photo assessment returning structured severity scores, damage classification, and repair recommendations
Backend	NestJS + Prisma + PostgreSQL — REST API with RBAC guards, WebSocket gateway, ephemeral token management
Frontend	React 18 + Vite + TailwindCSS — glassmorphism design system, real-time WebSocket integration, TanStack Query
Real-time	Socket.IO — role-filtered event broadcasting for live dashboard updates

Challenges

1. Gemini Live + Tool Calls Timing (Error 1007/1008)

The hardest bug: Gemini Live's WebSocket would close with cryptic error codes when audio frames and tool call responses collided. We discovered that sendToolResponse must complete before audio streaming resumes, and sendClientContent (text injection) must happen before mic capture starts not after. This required careful orchestration of the toolCallActiveRef flag to pause audio during tool execution.

2. Ephemeral Token Format

The authTokens.create() API returns a token object, but the actual token string lives in .name, not .token or the object itself. This undocumented behavior cost hours of debugging 401 errors.

3. Tool Response Payload Size

Sending full database objects (including Gemini Vision's JSON assessment blobs) back through sendToolResponse caused silent 1007 disconnections. The fix: slim every tool response to only the fields the voice agent needs to speak aloud: order numbers, addresses, priorities, not raw AI assessment JSON.

4. AudioWorklet Cross-Browser

AudioWorklet processors must be served as separate .js files from the public directory, not bundled. The \(16\,\text{kHz}\) capture → \(24\,\text{kHz}\) playback sample rate mismatch required separate AudioContext instances.

What We Learned

Voice-first changes everything. When you design for voice, the entire UX simplifies. No forms, no dropdowns, no pagination — just conversation.
Gemini Live's function calling is powerful but unforgiving. The timing constraints around audio streaming + tool calls require defensive programming that isn't documented yet.
Vision + Voice together is the unlock. Neither alone is enough — a photo without context misses urgency, and a voice description without visual evidence misses severity. Combined, they produce accurate, trustworthy triage.
Real-time is table stakes. Once the manager sees one work order appear live, they never want to refresh again. WebSocket-driven dashboards aren't a feature — they're the expectation.